Commit graph

279 commits

Author SHA1 Message Date
Felipe Sateler c465a29f24 core: add ConditionUser and ConditionGroup
This adds two options that are useful for user units. In particular, it
is useful to check ConditionUser=!0 to not start for the root user.

Closes: #5187
2017-05-26 09:42:44 -04:00
NeilBrown 2d79a0bbb9 Allow TimeoutSec=0 to work as documented in mount units and elsewhere (#6013)
Since commit 36c16a7cdd ("core: rework unit timeout handling, and add
new setting RuntimeMaxSec=") TimeoutSec=0 in mount units has
cause the mount to timeout immediately instead of never as documented.

There is a similar problem with Socket.TimeoutSec and Swap.TimeoutSec.

These are easily fixed using config_parse_sec_fix_0().

Automount.TimeoutIdleSec looks like it could have the same problem,
but doesn't because the kernel treats '0' as 'no timeout'.
It handle USEC_INFINITY correctly only because that constant has
the value '-1', and when round up, it becomes zero.
To avoid possible confusion, use config_parse_sec_fix_0() as well, and
explicitly handle USEC_INFINITY.
2017-05-23 09:42:26 +02:00
Michal Koutný a2df3ea4ae job: add JobRunningTimeoutSec for JOB_RUNNING state
Unit.JobTimeoutSec starts when a job is enqueued in a transaction. The
introduced distinct Unit.JobRunningTimeoutSec starts only when the job starts
running (e.g. it groups all Exec* commands of a service or spans waiting for a
device period.)

Unit.JobRunningTimeoutSec is intended to be used by default instead of
Unit.JobTimeoutSec for device units where such behavior causes less confusion
(consider a job for a _netdev mount device, with this change the timeout will
start ticking only after the network is ready).
2017-04-25 18:00:29 +02:00
Lennart Poettering 915e6d1676 core: add RootImage= setting for using a specific image file as root directory for a service
This is similar to RootDirectory= but mounts the root file system from a
block device or loopback file instead of another directory.

This reuses the image dissector code now used by nspawn and
gpt-auto-discovery.
2017-02-07 12:19:42 +01:00
Lennart Poettering 5d997827e2 core: add a per-unit setting MountAPIVFS= for mounting /dev, /proc, /sys in conjunction with RootDirectory=
This adds a boolean unit file setting MountAPIVFS=. If set, the three
main API VFS mounts will be mounted for the service. This only has an
effect on RootDirectory=, which it makes a ton times more useful.

(This is basically the /dev + /proc + /sys mounting code posted in the
original #4727, but rebased on current git, and with the automatic logic
replaced by explicit logic controlled by a unit file setting)
2017-02-07 11:22:05 +01:00
Lennart Poettering d2d6c096f6 core: add ability to define arbitrary bind mounts for services
This adds two new settings BindPaths= and BindReadOnlyPaths=. They allow
defining arbitrary bind mounts specific to particular services. This is
particularly useful for services with RootDirectory= set as this permits making
specific bits of the host directory available to chrooted services.

The two new settings follow the concepts nspawn already possess in --bind= and
--bind-ro=, as well as the .nspawn settings Bind= and BindReadOnly= (and these
latter options should probably be renamed to BindPaths= and BindReadOnlyPaths=
too).

Fixes: #3439
2016-12-14 00:54:10 +01:00
Lennart Poettering d107589cd2 core: turn on specifier expansion for more unit file settings
Let's permit specifier expansion at a numbre of additional fields, where
arbitrary strings might be passed where this might be useful one day. (Or at
least where there's no clear reason where it wouldn't make sense to have.)
2016-12-07 18:47:32 +01:00
Djalal Harouni d6299d613f core:gperf: pass the exec_context struct directly to parse restrict namespaces
The RestrictNamespaces= takes yes, no or a list of namespaces types,
therefor config_parse_restrict_namespaces() is a bit complex and it
operates on the ExecContext, fix this by passing the offset of
ExecContext directly otherwise restricting namespaces won't work.
2016-11-15 15:04:43 +01:00
Lennart Poettering add005357d core: add new RestrictNamespaces= unit file setting
This new setting permits restricting whether namespaces may be created and
managed by processes started by a unit. It installs a seccomp filter blocking
certain invocations of unshare(), clone() and setns().

RestrictNamespaces=no is the default, and does not restrict namespaces in any
way. RestrictNamespaces=yes takes away the ability to create or manage any kind
of namspace. "RestrictNamespaces=mnt ipc" restricts the creation of namespaces
so that only mount and IPC namespaces may be created/managed, but no other
kind of namespaces.

This setting should be improve security quite a bit as in particular user
namespacing was a major source of CVEs in the kernel in the past, and is
accessible to unprivileged processes. With this setting the entire attack
surface may be removed for system services that do not make use of namespaces.
2016-11-04 07:40:13 -06:00
Lukas Nykryn 87a47f99bc failure-action: generalize failure action to emergency action 2016-10-21 15:13:50 +02:00
Luca Bruno 52c239d770 core/exec: add a named-descriptor option ("fd") for streams (#4179)
This commit adds a `fd` option to `StandardInput=`,
`StandardOutput=` and `StandardError=` properties in order to
connect standard streams to externally named descriptors provided
by some socket units.

This option looks for a file descriptor named as the corresponding
stream. Custom names can be specified, separated by a colon.
If multiple name-matches exist, the first matching fd will be used.
2016-10-17 20:05:49 -04:00
Djalal Harouni 502d704e5e core:sandbox: Add ProtectKernelModules= option
This is useful to turn off explicit module load and unload operations on modular
kernels. This option removes CAP_SYS_MODULE from the capability bounding set for
the unit, and installs a system call filter to block module system calls.

This option will not prevent the kernel from loading modules using the module
auto-load feature which is a system wide operation.
2016-10-12 13:31:21 +02:00
Lennart Poettering 59eeb84ba6 core: add two new service settings ProtectKernelTunables= and ProtectControlGroups=
If enabled, these will block write access to /sys, /proc/sys and
/proc/sys/fs/cgroup.
2016-09-25 10:18:48 +02:00
Lennart Poettering cf08b48642 core: introduce MemorySwapMax= (#3659)
Similar to MemoryMax=, MemorySwapMax= limits swap usage. This controls
controls "memory.swap.max" attribute in unified cgroup.
2016-08-31 12:28:54 +02:00
WaLyong Cho 96e131ea09 core: introduce MemorySwapMax=
Similar to MemoryMax=, MemorySwapMax= limits swap usage. This controls
controls "memory.swap.max" attribute in unified cgroup.
2016-08-30 11:11:45 +09:00
Barron Rulon 4f8d40a9dc mount: add new ForceUnmount= setting for mount units, mapping to umount(8)'s "-f" switch 2016-08-27 10:46:52 -04:00
brulon e520950a03 mount: add new LazyUnmount= setting for mount units, mapping to umount(8)'s "-l" switch (#3827) 2016-08-26 17:57:22 +02:00
Zbigniew Jędrzejewski-Szmek 5f9a610ad2 Merge pull request #3905 from htejun/cgroup-v2-cpu
core: add cgroup CPU controller support on the unified hierarchy

(zj: merging not squashing to make it clear against which upstream this patch was developed.)
2016-08-14 18:03:35 -04:00
Tejun Heo 66ebf6c0a1 core: add cgroup CPU controller support on the unified hierarchy
Unfortunately, due to the disagreements in the kernel development community,
CPU controller cgroup v2 support has not been merged and enabling it requires
applying two small out-of-tree kernel patches.  The situation is explained in
the following documentation.

 https://git.kernel.org/cgit/linux/kernel/git/tj/cgroup.git/tree/Documentation/cgroup-v2-cpu.txt?h=cgroup-v2-cpu

While it isn't clear what will happen with CPU controller cgroup v2 support,
there are critical features which are possible only on cgroup v2 such as
buffered write control making cgroup v2 essential for a lot of workloads.  This
commit implements systemd CPU controller support on the unified hierarchy so
that users who choose to deploy CPU controller cgroup v2 support can easily
take advantage of it.

On the unified hierarchy, "cpu.weight" knob replaces "cpu.shares" and "cpu.max"
replaces "cpu.cfs_period_us" and "cpu.cfs_quota_us".  [Startup]CPUWeight config
options are added with the usual compat translation.  CPU quota settings remain
unchanged and apply to both legacy and unified hierarchies.

v2: - Error in man page corrected.
    - CPU config application in cgroup_context_apply() refactored.
    - CPU accounting now works on unified hierarchy.
2016-08-07 09:45:39 -04:00
Lennart Poettering d251207d55 core: add new PrivateUsers= option to service execution
This setting adds minimal user namespacing support to a service. When set the invoked
processes will run in their own user namespace. Only a trivial mapping will be
set up: the root user/group is mapped to root, and the user/group of the
service will be mapped to itself, everything else is mapped to nobody.

If this setting is used the service runs with no capabilities on the host, but
configurable capabilities within the service.

This setting is particularly useful in conjunction with RootDirectory= as the
need to synchronize /etc/passwd and /etc/group between the host and the service
OS tree is reduced, as only three UID/GIDs need to match: root, nobody and the
user of the service itself. But even outside the RootDirectory= case this
setting is useful to substantially reduce the attack surface of a service.

Example command to test this:

        systemd-run -p PrivateUsers=1 -p User=foobar -t /bin/sh

This runs a shell as user "foobar". When typing "ps" only processes owned by
"root", by "foobar", and by "nobody" should be visible.
2016-08-03 20:42:04 +02:00
Susant Sahani 9d56542764 socket: add support to control no. of connections from one source (#3607)
Introduce MaxConnectionsPerSource= that is number of concurrent
connections allowed per IP.

RFE: 1939
2016-08-02 13:48:23 -04:00
Lennart Poettering 29206d4619 core: add a concept of "dynamic" user ids, that are allocated as long as a service is running
This adds a new boolean setting DynamicUser= to service files. If set, a new
user will be allocated dynamically when the unit is started, and released when
it is stopped. The user ID is allocated from the range 61184..65519. The user
will not be added to /etc/passwd (but an NSS module to be added later should
make it show up in getent passwd).

For now, care should be taken that the service writes no files to disk, since
this might result in files owned by UIDs that might get assigned dynamically to
a different service later on. Later patches will tighten sandboxing in order to
ensure that this cannot happen, except for a few selected directories.

A simple way to test this is:

        systemd-run -p DynamicUser=1 /bin/sleep 99999
2016-07-22 15:53:45 +02:00
Lennart Poettering 66dccd8d85 core: be stricter when parsing User=/Group= fields
Let's verify the validity of the syntax of the user/group names set.
2016-07-22 15:53:45 +02:00
Alessandro Puccetti 2a624c36e6 doc,core: Read{Write,Only}Paths= and InaccessiblePaths=
This patch renames Read{Write,Only}Directories= and InaccessibleDirectories=
to Read{Write,Only}Paths= and InaccessiblePaths=, previous names are kept
as aliases but they are not advertised in the documentation.

Renamed variables:
`read_write_dirs` --> `read_write_paths`
`read_only_dirs` --> `read_only_paths`
`inaccessible_dirs` --> `inaccessible_paths`
2016-07-19 17:22:02 +02:00
Lennart Poettering f4170c671b execute: add a new easy-to-use RestrictRealtime= option to units
It takes a boolean value. If true, access to SCHED_RR, SCHED_FIFO and
SCHED_DEADLINE is blocked, which my be used to lock up the system.
2016-06-23 01:45:45 +02:00
Topi Miettinen f3e4363593 core: Restrict mmap and mprotect with PAGE_WRITE|PAGE_EXEC (#3319) (#3379)
New exec boolean MemoryDenyWriteExecute, when set, installs
a seccomp filter to reject mmap(2) with PAGE_WRITE|PAGE_EXEC
and mprotect(2) with PAGE_EXEC.
2016-06-03 17:58:18 +02:00
Tejun Heo da4d897e75 core: add cgroup memory controller support on the unified hierarchy (#3315)
On the unified hierarchy, memory controller implements three control knobs -
low, high and max which enables more useable and versatile control over memory
usage.  This patch implements support for the three control knobs.

* MemoryLow, MemoryHigh and MemoryMax are added for memory.low, memory.high and
  memory.max, respectively.

* As all absolute limits on the unified hierarchy use "max" for no limit, make
  memory limit parse functions accept "max" in addition to "infinity" and
  document "max" for the new knobs.

* Implement compatibility translation between MemoryMax and MemoryLimit.

v2:

- Fixed missing else's in config_parse_memory_limit().
- Fixed missing newline when writing out drop-ins.
- Coding style updates to use "val > 0" instead of "val".
- Minor updates to documentation.
2016-05-27 18:10:18 +02:00
Tejun Heo ac06a0cf8a core: add support for IOReadIOPSMax and IOWriteIOPSMax
cgroup IO controller supports maximum limits for both bandwidth and IOPS but
systemd resource control currently only supports bandwidth limits.  This patch
adds support for IOReadIOPSMax and IOWriteIOPSMax when unified cgroup hierarchy
is in use.

It isn't difficult to also add BlockIOReadIOPS and BlockIOWriteIOPS for legacy
hierarchies but IO control on legacy hierarchies is half-broken anyway, so
let's leave it alone for now.
2016-05-18 13:50:56 -07:00
Tejun Heo 13c31542cc core: add io controller support on the unified hierarchy
On the unified hierarchy, blkio controller is renamed to io and the interface
is changed significantly.

* blkio.weight and blkio.weight_device are consolidated into io.weight which
  uses the standardized weight range [1, 10000] with 100 as the default value.

* blkio.throttle.{read|write}_{bps|iops}_device are consolidated into io.max.
  Expansion of throttling features is being worked on to support
  work-conserving absolute limits (io.low and io.high).

* All stats are consolidated into io.stats.

This patchset adds support for the new interface.  As the interface has been
revamped and new features are expected to be added, it seems best to treat it
as a separate controller rather than trying to expand the blkio settings
although we might add automatic translation if only blkio settings are
specified.

* io.weight handling is mostly identical to blkio.weight[_device] handling
  except that the weight range is different.

* Both read and write bandwidth settings are consolidated into
  CGroupIODeviceLimit which describes all limits applicable to the device.
  This makes it less painful to add new limits.

* "max" can be used to specify the maximum limit which is equivalent to no
  config for max limits and treated as such.  If a given CGroupIODeviceLimit
  doesn't contain any non-default configs, the config struct is discarded once
  the no limit config is applied to cgroup.

* lookup_blkio_device() is renamed to lookup_block_device().

Signed-off-by: Tejun Heo <htejun@fb.com>
2016-05-05 16:43:06 -04:00
Lennart Poettering f0367da7d1 core: rename StartLimitInterval= to StartLimitIntervalSec=
We generally follow the rule that for time settings we suffix the setting name
with "Sec" to indicate the default unit if none is specified. The only
exception was the rate limiting interval settings. Fix this, and keep the old
names for compatibility.

Do the same for journald's RateLimitInterval= setting
2016-04-29 16:27:48 +02:00
Lennart Poettering 7629ec4642 core: move start ratelimiting check after condition checks
With #2564 unit start rate limiting was moved from after the condition checks
are to before they are made, in an attempt to fix #2467. This however resulted
in #2684. However, with a previous commit a concept of per socket unit trigger
rate limiting has been added, to fix #2467 more comprehensively, hence the
start limit can be moved after the condition checks again, thus fixing #2684.

Fixes: #2684
2016-04-29 16:27:48 +02:00
Lennart Poettering 8b26cdbd2a core: introduce activation rate limiting for socket units
This adds two new settings TriggerLimitIntervalSec= and TriggerLimitBurst= that
define a rate limit for activation of socket units. When the limit is hit, the
socket is is put into a failure mode. This is an alternative fix for #2467,
since the original fix resulted in issue #2684.

In a later commit the StartLimitInterval=/StartLimitBurst= rate limiter will be
changed to be applied after any start conditions checks are made. This way,
there are two separate rate limiters enforced: one at triggering time, before
any jobs are queued with this patch, as well as the start limit that is moved
again to be run immediately before the unit is activated. Condition checks are
done in between the two, and thus no longer affect the start limit.
2016-04-29 16:27:48 +02:00
Lennart Poettering 479050b363 core: drop Capabilities= setting
The setting is hardly useful (since its effect is generally reduced to zero due
to file system caps), and with the advent of ambient caps an actually useful
replacement exists, hence let's get rid of this.

I am pretty sure this was unused and our man page already recommended against
its use, hence this should be a safe thing to remove.
2016-02-13 11:59:34 +01:00
Daniel Mack 9ca6ff50ab Remove kdbus custom endpoint support
This feature will not be used anytime soon, so remove a bit of cruft.

The BusPolicy= config directive will stay around as compat noop.
2016-02-11 22:12:04 +01:00
Lennart Poettering 926db6521b Merge pull request #2574 from zonque/netclass-remove
cgroup: remove support for NetClass= directive
2016-02-10 17:03:00 +01:00
Daniel Mack 50f48ad37a cgroup: remove support for NetClass= directive
Support for net_cls.class_id through the NetClass= configuration directive
has been added in v227 in preparation for a per-unit packet filter mechanism.
However, it turns out the kernel people have decided to deprecate the net_cls
and net_prio controllers in v2. Tejun provides a comprehensive justification
for this in his commit, which has landed during the merge window for kernel
v4.5:

  https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=bd1060a1d671

As we're aiming for full support for the v2 cgroup hierarchy, we can no
longer support this feature. Userspace tool such as nftables are moving over
to setting rules that are specific to the full cgroup path of a task, which
obsoletes these controllers anyway.

This commit removes support for tweaking details in the net_cls controller,
but keeps the NetClass= directive around for legacy compatibility reasons.
2016-02-10 16:38:56 +01:00
Lennart Poettering 89beff89ed core: treat JobTimeout=0 as equivalent to JobTimeout=infinity
Corrects an incompatibility introduced with 36c16a7cdd.

Fixes: #2537
2016-02-10 16:09:24 +01:00
Lennart Poettering aad41f0814 core: simplify how we parse TimeoutSec=, TimeoutStartSec= and TimeoutStopSec=
Let's make things more obvious by placing the parse_usec() invocation directly in config_parse_service_timeout().
2016-02-10 16:09:24 +01:00
Lennart Poettering 6bf0f408e4 core: make the StartLimitXYZ= settings generic and apply to any kind of unit, not just services
This moves the StartLimitBurst=, StartLimitInterval=, StartLimitAction=, RebootArgument= from the [Service] section
into the [Unit] section of unit files, and thus support it in all unit types, not just in services.

This way we can enforce the start limit much earlier, in particular before testing the unit conditions, so that
repeated start-up failure due to failed conditions is also considered for the start limit logic.

For compatibility the four options may also be configured in the [Service] section still, but we only document them in
their new section [Unit].

This also renamed the socket unit failure code "service-failed-permanent" into "service-start-limit-hit" to express
more clearly what it is about, after all it's only triggered through the start limit being hit.

Finally, the code in busname_trigger_notify() and socket_trigger_notify() is altered to become more alike.

Fixes: #2467
2016-02-10 13:26:56 +01:00
Lennart Poettering 36c16a7cdd core: rework unit timeout handling, and add new setting RuntimeMaxSec=
This clean-ups timeout handling in PID 1. Specifically, instead of storing 0 in internal timeout variables as
indication for a disabled timeout, use USEC_INFINITY which is in-line with how we do this in the rest of our code
(following the logic that 0 means "no", and USEC_INFINITY means "never").

This also replace all usec_t additions with invocations to usec_add(), so that USEC_INFINITY is properly propagated,
and sd-event considers it has indication for turning off the event source.

This also alters the deserialization of the units to restart timeouts from the time they were originally started from.
Before this patch timeouts would be restarted beginning with the time of the deserialization, which could lead to
artificially prolonged timeouts if a daemon reload took place.

Finally, a new RuntimeMaxSec= setting is introduced for service units, that specifies a maximum runtime after which a
specific service is forcibly terminated. This is useful to put time limits on time-intensive processing jobs.

This also simplifies the various xyz_spawn() calls of the various types in that explicit distruction of the timers is
removed, as that is done anyway by the state change handlers, and a state change is always done when the xyz_spawn()
calls fail.

Fixes: #2249
2016-02-01 22:18:16 +01:00
Lennart Poettering d0a7c5f692 core: move parsing of rlimits into rlimit-util.[ch]
This way we can reuse it for parsing rlimit settings in "systemctl set-property" and related commands.
2016-02-01 22:18:16 +01:00
Ismo Puustinen 755d4b67a4 capabilities: added support for ambient capabilities.
This patch adds support for ambient capabilities in service files. The
idea with ambient capabilities is that the execed processes can run with
non-root user and get some inherited capabilities, without having any
need to add the capabilities to the executable file.

You need at least Linux 4.3 to use ambient capabilities. SecureBit
keep-caps is automatically added when you use ambient capabilities and
wish to change the user.

An example system service file might look like this:

[Unit]
Description=Service for testing caps

[Service]
ExecStart=/usr/bin/sleep 10000
User=nobody
AmbientCapabilities=CAP_NET_ADMIN CAP_NET_RAW

After starting the service it has these capabilities:

CapInh: 0000000000003000
CapPrm: 0000000000003000
CapEff: 0000000000003000
CapBnd: 0000003fffffffff
CapAmb: 0000000000003000
2016-01-12 12:14:50 +02:00
Ismo Puustinen a103496ca5 capabilities: keep bounding set in non-inverted format.
Change the capability bounding set parser and logic so that the bounding
set is kept as a positive set internally. This means that the set
reflects those capabilities that we want to keep instead of drop.
2016-01-12 12:14:50 +02:00
Zbigniew Jędrzejewski-Szmek 6f5d79986a core: rename Random* to RandomizedDelay*
The name RandomSec is too generic: "Sec" just specifies the default
unit type, and "Random" by itself is not enough. Rename to something
that should give the user general idea what the setting does without
looking at documentation.
2015-11-26 16:32:41 -05:00
Lennart Poettering 744c769375 core: add new RandomSec= setting for time units
This allows configuration of a random time on top of the elapse events,
in order to spread time events in a network evenly across a range.
2015-11-18 17:07:11 +01:00
Lennart Poettering edf1e71381 Merge pull request #1889 from ssahani/socket-proto
socket: Add support for socket protcol
2015-11-18 11:30:06 +01:00
Susant Sahani 74bb646ee5 socket: Add support for socket protcol
Now we don't support the socket protocol like
sctp and udplite .

This patch add a new config param
SocketProtocol: udplite/sctp

With this now we can configure the protocol as

udplite = IPPROTO_UDPLITE
sctp = IPPROTO_SCTP

Tested with nspawn:
2015-11-18 09:34:18 +05:30
Lennart Poettering 3e0c30ac56 core: add RemainAfterElapse= setting to timer units
Previously, after a timer unit elapsed we'd leave it around for good,
which has the nice benefit that starting a timer that shall trigger at a
specific point in time multiple times will only result in one trigger
instead of possibly many. With this change a new option
RemainAfterElapse= is added. It defaults to "true", to mimic the old
behaviour. If set to "false" timer units will be unloaded after they
elapsed. This is specifically useful for transient timer units.
2015-11-17 20:48:23 +01:00
Lennart Poettering 0af20ea2ee core: add new DefaultTasksMax= setting for system.conf
This allows initializing the TasksMax= setting of all units by default
to some fixed value, instead of leaving it at infinity as before.
2015-11-13 19:50:52 +01:00
Lennart Poettering f32b43bda4 core: remove support for RequiresOverridable= and RequisiteOverridable=
As discussed at systemd.conf 2015 and on also raised on the ML:

http://lists.freedesktop.org/archives/systemd-devel/2015-November/034880.html

This removes the two XyzOverridable= unit dependencies, that were
basically never used, and do not enhance user experience in any way.
Most folks looking for the functionality this provides probably opt for
the "ignore-dependencies" job mode, and that's probably a good idea.

Hence, let's simplify systemd's dependency engine and remove these two
dependency types (and their inverses).

The unit file parser and the dbus property parser will now redirect
the settings/properties to result in an equivalent non-overridable
dependency. In the case of the unit file parser we generate a warning,
to inform the user.

The dbus properties for this unit type stay available on the unit
objects, but they are now hidden from usual introspection and will
always return the empty list when queried.

This should provide enough compatibility for the few unit files that
actually ever made use of this.
2015-11-12 19:27:24 +01:00
Lennart Poettering 2a2e1b36a0 core: remove SmackFileSystemRootLabel= again
Apparently, util-linux' mount command implicitly drops the smack-related
options anyway before passing them to the kernel, if the kernel doesn't
know SMACK, hence there's no point in duplicating this in systemd.

Fixes #1696
2015-11-12 12:50:59 +01:00
Filipe Brandenburger b4c14404b3 execute: Add new PassEnvironment= directive
This directive allows passing environment variables from the system
manager to spawned services. Variables in the system manager can be set
inside a container by passing `--set-env=...` options to systemd-spawn.

Tested with an on-disk test.service unit. Tested using multiple variable
names on a single line, with an empty setting to clear the current list
of variables, with non-existing variables.

Tested using `systemd-run -p PassEnvironment=VARNAME` to confirm it
works with transient units.

Confirmed that `systemctl show` will display the PassEnvironment
settings.

Checked that man pages are generated correctly.

No regressions in `make check`.
2015-11-11 07:55:23 -08:00
Zbigniew Jędrzejewski-Szmek 36b4a7ba55 Remove snapshot unit type
Snapshots were never useful or used for anything. Many systemd
developers that I spoke to at systemd.conf2015, didn't even know they
existed, so it is fairly safe to assume that this type can be deleted
without harm.

The fundamental problem with snapshots is that the state of the system
is dynamic, devices come and go, users log in and out, timers fire...
and restoring all units to some state from the past would "undo"
those changes, which isn't really possible.

Tested by creating a snapshot, running the new binary, and checking
that the transition did not cause errors, and the snapshot is gone,
and snapshots cannot be created anymore.

New systemctl says:
Unknown operation snapshot.
Old systemctl says:
Failed to create snapshot: Support for snapshots has been removed.

IgnoreOnSnaphost settings are warned about and ignored:
Support for option IgnoreOnSnapshot= has been removed and it is ignored

http://lists.freedesktop.org/archives/systemd-devel/2015-November/034872.html
2015-11-10 19:33:06 -05:00
Lennart Poettering a4c1800284 core: accept time units for time-based resource limits
Let's make sure "LimitCPU=30min" can be parsed properly, following the
usual logic how we parse time values. Similar for LimitRTTIME=.

While we are at it, extend a bit on the man page section about resource
limits.

Fixes: #1772
2015-11-10 17:36:46 +01:00
Karel Zak 412ea7a936 core: support IEC suffixes for RLIMIT stuff
Let's make things more user-friendly and support for example

  LimitAS=16G

rather than force users to always use LimitAS=16106127360.

The change is relevant for options:

  [Default]Limit{FSIZE,DATA,STACK,CORE,RSS,AS,MEMLOCK,MSGQUEUE}

The patch introduces config_parse_bytes_limit(), it's the same as
config_parse_limit() but uses parse_size() tu support the suffixes.

Addresses: https://github.com/systemd/systemd/issues/1772
2015-11-06 11:06:52 +01:00
Lennart Poettering 7cb48925dc core: rename SmackFileSystemRoot= to SmackFileSystemRootLabel=
That way it's in sync with the other SMACK label settings.

https://github.com/systemd/systemd/pull/1664#issuecomment-150891270
2015-10-26 01:24:39 +01:00
Sangjung Woo 46a01abae9 mount: add new SmackFileSystemRoot= setting for mount unit
This option specifies the label to assign the root of the file system if
it lacks the Smack extended attribute. Note that this option will be
ignored if kernel does not support the Smack feature by runtime
checking.
2015-10-24 20:53:54 +09:00
Lennart Poettering 8dd4c05b54 core: add support for naming file descriptors passed using socket activation
This adds support for naming file descriptors passed using socket
activation. The names are passed in a new $LISTEN_FDNAMES= environment
variable, that matches the existign $LISTEN_FDS= one and contains a
colon-separated list of names.

This also adds support for naming fds submitted to the per-service fd
store using FDNAME= in the sd_notify() message.

This also adds a new FileDescriptorName= setting for socket unit files
to set the name for fds created by socket units.

This also adds a new call sd_listen_fds_with_names(), that is similar to
sd_listen_fds(), but also returns the names of the fds.

systemd-activate gained the new --fdname= switch to specify a name for
testing socket activation.

This is based on #1247 by Maciej Wereski.

Fixes #1247.
2015-10-06 11:52:48 +02:00
Lennart Poettering 55301ec028 core: add new setting Writable= to ListenSpecial= socket units
Writable= is a new boolean setting. If ture, then ListenSpecial= will
open the specified path in O_RDWR mode, rather than just O_RDONLY.

This is useful for implementing services like rfkill, where /dev/rfkill
is more useful when opened in write mode, if we want to not only save
but also restore its state.
2015-10-01 14:28:13 +02:00
Lennart Poettering 5f5d8eab1f core: allow setting WorkingDirectory= to the special value ~
If set to ~ the working directory is set to the home directory of the
user configured in User=.

This change also exposes the existing switch for the working directory
that allowed making missing working directories non-fatal.

This also changes "machinectl shell" to make use of this to ensure that
the invoked shell is by default in the user's home directory.

Fixes #1268.
2015-09-29 21:55:51 +02:00
Lennart Poettering f98f4ace4d Merge pull request #1336 from pszewczyk/functionfs_sockets_v3
core: add support for usb functionfs v3
2015-09-22 16:55:08 +02:00
Pawel Szewczyk 6b7e592310 core: Add FFSDescriptors and FFSStrings service parameters
By using these parameters functionfs service can specify ffs descriptors
and strings which should be written to ep0.
2015-09-22 16:32:16 +02:00
Pawel Szewczyk 602524469e core: Add socket type for usb functionfs endpoints
For handling functionfs endpoints additional socket type is added.
2015-09-22 16:32:16 +02:00
Daniel Mack 32ee7d3309 cgroup: add support for net_cls controllers
Add a new config directive called NetClass= to CGroup enabled units.
Allowed values are positive numbers for fix assignments and "auto" for
picking a free value automatically, for which we need to keep track of
dynamically assigned net class IDs of units. Introduce a hash table for
this, and also record the last ID that was given out, so the allocator
can start its search for the next 'hole' from there. This could
eventually be optimized with something like an irb.

The class IDs up to 65536 are considered reserved and won't be
assigned automatically by systemd. This barrier can be made a config
directive in the future.

Values set in unit files are stored in the CGroupContext of the
unit and considered read-only. The actually assigned number (which
may have been chosen dynamically) is stored in the unit itself and
is guaranteed to remain stable as long as the unit is active.

In the CGroup controller, set the configured CGroup net class to
net_cls.classid. Multiple unit may share the same net class ID,
and those which do are linked together.
2015-09-16 00:21:55 +02:00
Lennart Poettering 03a7b521e3 core: add support for the "pids" cgroup controller
This adds support for the new "pids" cgroup controller of 4.3 kernels.
It allows accounting the number of tasks in a cgroup and enforcing
limits on it.

This adds two new setting TasksAccounting= and TasksMax= to each unit,
as well as a gloabl option DefaultTasksAccounting=.

This also updated "cgtop" to optionally make use of the new
kernel-provided accounting.

systemctl has been updated to show the number of tasks for each service
if it is available.

This patch also adds correct support for undoing memory limits for units
using a MemoryLimit=infinity syntax. We do the same for TasksMax= now
and hence keep things in sync here.
2015-09-10 18:41:06 +02:00
Lennart Poettering f757855e81 nspawn: add new .nspawn files for container settings
.nspawn fiels are simple settings files that may accompany container
images and directories and contain settings otherwise passed on the
nspawn command line. This provides an efficient way to attach execution
data directly to containers.
2015-09-06 01:49:06 +02:00
Lennart Poettering 023a4f6701 core: optionally create LOGIN_PROCESS or USER_PROCESS utmp entries
When generating utmp/wtmp entries, optionally add both LOGIN_PROCESS and
INIT_PROCESS entries or even all three of LOGIN_PROCESS, INIT_PROCESS
and USER_PROCESS entries, instead of just a single INIT_PROCESS entry.

With this change systemd may be used to not only invoke a getty directly
in a SysV-compliant way but alternatively also a login(1) implementation
or even forego getty and login entirely, and invoke arbitrary shells in
a way that they appear in who(1) or w(1).

This is preparation for a later commit that adds a "machinectl shell"
operation to invoke a shell in a container, in a way that is compatible
with who(1) and w(1).
2015-08-24 22:46:45 +02:00
Kay Sievers 1b09f548c7 turn kdbus support into a runtime option
./configure --enable/disable-kdbus can be used to set the default
behavior regarding kdbus.

If no kdbus kernel support is available, dbus-dameon will be used.

With --enable-kdbus, the kernel command line option "kdbus=0" can
be used to disable kdbus.

With --disable-kdbus, the kernel command line option "kdbus=1" is
required to enable kdbus support.
2015-06-17 18:01:49 +02:00
Michael Olbrich deb0a77cf0 automount: add expire support 2015-04-21 20:23:41 +02:00
Lennart Poettering b02cb41c78 conf-parse: don't accept invalid bus names as BusName= arguments in service units 2015-01-07 23:44:08 +01:00
Lennart Poettering a354329f72 core: add new logic for services to store file descriptors in PID 1
With this change it is possible to send file descriptors to PID 1, via
sd_pid_notify_with_fds() which PID 1 will store individually for each
service, and pass via the usual fd passing logic on next invocation.
This is useful for enable daemon reload schemes where daemons serialize
their state to /run, push their fds into PID 1 and terminate, restoring
their state on next start from the data in /run and passed in from PID
1.

The fds are kept by PID 1 as long as no POLLHUP or POLLERR is seen on
them, and the service they belong to are either not dead or failed, or
have a job queued.
2015-01-06 03:16:39 +01:00
Zbigniew Jędrzejewski-Szmek 9e37c9544b core: warn and ignore SysVStartPriority=
Option was being parsed but not used for anything.
2014-11-30 19:10:40 -05:00
Zbigniew Jędrzejewski-Szmek a2c0e528b8 When warning about unsupported options, be more detailed 2014-11-30 18:49:08 -05:00
WaLyong Cho 2ca620c4ed smack: introduce new SmackProcessLabel option
In service file, if the file has some of special SMACK label in
ExecStart= and systemd has no permission for the special SMACK label
then permission error will occurred. To resolve this, systemd should
be able to set its SMACK label to something accessible of ExecStart=.
So introduce new SmackProcessLabel. If label is specified with
SmackProcessLabel= then the child systemd will set its label to
that. To successfully execute the ExecStart=, accessible label should
be specified with SmackProcessLabel=.
Additionally, by SMACK policy, if the file in ExecStart= has no
SMACK64EXEC then the executed process will have given label by
SmackProcessLabel=. But if the file has SMACK64EXEC then the
SMACK64EXEC label will be overridden.

[zj: reword man page]
2014-11-24 10:20:53 -05:00
Lennart Poettering 59fccdc587 core: introduce the concept of AssertXYZ= similar to ConditionXYZ=, but fatal for a start job if not met 2014-11-06 14:21:11 +01:00
Lennart Poettering a931ad47a8 core: introduce new Delegate=yes/no property controlling creation of cgroup subhierarchies
For priviliged units this resource control property ensures that the
processes have all controllers systemd manages enabled.

For unpriviliged services (those with User= set) this ensures that
access rights to the service cgroup is granted to the user in question,
to create further subgroups. Note that this only applies to the
name=systemd hierarchy though, as access to other controllers is not
safe for unpriviliged processes.

Delegate=yes should be set for container scopes where a systemd instance
inside the container shall manage the hierarchies below its own cgroup
and have access to all controllers.

Delegate=yes should also be set for user@.service, so that systemd
--user can run, controlling its own cgroup tree.

This commit changes machined, systemd-nspawn@.service and user@.service
to set this boolean, in order to ensure that container management will
just work, and the user systemd instance can run fine.
2014-11-05 18:49:14 +01:00
Lennart Poettering 47cb901e38 swap: replace Discard= setting by a more generic Options= setting
For now, it's systemd itself that parses the options string, but as soon
as util-linux' swapon can take the option string directly with -o we
should pass it on unmodified.
2014-10-28 14:31:25 +01:00
Lennart Poettering f189ab18de job: optionally, when a job timeout is hit, also execute a failure action 2014-10-28 02:19:55 +01:00
Jan Synacek 86b23b07c9 swap: introduce Discard property
Process possible "discard" values from /etc/fstab.
2014-09-29 11:08:12 -04:00
Michal Sekletar 16115b0a7b socket: introduce SELinuxContextFromNet option
This makes possible to spawn service instances triggered by socket with
MLS/MCS SELinux labels which are created based on information provided by
connected peer.

Implementation of label_get_child_mls_label derived from xinetd.

Reviewed-by: Paul Moore <pmoore@redhat.com>
2014-09-19 12:32:06 +02:00
Daniel Mack 5019962312 bus: parse BusPolicy directive in service files
Add a new directive called BusPolicy to define custom endpoint policies. If
one such directive is given, an endpoint object in the service's ExecContext is
created and the given policy is added to it.
2014-09-08 14:12:54 +02:00
Lennart Poettering 3cd761e4df socket: suffix newly added TCP sockopt time properties with "Sec"
This is what we have done so far for all other time values, and hence we
should do this here. This indicates the default unit of time values
specified here, if they don't contain a unit.
2014-08-19 21:58:48 +02:00
Lennart Poettering 3bb07b7680 Revert "socket: introduce SELinuxLabelViaNet option"
This reverts commit cf8bd44339.

Needs more discussion on the mailing list.
2014-08-19 19:16:08 +02:00
Michal Sekletar cf8bd44339 socket: introduce SELinuxLabelViaNet option
This makes possible to spawn service instances triggered by socket with
MLS/MCS SELinux labels which are created based on information provided by
connected peer.

Implementation of label_get_child_label derived from xinetd.

Reviewed-by: Paul Moore <pmoore@redhat.com>
2014-08-19 18:57:12 +02:00
Susant Sahani cc567c9bea socket: Add support for TCP defer accept
TCP_DEFER_ACCEPT Allow a listener to be awakened only when data
arrives on the socket. If TCP_DEFER_ACCEPT set on a server-side
listening socket, the TCP/IP stack will not to wait for the final
ACK packet and not to initiate the process until the first packet
of real data has arrived. After sending the SYN/ACK, the server will
then wait for a data packet from a client. Now, only three packets
will be sent over the network, and the connection establishment delay
will be significantly reduced.
2014-08-14 19:55:44 -04:00
Susant Sahani 209e9dcd7b socket: Add Support for TCP keep alive variables
The tcp keep alive variables now can be configured via conf
parameter. Follwing variables are now supported by this patch.

tcp_keepalive_intvl: The number of seconds between TCP keep-alive probes

tcp_keepalive_probes: The maximum number of TCP keep-alive probes to
send before giving up and killing the connection if no response is
obtained from the other end.

tcp_keepalive_time: The number of seconds a connection needs to be
idle before TCP begins sending out keep-alive probes.
2014-08-14 19:48:57 -04:00
Lennart Poettering 4d8ddba9d7 Revert "socket: add support for TCP fast Open"
This reverts commit 9528592ff8.

Apparently TFO is actually the default at least for the server side now.

Also the setsockopt doesn't actually take a bool, but a qlen integer.
2014-08-14 16:36:32 +02:00
Susant Sahani 9528592ff8 socket: add support for TCP fast Open
TCP Fast Open (TFO) speeds up the opening of successiveTCP)
connections between two endpoints.It works by using a TFO cookie
in the initial SYN packet to authenticate a previously connected
client. It starts sending data to the client before the receipt
of the final ACK packet of the three way handshake is received,
skipping a round trip and lowering the latency in the start of
transmission of data.
2014-08-14 13:14:39 +02:00
Susant Sahani 4427c3f43a socket: add support for tcp nagle
This patch adds support for TCP TCP_NODELAY socket option. This can be
configured via NoDelay conf parameter. TCP Nagle's algorithm works by
combining a number of small outgoing messages, and sending them all at
once.  This controls the TCP_NODELAY socket option.
2014-08-14 03:15:59 +02:00
Michal Schmidt 9a8c867fec load-fragment: ConditionFirstBoot wants a bool string, not a path 2014-07-08 17:22:34 +02:00
Lennart Poettering e26807239b firstboot: get rid of firstboot generator again, introduce ConditionFirstBoot= instead
As Zbigniew pointed out a new ConditionFirstBoot= appears like the nicer
way to hook in systemd-firstboot.service on first boots (those with /etc
unpopulated), so let's do this, and get rid of the generator again.
2014-07-07 21:05:09 +02:00
Lennart Poettering 37520c1bec core: introduce new RestartForceExitStatus= service setting
This does the inverse of RestartPreventExitStatus=: it forces a restart
of a service when a certain exit status is returned by a service
process.
2014-07-03 12:51:07 +02:00
Lennart Poettering d54c499369 install: introduce new DefaultInstance= field for [Install] sections
The DefaultInstance= name is used when enabling template units when only
specifying the template name, but no instance.

Add DefaultInstance=tty1 to getty@.service, so that when the template
itself is enabled an instance for tty1 is created.

This is useful so that we "systemctl preset-all" can work properly,
because we can operate on getty@.service after finding it, and the right
instance is created.
2014-06-17 02:43:43 +02:00
Lennart Poettering 2dbd4a9454 mount: add new SloppyOptions= setting for mount units, mapping to mount(8)'s "-s" switch 2014-06-16 01:02:27 +02:00
Lennart Poettering a55654d598 core: add new ConditionNeedsUpdate= unit condition
This new condition allows checking whether /etc or /var are out-of-date
relative to /usr. This is the counterpart for the update flag managed by
systemd-update-done.service. Services that want to be started once after
/usr got updated should use:

        [Unit]
        ConditionNeedsUpdate=/etc
        Before=systemd-update-done.service

This makes sure that they are only run if /etc is out-of-date relative
to /usr. And that it will be executed after systemd-update-done.service
which is responsible for marking /etc up-to-date relative to the current
/usr.

ConditionNeedsUpdate= will also checks whether /etc is actually
writable, and not trigger if it isn't, since no update is possible then.
2014-06-13 13:26:32 +02:00
Lennart Poettering a4152e3fe2 kdbus: when uploading bus name policy, resolve users/groups out-of-process
It's not safe invoking NSS from PID 1, hence fork off worker processes
that upload the policy into the kernel for busnames.
2014-06-05 13:09:46 +02:00
Lennart Poettering 3900e5fdff socket: add SocketUser= and SocketGroup= for chown()ing sockets in the file system
This is relatively complex, as we cannot invoke NSS from PID 1, and thus
need to fork a helper process temporarily.
2014-06-05 09:55:53 +02:00
Lennart Poettering a8330cd118 core: make sure we properly parse ProtectHome= and ProtectSystem= 2014-06-04 23:04:34 +02:00
Lennart Poettering 1b8689f949 core: rename ReadOnlySystem= to ProtectSystem= and add a third value for also mounting /etc read-only
Also, rename ProtectedHome= to ProtectHome=, to simplify things a bit.

With this in place we now have two neat options ProtectSystem= and
ProtectHome= for protecting the OS itself (and optionally its
configuration), and for protecting the user's data.
2014-06-04 18:12:55 +02:00
Lennart Poettering 811ba7a0e2 socket: add new Symlinks= option for socket units
With Symlinks= we can manage one or more symlinks to AF_UNIX or FIFO
nodes in the file system, with the same lifecycle as the socket itself.

This has two benefits: first, this allows us to remove /dev/log and
/dev/initctl from /dev, thus leaving only symlinks, device nodes and
directories in the /dev tree. More importantly however, this allows us
to move /dev/log out of /dev, while still making it accessible there, so
that PrivateDevices= can provide /dev/log too.
2014-06-04 16:21:17 +02:00
Lennart Poettering bd1fe7c79d socket: optionally remove sockets/FIFOs in the file system after use 2014-06-04 13:12:34 +02:00
Lennart Poettering 417116f234 core: add new ReadOnlySystem= and ProtectedHome= settings for service units
ReadOnlySystem= uses fs namespaces to mount /usr and /boot read-only for
a service.

ProtectedHome= uses fs namespaces to mount /home and /run/user
inaccessible or read-only for a service.

This patch also enables these settings for all our long-running services.

Together they should be good building block for a minimal service
sandbox, removing the ability for services to modify the operating
system or access the user's private data.
2014-06-03 23:57:51 +02:00
Lennart Poettering 9a05490933 cgroups: simplify CPUQuota= logic
Only accept cpu quota values in percentages, get rid of period
definition.

It's not clear whether the CFS period controllable per-cgroup even has a
future in the kernel, hence let's simplify all this, hardcode the period
to 100ms and only accept percentage based quota values.
2014-05-22 11:53:12 +09:00
Lennart Poettering db785129c9 cgroup: rework startup logic
Introduce a (unsigned long) -1 as "unset" state for cpu shares/block io
weights, and keep the startup unit set around all the time.
2014-05-22 07:13:56 +09:00
WaLyong Cho 95ae05c0e7 core: add startup resource control option
Similar to CPUShares= and BlockIOWeight= respectively. However only
assign the specified weight during startup. Each control group
attribute is re-assigned as weight by CPUShares=weight and
BlockIOWeight=weight after startup.  If not CPUShares= or
BlockIOWeight= be specified, then the attribute is re-assigned to each
default attribute value. (default cpu.shares=1024, blkio.weight=1000)
If only CPUShares=weight or BlockIOWeight=weight be specified, then
that implies StartupCPUShares=weight and StartupBlockIOWeight=weight.
2014-05-22 07:13:56 +09:00
Nis Martensen f1721625e7 fix spelling of privilege 2014-05-19 00:40:44 +09:00
Lennart Poettering b2f8b02ec2 core: expose CFS CPU time quota as high-level unit properties 2014-04-25 13:27:25 +02:00
Michael Olbrich bf50056632 service: rename StartLimitAction enum to FailureAction
It's used for the FailureAction property as well.
2014-04-24 20:11:20 +02:00
Michael Olbrich 93ae25e6fd service: add FailureAction= option
It has the same possible values as StartLimitAction= and is executed
immediately if a service fails.
2014-04-24 20:11:20 +02:00
Michael Olbrich efe6e7d33a service: add support for reboot argument when triggered by StartLimitAction=
When rebooting with systemctl, an optional argument can be passed to the
reboot system call. This makes it possible the specify the argument in a
service file and use it when the service triggers a restart.
This is useful to distinguish between manual reboots and reboots caused by
failing services.
2014-04-21 09:58:53 -04:00
Lennart Poettering 7f8aa67131 core: remove tcpwrap support
tcpwrap is legacy code, that is barely maintained upstream. It's APIs
are awful, and the feature set it exposes (such as DNS and IDENT
access control) questionnable. We should not support this natively in
systemd.

Hence, let's remove the code. If people want to continue making use of
this, they can do so by plugging in "tcpd" for the processes they start.
With that scheme things are as well or badly supported as they were from
traditional inetd, hence no functionality is really lost.
2014-03-24 20:07:42 +01:00
Lennart Poettering dedabea4b3 timer: support timers that can resume the system from suspend 2014-03-24 16:24:07 +01:00
Lennart Poettering 06642d1795 timer: add timer persistance (aka anacron-like behaviour) 2014-03-21 03:43:46 +01:00
Daniel Mack 5892a914d1 busname: introduce Activating directive
Add a new config 'Activating' directive which denotes whether a busname
is actually registered on the bus. It defaults to 'yes'.

If set to 'no', the .busname unit only uploads policy, which will remain
active as long as the unit is running.
2014-03-19 02:25:36 +01:00
Lennart Poettering 3f9da41645 core: add new AcceptFD= setting to .busname units
AcceptFD= defaults to true, thus making sure that by default fd passing
is enabled for all activatable names. Since for normal bus connections
fd passing is enabled too by default this makes sure fd passing works
correctly regardless whether a service is already activated or not.

Making this configurable on both busname units and in bus connections is
messy, but unavoidable since busnames are established and may queue
messages before the connection feature negotiation is done by the
service eventually activated. Conversely, feature negotiation on bus
connections takes place before the connection acquires its names.

Of course, this means developers really should make sure to keep the
settings in .busname units in sync with what they later intend to
negotiate.
2014-03-18 20:54:32 +01:00
Daniel Mack 54d76c9286 busname: add parser for bus name policies
There are three directives to specify bus name polices in .busname
files:

 * AllowUser [username] [access]
 * AllowGroup [groupname] [access]
 * AllowWorld [access]

Where [access] is one of

 * 'see': The user/group/world is allowed to see a name on the bus
 * 'talk': The user/group/world is allowed to talk to a name
 * 'own': The user/group/world is allowed to own a name

There is no user added yet in this commit.
2014-03-07 19:14:05 +01:00
Lennart Poettering 760b9d7cba core: don't override NoNewPriviliges= from SystemCallFilter= if it is already explicitly set 2014-03-05 04:41:01 +01:00
Lennart Poettering 94828d2ddc conf-parser: config_parse_path_strv() is not generic, so let's move it into load-fragment.c
The parse code actually checked for specific lvalue names, which is
really wrong for supposedly generic parsers...
2014-03-03 21:40:55 +01:00
Lennart Poettering ca37242e52 conf-parse: rename config_parse_level() to config_parse_log_level()
"level" is a bit too generic, let's clarify what kind of level we are
referring to here.
2014-03-03 21:14:07 +01:00
Lennart Poettering e66cf1a3f9 core: introduce new RuntimeDirectory= and RuntimeDirectoryMode= unit settings
As discussed on the ML these are useful to manage runtime directories
below /run for services.
2014-03-03 17:55:32 +01:00
Lennart Poettering 4298d0b512 core: add new RestrictAddressFamilies= switch
This new unit settings allows restricting which address families are
available to processes. This is an effective way to minimize the attack
surface of services, by turning off entire network stacks for them.

This is based on seccomp, and does not work on x86-32, since seccomp
cannot filter socketcall() syscalls on that platform.
2014-02-26 02:19:28 +01:00
Lennart Poettering 5556b5fe41 core: clean up some confusing regarding SI decimal and IEC binary suffixes for sizes
According to Wikipedia it is customary to specify hardware metrics and
transfer speeds to the basis 1000 (SI decimal), while software metrics
and physical volatile memory (RAM) sizes to the basis 1024 (IEC binary).
So far we specified everything in IEC, let's fix that and be more
true to what's otherwise customary. Since we don't want to parse "Mi"
instead of "M" we document each time what the context used is.
2014-02-23 03:19:04 +01:00
Michael Scherer eef65bf3ee core: Add AppArmor profile switching
This permit to switch to a specific apparmor profile when starting a daemon. This
will result in a non operation if apparmor is disabled.
It also add a new build requirement on libapparmor for using this feature.
2014-02-21 03:44:20 +01:00
Lennart Poettering 099524d7b0 core: add new ConditionArchitecture() that checks the architecture returned by uname()'s machine field. 2014-02-21 02:43:14 +01:00
Lennart Poettering ac45f971a1 core: add Personality= option for units to set the personality for spawned processes 2014-02-19 03:27:03 +01:00
Jasper St. Pierre acfbbf5c56 Fix gperf syntax
If we put a closing bracket on its own line, gperf will complain about
empty lines. Only occurs if the option in question is disabled. So fix the
m4 macros to work properly in both cases.
2014-02-17 22:07:02 +01:00
Lennart Poettering 6a6751fe24 core: warn when unit files with unsupported options are parsed 2014-02-17 17:49:09 +01:00
Lennart Poettering 5f8640fb62 core: store and expose SELinuxContext field normalized as bool + string 2014-02-17 16:52:52 +01:00
Lennart Poettering d3b1c50833 core: add a system-wide SystemCallArchitectures= setting
This is useful to prohibit execution of non-native processes on systems,
for example 32bit binaries on 64bit systems, this lowering the attack
service on incorrect syscall and ioctl 32→64bit mappings.
2014-02-13 01:40:50 +01:00
Lennart Poettering 57183d117a core: add SystemCallArchitectures= unit setting to allow disabling of non-native
architecture support for system calls

Also, turn system call filter bus properties into complex types instead
of concatenated strings.
2014-02-13 00:24:00 +01:00
Lennart Poettering 17df7223be core: rework syscall filter
- Allow configuration of an errno error to return from blacklisted
  syscalls, instead of immediately terminating a process.

- Fix parsing logic when libseccomp support is turned off

- Only keep the actual syscall set in the ExecContext, and generate the
  string version only on demand.
2014-02-12 18:30:36 +01:00
Michael Scherer 7b52a628f8 exec: Add SELinuxContext configuration item
This permit to let system administrators decide of the domain of a service.
This can be used with templated units to have each service in a différent
domain ( for example, a per customer database, using MLS or anything ),
or can be used to force a non selinux enabled system (jvm, erlang, etc)
to start in a different domain for each service.
2014-02-10 13:18:16 +01:00
Lennart Poettering 7f112f50fe exec: introduce PrivateDevices= switch to provide services with a private /dev
Similar to PrivateNetwork=, PrivateTmp= introduce PrivateDevices= that
sets up a private /dev with only the API pseudo-devices like /dev/null,
/dev/zero, /dev/random, but not any physical devices in them.
2014-01-20 21:28:37 +01:00
Lennart Poettering e821075a23 bus: add .busname unit type to implement kdbus-style bus activation 2013-12-02 23:32:34 +01:00
Lennart Poettering 613b411c94 service: add the ability for units to join other unit's PrivateNetwork= and PrivateTmp= namespaces 2013-11-27 20:28:48 +01:00
Lennart Poettering d420282b28 core: replace OnFailureIsolate= setting by a more generic OnFailureJobMode= setting and make use of it where applicable 2013-11-26 02:26:31 +01:00
Lennart Poettering 9f5eb56a13 timer: make timer accuracy configurable
And make it default to 1min
2013-11-21 22:08:20 +01:00
Lennart Poettering 718db96199 core: convert PID 1 to libsystemd-bus
This patch converts PID 1 to libsystemd-bus and thus drops the
dependency on libdbus. The only remaining code using libdbus is a test
case that validates our bus marshalling against libdbus' marshalling,
and this dependency can be turned off.

This patch also adds a couple of things to libsystem-bus, that are
necessary to make the port work:

- Synthesizing of "Disconnected" messages when bus connections are
  severed.

- Support for attaching multiple vtables for the same interface on the
  same path.

This patch also fixes the SetDefaultTarget() and GetDefaultTarget() bus
calls which used an inappropriate signature.

As a side effect we will now generate PropertiesChanged messages which
carry property contents, rather than just invalidation information.
2013-11-20 20:52:36 +01:00
Shawn Landden f0511bd7e3 core/socket: fix SO_REUSEPORT 2013-11-17 17:41:35 -05:00
Tom Gundersen accdd018ed mount/service: drop FsckPassNo support
We now treat passno as boleans in the generators, and don't need this any more. fsck itself
is able to sequentialize checks on the same local media, so in the common case the ordering
is redundant.

It is still possible to force an order by using .d fragments, in case that is desired.
2013-10-19 12:23:17 +02:00
Lennart Poettering a57f7e2c82 core: rework how we match mount units against each other
Previously to automatically create dependencies between mount units we
matched every mount unit agains all others resulting in O(n^2)
complexity. On setups with large amounts of mount units this might make
things slow.

This change replaces the matching code to use a hashtable that is keyed
by a path prefix, and points to a set of units that require that path to
be around. When a new mount unit is installed it is hence sufficient to
simply look up this set of units via its own file system paths to know
which units to order after itself.

This patch also changes all unit types to only create automatic mount
dependencies via the RequiresMountsFor= logic, and this is exposed to
the outside to make things more transparent.

With this change we still have some O(n) complexities in place when
handling mounts, but that's currently unavoidable due to kernel APIs,
and still substantially better than O(n^2) as before.

https://bugs.freedesktop.org/show_bug.cgi?id=69740
2013-09-26 20:20:30 +02:00
Lennart Poettering ddca82aca0 cgroup: get rid of MemorySoftLimit=
The cgroup attribute memory.soft_limit_in_bytes is unlikely to stay
around in the kernel for good, so let's not expose it for now. We can
readd something like it later when the kernel guys decided on a final
API for this.
2013-09-17 14:58:00 -05:00
Lennart Poettering c3df8d3dde core: make sure scope attributes survive a reload 2013-07-30 02:54:56 +02:00
Lennart Poettering 82659fd757 core: optionally send SIGHUP in addition to the configured kill signal
This is useful to fake session ends for processes like shells.
2013-07-30 01:54:59 +02:00
Lennart Poettering 8e7076caae cgroup: split out per-device BlockIOWeight= setting into BlockIODeviceWeight=
This way we can nicely map the configuration directive to properties and
back, without requiring two different signatures for the same property.
2013-07-11 20:40:18 +02:00
Lennart Poettering b9316fb0f3 unit: save description/slice of transient units to /run
This is necessary so that these properties survive a daemon reload.
2013-07-10 23:41:03 +02:00
Lennart Poettering d28e9236e7 core: parse Slice= from the unit type specific unit file section
Since not all unit types know Slice= it belongs in the unit type
specific unit file section.
2013-07-01 02:52:17 +02:00
Lennart Poettering 4ad490007b core: general cgroup rework
Replace the very generic cgroup hookup with a much simpler one. With
this change only the high-level cgroup settings remain, the ability to
set arbitrary cgroup attributes is removed, so is support for adding
units to arbitrary cgroup controllers or setting arbitrary paths for
them (especially paths that are different for the various controllers).

This also introduces a new -.slice root slice, that is the parent of
system.slice and friends. This enables easy admin configuration of
root-level cgrouo properties.

This replaces DeviceDeny= by DevicePolicy=, and implicitly adds in
/dev/null, /dev/zero and friends if DeviceAllow= is used (unless this is
turned off by DevicePolicy=).
2013-06-27 04:17:34 +02:00
Lennart Poettering a016b9228f core: add new .slice unit type for partitioning systems
In order to prepare for the kernel cgroup rework, let's introduce a new
unit type to systemd, the "slice". Slices can be arranged in a tree and
are useful to partition resources freely and hierarchally by the user.

Each service unit can now be assigned to one of these slices, and later
on login users and machines may too.

Slices translate pretty directly to the cgroup hierarchy, and the
various objects can be assigned to any of the slices in the tree.
2013-06-17 21:36:51 +02:00
Lennart Poettering 3ecaa09bcc unit: rework trigger dependency logic
Instead of having explicit type-specific callbacks that inform the
triggering unit when a triggered unit changes state, make this generic
so that state changes are forwarded betwee any triggered and triggering
unit.

Also, get rid of UnitRef references from automount, timer, path units,
to the units they trigger and rely exclsuively on UNIT_TRIGGER type
dendencies.
2013-04-23 16:00:32 -03:00