Commit graph

373 commits

Author SHA1 Message Date
Yu Watanabe 499295fb46 socket: Symlinks= with empty string resets the list of paths. 2017-09-05 15:08:59 +09:00
iplayinsun c792ec2e35 core: merge the second CapabilityBoundingSet= lines by AND when it is prefixed with tilde (#6724)
If a unit file contains multiple CapabilityBoundingSet= or
AmbientCapabilities= lines, e.g.,
===
CapabilityBoundingSet=CAP_A CAP_B
CapabilityBoundingSet=~CAP_B CAP_C
===
before this commit, it results all capabilities except CAP_C are set to
CapabilityBoundingSet=, as each lines are always merged by OR.
This commit makes lines prefixed with ~ are merged by AND. So, for the
above example only CAP_A is set.
This makes easier to drop capabilities with drop-in config files.
2017-09-04 12:12:27 +09:00
Lennart Poettering 165a31c0db core: add two new special ExecStart= character prefixes
This patch adds two new special character prefixes to ExecStart= and
friends, in addition to the existing "-", "@" and "+":

"!"  → much like "+", except with a much reduced effect as it only
       disables the actual setresuid()/setresgid()/setgroups() calls, but
       leaves all other security features on, including namespace
       options. This is very useful in combination with
       RuntimeDirectory= or DynamicUser= and similar option, as a user
       is still allocated and used for the runtime directory, but the
       actual UID/GID dropping is left to the daemon process itself.
       This should make RuntimeDirectory= a lot more useful for daemons
       which insist on doing their own privilege dropping.

"!!" → Similar to "!", but on systems supporting ambient caps this
       becomes a NOP. This makes it relatively straightforward to write
       unit files that make use of ambient capabilities to let systemd
       drop all privs while retaining compatibility with systems that
       lack ambient caps, where priv dropping is the left to the daemon
       codes themselves.

This is an alternative approach to #6564 and related PRs.
2017-08-10 15:04:32 +02:00
Lennart Poettering 3ed0cd26ea execute: replace command flag bools by a flags field
This way, we can extend it later on in an easier way, and can pass it
along nicely.
2017-08-10 14:44:58 +02:00
Yu Watanabe 07d46372fe securebits-util: add secure_bits_{from_string,to_string_alloc}() 2017-08-07 23:40:25 +09:00
Yu Watanabe dd1f5bd0aa cap-list: add capability_set_{from_string,to_string_alloc}() 2017-08-07 23:25:11 +09:00
Yu Watanabe 70d54d9072 core: replace strcmp() == 0 with streq() 2017-08-06 13:08:40 +09:00
Zbigniew Jędrzejewski-Szmek 4bc5d27b94 Drop busname unit type
Since busname units are only useful with kdbus, they weren't actively
used. This was dead code, only compile-tested. If busname units are
ever added back, it'll be cleaner to start from scratch (possibly reverting
parts of this patch).
2017-07-23 09:29:02 -04:00
Yu Watanabe 3536f49e8f core: add {State,Cache,Log,Configuration}Directory= (#6384)
This introduces {State,Cache,Log,Configuration}Directory= those are
similar to RuntimeDirectory=. They create the directories under
/var/lib, /var/cache/, /var/log, or /etc, respectively, with the mode
specified in {State,Cache,Log,Configuration}DirectoryMode=.

This also fixes #6391.
2017-07-18 14:34:52 +02:00
Yu Watanabe 23a7448efa core: support subdirectories in RuntimeDirectory= option 2017-07-17 16:30:53 +09:00
Yu Watanabe 53f47dfc7b core: allow preserving contents of RuntimeDirectory= over process restart
This introduces RuntimeDirectoryPreserve= option which takes a boolean
argument or 'restart'.

Closes #6087.
2017-07-17 16:22:25 +09:00
Zbigniew Jędrzejewski-Szmek 2c75fb7330 core/load-fragment: refuse units with errors in RootDirectory/RootImage/DynamicUser
Behaviour of the service is completely different with the option off, so the
service would probably mess up state on disk and do unexpected things.
2017-07-11 13:38:13 -04:00
Zbigniew Jędrzejewski-Szmek bb28e68477 core/load-fragment: refuse units with errors in certain directives
If an error is encountered in any of the Exec* lines, WorkingDirectory,
SELinuxContext, ApparmorProfile, SmackProcessLabel, Service (in .socket
units), User, or Group, refuse to load the unit. If the config stanza
has support, ignore the failure if '-' is present.

For those configuration directives, even if we started the unit, it's
pretty likely that it'll do something unexpected (like write files
in a wrong place, or with a wrong context, or run with wrong permissions,
etc). It seems better to refuse to start the unit and have the admin
clean up the configuration without giving the service a chance to mess
up stuff.

Note that all "security" options that restrict what the unit can do
(Capabilities, AmbientCapabilities, Restrict*, SystemCallFilter, Limit*,
PrivateDevices, Protect*, etc) are _not_ treated like this. Such options are
only supplementary, and are not always available depending on the architecture
and compilation options, so unit authors have to make sure that the service
runs correctly without them anyway.

Fixes #6237, #6277.
2017-07-11 13:38:02 -04:00
Zbigniew Jędrzejewski-Szmek 0004f698df Parse "timeout=0" as infinity in various generators (#6264)
This extends 2d79a0bbb9 to the kernel
command line parsing.

The parsing is changed a bit to only understand "0" as infinity. If units are
specified, parse normally, e.g. "0s" is just 0. This makes it possible to
provide a zero timeout if necessary.

Simple test is added.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1462378.
2017-07-03 14:29:32 +02:00
Lennart Poettering 7f452159b8 core: make IOSchedulingClass= and IOSchedulingPriority= settable for transient units
This patch is a bit more complex thant I hoped. In particular the single
IOScheduling= property exposed on the bus is split up into
IOSchedulingClass= and IOSchedulingPriority= (though compat is
retained). Otherwise the asymmetry between setting props and getting
them is a bit too nasty.

Fixes #5613
2017-06-26 17:43:18 +02:00
Michal Sekletar f847b8b7df load-fragment: don't print error about incorrect syntax when IPv6 is disabled (#5791) 2017-04-25 09:31:52 +02:00
Danielle Church 42d43f214e load-fragment: resolve specifiers in BindPaths/BindReadOnlyPaths (#5687) 2017-04-24 18:23:35 +02:00
Lennart Poettering 20b7a0070c core: actually make "+" prefix in ReadOnlyPaths=, InaccessiblePaths=, ReadWritablePaths= work
5327c910d2 claimed to add support for "+"
for prefixing paths with the configured RootDirectory=. But actually it
only implemented it in the backend, it did not add support for it to the
configuration file parsers. Fix that now.
2017-02-07 11:22:05 +01:00
Michal Sekletar 29e6561f89 load-fragment: fix comment to reflect changes made in 43eb109 (#5138) 2017-01-23 21:18:40 -05:00
Zbigniew Jędrzejewski-Szmek c73838280c Modify mount_propagation_flags_from_string to return a normal int code
This means that callers can distiguish an error from flags==0,
and don't have to special-case the empty string.
2016-12-17 13:57:04 -05:00
Lennart Poettering d2d6c096f6 core: add ability to define arbitrary bind mounts for services
This adds two new settings BindPaths= and BindReadOnlyPaths=. They allow
defining arbitrary bind mounts specific to particular services. This is
particularly useful for services with RootDirectory= set as this permits making
specific bits of the host directory available to chrooted services.

The two new settings follow the concepts nspawn already possess in --bind= and
--bind-ro=, as well as the .nspawn settings Bind= and BindReadOnly= (and these
latter options should probably be renamed to BindPaths= and BindReadOnlyPaths=
too).

Fixes: #3439
2016-12-14 00:54:10 +01:00
Lennart Poettering 835552511e core: hook up MountFlags= to the transient unit logic
This makes "systemd-run -p MountFlags=shared -t /bin/sh" work, by making
MountFlags= to the list of properties that may be accessed transiently.
2016-12-13 21:22:13 +01:00
Zbigniew Jędrzejewski-Szmek 007f48bb89 pid1: remove unnecessary counter
The loop must terminate after at most three iterations anyway.
2016-12-11 00:21:35 -05:00
Lennart Poettering 7b07e99320 core: add specifier expansion to ReadOnlyPaths= and friends
Expanding specifiers here definitely makes sense.

Also simplifies the loop a bit, as there's no reason to keep "prev" around...
2016-12-07 18:47:32 +01:00
Lennart Poettering 744bb5b1be core: add specifier expansion to RequiresMountsFor=
This might be useful for some people, for example to pull in mounts for paths
including the machine ID or hostname.
2016-12-07 18:47:32 +01:00
Lennart Poettering 18913df9a2 core: use unit_full_printf() at a couple of locations we used unit_name_printf() before
For settings that are not taking unit names there's no reason to use
unit_name_printf(). Use unit_full_printf() instead, as the names are validated
anyway in one form or another after expansion.
2016-12-07 18:47:32 +01:00
Lennart Poettering 5125e76243 core: move specifier expansion out of service.c/socket.c
This monopolizes unit file specifier expansion in load-fragment.c, and removes
it from socket.c + service.c. This way expansion becomes an operation done exclusively at time of loading unit files.

Previously specifiers were resolved for all settings during loading of unit
files with the exception of ExecStart= and friends which were resolved in
socket.c and service.c. With this change the latter is also moved to the
loading of unit files.

Fixes: #3061
2016-12-07 18:47:32 +01:00
Djalal Harouni c92e8afebd core: improve the logic that implies no new privileges
The no_new_privileged_set variable is not used any more since commit
9b232d3241 that fixed another thing. So remove it. Also no
need to check if we are under user manager, remove that part too.
2016-11-15 15:04:31 +01:00
Zbigniew Jędrzejewski-Szmek d85a0f8028 Merge pull request #4536 from poettering/seccomp-namespaces
core: add new RestrictNamespaces= unit file setting

Merging, not rebasing, because this touches many files and there were tree-wide cleanups in the mean time.
2016-11-08 19:54:21 -05:00
Zbigniew Jędrzejewski-Szmek 54ac349445 core/load-fragment: modify existing environment instead of copying strv over and over 2016-11-05 18:54:27 -04:00
Zbigniew Jędrzejewski-Szmek 035fe294b3 core/load-fragment: port to extract_first_word 2016-11-05 15:35:51 -04:00
Zbigniew Jędrzejewski-Szmek 9a82ab9592 tree-wide: drop unneded WHITESPACE param to extract_first_word
It's the default, and NULL is shorter.
2016-11-05 15:35:51 -04:00
Lennart Poettering add005357d core: add new RestrictNamespaces= unit file setting
This new setting permits restricting whether namespaces may be created and
managed by processes started by a unit. It installs a seccomp filter blocking
certain invocations of unshare(), clone() and setns().

RestrictNamespaces=no is the default, and does not restrict namespaces in any
way. RestrictNamespaces=yes takes away the ability to create or manage any kind
of namspace. "RestrictNamespaces=mnt ipc" restricts the creation of namespaces
so that only mount and IPC namespaces may be created/managed, but no other
kind of namespaces.

This setting should be improve security quite a bit as in particular user
namespacing was a major source of CVEs in the kernel in the past, and is
accessible to unprivileged processes. With this setting the entire attack
surface may be removed for system services that do not make use of namespaces.
2016-11-04 07:40:13 -06:00
Martin Pitt 1740c5a807 Merge pull request #4458 from keszybz/man-nonewprivileges
Document NoNewPrivileges default value
2016-10-28 15:35:29 +02:00
Lennart Poettering 8130926d32 core: rework syscall filter set handling
A variety of fixes:

- rename the SystemCallFilterSet structure to SyscallFilterSet. So far the main
  instance of it (the syscall_filter_sets[] array) used to abbreviate
  "SystemCall" as "Syscall". Let's stick to one of the two syntaxes, and not
  mix and match too wildly. Let's pick the shorter name in this case, as it is
  sufficiently well established to not confuse hackers reading this.

- Export explicit indexes into the syscall_filter_sets[] array via an enum.
  This way, code that wants to make use of a specific filter set, can index it
  directly via the enum, instead of having to search for it. This makes
  apply_private_devices() in particular a lot simpler.

- Provide two new helper calls in seccomp-util.c: syscall_filter_set_find() to
  find a set by its name, seccomp_add_syscall_filter_set() to add a set to a
  seccomp object.

- Update SystemCallFilter= parser to use extract_first_word().  Let's work on
  deprecating FOREACH_WORD_QUOTED().

- Simplify apply_private_devices() using this functionality
2016-10-24 17:32:50 +02:00
Zbigniew Jędrzejewski-Szmek 9b232d3241 core: do not set no_new_privileges flag in config_parse_syscall_filter
If SyscallFilter was set, and subsequently cleared, the no_new_privileges flag
was not reset properly. We don't need to set this flag here, it will be
set automatically in unit_patch_contexts() if syscall_filter is set.
2016-10-22 23:42:34 -04:00
Lukas Nykryn 87a47f99bc failure-action: generalize failure action to emergency action 2016-10-21 15:13:50 +02:00
Luca Bruno 52c239d770 core/exec: add a named-descriptor option ("fd") for streams (#4179)
This commit adds a `fd` option to `StandardInput=`,
`StandardOutput=` and `StandardError=` properties in order to
connect standard streams to externally named descriptors provided
by some socket units.

This option looks for a file descriptor named as the corresponding
stream. Custom names can be specified, separated by a colon.
If multiple name-matches exist, the first matching fd will be used.
2016-10-17 20:05:49 -04:00
Zbigniew Jędrzejewski-Szmek 3b319885c4 tree-wide: introduce free_and_replace helper
It's a common pattern, so add a helper for it. A macro is necessary
because a function that takes a pointer to a pointer would be type specific,
similarly to cleanup functions. Seems better to use a macro.
2016-10-16 23:35:39 -04:00
Zbigniew Jędrzejewski-Szmek 3ccb886283 Allow block and char classes in DeviceAllow bus properties (#4353)
Allowed paths are unified betwen the configuration file parses and the bus
property checker. The biggest change is that the bus code now allows "block-"
and "char-" classes. In addition, path_startswith("/dev") was used in the bus
code, and startswith("/dev") was used in the config file code. It seems
reasonable to use path_startswith() which allows a slightly broader class of
strings.

Fixes #3935.
2016-10-12 11:12:11 +02:00
Lennart Poettering cf08b48642 core: introduce MemorySwapMax= (#3659)
Similar to MemoryMax=, MemorySwapMax= limits swap usage. This controls
controls "memory.swap.max" attribute in unified cgroup.
2016-08-31 12:28:54 +02:00
WaLyong Cho 96e131ea09 core: introduce MemorySwapMax=
Similar to MemoryMax=, MemorySwapMax= limits swap usage. This controls
controls "memory.swap.max" attribute in unified cgroup.
2016-08-30 11:11:45 +09:00
Douglas Christman 2507992f6b load-fragment: Resolve specifiers in OnCalendar and On*Sec
Resolves #3534
2016-08-26 12:13:16 -04:00
Tejun Heo f50582649f logind: update empty and "infinity" handling for [User]TasksMax (#3835)
The parsing functions for [User]TasksMax were inconsistent.  Empty string and
"infinity" were interpreted as no limit for TasksMax but not accepted for
UserTasksMax.  Update them so that they're consistent with other knobs.

* Empty string indicates the default value.
* "infinity" indicates no limit.

While at it, replace opencoded (uint64_t) -1 with CGROUP_LIMIT_MAX in TasksMax
handling.

v2: Update empty string to indicate the default value as suggested by Zbigniew
    Jędrzejewski-Szmek.

v3: Fixed empty UserTasksMax handling.
2016-08-18 22:57:53 -04:00
Tejun Heo 66ebf6c0a1 core: add cgroup CPU controller support on the unified hierarchy
Unfortunately, due to the disagreements in the kernel development community,
CPU controller cgroup v2 support has not been merged and enabling it requires
applying two small out-of-tree kernel patches.  The situation is explained in
the following documentation.

 https://git.kernel.org/cgit/linux/kernel/git/tj/cgroup.git/tree/Documentation/cgroup-v2-cpu.txt?h=cgroup-v2-cpu

While it isn't clear what will happen with CPU controller cgroup v2 support,
there are critical features which are possible only on cgroup v2 such as
buffered write control making cgroup v2 essential for a lot of workloads.  This
commit implements systemd CPU controller support on the unified hierarchy so
that users who choose to deploy CPU controller cgroup v2 support can easily
take advantage of it.

On the unified hierarchy, "cpu.weight" knob replaces "cpu.shares" and "cpu.max"
replaces "cpu.cfs_period_us" and "cpu.cfs_quota_us".  [Startup]CPUWeight config
options are added with the usual compat translation.  CPU quota settings remain
unchanged and apply to both legacy and unified hierarchies.

v2: - Error in man page corrected.
    - CPU config application in cgroup_context_apply() refactored.
    - CPU accounting now works on unified hierarchy.
2016-08-07 09:45:39 -04:00
Lennart Poettering 41bf0590cc util-lib: unify parsing of nice level values
This adds parse_nice() that parses a nice level and ensures it is in the right
range, via a new nice_is_valid() helper. It then ports over a number of users
to this.

No functional changes.
2016-08-05 11:18:32 +02:00
David Michael 5124866d73 util-lib: add parse_percent_unbounded() for percentages over 100% (#3886)
This permits CPUQuota to accept greater values as documented.
2016-08-04 13:09:54 +02:00
Zbigniew Jędrzejewski-Szmek dadd6ecfa5 Merge pull request #3728 from poettering/dynamic-users 2016-07-25 16:40:26 -04:00
Lennart Poettering 43eb109aa9 core: change ExecStart=! syntax to ExecStart=+ (#3797)
As suggested by @mbiebl we already use the "!" special char in unit file
assignments for negation, hence we should not use it in a different context for
privileged execution. Let's use "+" instead.
2016-07-25 16:53:33 +02:00
Lennart Poettering 66dccd8d85 core: be stricter when parsing User=/Group= fields
Let's verify the validity of the syntax of the user/group names set.
2016-07-22 15:53:45 +02:00
Lennart Poettering b3785cd5e6 core: check for overflow when handling scaled MemoryLimit= settings
Just in case...
2016-07-22 15:33:13 +02:00
Lennart Poettering 83f8e80857 core: support percentage specifications on TasksMax=
This adds support for a TasksMax=40% syntax for specifying values relative to
the system's configured maximum number of processes. This is useful in order to
neatly subdivide the available room for tasks within containers.
2016-07-22 15:33:12 +02:00
Luca Bruno 391b81cd03 seccomp: only abort on syscall name resolution failures (#3701)
seccomp_syscall_resolve_name() can return a mix of positive and negative
(pseudo-) syscall numbers, while errors are signaled via __NR_SCMP_ERROR.
This commit lets the syscall filter parser only abort on real parsing
failures, letting libseccomp handle pseudo-syscall number on its own
and allowing proper multiplexed syscalls filtering.
2016-07-12 11:55:26 +02:00
Torstein Husebø 61233823aa treewide: fix typos and remove accidental repetition of words 2016-07-11 16:18:43 +02:00
Lennart Poettering 616aab6085 Merge pull request #3481 from poettering/relative-memcg
various changes, most importantly regarding memory metrics
2016-06-16 13:56:23 +02:00
Zbigniew Jędrzejewski-Szmek a1feacf77f load-fragment: ignore ENOTDIR/EACCES errors (#3510)
If for whatever reason the file system is "corrupted", we want
to be resilient and ignore the error, as long as we can load the units
from a different place.

Arch bug https://bugs.archlinux.org/task/49547.

A user had an ntfs symlink (essentially a file) instead of a directory after
restoring from backup. We should just ignore that like we would treat a missing
directory, for general resiliency.

We should treat permission errors similarly. For example an unreadable
/usr/local/lib directory would prevent (user) instances of systemd from
loading any units. It seems better to continue.
2016-06-15 23:02:27 +02:00
Lennart Poettering d8cf2ac79b util: introduce physical_memory_scale() to unify how we scale by physical memory
The various bits of code did the scaling all different, let's unify this,
given that the code is not trivial.
2016-06-14 20:01:45 +02:00
Lennart Poettering 875ae5661a core: optionally, accept a percentage value for MemoryLimit= and related settings
If a percentage is used, it is taken relative to the installed RAM size. This
should make it easier to write generic unit files that adapt to the local system.
2016-06-14 19:50:38 +02:00
Lennart Poettering 9184ca48ea util-lib: introduce parse_percent() for parsing percent specifications
And port a couple of users over to it.
2016-06-14 19:50:38 +02:00
Alessandro Puccetti cf677fe686 core/execute: add the magic character '!' to allow privileged execution (#3493)
This patch implements the new magic character '!'. By putting '!' in front
of a command, systemd executes it with full privileges ignoring paramters
such as User, Group, SupplementaryGroups, CapabilityBoundingSet,
AmbientCapabilities, SecureBits, SystemCallFilter, SELinuxContext,
AppArmorProfile, SmackProcessLabel, and RestrictAddressFamilies.

Fixes partially https://github.com/systemd/systemd/issues/3414
Related to https://github.com/coreos/rkt/issues/2482

Testing:
1. Create a user 'bob'
2. Create the unit file /etc/systemd/system/exec-perm.service
   (You can use the example below)
3. sudo systemctl start ext-perm.service
4. Verify that the commands starting with '!' were not executed as bob,
   4.1 Looking to the output of ls -l /tmp/exec-perm
   4.2 Each file contains the result of the id command.

`````````````````````````````````````````````````````````````````
[Unit]
Description=ext-perm

[Service]
Type=oneshot
TimeoutStartSec=0
User=bob
ExecStartPre=!/usr/bin/sh -c "/usr/bin/rm /tmp/exec-perm*" ;
    /usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start-pre"
ExecStart=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start" ;
    !/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-star-2"
ExecStartPost=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start-post"
ExecReload=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-reload"
ExecStop=!/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-stop"
ExecStopPost=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-stop-post"

[Install]
WantedBy=multi-user.target]
`````````````````````````````````````````````````````````````````
2016-06-10 18:19:54 +02:00
Lennart Poettering 9d3e340639 load-fragment: don't try to do a template instance replacement if we are not an instance (#3451)
Corrects: 7aad67e7

Fixes: #3438
2016-06-09 10:49:36 +02:00
Tejun Heo e57c9ce169 core: always use "infinity" for no upper limit instead of "max" (#3417)
Recently added cgroup unified hierarchy support uses "max" in configurations
for no upper limit.  While consistent with what the kernel uses for no upper
limit, it is inconsistent with what systemd uses for other controllers such as
memory or pids.  There's no point in introducing another term.  Update cgroup
unified hierarchy support so that "infinity" is the only term that systemd
uses for no upper limit.
2016-06-03 17:49:05 +02:00
Topi Miettinen 201c1cc22a core: add pre-defined syscall groups to SystemCallFilter= (#3053) (#3157)
Implement sets of system calls to help constructing system call
filters. A set starts with '@' to distinguish from a system call.

Closes: #3053, #3157
2016-06-01 11:56:01 +02:00
Tejun Heo da4d897e75 core: add cgroup memory controller support on the unified hierarchy (#3315)
On the unified hierarchy, memory controller implements three control knobs -
low, high and max which enables more useable and versatile control over memory
usage.  This patch implements support for the three control knobs.

* MemoryLow, MemoryHigh and MemoryMax are added for memory.low, memory.high and
  memory.max, respectively.

* As all absolute limits on the unified hierarchy use "max" for no limit, make
  memory limit parse functions accept "max" in addition to "infinity" and
  document "max" for the new knobs.

* Implement compatibility translation between MemoryMax and MemoryLimit.

v2:

- Fixed missing else's in config_parse_memory_limit().
- Fixed missing newline when writing out drop-ins.
- Coding style updates to use "val > 0" instead of "val".
- Minor updates to documentation.
2016-05-27 18:10:18 +02:00
Tejun Heo 979d03117f core: update CGroupBlockIODeviceBandwidth to record both rbps and wbps
CGroupBlockIODeviceBandwith is used to keep track of IO bandwidth limits for
legacy cgroup hierarchies.  Unlike the unified hierarchy counterpart
CGroupIODeviceLimit, a CGroupBlockIODeviceBandwiddth records either a read or
write limit and has a couple issues.

* There's no way to clear specific config entry.

* When configs are cleared for an IO direction of a unit, the kernel settings
  aren't cleared accordingly creating discrepancies.

This patch updates CGroupBlockIODeviceBandwidth so that it behaves similarly to
CGroupIODeviceLimit - each entry records both rbps and wbps limits and is
cleared if both are at default values after kernel settings are updated.
2016-05-18 13:51:46 -07:00
Tejun Heo 9be572497d core: introduce CGroupIOLimitType enums
Currently, there are two cgroup IO limits, bandwidth max for read and write,
and they are hard-coded in various places.  This is fine for two limits but IO
is expected to grow more limits - low, high and max limits for bandwidth and
IOPS - and hard-coding each limit won't make sense.

This patch replaces hard-coded limits with an array indexed by
CGroupIOLimitType and accompanying string and default value tables so that new
limits can be added trivially.
2016-05-18 13:50:56 -07:00
Lennart Poettering 3103459e90 Merge pull request #3193 from htejun/cgroup-io-controller
core: add io controller support on the unified hierarchy
2016-05-16 22:05:27 +02:00
Lennart Poettering d31645adef tree-wide: port more code to use ifname_valid() 2016-05-09 15:45:31 +02:00
Tejun Heo 13c31542cc core: add io controller support on the unified hierarchy
On the unified hierarchy, blkio controller is renamed to io and the interface
is changed significantly.

* blkio.weight and blkio.weight_device are consolidated into io.weight which
  uses the standardized weight range [1, 10000] with 100 as the default value.

* blkio.throttle.{read|write}_{bps|iops}_device are consolidated into io.max.
  Expansion of throttling features is being worked on to support
  work-conserving absolute limits (io.low and io.high).

* All stats are consolidated into io.stats.

This patchset adds support for the new interface.  As the interface has been
revamped and new features are expected to be added, it seems best to treat it
as a separate controller rather than trying to expand the blkio settings
although we might add automatic translation if only blkio settings are
specified.

* io.weight handling is mostly identical to blkio.weight[_device] handling
  except that the weight range is different.

* Both read and write bandwidth settings are consolidated into
  CGroupIODeviceLimit which describes all limits applicable to the device.
  This makes it less painful to add new limits.

* "max" can be used to specify the maximum limit which is equivalent to no
  config for max limits and treated as such.  If a given CGroupIODeviceLimit
  doesn't contain any non-default configs, the config struct is discarded once
  the no limit config is applied to cgroup.

* lookup_blkio_device() is renamed to lookup_block_device().

Signed-off-by: Tejun Heo <htejun@fb.com>
2016-05-05 16:43:06 -04:00
Zbigniew Jędrzejewski-Szmek a82394c889 Merge pull request #2921 from keszybz/do-not-report-masked-units-as-changed 2016-05-03 14:08:39 -04:00
Zbigniew Jędrzejewski-Szmek d43bbb52de Revert "Do not report masked units as changed (#2921)"
This reverts commit 6d10d308c6.

It got squashed by mistake.
2016-05-03 14:08:23 -04:00
Zbigniew Jędrzejewski-Szmek 8a993b61d1 Move no_alias information to shared/
This way it can be used in install.c in subsequent commit.
2016-05-01 19:40:51 -04:00
Lennart Poettering a837f08803 core: when encountering a symlink alias for non-aliasable units warn nicely
If the user defines a symlink alias for a unit whose type does not support
aliasing, detect this early and print a nice warning.

Fixe: #2730
2016-04-29 18:06:12 +02:00
Martin Pitt 025ef1d226 Merge pull request #2973 from poettering/search-path
Many fixes, in particular to the install logic
2016-04-12 18:20:13 +02:00
Nicolas Braud-Santoni 1116e14c49 load-fragment: Resolve specifiers in DeviceAllow (#3019)
Closes #1602
2016-04-12 18:00:19 +02:00
Lennart Poettering 463d0d1569 core: remove ManagerRunningAs enum
Previously, we had two enums ManagerRunningAs and UnitFileScope, that were
mostly identical and converted from one to the other all the time. The latter
had one more value UNIT_FILE_GLOBAL however.

Let's simplify things, and remove ManagerRunningAs and replace it by
UnitFileScope everywhere, thus making the translation unnecessary. Introduce
two new macros MANAGER_IS_SYSTEM() and MANAGER_IS_USER() to simplify checking
if we are running in one or the user context.
2016-04-12 13:43:30 +02:00
Lennart Poettering a3c4eb0710 core: rework generator dir logic, move the dirs into LookupPaths structure
A long time ago – when generators where first introduced – the directories for
them were randomly created via mkdtemp(). This was changed later so that they
use fixed name directories now. Let's make use of this, and add the genrator
dirs to the LookupPaths structure and into the unit file search path maintained
in it. This has the benefit that the generator dirs are now normal part of the
search path for all tools, and thus are shown in "systemctl list-unit-files"
too.
2016-04-12 13:43:29 +02:00
Zbigniew Jędrzejewski-Szmek 6d10d308c6 Do not report masked units as changed (#2921)
* core/unit: extract checking of stat paths into helper function

The same code was repeated three times.

* core: treat masked files as "unchanged"

systemctl prints the "unit file changed on disk" warning
for a masked unit. I think it's better to print nothing in that
case.

When a masked unit is loaded, set mtime as 0. When checking
if a unit with mtime of 0 needs reload, check that the mask
is still in place.

* test-dnssec: fix build without gcrypt

Also reorder the test functions to follow the way they are called
from main().
2016-04-12 11:10:57 +02:00
Zbigniew Jędrzejewski-Szmek 3a8db9fe81 core: treat masked files as "unchanged"
systemctl prints the "unit file changed on disk" warning
for a masked unit. I think it's better to print nothing in that
case.

When a masked unit is loaded, set mtime as 0. When checking
if a unit with mtime of 0 needs reload, check that the mask
is still in place.
2016-03-31 00:38:50 -04:00
Michal Sekletar 7aad67e7f2 core: look for instance when processing template name
If first attempt to merge units failed and we are trying to do
merge the other way around and at the same time we are working with
template name, then other unit can't possibly be template, because it is
not possible to have template unit running, only instances of the
template. Thus we need to look for already active instance instead.
2016-03-16 15:40:14 +01:00
Vito Caputo 9ed794a32d tree-wide: minor formatting inconsistency cleanups 2016-02-23 14:20:34 -08:00
Vito Caputo 313cefa1d9 tree-wide: make ++/-- usage consistent WRT spacing
Throughout the tree there's spurious use of spaces separating ++ and --
operators from their respective operands.  Make ++ and -- operator
consistent with the majority of existing uses; discard the spaces.
2016-02-22 20:32:04 -08:00
Evgeny Vereshchagin bd1b973fb3 core: revert "core: resolve specifier in config_parse_exec()"
This reverts commit cb48dfca6a.

Exec*-settings resolve specifiers twice:
%%U -> config_parse_exec [cb48dfca6a] -> %U -> service_spawn -> 0

Fixes #2637
2016-02-18 11:55:53 +00:00
Lennart Poettering 479050b363 core: drop Capabilities= setting
The setting is hardly useful (since its effect is generally reduced to zero due
to file system caps), and with the advent of ambient caps an actually useful
replacement exists, hence let's get rid of this.

I am pretty sure this was unused and our man page already recommended against
its use, hence this should be a safe thing to remove.
2016-02-13 11:59:34 +01:00
Daniel Mack 9ca6ff50ab Remove kdbus custom endpoint support
This feature will not be used anytime soon, so remove a bit of cruft.

The BusPolicy= config directive will stay around as compat noop.
2016-02-11 22:12:04 +01:00
Lennart Poettering 926db6521b Merge pull request #2574 from zonque/netclass-remove
cgroup: remove support for NetClass= directive
2016-02-10 17:03:00 +01:00
Daniel Mack 50f48ad37a cgroup: remove support for NetClass= directive
Support for net_cls.class_id through the NetClass= configuration directive
has been added in v227 in preparation for a per-unit packet filter mechanism.
However, it turns out the kernel people have decided to deprecate the net_cls
and net_prio controllers in v2. Tejun provides a comprehensive justification
for this in his commit, which has landed during the merge window for kernel
v4.5:

  https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=bd1060a1d671

As we're aiming for full support for the v2 cgroup hierarchy, we can no
longer support this feature. Userspace tool such as nftables are moving over
to setting rules that are specific to the full cgroup path of a task, which
obsoletes these controllers anyway.

This commit removes support for tweaking details in the net_cls controller,
but keeps the NetClass= directive around for legacy compatibility reasons.
2016-02-10 16:38:56 +01:00
Lennart Poettering 89beff89ed core: treat JobTimeout=0 as equivalent to JobTimeout=infinity
Corrects an incompatibility introduced with 36c16a7cdd.

Fixes: #2537
2016-02-10 16:09:24 +01:00
Lennart Poettering aad41f0814 core: simplify how we parse TimeoutSec=, TimeoutStartSec= and TimeoutStopSec=
Let's make things more obvious by placing the parse_usec() invocation directly in config_parse_service_timeout().
2016-02-10 16:09:24 +01:00
Daniel Mack b26fa1a2fb tree-wide: remove Emacs lines from all files
This should be handled fine now by .dir-locals.el, so need to carry that
stuff in every file.
2016-02-10 13:41:57 +01:00
Lennart Poettering 36c16a7cdd core: rework unit timeout handling, and add new setting RuntimeMaxSec=
This clean-ups timeout handling in PID 1. Specifically, instead of storing 0 in internal timeout variables as
indication for a disabled timeout, use USEC_INFINITY which is in-line with how we do this in the rest of our code
(following the logic that 0 means "no", and USEC_INFINITY means "never").

This also replace all usec_t additions with invocations to usec_add(), so that USEC_INFINITY is properly propagated,
and sd-event considers it has indication for turning off the event source.

This also alters the deserialization of the units to restart timeouts from the time they were originally started from.
Before this patch timeouts would be restarted beginning with the time of the deserialization, which could lead to
artificially prolonged timeouts if a daemon reload took place.

Finally, a new RuntimeMaxSec= setting is introduced for service units, that specifies a maximum runtime after which a
specific service is forcibly terminated. This is useful to put time limits on time-intensive processing jobs.

This also simplifies the various xyz_spawn() calls of the various types in that explicit distruction of the timers is
removed, as that is done anyway by the state change handlers, and a state change is always done when the xyz_spawn()
calls fail.

Fixes: #2249
2016-02-01 22:18:16 +01:00
Lennart Poettering d0a7c5f692 core: move parsing of rlimits into rlimit-util.[ch]
This way we can reuse it for parsing rlimit settings in "systemctl set-property" and related commands.
2016-02-01 22:18:16 +01:00
Lennart Poettering aee7c185ec Merge pull request #2306 from walyong/exec_v01
[v1] core: resolve specifier in config_parse_exec()
2016-01-26 21:52:30 +01:00
Ismo Puustinen 755d4b67a4 capabilities: added support for ambient capabilities.
This patch adds support for ambient capabilities in service files. The
idea with ambient capabilities is that the execed processes can run with
non-root user and get some inherited capabilities, without having any
need to add the capabilities to the executable file.

You need at least Linux 4.3 to use ambient capabilities. SecureBit
keep-caps is automatically added when you use ambient capabilities and
wish to change the user.

An example system service file might look like this:

[Unit]
Description=Service for testing caps

[Service]
ExecStart=/usr/bin/sleep 10000
User=nobody
AmbientCapabilities=CAP_NET_ADMIN CAP_NET_RAW

After starting the service it has these capabilities:

CapInh: 0000000000003000
CapPrm: 0000000000003000
CapEff: 0000000000003000
CapBnd: 0000003fffffffff
CapAmb: 0000000000003000
2016-01-12 12:14:50 +02:00
Ismo Puustinen a103496ca5 capabilities: keep bounding set in non-inverted format.
Change the capability bounding set parser and logic so that the bounding
set is kept as a positive set internally. This means that the set
reflects those capabilities that we want to keep instead of drop.
2016-01-12 12:14:50 +02:00
WaLyong Cho cb48dfca6a core: resolve specifier in config_parse_exec()
When parse ExecXXX=, specifiers are not resolved in
config_parse_exec(). Finally, the specifiers are set into unit
properties. So, systemctl shows not resolved speicifier on "Process:"
field.
To set the exec properties well, resolve specifiers before parse the
rvale by unit_full_printf();
2016-01-12 17:32:08 +09:00
Lennart Poettering 4afd3348c7 tree-wide: expose "p"-suffix unref calls in public APIs to make gcc cleanup easy
GLIB has recently started to officially support the gcc cleanup
attribute in its public API, hence let's do the same for our APIs.

With this patch we'll define an xyz_unrefp() call for each public
xyz_unref() call, to make it easy to use inside a
__attribute__((cleanup())) expression. Then, all code is ported over to
make use of this.

The new calls are also documented in the man pages, with examples how to
use them (well, I only added docs where the _unref() call itself already
had docs, and the examples, only cover sd_bus_unrefp() and
sd_event_unrefp()).

This also renames sd_lldp_free() to sd_lldp_unref(), since that's how we
tend to call our destructors these days.

Note that this defines no public macro that wraps gcc's attribute and
makes it easier to use. While I think it's our duty in the library to
make our stuff easy to use, I figure it's not our duty to make gcc's own
features easy to use on its own. Most likely, client code which wants to
make use of this should define its own:

       #define _cleanup_(function) __attribute__((cleanup(function)))

Or similar, to make the gcc feature easier to use.

Making this logic public has the benefit that we can remove three header
files whose only purpose was to define these functions internally.

See #2008.
2015-11-27 19:19:36 +01:00
Evgeny Vereshchagin 0316f2aeeb core: fix rlimit parsing
* refuse limits if soft > hard
* print an actual value instead of (null)

see https://github.com/systemd/systemd/pull/1994#issuecomment-159999123
2015-11-27 11:26:37 +00:00
Lennart Poettering f7b5b034e8 Merge pull request #1994 from karelzak/rlimits
core: support <soft:hard> ranges for RLIMIT options
2015-11-26 13:17:25 +01:00
Karel Zak 91518d20dd core: support <soft:hard> ranges for RLIMIT options
The new parser supports:

 <value>       - specify both limits to the same value
 <soft:hard>   - specify both limits

the size or time specific suffixes are supported, for example

  LimitRTTIME=1sec
  LimitAS=4G:16G

The patch introduces parse_rlimit_range() and rlim type (size, sec,
usec, etc.) specific parsers. No code is duplicated now.

The patch also sync docs for DefaultLimitXXX= and LimitXXX=.

References: https://github.com/systemd/systemd/issues/1769
2015-11-25 12:03:32 +01:00