Commit Graph

225 Commits

Author SHA1 Message Date
David Michael 5124866d73 util-lib: add parse_percent_unbounded() for percentages over 100% (#3886)
This permits CPUQuota to accept greater values as documented.
2016-08-04 13:09:54 +02:00
Lennart Poettering d251207d55 core: add new PrivateUsers= option to service execution
This setting adds minimal user namespacing support to a service. When set the invoked
processes will run in their own user namespace. Only a trivial mapping will be
set up: the root user/group is mapped to root, and the user/group of the
service will be mapped to itself, everything else is mapped to nobody.

If this setting is used the service runs with no capabilities on the host, but
configurable capabilities within the service.

This setting is particularly useful in conjunction with RootDirectory= as the
need to synchronize /etc/passwd and /etc/group between the host and the service
OS tree is reduced, as only three UID/GIDs need to match: root, nobody and the
user of the service itself. But even outside the RootDirectory= case this
setting is useful to substantially reduce the attack surface of a service.

Example command to test this:

        systemd-run -p PrivateUsers=1 -p User=foobar -t /bin/sh

This runs a shell as user "foobar". When typing "ps" only processes owned by
"root", by "foobar", and by "nobody" should be visible.
2016-08-03 20:42:04 +02:00
Zbigniew Jędrzejewski-Szmek dadd6ecfa5 Merge pull request #3728 from poettering/dynamic-users 2016-07-25 16:40:26 -04:00
Lennart Poettering 29206d4619 core: add a concept of "dynamic" user ids, that are allocated as long as a service is running
This adds a new boolean setting DynamicUser= to service files. If set, a new
user will be allocated dynamically when the unit is started, and released when
it is stopped. The user ID is allocated from the range 61184..65519. The user
will not be added to /etc/passwd (but an NSS module to be added later should
make it show up in getent passwd).

For now, care should be taken that the service writes no files to disk, since
this might result in files owned by UIDs that might get assigned dynamically to
a different service later on. Later patches will tighten sandboxing in order to
ensure that this cannot happen, except for a few selected directories.

A simple way to test this is:

        systemd-run -p DynamicUser=1 /bin/sleep 99999
2016-07-22 15:53:45 +02:00
Lennart Poettering f7903e8db6 core: rename MemoryLimitByPhysicalMemory transient property to MemoryLimitScale
That way, we can neatly keep this in line with the new TasksMaxScale= option.

Note that we didn't release a version with MemoryLimitByPhysicalMemory= yet,
hence this change should be unproblematic without breaking API.
2016-07-22 15:33:12 +02:00
Lennart Poettering 83f8e80857 core: support percentage specifications on TasksMax=
This adds support for a TasksMax=40% syntax for specifying values relative to
the system's configured maximum number of processes. This is useful in order to
neatly subdivide the available room for tasks within containers.
2016-07-22 15:33:12 +02:00
Alessandro Puccetti 2a624c36e6 doc,core: Read{Write,Only}Paths= and InaccessiblePaths=
This patch renames Read{Write,Only}Directories= and InaccessibleDirectories=
to Read{Write,Only}Paths= and InaccessiblePaths=, previous names are kept
as aliases but they are not advertised in the documentation.

Renamed variables:
`read_write_dirs` --> `read_write_paths`
`read_only_dirs` --> `read_only_paths`
`inaccessible_dirs` --> `inaccessible_paths`
2016-07-19 17:22:02 +02:00
Ian Lee 7351ded5b9 Do not ellipsize cgroups when showing slices in --full mode (#3560)
Do not ellipsize cgroups when showing slices in --full mode
2016-06-21 16:10:31 +02:00
Lennart Poettering d58d600efd systemctl: allow percent-based MemoryLimit= settings via systemctl set-property
The unit files already accept relative, percent-based memory limit
specification, let's make sure "systemctl set-property" support this too.

Since we want the physical memory size of the destination machine to apply we
pass the percentage in a new set of properties that only exist for this
purpose, and can only be set.
2016-06-14 20:01:45 +02:00
Lennart Poettering 9184ca48ea util-lib: introduce parse_percent() for parsing percent specifications
And port a couple of users over to it.
2016-06-14 19:50:38 +02:00
Topi Miettinen f3e4363593 core: Restrict mmap and mprotect with PAGE_WRITE|PAGE_EXEC (#3319) (#3379)
New exec boolean MemoryDenyWriteExecute, when set, installs
a seccomp filter to reject mmap(2) with PAGE_WRITE|PAGE_EXEC
and mprotect(2) with PAGE_EXEC.
2016-06-03 17:58:18 +02:00
Tejun Heo e57c9ce169 core: always use "infinity" for no upper limit instead of "max" (#3417)
Recently added cgroup unified hierarchy support uses "max" in configurations
for no upper limit.  While consistent with what the kernel uses for no upper
limit, it is inconsistent with what systemd uses for other controllers such as
memory or pids.  There's no point in introducing another term.  Update cgroup
unified hierarchy support so that "infinity" is the only term that systemd
uses for no upper limit.
2016-06-03 17:49:05 +02:00
Tejun Heo da4d897e75 core: add cgroup memory controller support on the unified hierarchy (#3315)
On the unified hierarchy, memory controller implements three control knobs -
low, high and max which enables more useable and versatile control over memory
usage.  This patch implements support for the three control knobs.

* MemoryLow, MemoryHigh and MemoryMax are added for memory.low, memory.high and
  memory.max, respectively.

* As all absolute limits on the unified hierarchy use "max" for no limit, make
  memory limit parse functions accept "max" in addition to "infinity" and
  document "max" for the new knobs.

* Implement compatibility translation between MemoryMax and MemoryLimit.

v2:

- Fixed missing else's in config_parse_memory_limit().
- Fixed missing newline when writing out drop-ins.
- Coding style updates to use "val > 0" instead of "val".
- Minor updates to documentation.
2016-05-27 18:10:18 +02:00
Christian Hesse acc0269cad {machine,system}ctl: always pass &changes and &n_changes (#3350)
We have to pass addresses of changes and n_changes to
bus_deserialize_and_dump_unit_file_changes(). Otherwise we are hit by
missing information (subsequent calls to unit_file_changes_add() to
not add anything).

Also prevent null pointer dereference in
bus_deserialize_and_dump_unit_file_changes() by asserting.

Fixes #3339
2016-05-26 15:57:37 +02:00
Lennart Poettering f9e26ecc48 Merge pull request #3290 from htejun/cgroup2-io-compat
Implement compat translation between IO* and BlockIO* settings
2016-05-20 18:53:11 +02:00
Jonathan Boulle 186ad4b1a0 core/dbus: expose SELinuxContext property (#3284)
Adds support to core for systemd D-Bus clients to send the
`SELinuxContext` property . This means `systemd-run -p
SELinuxContext=foo` should now work.
2016-05-20 15:09:14 +02:00
Tejun Heo 9be572497d core: introduce CGroupIOLimitType enums
Currently, there are two cgroup IO limits, bandwidth max for read and write,
and they are hard-coded in various places.  This is fine for two limits but IO
is expected to grow more limits - low, high and max limits for bandwidth and
IOPS - and hard-coding each limit won't make sense.

This patch replaces hard-coded limits with an array indexed by
CGroupIOLimitType and accompanying string and default value tables so that new
limits can be added trivially.
2016-05-18 13:50:56 -07:00
Lennart Poettering 3103459e90 Merge pull request #3193 from htejun/cgroup-io-controller
core: add io controller support on the unified hierarchy
2016-05-16 22:05:27 +02:00
Zbigniew Jędrzejewski-Szmek 323b7dc903 tree-wide: rename draw_special_char to special_glyph
That function doesn't draw anything on it's own, just returns a string, which
sometimes is more than one character. Also remove "DRAW_" prefix from character
names, TREE_* and ARROW and BLACK_CIRCLE are unambigous on their own, don't
draw anything, and are always used as an argument to special_glyph().

Rename "DASH" to "MDASH", as there's more than one type of dash.
2016-05-09 15:17:57 -04:00
Tejun Heo 13c31542cc core: add io controller support on the unified hierarchy
On the unified hierarchy, blkio controller is renamed to io and the interface
is changed significantly.

* blkio.weight and blkio.weight_device are consolidated into io.weight which
  uses the standardized weight range [1, 10000] with 100 as the default value.

* blkio.throttle.{read|write}_{bps|iops}_device are consolidated into io.max.
  Expansion of throttling features is being worked on to support
  work-conserving absolute limits (io.low and io.high).

* All stats are consolidated into io.stats.

This patchset adds support for the new interface.  As the interface has been
revamped and new features are expected to be added, it seems best to treat it
as a separate controller rather than trying to expand the blkio settings
although we might add automatic translation if only blkio settings are
specified.

* io.weight handling is mostly identical to blkio.weight[_device] handling
  except that the weight range is different.

* Both read and write bandwidth settings are consolidated into
  CGroupIODeviceLimit which describes all limits applicable to the device.
  This makes it less painful to add new limits.

* "max" can be used to specify the maximum limit which is equivalent to no
  config for max limits and treated as such.  If a given CGroupIODeviceLimit
  doesn't contain any non-default configs, the config struct is discarded once
  the no limit config is applied to cgroup.

* lookup_blkio_device() is renamed to lookup_block_device().

Signed-off-by: Tejun Heo <htejun@fb.com>
2016-05-05 16:43:06 -04:00
Tejun Heo 096a4d53dc core: fix segfault on "systemctl --set-property UNIT BlockIODeviceWeight=WEIGHT"
bus_append_unit_property_assignment() was missing an argument for
sd_bus_message_append() when processing BlockIODeviceWeight leading to
segfault.  Fix it.

Signed-off-by: Tejun Heo <htejun@fb.com>
2016-05-04 17:43:13 -04:00
Tejun Heo 3fdf9ad7ad core: bus_append_unit_property_assignment() was using the wrong parse function
It was incorrectly using cg_cpu_weight_parse() to parse BlockIOWeight.  Update
it to use cg_blkio_weight_parse() instead.

Signed-off-by: Tejun Heo <htejun@fb.com>
2016-04-30 16:12:54 -04:00
Lennart Poettering 0b2de9d90d core: fix description of "resources" service error (#3119)
The "resources" error is really just the generic error we return when
we hit some kind of error and we have no more appropriate error for the case to
return, for example because of some OS error.

Hence, reword the explanation and don't claim any relation to resource limits.

Admittedly, the "resources" service error is a bit of a misnomer, but I figure
it's kind of API now.

Fixes: #2716
2016-04-25 15:36:25 -04:00
Lennart Poettering 20b1644140 shared: move unit-specific code from bus-util.h to bus-unit-util.h
Previously we'd have generally useful sd-bus utilities in bust-util.h,
intermixed with code that is specifically for writing clients for PID 1,
wrapping job and unit handling. Let's split the latter out and move it into
bus-unit-util.c, to make the sources a bit short and easier to grok.
2016-04-22 16:06:20 +02:00
Lennart Poettering 291d565a04 core,systemctl: add bus API to retrieve processes of a unit
This adds a new GetProcesses() bus call to the Unit object which returns an
array consisting of all PIDs, their process names, as well as their full cgroup
paths. This is then used by "systemctl status" to show the per-unit process
tree.

This has the benefit that the client-side no longer needs to access the
cgroupfs directly to show the process tree of a unit. Instead, it now uses this
new API, which means it also works if -H or -M are used correctly, as the
information from the specific host is used, and not the one from the local
system.

Fixes: #2945
2016-04-22 16:06:20 +02:00