Systemd

Author	SHA1	Message	Date
Lennart Poettering	3dc5ca9787	core: support IP firewalling to be configured for transient units	2017-09-22 15:24:55 +02:00
Lennart Poettering	b1edf4456e	core: add new per-unit setting KeyringMode= for controlling kernel keyring setup Usually, it's a good thing that we isolate the kernel session keyring for the various services and disconnect them from the user keyring. However, in case of the cryptsetup key caching we actually want that multiple instances of the cryptsetup service can share the keys in the root user's user keyring, hence we need to be able to disable this logic for them. This adds KeyringMode=inherit\|private\|shared: inherit: don't do any keyring magic (this is the default in systemd --user) private: a private keyring as before (default in systemd --system) shared: the new setting	2017-09-15 16:53:35 +02:00
Lennart Poettering	00819cc151	core: add new UnsetEnvironment= setting for unit files With this setting we can explicitly unset specific variables for processes of a unit, as last step of assembling the environment block for them. This is useful to fix #6407. While we are at it, greatly expand the documentation on how the environment block for forked off processes is assembled.	2017-09-14 15:17:40 +02:00
Lennart Poettering	ae835bc75f	bus-unit-util: use STR_IN_SET() where appropriate	2017-08-31 15:45:04 +02:00
Lennart Poettering	bd5a1c91da	bus-unit-util: don't request result property from non-service units	2017-08-31 15:45:04 +02:00
Lennart Poettering	3167f78a11	core: open up LockPersonality= for transient units Let's make "systemd-run -p LockPersonality=1 -t /bin/sh" work.	2017-08-29 15:58:13 +02:00
Yu Watanabe	cffaed83e8	core: add missing properties in D-Bus API Closes #6466.	2017-08-08 00:37:02 +09:00
Yu Watanabe	3536f49e8f	core: add {State,Cache,Log,Configuration}Directory= (#6384 ) This introduces {State,Cache,Log,Configuration}Directory= those are similar to RuntimeDirectory=. They create the directories under /var/lib, /var/cache/, /var/log, or /etc, respectively, with the mode specified in {State,Cache,Log,Configuration}DirectoryMode=. This also fixes #6391.	2017-07-18 14:34:52 +02:00
Yu Watanabe	53f47dfc7b	core: allow preserving contents of RuntimeDirectory= over process restart This introduces RuntimeDirectoryPreserve= option which takes a boolean argument or 'restart'. Closes #6087.	2017-07-17 16:22:25 +09:00
Zbigniew Jędrzejewski-Szmek	23e4a234fc	Merge pull request #6200 from poettering/ioprio-transient	2017-06-26 21:30:36 -04:00
Lennart Poettering	7f452159b8	core: make IOSchedulingClass= and IOSchedulingPriority= settable for transient units This patch is a bit more complex thant I hoped. In particular the single IOScheduling= property exposed on the bus is split up into IOSchedulingClass= and IOSchedulingPriority= (though compat is retained). Otherwise the asymmetry between setting props and getting them is a bit too nasty. Fixes #5613	2017-06-26 17:43:18 +02:00
Lennart Poettering	9efb9df9e3	core: make NotifyAccess= and FileDescriptorStoreMax= available to transient services This is helpful for debugging/testing #5606.	2017-06-26 15:14:41 +02:00
Zbigniew Jędrzejewski-Szmek	804ee07c13	Use "dollar-single-quotes" to escape shell-sensitive strings Also called "ANSI-C Quoting" in info:(bash) ANSI-C Quoting. The escaping rules are a POSIX proposal, and are described in http://austingroupbugs.net/view.php?id=249. There's a lot of back-and-forth on the details of escaping of control characters, but we'll be only using a small subset of the syntax that is common to all proposals and is widely supported. Unfortunately dash and fish and maybe some other shells do not support it (see the man page patch for a list). This allows environment variables to be safely exported using show-environment and imported into the shell. Shells which do not support this syntax will have to do something like export $(systemctl show-environment\|grep -v '=\$') or whatever is appropriate in their case. I think csh and fish do not support the A=B syntax anyway, so the change is moot for them. Fixes #5536. v2: - also escape newlines (which currently disallowed in shell values, so this doesn't really matter), and tabs (as $'\t'), and ! (as $'!'). This way quoted output can be included directly in both interactive and noninteractive bash.	2017-06-19 19:39:43 -04:00
Adrián López	ef6e596ff0	systemctl: show extra args if defined (#5379 )	2017-02-17 15:27:45 -05:00
Evgeny Vereshchagin	0d7578dc30	shared: pass unsigned_long to namespace_flag_from_string_many (#5315 ) Fixes: ``` src/shared/bus-unit-util.c: In function ‘bus_append_unit_property_assignment’: src/shared/bus-unit-util.c:570:65: warning: passing argument 2 of ‘namespace_flag_from_string_many’ from incompatible pointer type [-Wincompatible-pointer-types] r = namespace_flag_from_string_many(eq, &flags); ^ In file included from src/shared/bus-unit-util.c:31:0: src/shared/nsflags.h:41:5: note: expected ‘long unsigned int ’ but argument is of type ‘uint64_t * {aka long long unsigned int }’ int namespace_flag_from_string_many(const char name, unsigned long *ret); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` Closes #5312	2017-02-12 00:38:16 -05:00
Evgeny Vereshchagin	b9e2d822d0	shared: convert unsigned long to uint64_t explicitly (#5314 ) Closes #5313	2017-02-12 00:36:34 -05:00
Lennart Poettering	915e6d1676	core: add RootImage= setting for using a specific image file as root directory for a service This is similar to RootDirectory= but mounts the root file system from a block device or loopback file instead of another directory. This reuses the image dissector code now used by nspawn and gpt-auto-discovery.	2017-02-07 12:19:42 +01:00
Lennart Poettering	20b7a0070c	core: actually make "+" prefix in ReadOnlyPaths=, InaccessiblePaths=, ReadWritablePaths= work `5327c910d2` claimed to add support for "+" for prefixing paths with the configured RootDirectory=. But actually it only implemented it in the backend, it did not add support for it to the configuration file parsers. Fix that now.	2017-02-07 11:22:05 +01:00
Lennart Poettering	5d997827e2	core: add a per-unit setting MountAPIVFS= for mounting /dev, /proc, /sys in conjunction with RootDirectory= This adds a boolean unit file setting MountAPIVFS=. If set, the three main API VFS mounts will be mounted for the service. This only has an effect on RootDirectory=, which it makes a ton times more useful. (This is basically the /dev + /proc + /sys mounting code posted in the original #4727, but rebased on current git, and with the automatic logic replaced by explicit logic controlled by a unit file setting)	2017-02-07 11:22:05 +01:00
Zbigniew Jędrzejewski-Szmek	c73838280c	Modify mount_propagation_flags_from_string to return a normal int code This means that callers can distiguish an error from flags==0, and don't have to special-case the empty string.	2016-12-17 13:57:04 -05:00
Lennart Poettering	4ea0d7f431	core: make "Restart" service property accessible via the transient API Fixes: #4402	2016-12-14 00:54:13 +01:00
Lennart Poettering	d2d6c096f6	core: add ability to define arbitrary bind mounts for services This adds two new settings BindPaths= and BindReadOnlyPaths=. They allow defining arbitrary bind mounts specific to particular services. This is particularly useful for services with RootDirectory= set as this permits making specific bits of the host directory available to chrooted services. The two new settings follow the concepts nspawn already possess in --bind= and --bind-ro=, as well as the .nspawn settings Bind= and BindReadOnly= (and these latter options should probably be renamed to BindPaths= and BindReadOnlyPaths= too). Fixes: #3439	2016-12-14 00:54:10 +01:00
Lennart Poettering	835552511e	core: hook up MountFlags= to the transient unit logic This makes "systemd-run -p MountFlags=shared -t /bin/sh" work, by making MountFlags= to the list of properties that may be accessed transiently.	2016-12-13 21:22:13 +01:00
Jouke Witteveen	7ed0a4c537	bus-util: add protocol error type explanation	2016-11-29 23:19:52 +01:00
Lennart Poettering	289188ca7d	system-run: add support for configuring unit dependencies with --property= Support on the server side has already been in place for quite some time, let's also add support on the client side for this.	2016-11-16 15:03:26 +01:00
Lennart Poettering	c5a97ed132	core: GC redundant device jobs from the run queue In contrast to all other unit types device units when queued just track external state, they cannot effect state changes on their own. Hence unless a client or other job waits for them there's no reason to keep them in the job queue. This adds a concept of GC'ing jobs of this type as soon as no client or other job waits for them anymore. To ensure this works correctly we need to track which clients actually reference a job (i.e. which ones enqueued it). Unfortunately that's pretty nasty to do for direct connections, as sd_bus_track doesn't work for them. For now, work around this, by simply remembering in a boolean that a job was requested by a direct connection, and reset it when we notice the direct connection is gone. This means the GC logic works fine, except that jobs are not immediately removed when direct connections disconnect. In the longer term, a rework of the bus logic should fix this properly. For now this should be good enough, as GC works for fine all cases except this one, and thus is a clear improvement over the previous behaviour. Fixes: #1921	2016-11-16 15:03:26 +01:00
Zbigniew Jędrzejewski-Szmek	c58bd76a6a	tree-wide: make invocations of extract_first_word more uniform (#4627 ) extract_first_words deals fine with the string being NULL, so drop the upfront check for that.	2016-11-11 18:58:41 +01:00
Lennart Poettering	add005357d	core: add new RestrictNamespaces= unit file setting This new setting permits restricting whether namespaces may be created and managed by processes started by a unit. It installs a seccomp filter blocking certain invocations of unshare(), clone() and setns(). RestrictNamespaces=no is the default, and does not restrict namespaces in any way. RestrictNamespaces=yes takes away the ability to create or manage any kind of namspace. "RestrictNamespaces=mnt ipc" restricts the creation of namespaces so that only mount and IPC namespaces may be created/managed, but no other kind of namespaces. This setting should be improve security quite a bit as in particular user namespacing was a major source of CVEs in the kernel in the past, and is accessible to unprivileged processes. With this setting the entire attack surface may be removed for system services that do not make use of namespaces.	2016-11-04 07:40:13 -06:00
Djalal Harouni	502d704e5e	core:sandbox: Add ProtectKernelModules= option This is useful to turn off explicit module load and unload operations on modular kernels. This option removes CAP_SYS_MODULE from the capability bounding set for the unit, and installs a system call filter to block module system calls. This option will not prevent the kernel from loading modules using the module auto-load feature which is a system wide operation.	2016-10-12 13:31:21 +02:00
Zbigniew Jędrzejewski-Szmek	3ccb886283	Allow block and char classes in DeviceAllow bus properties (#4353 ) Allowed paths are unified betwen the configuration file parses and the bus property checker. The biggest change is that the bus code now allows "block-" and "char-" classes. In addition, path_startswith("/dev") was used in the bus code, and startswith("/dev") was used in the config file code. It seems reasonable to use path_startswith() which allows a slightly broader class of strings. Fixes #3935.	2016-10-12 11:12:11 +02:00
Lennart Poettering	59eeb84ba6	core: add two new service settings ProtectKernelTunables= and ProtectControlGroups= If enabled, these will block write access to /sys, /proc/sys and /proc/sys/fs/cgroup.	2016-09-25 10:18:48 +02:00
Evgeny Vereshchagin	29272c04a7	Merge pull request #3909 from poettering/mount-tool add a new tool for creating transient mount and automount units	2016-08-19 23:33:49 +03:00
Lennart Poettering	00d9ef8560	core: add RemoveIPC= setting This adds the boolean RemoveIPC= setting to service, socket, mount and swap units (i.e. all unit types that may invoke processes). if turned on, and the unit's user/group is not root, all IPC objects of the user/group are removed when the service is shut down. The life-cycle of the IPC objects is hence bound to the unit life-cycle. This is particularly relevant for units with dynamic users, as it is essential that no objects owned by the dynamic users survive the service exiting. In fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set. In order to communicate the UID/GID of an executed process back to PID 1 this adds a new "user lookup" socket pair, that is inherited into the forked processes, and closed before the exec(). This is needed since we cannot do NSS from PID 1 due to deadlock risks, However need to know the used UID/GID in order to clean up IPC owned by it if the unit shuts down.	2016-08-19 00:37:25 +02:00
Lennart Poettering	8673cf13c0	bus-util: unify loop around bus_append_unit_property_assignment() This is done exactly the same way a couple of times at various places, let's unify this into one version.	2016-08-18 22:23:31 +02:00
Zbigniew Jędrzejewski-Szmek	5f9a610ad2	Merge pull request #3905 from htejun/cgroup-v2-cpu core: add cgroup CPU controller support on the unified hierarchy (zj: merging not squashing to make it clear against which upstream this patch was developed.)	2016-08-14 18:03:35 -04:00
Tejun Heo	66ebf6c0a1	core: add cgroup CPU controller support on the unified hierarchy Unfortunately, due to the disagreements in the kernel development community, CPU controller cgroup v2 support has not been merged and enabling it requires applying two small out-of-tree kernel patches. The situation is explained in the following documentation. https://git.kernel.org/cgit/linux/kernel/git/tj/cgroup.git/tree/Documentation/cgroup-v2-cpu.txt?h=cgroup-v2-cpu While it isn't clear what will happen with CPU controller cgroup v2 support, there are critical features which are possible only on cgroup v2 such as buffered write control making cgroup v2 essential for a lot of workloads. This commit implements systemd CPU controller support on the unified hierarchy so that users who choose to deploy CPU controller cgroup v2 support can easily take advantage of it. On the unified hierarchy, "cpu.weight" knob replaces "cpu.shares" and "cpu.max" replaces "cpu.cfs_period_us" and "cpu.cfs_quota_us". [Startup]CPUWeight config options are added with the usual compat translation. CPU quota settings remain unchanged and apply to both legacy and unified hierarchies. v2: - Error in man page corrected. - CPU config application in cgroup_context_apply() refactored. - CPU accounting now works on unified hierarchy.	2016-08-07 09:45:39 -04:00
Zbigniew Jędrzejewski-Szmek	d87a2ef782	Merge pull request #3884 from poettering/private-users	2016-08-06 17:04:45 -04:00
Lennart Poettering	41bf0590cc	util-lib: unify parsing of nice level values This adds parse_nice() that parses a nice level and ensures it is in the right range, via a new nice_is_valid() helper. It then ports over a number of users to this. No functional changes.	2016-08-05 11:18:32 +02:00
David Michael	5124866d73	util-lib: add parse_percent_unbounded() for percentages over 100% (#3886 ) This permits CPUQuota to accept greater values as documented.	2016-08-04 13:09:54 +02:00
Lennart Poettering	d251207d55	core: add new PrivateUsers= option to service execution This setting adds minimal user namespacing support to a service. When set the invoked processes will run in their own user namespace. Only a trivial mapping will be set up: the root user/group is mapped to root, and the user/group of the service will be mapped to itself, everything else is mapped to nobody. If this setting is used the service runs with no capabilities on the host, but configurable capabilities within the service. This setting is particularly useful in conjunction with RootDirectory= as the need to synchronize /etc/passwd and /etc/group between the host and the service OS tree is reduced, as only three UID/GIDs need to match: root, nobody and the user of the service itself. But even outside the RootDirectory= case this setting is useful to substantially reduce the attack surface of a service. Example command to test this: systemd-run -p PrivateUsers=1 -p User=foobar -t /bin/sh This runs a shell as user "foobar". When typing "ps" only processes owned by "root", by "foobar", and by "nobody" should be visible.	2016-08-03 20:42:04 +02:00
Zbigniew Jędrzejewski-Szmek	dadd6ecfa5	Merge pull request #3728 from poettering/dynamic-users	2016-07-25 16:40:26 -04:00
Lennart Poettering	29206d4619	core: add a concept of "dynamic" user ids, that are allocated as long as a service is running This adds a new boolean setting DynamicUser= to service files. If set, a new user will be allocated dynamically when the unit is started, and released when it is stopped. The user ID is allocated from the range 61184..65519. The user will not be added to /etc/passwd (but an NSS module to be added later should make it show up in getent passwd). For now, care should be taken that the service writes no files to disk, since this might result in files owned by UIDs that might get assigned dynamically to a different service later on. Later patches will tighten sandboxing in order to ensure that this cannot happen, except for a few selected directories. A simple way to test this is: systemd-run -p DynamicUser=1 /bin/sleep 99999	2016-07-22 15:53:45 +02:00
Lennart Poettering	f7903e8db6	core: rename MemoryLimitByPhysicalMemory transient property to MemoryLimitScale That way, we can neatly keep this in line with the new TasksMaxScale= option. Note that we didn't release a version with MemoryLimitByPhysicalMemory= yet, hence this change should be unproblematic without breaking API.	2016-07-22 15:33:12 +02:00
Lennart Poettering	83f8e80857	core: support percentage specifications on TasksMax= This adds support for a TasksMax=40% syntax for specifying values relative to the system's configured maximum number of processes. This is useful in order to neatly subdivide the available room for tasks within containers.	2016-07-22 15:33:12 +02:00
Alessandro Puccetti	2a624c36e6	doc,core: Read{Write,Only}Paths= and InaccessiblePaths= This patch renames Read{Write,Only}Directories= and InaccessibleDirectories= to Read{Write,Only}Paths= and InaccessiblePaths=, previous names are kept as aliases but they are not advertised in the documentation. Renamed variables: `read_write_dirs` --> `read_write_paths` `read_only_dirs` --> `read_only_paths` `inaccessible_dirs` --> `inaccessible_paths`	2016-07-19 17:22:02 +02:00
Ian Lee	7351ded5b9	Do not ellipsize cgroups when showing slices in --full mode (#3560 ) Do not ellipsize cgroups when showing slices in --full mode	2016-06-21 16:10:31 +02:00
Lennart Poettering	d58d600efd	systemctl: allow percent-based MemoryLimit= settings via systemctl set-property The unit files already accept relative, percent-based memory limit specification, let's make sure "systemctl set-property" support this too. Since we want the physical memory size of the destination machine to apply we pass the percentage in a new set of properties that only exist for this purpose, and can only be set.	2016-06-14 20:01:45 +02:00
Lennart Poettering	9184ca48ea	util-lib: introduce parse_percent() for parsing percent specifications And port a couple of users over to it.	2016-06-14 19:50:38 +02:00
Topi Miettinen	f3e4363593	core: Restrict mmap and mprotect with PAGE_WRITE\|PAGE_EXEC (#3319 ) (#3379 ) New exec boolean MemoryDenyWriteExecute, when set, installs a seccomp filter to reject mmap(2) with PAGE_WRITE\|PAGE_EXEC and mprotect(2) with PAGE_EXEC.	2016-06-03 17:58:18 +02:00
Tejun Heo	e57c9ce169	core: always use "infinity" for no upper limit instead of "max" (#3417 ) Recently added cgroup unified hierarchy support uses "max" in configurations for no upper limit. While consistent with what the kernel uses for no upper limit, it is inconsistent with what systemd uses for other controllers such as memory or pids. There's no point in introducing another term. Update cgroup unified hierarchy support so that "infinity" is the only term that systemd uses for no upper limit.	2016-06-03 17:49:05 +02:00

1 2

63 commits