Systemd

Commit Graph

Author	SHA1	Message	Date
Zbigniew Jędrzejewski-Szmek	349cc4a507	build-sys: use #if Y instead of #ifdef Y everywhere The advantage is that is the name is mispellt, cpp will warn us. $ git grep -Ee "conf.set$'(HAVE\|ENABLE)_" -l\|xargs sed -r -i "s/conf.set\('(HAVE\|ENABLE)_/conf.set10('\1_/" $ git grep -Ee '#ifn?def (HAVE\|ENABLE)' -l\|xargs sed -r -i 's/#ifdef (HAVE\|ENABLE)/#if \1/; s/#ifndef (HAVE\|ENABLE)/#if ! \1/;' $ git grep -Ee 'if.defined\(HAVE' -l\|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_])$/\1/g' $ git grep -Ee 'if.defined$ENABLE' -l\|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_])$/\1/g' + manual changes to meson.build squash! build-sys: use #if Y instead of #ifdef Y everywhere v2: - fix incorrect setting of HAVE_LIBIDN2	2017-10-04 12:09:29 +02:00
Zbigniew Jędrzejewski-Szmek	ac6e8be66e	core: use strv_isempty to check if supplementary_groups is empty With the previous commit, we know that it will be NULL if empty, but it's safe to always use strv_isempty() in case the code changes in the future.	2017-10-04 11:33:30 +02:00
Lennart Poettering	da50b85af7	core: when looking for a UID to use for a dynamic UID start with the current owner of the StateDirectory= and friends Let's optimize dynamic UID allocation a bit: if a StateDirectory= (or suchlike) is configured, we start our allocation loop from that UID and use it if it currently isn't used otherwise. This is beneficial as it saves us from having to expensively recursively chown() these directories in the typical case (which StateDirectory= does when it notices that the owner of the directory doesn't match the UID picked). With this in place we now have the a three-phase logic for allocating a dynamic UID: a) first, we try to use the owning UID of StateDirectory=, CacheDirectory=, LogDirectory= if that exists and is currently otherwise unused. b) if that didn't work out, we hash the UID from the service name c) if that didn't yield an unused UID either, randomly pick new ones until we find a free one.	2017-10-02 17:41:44 +02:00
Lennart Poettering	6c47cd7d3b	execute: make StateDirectory= and friends compatible with DynamicUser=1 and RootDirectory=/RootImage= Let's clean up the interaction of StateDirectory= (and friends) to DynamicUser=1: instead of creating these directories directly below /var/lib, place them in /var/lib/private instead if DynamicUser=1 is set, making that directory 0700 and owned by root:root. This way, if a dynamic UID is later reused, access to the old run's state directory is prohibited for that user. Then, use file system namespacing inside the service to make /var/lib/private a readable tmpfs, hiding all state directories that are not listed in StateDirectory=, and making access to the actual state directory possible. Mount all directories listed in StateDirectory= to the same places inside the service (which means they'll now be mounted into the tmpfs instance). Finally, add a symlink from the state directory name in /var/lib/ to the one in /var/lib/private, so that both the host and the service can access the path under the same location. Here's an example: let's say a service runs with StateDirectory=foo. When DynamicUser=0 is set, it will get the following setup, and no difference between what the unit and what the host sees: /var/lib/foo (created as directory) Now, if DynamicUser=1 is set, we'll instead get this on the host: /var/lib/private (created as directory with mode 0700, root:root) /var/lib/private/foo (created as directory) /var/lib/foo → private/foo (created as symlink) And from inside the unit: /var/lib/private (a tmpfs mount with mode 0755, root:root) /var/lib/private/foo (bind mounted from the host) /var/lib/foo → private/foo (the same symlink as above) This takes inspiration from how container trees are protected below /var/lib/machines: they generally reuse UIDs/GIDs of the host, but because /var/lib/machines itself is set to 0700 host users cannot access files in the container tree even if the UIDs/GIDs are reused. However, for this commit we add one further trick: inside and outside of the unit /var/lib/private is a different thing: outside it is a plain, inaccessible directory, and inside it is a world-readable tmpfs mount with only the whitelisted subdirs below it, bind mounte din. This means, from the outside the dir acts as an access barrier, but from the inside it does not. And the symlink created in /var/lib/foo itself points across the barrier in both cases, so that root and the unit's user always have access to these dirs without knowing the details of this mounting magic. This logic resolves a major shortcoming of DynamicUser=1 units: previously they couldn't safely store persistant data. With this change they can have their own private state, log and data directories, which they can write to, but which are protected from UID recycling. With this change, if RootDirectory= or RootImage= are used it is ensured that the specified state/log/cache directories are always mounted in from the host. This change of semantics I think is much preferable since this means the root directory/image logic can be used easily for read-only resource bundling (as all writable data resides outside of the image). Note that this is a change of behaviour, but given that we haven't released any systemd version with StateDirectory= and friends implemented this should be a safe change to make (in particular as previously it wasn't clear what would actually happen when used in combination). Moreover, by making this change we can later add a "+" modifier to these setings too working similar to the same modifier in ReadOnlyPaths= and friends, making specified paths relative to the container itself.	2017-10-02 17:41:44 +02:00
Lennart Poettering	72fd17682d	core: usually our enum's _INVALID and _MAX special values are named after the full type In most cases we followed the rule that the special _INVALID and _MAX values we use in our enums use the full type name as prefix (in contrast to regular values that we often make shorter), do so for ExecDirectoryType as well. No functional changes, just a little bit of renaming to make this code more like the rest.	2017-10-02 17:41:43 +02:00
Lennart Poettering	a1164ae380	core: chown() StateDirectory= and friends recursively when starting a service This is particularly useful when used in conjunction with DynamicUser=1, where the UID might change for every invocation, but is useful in other cases too, for example, when these directories are shared between systems where the UID assignments differ slightly.	2017-10-02 17:41:43 +02:00
Lennart Poettering	40a80078d2	execute: let's close glibc syslog channels too Just in case something opened them, let's make sure glibc invalidates them too. Thankfully so far no library opened log channels behind our back, at least as far as I know, hence this is actually a NOP, but let's better be safe than sorry.	2017-09-26 17:52:25 +02:00
Lennart Poettering	12145637e9	execute: normalize logging in execute.c Now that logging can implicitly reopen the log streams when needed we can log errors without any special magic, hence let's normalize things, and log the same way we do everywhere else.	2017-09-26 17:51:22 +02:00
Lennart Poettering	86ffb32560	execute: drop explicit log_open()/log_close() now that it is unnecessary	2017-09-26 17:46:34 +02:00
Lennart Poettering	2c027c62dd	execute: make use of the new logging mode in execute.c	2017-09-26 17:46:34 +02:00
Lennart Poettering	82677ae4c7	execute: downgrade a log message ERR → WARNING, since we proceed ignoring its result	2017-09-26 17:46:33 +02:00
Lennart Poettering	8002fb9747	execute: rework logging in setup_keyring() to include unit info Let's use log_unit_error() instead of log_error() everywhere (and friends).	2017-09-26 17:46:33 +02:00
Lennart Poettering	e6a7ec4b8e	io-util: add new IOVEC_INIT/IOVEC_MAKE macros This adds IOVEC_INIT() and IOVEC_MAKE() for initializing iovec structures from a pointer and a size. On top of these IOVEC_INIT_STRING() and IOVEC_MAKE_STRING() are added which take a string and automatically determine the size of the string using strlen(). This patch removes the old IOVEC_SET_STRING() macro, given that IOVEC_MAKE_STRING() is now useful for similar purposes. Note that the old IOVEC_SET_STRING() invocations were two characters shorter than the new ones using IOVEC_MAKE_STRING(), but I think the new syntax is more readable and more generic as it simply resolves to a C99 literal structure initialization. Moreover, we can use very similar syntax now for initializing strings and pointer+size iovec entries. We canalso use the new macros to initialize function parameters on-the-fly or array definitions. And given that we shouldn't have so many ways to do the same stuff, let's just settle on the new macros. (This also converts some code to use _cleanup_ where dynamically allocated strings were using IOVEC_SET_STRING() before, to modernize things a bit)	2017-09-22 15:28:04 +02:00
Lennart Poettering	f1c50becda	core: make sure to log invocation ID of units also when doing structured logging	2017-09-22 15:24:55 +02:00
Jan Synacek	f679ed6116	execute: fix typo in error message (#6881 )	2017-09-21 10:38:52 +02:00
Zbigniew Jędrzejewski-Szmek	61ceaea5c8	Move one space from dbus-execute.c to execute.c The number of spaces is conserved ;)	2017-09-16 08:45:02 +02:00
Zbigniew Jędrzejewski-Szmek	3d7d3cbbda	Merge pull request #6832 from poettering/keyring-mode Add KeyringMode unit property to fix cryptsetup key caching	2017-09-15 21:24:48 +02:00
Lennart Poettering	b1edf4456e	core: add new per-unit setting KeyringMode= for controlling kernel keyring setup Usually, it's a good thing that we isolate the kernel session keyring for the various services and disconnect them from the user keyring. However, in case of the cryptsetup key caching we actually want that multiple instances of the cryptsetup service can share the keys in the root user's user keyring, hence we need to be able to disable this logic for them. This adds KeyringMode=inherit\|private\|shared: inherit: don't do any keyring magic (this is the default in systemd --user) private: a private keyring as before (default in systemd --system) shared: the new setting	2017-09-15 16:53:35 +02:00
Lennart Poettering	0460aa5c08	execute: improve and augment execution log messages Let's generate friendly messages for more cases, and make slight adjustments to the existing messages.	2017-09-15 16:43:06 +02:00
Lennart Poettering	ab2116b140	core: make sure that $JOURNAL_STREAM prefers stderr over stdout information (#6824 ) If two separate log streams are connected to stdout and stderr, let's make sure $JOURNAL_STREAM points to the latter, as that's the preferred log destination, and the environment variable has been created in order to permit services to automatically upgrade from stderr based logging to native journal logging. Also, document this behaviour. Fixes: #6800	2017-09-15 08:26:38 +02:00
Lennart Poettering	00819cc151	core: add new UnsetEnvironment= setting for unit files With this setting we can explicitly unset specific variables for processes of a unit, as last step of assembling the environment block for them. This is useful to fix #6407. While we are at it, greatly expand the documentation on how the environment block for forked off processes is assembled.	2017-09-14 15:17:40 +02:00
Lennart Poettering	21022b9dde	util-lib: wrap personality() to fix up broken glibc error handling (#6766 ) glibc appears to propagate different errors in different ways, let's fix this up, so that our own code doesn't get confused by this. See #6752 + #6737 for details. Fixes: #6755	2017-09-08 17:16:29 +03:00
Lennart Poettering	aac8c0c382	execute: minor ExecOutput handling beautification (#6711 ) Let's clean up the checking for the various ExecOutput values a bit, let's use IN_SET everywhere, and the same concepts for all three bools we pass to dprintf().	2017-09-01 09:04:27 +09:00
Lennart Poettering	e8132d63fe	seccomp: default to something resembling the current personality when locking it Let's lock the personality to the currently set one, if nothing is specifically specified. But do so with a grain of salt, and never default to any exotic personality here, but only PER_LINUX or PER_LINUX32.	2017-08-29 15:56:57 +02:00
Topi Miettinen	78e864e5b3	seccomp: LockPersonality boolean (#6193 ) Add LockPersonality boolean to allow locking down personality(2) system call so that the execution domain can't be changed. This may be useful to improve security because odd emulations may be poorly tested and source of vulnerabilities, while system services shouldn't need any weird personalities.	2017-08-29 15:54:50 +02:00
Yu Watanabe	5ce96b141a	Merge pull request #6582 from poettering/logind-tty various tty path parsing fixes	2017-08-26 22:12:48 +09:00
Lennart Poettering	165a31c0db	core: add two new special ExecStart= character prefixes This patch adds two new special character prefixes to ExecStart= and friends, in addition to the existing "-", "@" and "+": "!" → much like "+", except with a much reduced effect as it only disables the actual setresuid()/setresgid()/setgroups() calls, but leaves all other security features on, including namespace options. This is very useful in combination with RuntimeDirectory= or DynamicUser= and similar option, as a user is still allocated and used for the runtime directory, but the actual UID/GID dropping is left to the daemon process itself. This should make RuntimeDirectory= a lot more useful for daemons which insist on doing their own privilege dropping. "!!" → Similar to "!", but on systems supporting ambient caps this becomes a NOP. This makes it relatively straightforward to write unit files that make use of ambient capabilities to let systemd drop all privs while retaining compatibility with systems that lack ambient caps, where priv dropping is the left to the daemon codes themselves. This is an alternative approach to #6564 and related PRs.	2017-08-10 15:04:32 +02:00
Lennart Poettering	43b1f7092d	execute: needs_{selinux,apparmor,smack} → use_{selinux,apparmor,smack} These booleans simply store whether selinux/apparmor/smack are supposed ot be used, and chache the various mac_xyz_use() calls before we transition into the namespace, hence let's use the same verb for the variables and the functions: "use"	2017-08-10 15:02:50 +02:00
Lennart Poettering	9f6444eb92	execute: make use of IN_SET() where we can	2017-08-10 15:02:50 +02:00
Lennart Poettering	937ccce94c	execute: simplify needs_sandboxing checking Let's merge three if blocks that shall only run when sandboxing is applied into one. Note that this changes behaviour in one corner case: PrivateUsers=1 is now honours both PermissionsStartOnly= and the "+" modifier in ExecStart=, and not just the former, as before. This was an oversight, so let's fix this now, at a point in time the option isn't used much yet.	2017-08-10 15:02:50 +02:00
Lennart Poettering	1703fa41a7	core: rename EXEC_APPLY_PERMISSIONS → EXEC_APPLY_SANDBOXING "Permissions" was a bit of a misnomer, as it suggests that UNIX file permission bits are adjusted, which aren't really changed here. Instead, this is about UNIX credentials such as users or groups, as well as namespacing, hence let's use a more generic term here, without any misleading reference to UNIX file permissions: "sandboxing", which shall refer to all kinds of sandboxing technologies, including UID/GID dropping, selinux relabelling, namespacing, seccomp, and so on.	2017-08-10 15:02:50 +02:00
Lennart Poettering	584b8688d1	execute: also fold the cgroup delegate bit into ExecFlags	2017-08-10 15:02:50 +02:00
Lennart Poettering	ac6479781e	execute: also control the SYSTEMD_NSS_BYPASS_BUS through an ExecFlags field Also, correct the logic while we are at it: the variable is only required for system services, not user services.	2017-08-10 15:02:49 +02:00
Lennart Poettering	c71b2eb77e	core: don't chown() the configuration directory The configuration directory is commonly not owned by a service, but remains root-owned, hence don't change the owner automatically for it.	2017-08-10 15:02:49 +02:00
Lennart Poettering	8679efde21	execute: add one more ExecFlags flag, for controlling unconditional directory chowning Let's decouple the Manager object from the execution logic a bit more here too, and simply pass along the fact whether we should unconditionally chown the runtime/... directories via the ExecFlags field too.	2017-08-10 14:44:58 +02:00
Lennart Poettering	af635cf377	execute: let's decouple execute.c a bit from the unit logic Let's try to decouple the execution engine a bit from the Unit/Manager concept, and hence pass one more flag as part of the ExecParameters flags field.	2017-08-10 14:44:58 +02:00
Lennart Poettering	3ed0cd26ea	execute: replace command flag bools by a flags field This way, we can extend it later on in an easier way, and can pass it along nicely.	2017-08-10 14:44:58 +02:00
Lennart Poettering	a119ec7c82	util-lib: add a new skip_dev_prefix() helper This new helper removes a leading /dev if there is one. We have code doing this all over the place, let's unify this, and correct it while we are at it, by using path_startswith() rather than startswith() to drop the prefix.	2017-08-09 19:01:18 +02:00
Yu Watanabe	07d46372fe	securebits-util: add secure_bits_{from_string,to_string_alloc}()	2017-08-07 23:40:25 +09:00
Yu Watanabe	dd1f5bd0aa	cap-list: add capability_set_{from_string,to_string_alloc}()	2017-08-07 23:25:11 +09:00
Yu Watanabe	837df14040	core: do not ignore returned values	2017-08-06 23:34:55 +09:00
Yu Watanabe	ecfbc84f1c	core: define variables only when they are required Follow-up for `7f18ef0a55`.	2017-08-06 13:08:34 +09:00
Fabio Kung	7f18ef0a55	core: check which MACs to use before a new mount ns is created (#6498 ) /sys is not guaranteed to exist when a new mount namespace is created. It is only mounted under conditions specified by `namespace_info_mount_apivfs`. Checking if the three available MAC LSMs are enabled requires a sysfs mounted at /sys, so the checks are moved to before a new mount ns is created.	2017-08-01 09:15:18 +02:00
Lennart Poettering	c867611e0a	execute: don't pass unit ID in --user mode to journald for stream logging When we create a log stream connection to journald, we pass along the unit ID. With this change we do this only when we run as system instance, not as user instance, to remove the ambiguity whether a user or system unit is specified. The effect of this change is minor: journald ignores the field anyway from clients with UID != 0. This patch hence only fixes the unit attribution for the --user instance of the root user.	2017-07-31 18:01:42 +02:00
Lennart Poettering	92a17af991	execute: make some code shorter Let's simplify some lines to make it shorter.	2017-07-31 18:01:42 +02:00
Lennart Poettering	cad93f2996	core, sd-bus, logind: make use of uid_is_valid() in more places	2017-07-31 18:01:42 +02:00
Lennart Poettering	df0ff12775	tree-wide: make use of getpid_cached() wherever we can This moves pretty much all uses of getpid() over to getpid_raw(). I didn't specifically check whether the optimization is worth it for each replacement, but in order to keep things simple and systematic I switched over everything at once.	2017-07-20 20:27:24 +02:00
Yu Watanabe	3536f49e8f	core: add {State,Cache,Log,Configuration}Directory= (#6384 ) This introduces {State,Cache,Log,Configuration}Directory= those are similar to RuntimeDirectory=. They create the directories under /var/lib, /var/cache/, /var/log, or /etc, respectively, with the mode specified in {State,Cache,Log,Configuration}DirectoryMode=. This also fixes #6391.	2017-07-18 14:34:52 +02:00
Lennart Poettering	688230d3a7	Merge pull request #6354 from walyong/smack_process_label_free core: modify resource leak and missed security context dump	2017-07-17 10:04:12 +02:00
Yu Watanabe	23a7448efa	core: support subdirectories in RuntimeDirectory= option	2017-07-17 16:30:53 +09:00
Yu Watanabe	53f47dfc7b	core: allow preserving contents of RuntimeDirectory= over process restart This introduces RuntimeDirectoryPreserve= option which takes a boolean argument or 'restart'. Closes #6087.	2017-07-17 16:22:25 +09:00
WaLyong Cho	80c21aea11	core: dump also missed security context	2017-07-13 13:12:24 +09:00
WaLyong Cho	5b8e1b7755	core: modify resource leak by SmackProcessLabel=	2017-07-13 13:12:15 +09:00
Lennart Poettering	782c925f7f	Revert "core: link user keyring to session keyring (#6275 )" (#6342 ) This reverts commit `437a85112e`. The outcome of this isn't that clear, let's revert this for now, see discussion on #6286.	2017-07-12 10:00:43 -04:00
Christian Hesse	437a85112e	core: link user keyring to session keyring (#6275 ) Commit `74dd6b515f` (core: run each system service with a fresh session keyring) broke adding keys to user keyring. Added keys could not be accessed with error message: keyctl_read_alloc: Permission denied So link the user keyring to our session keyring.	2017-07-04 09:38:31 +02:00
Lennart Poettering	7f452159b8	core: make IOSchedulingClass= and IOSchedulingPriority= settable for transient units This patch is a bit more complex thant I hoped. In particular the single IOScheduling= property exposed on the bus is split up into IOSchedulingClass= and IOSchedulingPriority= (though compat is retained). Otherwise the asymmetry between setting props and getting them is a bit too nasty. Fixes #5613	2017-06-26 17:43:18 +02:00
Zbigniew Jędrzejewski-Szmek	7e867138f5	Merge pull request #5600 from fbuihuu/make-logind-restartable Make logind restartable.	2017-06-24 18:58:36 -04:00
Franck Bui	4c47affcf1	core: remove the redundancy of 'n_fds' and 'n_storage_fds' in ExecParameters struct 'n_fds' field in the ExecParameters structure was counting the total number of file descriptors to be passed to a unit. This counter also includes the number of passed socket fds which is counted by 'n_socket_fds' already. This patch removes that redundancy by replacing 'n_fds' with 'n_storage_fds'. The new field only counts the fds passed via the storage store mechanism. That way each fd is counted at one place only. Subsequently the patch makes sure to fix code that used 'n_fds' and also wanted to iterate through all of them by explicitly adding 'n_socket_fds' + 'n_storage_fds'. Suggested by Lennart.	2017-06-08 16:21:35 +02:00
Franck Bui	9b1419111a	core: only apply NonBlocking= to fds passed via socket activation Make sure to only apply the O_NONBLOCK flag to the fds passed via socket activation. Previously the flag was also applied to the fds which came from the fd store but this was incorrect since services, after being restarted, expect that these passed fds have their flags unchanged and can be reused as before. The documentation was a bit unclear about this so clarify it.	2017-06-06 22:42:50 +02:00
Zbigniew Jędrzejewski-Szmek	52511fae7b	core: fix warning about unsigned variable (#5935 ) Fixup for `d8c92e8bc7`.	2017-05-11 08:15:28 +02:00
Lennart Poettering	4e168f4606	Merge pull request #5420 from OpenDZ/tixxdz/namespace-fixes-v2 Namespace: RootImage= RootDirectory= and MountAPIVFS fixes	2017-05-09 20:42:32 +02:00
Aggelos Avgerinos	488ab41cb8	execute: Properly log errors considering socket fds (#5910 ) Till now if the params->n_fds was 0, systemd was logging that there were more than one sockets. Thanks @gregoryp and @VFXcode who did the most work debugging this.	2017-05-08 19:09:22 -04:00
Zbigniew Jędrzejewski-Szmek	d8c92e8bc7	execute: filter out "." for ".." in EnvironmentFile= globs too This doesn't really matter much, only in case somebody would use something strange like EnvironmentFile=/etc/something/.* Make sure that "." and ".." is not returned by that glob. This makes all our globbing patterns behave the same.	2017-04-27 13:21:08 -04:00
Djalal Harouni	74e941c022	Merge pull request #5774 from keszybz/printf-annotations Printf annotation improvements	2017-04-23 01:03:42 +02:00
Zbigniew Jędrzejewski-Szmek	ba360bb05c	tree-wide: mark log_struct with _printf_ and fix fallout log_struct takes multiple format strings, each one followed by arguments. The _printf_ annotation is not sufficiently flexible to express this, but we can still annotate the first format string, though not its arguments (because their number is unknown). With the annotation, the places which specified the message id or similar as the first pattern cause a warning from -Wformat-nonliteral. This can be trivially fixed by putting the MESSAGE= first. This change will help find issues where a non-literal is erroneously used as the pattern.	2017-04-21 13:37:04 -04:00
Yu Watanabe	4d8b0f0f7a	core: downgrade error message if command is prefixed with `-` and the command is not found Fixes #5621	2017-04-03 15:38:37 +09:00
Djalal Harouni	9c988f934b	namespace: Apply MountAPIVFS= only when a Root directory is set The MountAPIVFS= documentation says that this options has no effect unless used in conjunction with RootDirectory= or RootImage= ,lets fix this and avoid to create private mount namespaces where it is not needed.	2017-03-05 21:39:43 +01:00
Zbigniew Jędrzejewski-Szmek	643f4706b0	core/execute: add (void) CID #778045.	2017-02-20 16:02:18 -05:00
Zbigniew Jędrzejewski-Szmek	2b0445262a	tree-wide: add SD_ID128_MAKE_STR, remove LOG_MESSAGE_ID Embedding sd_id128_t's in constant strings was rather cumbersome. We had SD_ID128_CONST_STR which returned a const char[], but it had two problems: - it wasn't possible to statically concatanate this array with a normal string - gcc wasn't really able to optimize this, and generated code to perform the "conversion" at runtime. Because of this, even our own code in coredumpctl wasn't using SD_ID128_CONST_STR. Add a new macro to generate a constant string: SD_ID128_MAKE_STR. It is not as elegant as SD_ID128_CONST_STR, because it requires a repetition of the numbers, but in practice it is more convenient to use, and allows gcc to generate smarter code: $ size .libs/systemd{,-logind,-journald}{.old,} text data bss dec hex filename 1265204 149564 4808 1419576 15a938 .libs/systemd.old 1260268 149564 4808 1414640 1595f0 .libs/systemd 246805 13852 209 260866 3fb02 .libs/systemd-logind.old 240973 13852 209 255034 3e43a .libs/systemd-logind 146839 4984 34 151857 25131 .libs/systemd-journald.old 146391 4984 34 151409 24f71 .libs/systemd-journald It is also much easier to check if a certain binary uses a certain MESSAGE_ID: $ strings .libs/systemd.old\|grep MESSAGE_ID MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x $ strings .libs/systemd\|grep MESSAGE_ID MESSAGE_ID=c7a787079b354eaaa9e77b371893cd27 MESSAGE_ID=b07a249cd024414a82dd00cd181378ff MESSAGE_ID=641257651c1b4ec9a8624d7a40a9e1e7 MESSAGE_ID=de5b426a63be47a7b6ac3eaac82e2f6f MESSAGE_ID=d34d037fff1847e6ae669a370e694725 MESSAGE_ID=7d4958e842da4a758f6c1cdc7b36dcc5 MESSAGE_ID=1dee0369c7fc4736b7099b38ecb46ee7 MESSAGE_ID=39f53479d3a045ac8e11786248231fbf MESSAGE_ID=be02cf6855d2428ba40df7e9d022f03d MESSAGE_ID=7b05ebc668384222baa8881179cfda54 MESSAGE_ID=9d1aaa27d60140bd96365438aad20286	2017-02-15 00:45:12 -05:00
Lennart Poettering	6818c54ca6	core: skip ReadOnlyPaths= and other permission-related mounts on PermissionsStartOnly= (#5309 ) ReadOnlyPaths=, ProtectHome=, InaccessiblePaths= and ProtectSystem= are about restricting access and little more, hence they should be disabled if PermissionsStartOnly= is used or ExecStart= lines are prefixed with a "+". Do that. (Note that we will still create namespaces and stuff, since that's about a lot more than just permissions. We'll simply disable the effect of the four options mentioned above, but nothing else mount related.) This also adds a test for this, to ensure this works as intended. No documentation updates, as the documentation are already vague enough to support the new behaviour ("If true, the permission-related execution options…"). We could clarify this further, but I think we might want to extend the switches' behaviour a bit more in future, hence leave it at this for now. Fixes: #5308	2017-02-12 00:44:46 -05:00
Lennart Poettering	376fecf670	execute: set the right exit status for CHDIR vs. CHROOT Fixes: #5125	2017-02-09 13:18:35 +01:00
Lennart Poettering	3b0e5bb524	execute: use prefix_roota() where appropriate	2017-02-09 13:18:35 +01:00
Lennart Poettering	6732edab4e	execute: set working directory to /root if User= is not set, but WorkingDirectory=~ is Or actually, try to to do the right thing depending on what is available: - If we know $HOME from User=, then use that. - If the UID for the service is 0, hardcode that WorkingDirectory=~ means WorkingDirectory=/root - In any other case (which will be the unprivileged --user case), use get_home_dir() to find the $HOME of the user we are running as. - Otherwise fail. Fixes: #5246 #5124	2017-02-09 13:17:58 +01:00
Lennart Poettering	23deef88b9	Revert "core/execute: set HOME, USER also for root users" This reverts commit `8b89628a10`. This broke #5246	2017-02-09 11:43:44 +01:00
Lennart Poettering	915e6d1676	core: add RootImage= setting for using a specific image file as root directory for a service This is similar to RootDirectory= but mounts the root file system from a block device or loopback file instead of another directory. This reuses the image dissector code now used by nspawn and gpt-auto-discovery.	2017-02-07 12:19:42 +01:00
Lennart Poettering	5d997827e2	core: add a per-unit setting MountAPIVFS= for mounting /dev, /proc, /sys in conjunction with RootDirectory= This adds a boolean unit file setting MountAPIVFS=. If set, the three main API VFS mounts will be mounted for the service. This only has an effect on RootDirectory=, which it makes a ton times more useful. (This is basically the /dev + /proc + /sys mounting code posted in the original #4727, but rebased on current git, and with the automatic logic replaced by explicit logic controlled by a unit file setting)	2017-02-07 11:22:05 +01:00
Zbigniew Jędrzejewski-Szmek	6a93917df9	core/execute: pass the username to utmp/wtmp database Before previous commit, username would be NULL for root, and set only for other users. So the argument passed to utmp_put_init_process() would be "root" for other users and NULL for root. Seems strange. Instead, always pass the username if available.	2017-02-03 11:49:43 -05:00
Zbigniew Jędrzejewski-Szmek	8b89628a10	core/execute: set HOME, USER also for root users This changes the environment for services running as root from: LANG=C.utf8 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin INVOCATION_ID=ffbdec203c69499a9b83199333e31555 JOURNAL_STREAM=8:1614518 to LANG=C.utf8 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin HOME=/root LOGNAME=root USER=root SHELL=/bin/sh INVOCATION_ID=15a077963d7b4ca0b82c91dc6519f87c JOURNAL_STREAM=8:1616718 Making the environment special for the root user complicates things unnecessarily. This change simplifies both our logic (by making the setting of the variables unconditional), and should also simplify the logic in services (particularly scripts). Fixes #5124.	2017-02-03 11:49:22 -05:00
Zbigniew Jędrzejewski-Szmek	587ab01b53	core/execute.c: check asprintf return value in the usual fashion This is unlikely to fail, but we cannot rely on asprintf return value on failure, so let's just be correct here. CID #1368227.	2017-01-31 11:31:47 -05:00
Zbigniew Jędrzejewski-Szmek	56fbd56143	core/execute: reformat exec_context_named_iofds() for legibility	2017-01-31 11:23:10 -05:00
Zbigniew Jędrzejewski-Szmek	06ec51d8ef	core/execute: fix strv memleak compile_read_write_paths() returns a normal strv from strv_copy(), and setup_namespace() uses it read-only, so we should use strv_free to deallocate.	2017-01-24 22:26:10 -05:00
Zbigniew Jędrzejewski-Szmek	5b3637b44a	Merge pull request #4991 from poettering/seccomp-fix	2017-01-17 23:10:46 -05:00
Zbigniew Jędrzejewski-Szmek	70dd455c8e	pid1: provide a more detailed error message when execution fails (#5074 ) Fixes #5000.	2017-01-17 22:38:55 -05:00
Lennart Poettering	469830d142	seccomp: rework seccomp code, to improve compat with some archs This substantially reworks the seccomp code, to ensure better compatibility with some architectures, including i386. So far we relied on libseccomp's internal handling of the multiple syscall ABIs supported on Linux. This is problematic however, as it does not define clear semantics if an ABI is not able to support specific seccomp rules we install. This rework hence changes a couple of things: - We no longer use seccomp_rule_add(), but only seccomp_rule_add_exact(), and fail the installation of a filter if the architecture doesn't support it. - We no longer rely on adding multiple syscall architectures to a single filter, but instead install a separate filter for each syscall architecture supported. This way, we can install a strict filter for x86-64, while permitting a less strict filter for i386. - All high-level filter additions are now moved from execute.c to seccomp-util.c, so that we can test them independently of the service execution logic. - Tests have been added for all types of our seccomp filters. - SystemCallFilters= and SystemCallArchitectures= are now implemented in independent filters and installation logic, as they semantically are very much independent of each other. Fixes: #4575	2017-01-17 22:14:27 -05:00
Zbigniew Jędrzejewski-Szmek	4014818d53	Merge pull request #4806 from poettering/keyring-init set up a per-service session kernel keyring, and store the invocation ID in it	2016-12-13 23:24:42 -05:00
Lennart Poettering	d2d6c096f6	core: add ability to define arbitrary bind mounts for services This adds two new settings BindPaths= and BindReadOnlyPaths=. They allow defining arbitrary bind mounts specific to particular services. This is particularly useful for services with RootDirectory= set as this permits making specific bits of the host directory available to chrooted services. The two new settings follow the concepts nspawn already possess in --bind= and --bind-ro=, as well as the .nspawn settings Bind= and BindReadOnly= (and these latter options should probably be renamed to BindPaths= and BindReadOnlyPaths= too). Fixes: #3439	2016-12-14 00:54:10 +01:00
Lennart Poettering	b3415f5dae	core: store the invocation ID in the per-service keyring Let's store the invocation ID in the per-service keyring as a root-owned key, with strict access rights. This has the advantage over the environment-based ID passing that it also works from SUID binaries (as they key cannot be overidden by unprivileged code starting them), in contrast to the secure_getenv() based mode. The invocation ID is now passed in three different ways to a service: - As environment variable $INVOCATION_ID. This is easy to use, but may be overriden by unprivileged code (which might be a bad or a good thing), which means it's incompatible with SUID code (see above). - As extended attribute on the service cgroup. This cannot be overriden by unprivileged code, and may be queried safely from "outside" of a service. However, it is incompatible with containers right now, as unprivileged containers generally cannot set xattrs on cgroupfs. - As "invocation_id" key in the kernel keyring. This has the benefit that the key cannot be changed by unprivileged service code, and thus is safe to access from SUID code (see above). But do note that service code can replace the session keyring with a fresh one that lacks the key. However in that case the key will not be owned by root, which is easily detectable. The keyring is also incompatible with containers right now, as it is not properly namespace aware (but this is being worked on), and thus most container managers mask the keyring-related system calls. Ideally we'd only have one way to pass the invocation ID, but the different ways all have limitations. The invocation ID hookup in journald is currently only available on the host but not in containers, due to the mentioned limitations. How to verify the new invocation ID in the keyring: # systemd-run -t /bin/sh Running as unit: run-rd917366c04f847b480d486017f7239d6.service Press ^] three times within 1s to disconnect TTY. # keyctl show Session Keyring 680208392 --alswrv 0 0 keyring: _ses 250926536 ----s-rv 0 0 \_ user: invocation_id # keyctl request user invocation_id 250926536 # keyctl read 250926536 16 bytes of data in key: 9c96317c ac64495a a42b9cd7 4f3ff96b # echo $INVOCATION_ID 9c96317cac64495aa42b9cd74f3ff96b # ^D This creates a new transient service runnint a shell. Then verifies the contents of the keyring, requests the invocation ID key, and reads its payload. For comparison the invocation ID as passed via the environment variable is also displayed.	2016-12-13 20:59:36 +01:00
Lennart Poettering	74dd6b515f	core: run each system service with a fresh session keyring This patch ensures that each system service gets its own session kernel keyring automatically, and implicitly. Without this a keyring is allocated for it on-demand, but is then linked with the user's kernel keyring, which is OK behaviour for logged in users, but not so much for system services. With this change each service gets a session keyring that is specific to the service and ceases to exist when the service is shut down. The session keyring is not linked up with the user keyring and keys hence only search within the session boundaries by default. (This is useful in a later commit to store per-service material in the keyring, for example the invocation ID) (With input from David Howells)	2016-12-13 20:59:10 +01:00
Lennart Poettering	2e6dbc0fcd	Merge pull request #4538 from fbuihuu/confirm-spawn-fixes Confirm spawn fixes/enhancements	2016-11-18 11:08:06 +01:00
Franck Bui	539622bd8c	core: in confirm spawn, suggest 'f' when user selects 'n' choice	2016-11-17 18:23:32 +01:00
Franck Bui	c891efaf8a	core: confirm_spawn: always accept units with same_pgrp set for now For some reasons units remaining in the same process group as PID 1 (same_pgrp=true) fail to acquire the console even if it's not taken by anyone. So always accept for units with same_pgrp set for now.	2016-11-17 18:16:51 +01:00
Franck Bui	63d77c9254	core: include the unit name when notifying that a confirmation question timed out	2016-11-17 18:16:51 +01:00
Franck Bui	b0eb29449e	core: add 'c' in confirmation_spawn to resume the boot process	2016-11-17 18:16:50 +01:00
Franck Bui	56fde33af1	core: add 'j' in confirmation_spawn to list the jobs that are in progress	2016-11-17 18:16:50 +01:00
Franck Bui	dd6f9ac0d0	core: add 'D' in confirmat spawn to show a full dump of the unit to spawn	2016-11-17 18:16:50 +01:00
Franck Bui	eedf223a30	core: add 'i' in confirm spawn to give a short summary of the unit to spawn	2016-11-17 18:16:50 +01:00
Franck Bui	d172b175f6	core: rework the confirmation spawn prompt Previously it was "[Yes, Fail, Skip]" which is pretty misleading because it suggests that the whole word needs to be entered instead of a single char. Also this won't fit well when we'll extend the number of choices. This patch addresses this by changing the choice hint with "[y, f, s – h for help]" so it's now clear that a single letter has to be entered. It also introduces a new choice 'h' which describes all possible choices since a single letter can be not descriptive enough for new users. It also allow to stick with the same hint string regardless of how many choices we will support.	2016-11-17 18:16:50 +01:00
Franck Bui	2bcd3c26fe	core: limit the length of the confirmation question When "confirmation_spawn=1", the confirmation question can look like: Execute /usr/bin/kmod static-nodes --format=tmpfiles --output=/run/tmpfiles.d/kmod.conf? [Yes, No, Skip] which is pretty verbose and might not fit in the console width size (which is usually 80 chars) and thus question will be splitted into 2 consecutive lines. However since the question is now refreshed every 2 secs, the reprinted question will overwrite the second line of the previous one... To prevent this, this patch makes sure that the command line won't be longer than 60 chars by ellipsizing it if the command is longer: Execute /usr/bin/kmod static-nodes --format=tmpfiles --output=/ru…nf? [Yes, No, View, Skip] A following patch will introduce a new choice that will allow the user to get details on the command to be executed so it will still be possible to see the full command line.	2016-11-17 18:16:50 +01:00
Franck Bui	2bcc330942	core: in confirm_spawn, the meaning of 'n' and 's' choices are confusing Before this patch we had: - "no" which gives "failing execution" but the command is actually assumed as succeed. - "skip" which gives "skipping", but the command is assumed to have failed, which ends up with "Failed to start ..." on the console. Now we have: - "fail" which gives "failing execution" and the command is indeed assumed as failed. - "skip" which gives "skipping execution" and the command is assumed as succeed.	2016-11-17 18:16:49 +01:00
Franck Bui	3b20f877ad	core: rework ask_for_confirmation() Now the reponses are handled by ask_for_confirmation() as well as the report of any errors occuring during the process of retrieving the confirmation response. One benefit of this is that there's no need to open/close the console one more time when reporting error/status messages. The caller now just needs to care about the return values whose meanings are: - don't execute and pretend that the command failed - don't execute and pretend that the command succeeed - positive answer, execute the command Also some slight code reorganization and introduce write_confirm_error() and write_confirm_error_fd(). write_confim_message becomes unneeded.	2016-11-17 18:16:49 +01:00
Franck Bui	7d5ceb6416	core: allow to redirect confirmation messages to a different console It's rather hard to parse the confirmation messages (enabled with systemd.confirm_spawn=true) amongst the status messages and the kernel ones (if enabled). This patch gives the possibility to the user to redirect the confirmation message to a different virtual console, either by giving its name or its path, so those messages are separated from the other ones and easier to read.	2016-11-17 18:16:16 +01:00
Djalal Harouni	c92e8afebd	core: improve the logic that implies no new privileges The no_new_privileged_set variable is not used any more since commit `9b232d3241` that fixed another thing. So remove it. Also no need to check if we are under user manager, remove that part too.	2016-11-15 15:04:31 +01:00
Djalal Harouni	af964954c6	core: on DynamicUser= make sure that protecting sensitive paths is enforced (#4596 ) This adds a variable that is always set to false to make sure that protect paths inside sandbox are always enforced and not ignored. The only case when it is set to true is on DynamicUser=no and RootDirectory=/chroot is set. This allows users to use more our sandbox features inside RootDirectory= The only exception is ProtectSystem=full\|strict and when DynamicUser=yes is implied. Currently RootDirectory= is not fully compatible with these due to two reasons: * /chroot/usr\|etc has to be present on ProtectSystem=full * /chroot// has to be a mount point on ProtectSystem=strict.	2016-11-08 21:57:32 -05:00
Zbigniew Jędrzejewski-Szmek	d85a0f8028	Merge pull request #4536 from poettering/seccomp-namespaces core: add new RestrictNamespaces= unit file setting Merging, not rebasing, because this touches many files and there were tree-wide cleanups in the mean time.	2016-11-08 19:54:21 -05:00
Zbigniew Jędrzejewski-Szmek	f97b34a629	Rename formats-util.h to format-util.h We don't have plural in the name of any other -util files and this inconsistency trips me up every time I try to type this file name from memory. "formats-util" is even hard to pronounce.	2016-11-07 10:15:08 -05:00
Lennart Poettering	add005357d	core: add new RestrictNamespaces= unit file setting This new setting permits restricting whether namespaces may be created and managed by processes started by a unit. It installs a seccomp filter blocking certain invocations of unshare(), clone() and setns(). RestrictNamespaces=no is the default, and does not restrict namespaces in any way. RestrictNamespaces=yes takes away the ability to create or manage any kind of namspace. "RestrictNamespaces=mnt ipc" restricts the creation of namespaces so that only mount and IPC namespaces may be created/managed, but no other kind of namespaces. This setting should be improve security quite a bit as in particular user namespacing was a major source of CVEs in the kernel in the past, and is accessible to unprivileged processes. With this setting the entire attack surface may be removed for system services that do not make use of namespaces.	2016-11-04 07:40:13 -06:00
Lennart Poettering	493fd52f1a	Merge pull request #4510 from keszybz/tree-wide-cleanups Tree wide cleanups	2016-11-03 13:59:20 -06:00
Djalal Harouni	cdc5d5c55e	core: intialize user aux groups and SupplementaryGroups= when DynamicUser= is set Make sure that when DynamicUser= is set that we intialize the user supplementary groups and that we also support SupplementaryGroups= Fixes: https://github.com/systemd/systemd/issues/4539 Thanks Evgeny Vereshchagin (@evverx)	2016-11-03 08:36:53 +01:00
Lennart Poettering	32e134c19f	Merge pull request #4483 from poettering/exec-order more seccomp fixes, and change of order of selinux/aa/smack and seccomp application on exec	2016-11-02 16:09:59 -06:00
Djalal Harouni	bbeea27117	core: initialize groups list before checking SupplementaryGroups= of a unit (#4533 ) Always initialize the supplementary groups of caller before checking the unit SupplementaryGroups= option. Fixes https://github.com/systemd/systemd/issues/4531	2016-11-02 10:51:35 -06:00
Lennart Poettering	5cd9cd3537	execute: apply seccomp filters after changing selinux/aa/smack contexts Seccomp is generally an unprivileged operation, changing security contexts is most likely associated with some form of policy. Moreover, while seccomp may influence our own flow of code quite a bit (much more than the security context change) make sure to apply the seccomp filters immediately before executing the binary to invoke. This also moves enforcement of NNP after the security context change, so that NNP cannot affect it anymore. (However, the security policy now has to permit the NNP change). This change has a good chance of breaking current SELinux/AA/SMACK setups, because the policy might not expect this change of behaviour. However, it's technically the better choice I think and should hence be applied. Fixes: #3993	2016-11-02 08:55:00 -06:00
Djalal Harouni	fa1f250d6f	Merge pull request #4495 from topimiettinen/block-shmat-exec seccomp: also block shmat(..., SHM_EXEC) for MemoryDenyWriteExecute	2016-10-28 15:41:07 +02:00
Djalal Harouni	59e856c7d3	core: make unit argument const for apply seccomp functions	2016-10-27 09:40:22 +02:00
Djalal Harouni	50b3dfb9d6	core: lets apply working directory just after mount namespaces This makes applying groups after applying the working directory, this may allow some flexibility but at same it is not a big deal since we don't execute or do anything between applying working directory and droping groups.	2016-10-27 09:40:21 +02:00
Djalal Harouni	2b3c1b9e9d	core: get the working directory value inside apply_working_directory() Improve apply_working_directory() and lets get the current working directory inside of it.	2016-10-27 09:40:21 +02:00
Djalal Harouni	e7f1e7c6e2	core: move apply working directory code into its own apply_working_directory()	2016-10-27 09:40:21 +02:00
Djalal Harouni	93c6bb51b6	core: move the code that setups namespaces on its own function	2016-10-27 09:40:21 +02:00
Topi Miettinen	d2ffa389b8	seccomp: also block shmat(..., SHM_EXEC) for MemoryDenyWriteExecute shmat(..., SHM_EXEC) can be used to create writable and executable memory, so let's block it when MemoryDenyWriteExecute is set.	2016-10-26 18:59:14 +03:00
Lennart Poettering	a3be2849b2	seccomp: add new helper call seccomp_load_filter_set() This allows us to unify most of the code in apply_protect_kernel_modules() and apply_private_devices().	2016-10-24 17:32:50 +02:00
Lennart Poettering	8d7b0c8fd7	seccomp: add new seccomp_init_conservative() helper This adds a new seccomp_init_conservative() helper call that is mostly just a wrapper around seccomp_init(), but turns off NNP and adds in all secondary archs, for best compatibility with everything else. Pretty much all of our code used the very same constructs for these three steps, hence unifying this in one small function makes things a lot shorter. This also changes incorrect usage of the "scmp_filter_ctx" type at various places. libseccomp defines it as typedef to "void", i.e. it is a pointer type (pretty poor choice already!) that casts implicitly to and from all other pointer types (even poorer choice: you defined a confusing type now, and don't even gain any bit of type safety through it...). A lot of the code assumed the type would refer to a structure, and hence aded additional "" here and there. Remove that.	2016-10-24 17:32:50 +02:00
Lennart Poettering	25a8d8a0cb	core: rework apply_protect_kernel_modules() to use seccomp_add_syscall_filter_set() Let's simplify this call, by making use of the new infrastructure. This is actually more in line with Djalal's original patch but instead of search the filter set in the array by its name we can now use the set index and jump directly to it.	2016-10-24 17:32:50 +02:00
Lennart Poettering	8130926d32	core: rework syscall filter set handling A variety of fixes: - rename the SystemCallFilterSet structure to SyscallFilterSet. So far the main instance of it (the syscall_filter_sets[] array) used to abbreviate "SystemCall" as "Syscall". Let's stick to one of the two syntaxes, and not mix and match too wildly. Let's pick the shorter name in this case, as it is sufficiently well established to not confuse hackers reading this. - Export explicit indexes into the syscall_filter_sets[] array via an enum. This way, code that wants to make use of a specific filter set, can index it directly via the enum, instead of having to search for it. This makes apply_private_devices() in particular a lot simpler. - Provide two new helper calls in seccomp-util.c: syscall_filter_set_find() to find a set by its name, seccomp_add_syscall_filter_set() to add a set to a seccomp object. - Update SystemCallFilter= parser to use extract_first_word(). Let's work on deprecating FOREACH_WORD_QUOTED(). - Simplify apply_private_devices() using this functionality	2016-10-24 17:32:50 +02:00
Lennart Poettering	e0f3720e39	core: move misplaced comment to the right place	2016-10-24 17:29:51 +02:00
Lennart Poettering	f673b62df6	core: simplify skip_seccomp_unavailable() a bit Let's prefer early-exit over deep-indented if blocks. Not behavioural change.	2016-10-24 17:29:51 +02:00
Djalal Harouni	366ddd252e	core: do not assert when sysconf(_SC_NGROUPS_MAX) fails (#4466 ) Remove the assert and check the return code of sysconf(_SC_NGROUPS_MAX). _SC_NGROUPS_MAX maps to NGROUPS_MAX which is defined in <limits.h> to 65536 these days. The value is a sysctl read-only /proc/sys/kernel/ngroups_max and the kernel assumes that it is always positive otherwise things may break. Follow this and support only positive values for all other case return either -errno or -EOPNOTSUPP. Now if there are systems that want to re-write NGROUPS_MAX then they should not pass SupplementaryGroups= in units even if it is empty, in this case nothing fails and we just ignore supplementary groups. However if SupplementaryGroups= is passed even if it is empty we have to assume that there will be groups manipulation from our side or the kernel and since the kernel always assumes that NGROUPS_MAX is positive, then follow that and support only positive values.	2016-10-24 13:13:06 +02:00
Djalal Harouni	8b6903ad4d	core: lets move the setup of working directory before group enforce This is minor but lets try to split and move bit by bit cgroups and portable environment setup before applying the security context.	2016-10-23 23:27:20 +02:00
Djalal Harouni	4d885bd326	core: first lookup and cache creds then apply them after namespace setup This fixes: https://github.com/systemd/systemd/issues/4357 Let's lookup and cache creds then apply them. We also switch from getgroups() to getgrouplist().	2016-10-23 23:24:14 +02:00
Zbigniew Jędrzejewski-Szmek	605405c6cc	tree-wide: drop NULL sentinel from strjoin This makes strjoin and strjoina more similar and avoids the useless final argument. spatch -I . -I ./src -I ./src/basic -I ./src/basic -I ./src/shared -I ./src/shared -I ./src/network -I ./src/locale -I ./src/login -I ./src/journal -I ./src/journal -I ./src/timedate -I ./src/timesync -I ./src/nspawn -I ./src/resolve -I ./src/resolve -I ./src/systemd -I ./src/core -I ./src/core -I ./src/libudev -I ./src/udev -I ./src/udev/net -I ./src/udev -I ./src/libsystemd/sd-bus -I ./src/libsystemd/sd-event -I ./src/libsystemd/sd-login -I ./src/libsystemd/sd-netlink -I ./src/libsystemd/sd-network -I ./src/libsystemd/sd-hwdb -I ./src/libsystemd/sd-device -I ./src/libsystemd/sd-id128 -I ./src/libsystemd-network --sp-file coccinelle/strjoin.cocci --in-place $(git ls-files src/.c) git grep -e '\bstrjoin\b.NULL' -l\|xargs sed -i -r 's/strjoin$(.*), NULL$/strjoin(\1)/' This might have missed a few cases (spatch has a really hard time dealing with _cleanup_ macros), but that's no big issue, they can always be fixed later.	2016-10-23 11:43:27 -04:00
Luca Bruno	52c239d770	core/exec: add a named-descriptor option ("fd") for streams (#4179 ) This commit adds a `fd` option to `StandardInput=`, `StandardOutput=` and `StandardError=` properties in order to connect standard streams to externally named descriptors provided by some socket units. This option looks for a file descriptor named as the corresponding stream. Custom names can be specified, separated by a colon. If multiple name-matches exist, the first matching fd will be used.	2016-10-17 20:05:49 -04:00
Zbigniew Jędrzejewski-Szmek	6b430fdb7c	tree-wide: use mfree more	2016-10-16 23:35:39 -04:00
Djalal Harouni	e66a2f658b	core: make sure to dump ProtectKernelModules= value	2016-10-12 14:12:17 +02:00
Djalal Harouni	4084e8fc89	core: check protect_kernel_modules and private_devices in order to setup NNP	2016-10-12 14:12:07 +02:00
Djalal Harouni	c575770b75	core:sandbox: lets make /lib/modules/ inaccessible on ProtectKernelModules= Lets go further and make /lib/modules/ inaccessible for services that do not have business with modules, this is a minor improvment but it may help on setups with custom modules and they are limited... in regard of kernel auto-load feature. This change introduce NameSpaceInfo struct which we may embed later inside ExecContext but for now lets just reduce the argument number to setup_namespace() and merge ProtectKernelModules feature.	2016-10-12 14:11:16 +02:00
Djalal Harouni	502d704e5e	core:sandbox: Add ProtectKernelModules= option This is useful to turn off explicit module load and unload operations on modular kernels. This option removes CAP_SYS_MODULE from the capability bounding set for the unit, and installs a system call filter to block module system calls. This option will not prevent the kernel from loading modules using the module auto-load feature which is a system wide operation.	2016-10-12 13:31:21 +02:00
Lennart Poettering	e0d2adfde6	core: chown() any TTY used for stdin, not just when StandardInput=tty is used (#4347 ) If stdin is supplied as an fd for transient units (using the StandardInputFileDescriptor pseudo-property for transient units), then we should also fix up the TTY ownership, not just when we opened the TTY ourselves. This simply drops the explicit is_terminal_input()-based check. Note that chown_terminal() internally does a much more appropriate isatty()-based check anyway, hence we can drop this without replacement. Fixes: #4260	2016-10-11 14:07:22 -04:00
Lennart Poettering	4b58153dd2	core: add "invocation ID" concept to service manager This adds a new invocation ID concept to the service manager. The invocation ID identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is generated each time a unit moves from and inactive to an activating or active state. The primary usecase for this concept is to connect the runtime data PID 1 maintains about a service with the offline data the journal stores about it. Previously we'd use the unit name plus start/stop times, which however is highly racy since the journal will generally process log data after the service already ended. The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel, except that it applies to an individual unit instead of the whole system. The invocation ID is passed to the activated processes as environment variable. It is additionally stored as extended attribute on the cgroup of the unit. The latter is used by journald to automatically retrieve it for each log logged message and attach it to the log entry. The environment variable is very easily accessible, even for unprivileged services. OTOH the extended attribute is only accessible to privileged processes (this is because cgroupfs only supports the "trusted." xattr namespace, not "user."). The environment variable may be altered by services, the extended attribute may not be, hence is the better choice for the journal. Note that reading the invocation ID off the extended attribute from journald is racy, similar to the way reading the unit name for a logging process is. This patch adds APIs to read the invocation ID to sd-id128: sd_id128_get_invocation() may be used in a similar fashion to sd_id128_get_boot(). PID1's own logging is updated to always include the invocation ID when it logs information about a unit. A new bus call GetUnitByInvocationID() is added that allows retrieving a bus path to a unit by its invocation ID. The bus path is built using the invocation ID, thus providing a path for referring to a unit that is valid only for the current runtime cycleof it. Outlook for the future: should the kernel eventually allow passing of cgroup information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we can alter the invocation ID to be generated as hash from that rather than entirely randomly. This way we can derive the invocation race-freely from the messages.	2016-10-07 20:14:38 +02:00
Lennart Poettering	97f0e76f18	user-util: rework maybe_setgroups() a bit Let's drop the caching of the setgroups /proc field for now. While there's a strict regime in place when it changes states, let's better not cache it since we cannot really be sure we follow that regime correctly. More importantly however, this is not in performance sensitive code, and there's no indication the cache is really beneficial, hence let's drop the caching and make things a bit simpler. Also, while we are at it, rework the error handling a bit, and always return negative errno-style error codes, following our usual coding style. This has the benefit that we can sensible hanld read_one_line_file() errors, without having to updat errno explicitly.	2016-10-06 19:04:10 +02:00
Lennart Poettering	2d6fce8d7c	core: leave PAM stub process around with GIDs updated In the process execution code of PID 1, before `096424d123` the GID settings where changed before invoking PAM, and the UID settings after. After the change both changes are made after the PAM session hooks are run. When invoking PAM we fork once, and leave a stub process around which will invoke the PAM session end hooks when the session goes away. This code previously was dropping the remaining privs (which were precisely the UID). Fix this code to do this correctly again, by really dropping them else (i.e. the GID as well). While we are at it, also fix error logging of this code. Fixes: #4238	2016-10-06 19:04:10 +02:00
Giuseppe Scrivano	36d854780c	core: do not fail in a container if we can't use setgroups It might be blocked through /proc/PID/setgroups	2016-10-06 11:49:00 +02:00
Stefan Schweter	629ff674ac	tree-wide: remove consecutive duplicate words in comments	2016-10-04 17:06:25 +02:00
Djalal Harouni	8f81a5f61b	core: Use @raw-io syscall group to filter I/O syscalls when PrivateDevices= is set Instead of having a local syscall list, use the @raw-io group which contains the same set of syscalls to filter.	2016-09-25 12:52:27 +02:00
Lennart Poettering	cefc33aee2	execute: move SMACK setup code into its own function While we are at it, move PAM code #ifdeffery into setup_pam() to simplify the main execution logic a bit.	2016-09-25 10:52:57 +02:00
Lennart Poettering	ba128bb809	execute: filter low-level I/O syscalls if PrivateDevices= is set If device access is restricted via PrivateDevices=, let's also block the various low-level I/O syscalls at the same time, so that we know that the minimal set of devices in our virtualized /dev are really everything the unit can access.	2016-09-25 10:52:57 +02:00
Lennart Poettering	096424d123	execute: drop group priviliges only after setting up namespace If PrivateDevices=yes is set, the namespace code creates device nodes in /dev that should be owned by the host's root, hence let's make sure we set up the namespace before dropping group privileges.	2016-09-25 10:42:18 +02:00
Lennart Poettering	3fbe8dbe41	execute: if RuntimeDirectory= is set, it should be writable Implicitly make all dirs set with RuntimeDirectory= writable, as the concept otherwise makes no sense.	2016-09-25 10:19:05 +02:00
Lennart Poettering	be39ccf3a0	execute: move suppression of HOME=/ and SHELL=/bin/nologin into user-util.c This adds a new call get_user_creds_clean(), which is just like get_user_creds() but returns NULL in the home/shell parameters if they contain no useful information. This code previously lived in execute.c, but by generalizing this we can reuse it in run.c.	2016-09-25 10:18:57 +02:00
Lennart Poettering	07689d5d2c	execute: split out creation of runtime dirs into its own functions	2016-09-25 10:18:54 +02:00
Lennart Poettering	59eeb84ba6	core: add two new service settings ProtectKernelTunables= and ProtectControlGroups= If enabled, these will block write access to /sys, /proc/sys and /proc/sys/fs/cgroup.	2016-09-25 10:18:48 +02:00
Lennart Poettering	72246c2a65	core: enforce seccomp for secondary archs too, for all rules Let's make sure that all our rules apply to all archs the local kernel supports.	2016-09-25 10:18:44 +02:00
Felipe Sateler	d347d9029c	seccomp: also detect if seccomp filtering is enabled In https://github.com/systemd/systemd/pull/4004 , a runtime detection method for seccomp was added. However, it does not detect the case where CONFIG_SECCOMP=y but CONFIG_SECCOMP_FILTER=n. This is possible if the architecture does not support filtering yet. Add a check for that case too. While at it, change get_proc_field usage to use PR_GET_SECCOMP prctl, as that should save a few system calls and (unnecessary) allocations. Previously, reading of /proc/self/stat was done as recommended by prctl(2) as safer. However, given that we need to do the prctl call anyway, lets skip opening, reading and parsing the file. Code for checking inspired by https://outflux.net/teach-seccomp/autodetect.html	2016-09-06 20:25:49 -03:00

1 2 3 4 5 ...

508 Commits