Systemd

Author	SHA1	Message	Date
Lennart Poettering	da6bc6ed05	execute: no need to check for NULL when function right after does anyway	2019-03-07 16:55:19 +01:00
Lennart Poettering	2fa3742d96	execute: make things a tiny bit shorter	2019-03-07 16:53:45 +01:00
Lennart Poettering	8e8009dc50	execute: use structured initialization	2019-03-07 16:53:45 +01:00
Anita Zhang	7ca69792e5	core: add ':' prefix to ExecXYZ= skip env var substitution	2019-02-20 17:58:14 +01:00
Lennart Poettering	eb5149ba74	Merge pull request #11682 from topimiettinen/private-utsname core: ProtectHostname feature	2019-02-20 14:12:15 +01:00
Topi Miettinen	aecd5ac621	core: ProtectHostname= feature Let services use a private UTS namespace. In addition, a seccomp filter is installed on set{host,domain}name and a ro bind mounts on /proc/sys/kernel/{host,domain}name.	2019-02-20 10:50:44 +02:00
Franck Bui	37ed15d7ed	namespace: make MountFlags=shared work again Since commit `0722b35934`, the root mountpoint is unconditionnally turned to slave which breaks units that are using explicitly MountFlags=shared (and no other options that would implicitly require a slave root mountpoint). Here is a test case: $ systemctl cat test-shared-mount-flag.service # /etc/systemd/system/test-shared-mount-flag.service [Service] Type=simple ExecStartPre=/usr/bin/mkdir -p /mnt/tmp ExecStart=/bin/sh -c "/usr/bin/mount -t tmpfs -o size=10M none /mnt/tmp && sleep infinity" ExecStop=-/bin/sh -c "/usr/bin/umount /mnt/tmp" MountFlags=shared $ systemctl start test-shared-mount-flag.service $ findmnt /mnt/tmp $ Mount on /mnt/tmp is not visible from the host although MountFlags=shared was used. This patch fixes that and turns the root mountpoint to slave when it's really required.	2019-02-20 06:20:40 +09:00
Taro Yamada	6cff72eb0a	Add a warning about the difference in permissions between existing directories and unit settings. To follows the intent of `30c81ce`, this change does not execute chmod() and just add warnings.	2019-01-29 09:52:21 +09:00
Taro Yamada	ff9e7900c0	Revert "Fixes #11128 " This reverts commit `0bf05f0122` because it breaks `30c81ce`. Please see #11540.	2019-01-27 13:43:30 +09:00
Taro Yamada	0bf05f0122	Fixes #11128	2019-01-22 11:14:51 +01:00
Lennart Poettering	ce932d2d33	execute: make sure to call into PAM after initializing resource limits We want that pam_limits takes precedence over our settings, after all. Fixes: #11386	2019-01-18 17:31:36 +01:00
Zbigniew Jędrzejewski-Szmek	3042bbebdd	tree-wide: use c99 static for array size declarations https://hamberg.no/erlend/posts/2013-02-18-static-array-indices.html This only works with clang, unfortunately gcc doesn't seem to implement the check (tested with gcc-8.2.1-5.fc29.x86_64). Simulated error: [2/3] Compiling C object 'systemd-nspawn@exe/src_nspawn_nspawn.c.o'. ../src/nspawn/nspawn.c:3179:45: warning: array argument is too small; contains 15 elements, callee requires at least 16 [-Warray-bounds] candidate = (uid_t) siphash24(arg_machine, strlen(arg_machine), hash_key); ^ ~~~~~~~~ ../src/basic/siphash24.h:24:64: note: callee declares array parameter as static here uint64_t siphash24(const void *in, size_t inlen, const uint8_t k[static 16]); ^~~~~~~~~~~~	2019-01-04 12:37:25 +01:00
Chris Down	4e1dfa45e9	cgroup: s/cgroups? ?v?([0-9])/cgroup v\1/gI Nitpicky, but we've used a lot of random spacings and names in the past, but we're trying to be completely consistent on "cgroup vN" now. Generated by `fd -0 \| xargs -0 -n1 sed -ri --follow-symlinks 's/cgroups? ?v?([0-9])/cgroup v\1/gI'`. I manually ignored places where it's not appropriate to replace (eg. "cgroup2" fstype and in src/shared/linux).	2019-01-03 11:32:40 +09:00
Michal Sekletar	4c70a4a748	core: do cgroup migration first and only then connect to journald Fixes #11162	2018-12-17 19:22:30 +01:00
Alexey Bogdanenko	8f9f3cb724	core: fix KeyringMode for user services KeyringMode option is useful for user services. Also, documentation for the option suggests that the option applies to user services. However, setting the option to any of its allowed values has no effect. This commit fixes that and removes EXEC_NEW_KEYRING flag. The flag is no longer necessary: instead of checking if the flag is set we can check if keyring_mode is not equal to EXEC_KEYRING_INHERIT.	2018-12-17 16:56:36 +01:00
Yu Watanabe	3843e8260c	missing: rename securebits.h to missing_securebits.h	2018-12-04 07:49:24 +01:00
Lennart Poettering	686d13b9f2	util-lib: split out env file parsing code into env-file.c It's quite complex, let's split this out. No code changes, just some file rearranging.	2018-12-02 13:22:29 +01:00
Lennart Poettering	4917894417	Merge pull request #10944 from poettering/redirect-file-fix StandardOutput=file: fixes	2018-11-27 13:18:26 +01:00
Lennart Poettering	41fc585a7a	core: be more careful when inheriting stdout fds to stderr We need to compare the fd name/file name if we inherit an fd from stdout to stderr. Let's do that. Fixes: #10875	2018-11-27 10:06:51 +01:00
Lennart Poettering	78f93209fc	core: when Delegate=yes is set for a unit, run ExecStartPre= and friends in a subcgroup of the unit Otherwise we might conflict with the "no-processes-in-inner-cgroup" rule of cgroupsv2. Consider nspawn starting up and initializing its cgroup hierarchy with "supervisor/" and "payload/" as subcgroup, with itself moved into the former and the payload into the latter. Now, if an ExecStartPre= is run right after it cannot be placed in the main cgroup, because that is now in inner cgroup with populated children. Hence, let's run these helpers in another sub-cgroup .control/ below it. This is somewhat ugly since it weakens the clear separation of ownership, but given that this is an explicit contract, and double opt-in should be acceptable. Fixes: #10482	2018-11-26 18:43:23 +01:00
Lennart Poettering	aa8fbc74e3	fileio: drop "newline" parameter for env file parsers Now that we don't (mis-)use the env file parser to parse kernel command lines there's no need anymore to override the used newline character set. Let's hence drop the argument and just "\n\r" always. This nicely simplifies our code.	2018-11-14 17:01:54 +01:00
Yu Watanabe	b9c04eafb8	core: introduce exec_params_clear() Follow-up for `1ad6e8b302`. Fixes #10677.	2018-11-08 09:36:37 +01:00
Joerg Behrmann	56ef8db9f5	core: apply WorkingDirectory after enforce_user If WorkingDirectory is on NFS, root might only have the privileges of nobody and the chdir to the WorkingDirectory might fail, even if the user running the service would have the proper privileges to chdir to that directory. Fixes #10568	2018-10-31 12:07:24 +01:00
Lennart Poettering	6897dfe85a	core: add free_and_replace() at one more place	2018-10-26 19:49:15 +02:00
Lennart Poettering	2194547e3b	execute: if we fail to do namespacing, explain why we refuse to continue in a debug message	2018-10-24 17:08:12 +02:00
Evgeny Vereshchagin	2ac1ff68f2	core: stop ignoring errors in connect_logger_as When journald reaches the maximum number of active streams, it, basically, starts to decline new connections. On the client side it can be detected by getting EPIPE and, if the writing process isn't lucky enough, getting SIGPIPE soon afterwards. systemd has always ignored EPIPE, which makes it very hard to keep track of services losing logs. This patch should make it easier to detect such services by just staring at the logs carefully. In case anyone is interested, the following one-liner run as any user can be used to paralyze all the stream logging on a machine: for i in {1..4096}; do systemd-cat -t HEY-$i & done	2018-10-19 10:32:21 +02:00
Anita Zhang	90fc172e19	core: implement per unit journal rate limiting Add LogRateLimitIntervalSec= and LogRateLimitBurst= options for services. If provided, these values get passed to the journald client context, and those values are used in the rate limiting function in the journal over the the journald.conf values. Part of #10230	2018-10-18 09:56:20 +02:00
Lennart Poettering	7d853ca6bc	execute: shorten things a bit	2018-10-17 21:18:09 +02:00
Lennart Poettering	15a3e96f92	tree-wide: port various users over to sockaddr_un_set_path() CID 1396140 CID 1396141	2018-10-15 19:40:51 +02:00
Lennart Poettering	ee8d493cbd	Merge pull request #10158 from keszybz/seccomp-log-tightening Seccomp log tightening	2018-09-26 15:56:32 +02:00
Lennart Poettering	7c428bb5d5	Merge pull request #10059 from yuwata/env-exec-directory core: introduce $RUNTIME_DIRECTORY= or friends	2018-09-25 12:34:30 +02:00
Yu Watanabe	6c9c51e5e2	fs-util: make symlink_idempotent() optionally create relative link	2018-09-24 18:52:53 +03:00
Zbigniew Jędrzejewski-Szmek	b54f36c604	seccomp: reduce logging about failure to add syscall to seccomp Our logs are full of: Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldstat() / -10037, ignoring: Numerical argument out of domain Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call get_thread_area() / -10076, ignoring: Numerical argument out of domain Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call set_thread_area() / -10079, ignoring: Numerical argument out of domain Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldfstat() / -10034, ignoring: Numerical argument out of domain Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldolduname() / -10036, ignoring: Numerical argument out of domain Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldlstat() / -10035, ignoring: Numerical argument out of domain Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call waitpid() / -10073, ignoring: Numerical argument out of domain ... This is pointless and makes debug logs hard to read. Let's keep the logs in test code, but disable it in nspawn and pid1. This is done through a function parameter because those functions operate recursively and it's not possible to make the caller to log meaningfully. There should be no functional change, except the skipped debug logs.	2018-09-24 17:21:09 +02:00
Yu Watanabe	aca835ed2e	core/execute: do not use the negative errno when setup_namespace() returns -ENOANO Without this, log shows meaningless error message 'No anode', e.g., === Failed to unshare the mount namespace: Operation not permitted foo.service: Failed to set up mount namespacing: No anode foo.service: Failed at step NAMESPACE spawning /usr/bin/test: No anode === Follow-up for `1beab8b0d0`.	2018-09-18 14:31:09 +09:00
Yu Watanabe	fb2042dd55	core: add new environment variable $RUNTIME_DIRECTORY= or friends The variable is generated from RuntimeDirectory= or friends. If multiple directories are set, then they are concatenated with the separator ':'.	2018-09-13 17:02:58 +09:00
Yu Watanabe	7c1cb6f198	core: add one more assert()	2018-09-13 17:02:58 +09:00
Yu Watanabe	76a9460d44	core: fix assert() about number of built environment variables Follow-up for `4b58153dd2` and `fd63e712b2`.	2018-09-13 17:02:58 +09:00
Yu Watanabe	52e4d62550	Merge pull request #9852 from poettering/namespace-errno namespace: be more careful when handling namespacing failures	2018-08-22 11:16:29 +09:00
Lennart Poettering	1beab8b0d0	namespace: be more careful when handling namespacing failures gracefully This makes two changes to the namespacing code: 1. We'll only gracefully skip service namespacing on access failure if exclusively sandboxing options where selected, and not mount-related options that result in a very different view of the world. For example, ignoring RootDirectory=, RootImage= or Bind= is really probablematic, but ReadOnlyPaths= is just a weaker sandbox. 2. The namespacing code will now return a clearly recognizable error code when it cannot enforce its namespacing, so that we cannot confuse EPERM errors from mount() with those from unshare(). Only the errors from the first unshare() are now taken as hint to gracefully disable namespacing. Fixes: #9844 #9835	2018-08-21 20:00:33 +02:00
Zbigniew Jędrzejewski-Szmek	7692fed98b	Merge pull request #9783 from poettering/get-user-creds-flags beef up get_user_creds() a bit and other improvements	2018-08-21 10:09:33 +02:00
Lennart Poettering	fafff8f1ff	user-util: rework get_user_creds() Let's fold get_user_creds_clean() into get_user_creds(), and introduce a flags argument for it to select "clean" behaviour. This flags parameter also learns to other new flags: - USER_CREDS_SYNTHESIZE_FALLBACK: in this mode the user records for root/nobody are only synthesized as fallback. Normally, the synthesized records take precedence over what is in the user database. With this flag set this is reversed, and the user database takes precedence, and the synthesized records are only used if they are missing there. This flag should be set in cases where doing NSS is deemed safe, and where there's interest in knowing the correct shell, for example if the admin changed root's shell to zsh or suchlike. - USER_CREDS_ALLOW_MISSING: if set, and a UID/GID is specified by numeric value, and there's no user/group record for it accept it anyway. This allows us to fix #9767 This then also ports all users to set the most appropriate flags. Fixes: #9767 [zj: remove one isempty() call]	2018-08-20 15:58:21 +02:00
Lennart Poettering	3cd24c1aa9	core: when setting up PAM, try to get tty of STDIN_FILENO if not set explicitly When stdin/stdout/stderr is initialized from an fd, let's read the tty name of it if we can, and pass that to PAM. This makes sure that "machinectl shell" sessions have proper TTY fields initialized that "loginctl" then shows.	2018-08-20 12:28:17 +02:00
Yu Watanabe	4c3a2b84d8	core/execute: fix dump format for Limit*= Fixes #9846.	2018-08-10 11:59:16 +02:00
Yu Watanabe	7e8d494b33	core: use memcpy_safe() Fixes #9738.	2018-08-08 17:11:43 +09:00
Zbigniew Jędrzejewski-Szmek	5b316330be	Merge pull request #9624 from poettering/service-state-flush flush out ExecStatus structures when a new service cycle begins	2018-08-02 09:50:39 +02:00
Zbigniew Jędrzejewski-Szmek	54fe2ce1b9	Merge pull request #9504 from poettering/nss-deadlock some nss deadlock love	2018-07-26 10:16:25 +02:00
Lennart Poettering	5686391b00	core: introduce new Type=exec service type Users are often surprised that "systemd-run" command lines like "systemd-run -p User=idontexist /bin/true" will return successfully, even though the logs show that the process couldn't be invoked, as the user "idontexist" doesn't exist. This is because Type=simple will only wait until fork() succeeded before returning start-up success. This patch adds a new service type Type=exec, which is very similar to Type=simple, but waits until the child process completed the execve() before returning success. It uses a pipe that has O_CLOEXEC set for this logic, so that the kernel automatically sends POLLHUP on it when the execve() succeeded but leaves the pipe open if not. This means PID 1 waits exactly until the execve() succeeded in the child, and not longer and not shorter, which is the desired functionality. Making use of this new functionality, the command line "systemd-run -p User=idontexist -p Type=exec /bin/true" will now fail, as expected.	2018-07-25 22:48:11 +02:00
Lennart Poettering	25b583d7ff	core: swap order of "n_storage_fds" and "n_socket_fds" parameters When process fd lists to pass to activated programs we always place the socket activation fds first, and the storage fds last. Irritatingly in almost all calls the "n_storage_fds" parameter (i.e. the number of storage fds to pass) came first so far, and the "n_socket_fds" parameter second. Let's clean this up, and specify the number of fds in the order the fds themselves are passed. (Also, let's fix one more case where "unsigned" was used to size an array, while we should use "size_t" instead.)	2018-07-25 22:48:11 +02:00
Lennart Poettering	6a1d4d9fa6	core: properly reset all ExecStatus structures when entering a new unit cycle Whenever a unit is started fresh we should flush out any runtime data from the previous cycle. We are pretty good at that already, but what so far we missed was the ExecStart=/ExecStop=/… command exit status data. Let's fix that, and properly flush out that stuff too. Consider this service: [Service] ExecStart=/bin/sleep infinity ExecStop=/bin/false When this service is started, then stopped and then started again "systemctl status" would show the ExecStop= results of the previous run along with the ExecStart= results of the current one, which is very confusing. With this patch this is corrected: the data is kept right until the moment the new service cycle starts, and then flushed out. Hence "systemctl status" in that case will only show the ExecStart= data, but no ExecStop= data, like it should be. This should fix part of the confusion of #9588	2018-07-23 13:36:47 +02:00
Lennart Poettering	ee39ca20c6	core: drop "argv" field from ExecParameter structure We always initialize it from the same field in ExecCommand anyway, hence there's no point in passing it separately to exec_spawn(), after all we already pass the ExecCommand structure itself anyway. No change in behaviour.	2018-07-23 13:36:47 +02:00
Lennart Poettering	2ed26ed065	execute: use structure initialization when filling in exec status	2018-07-23 13:36:47 +02:00
Lennart Poettering	d521916d0f	pid1: tell PAM/NSS modules why we are calling them	2018-07-20 16:57:35 +02:00
Zsolt Dollenstein	566b7d23eb	Add support for opening files for appending Addresses part of #8983	2018-07-20 03:54:22 -07:00
Chris Lamb	3fe910794b	Correct a number of trivial typos.	2018-06-18 22:44:44 +02:00
Lennart Poettering	0c69794138	tree-wide: remove Lennart's copyright lines These lines are generally out-of-date, incomplete and unnecessary. With SPDX and git repository much more accurate and fine grained information about licensing and authorship is available, hence let's drop the per-file copyright notice. Of course, removing copyright lines of others is problematic, hence this commit only removes my own lines and leaves all others untouched. It might be nicer if sooner or later those could go away too, making git the only and accurate source of authorship information.	2018-06-14 10:20:20 +02:00
Lennart Poettering	818bf54632	tree-wide: drop 'This file is part of systemd' blurb This part of the copyright blurb stems from the GPL use recommendations: https://www.gnu.org/licenses/gpl-howto.en.html The concept appears to originate in times where version control was per file, instead of per tree, and was a way to glue the files together. Ultimately, we nowadays don't live in that world anymore, and this information is entirely useless anyway, as people are very welcome to copy these files into any projects they like, and they shouldn't have to change bits that are part of our copyright header for that. hence, let's just get rid of this old cruft, and shorten our codebase a bit.	2018-06-14 10:20:20 +02:00
Lennart Poettering	228af36fff	core: add new PrivateMounts= unit setting This new setting is supposed to be useful in most cases where "MountFlags=slave" is currently used, i.e. as an explicit way to run a service in its own mount namespace and decouple propagation from all mounts of the new mount namespace towards the host. The effect of MountFlags=slave and PrivateMounts=yes is mostly the same, as both cause a CLONE_NEWNS namespace to be opened, and both will result in all mounts within it to be mounted MS_SLAVE. The difference is mostly on the conceptual/philosophical level: configuring the propagation mode is nothing people should have to think about, in particular as the matter is not precisely easyto grok. Moreover, MountFlags= allows configuration of "private" and "slave" modes which don't really make much sense to use in real-life and are quite confusing. In particular PrivateMounts=private means mounts made on the host stay pinned for good by the service which is particularly nasty for removable media mount. And PrivateMounts=shared is in most ways a NOP when used a alone... The main technical difference between setting only MountFlags=slave or only PrivateMounts=yes in a unit file is that the former remounts all mounts to MS_SLAVE and leaves them there, while that latter remounts them to MS_SHARED again right after. The latter is generally a nicer approach, since it disables propagation, while MS_SHARED is afterwards in effect, which is really nice as that means further namespacing down the tree will get MS_SHARED logic by default and we unify how applications see our mounts as we always pass them as MS_SHARED regardless whether any mount namespacing is used or not. The effect of PrivateMounts=yes was implied already by all the other mount namespacing options. With this new option we add an explicit knob for it, to request it without any other option used as well. See: #4393	2018-06-12 16:12:10 +02:00
Zbigniew Jędrzejewski-Szmek	a1230ff972	basic/log: add the log_struct terminator to macro This way all callers do not need to specify it. Exhaustively tested by running test-log under valgrind ;)	2018-06-04 13:46:03 +02:00
Yu Watanabe	37c56f89d2	core: setup mount namespace when RootDirectory= and RuntimeDirectory= or friends are set The directories specified by RuntimeDirectory= or friends are created on host. So, it is necessary to bind-mount them on root directory.	2018-05-25 17:33:03 +09:00
Yu Watanabe	5609f6888b	core: make StateDirectory= or friends works with DynamicUser= and RootDirectory=/RootImage= The symbolic links to private directories specified by StateDirectory= or its friends are created on the host. So, when DynamicUser= and RootDirectory=/RootImage= are set, then the executed process cannot access private directory. This makes the private directories are mounted on the non-private place when both DynamicUser= and RootDirectory=/RootImage= are set. Fixes #8965.	2018-05-25 17:25:17 +09:00
Lennart Poettering	cdc0f9be92	Merge pull request #8817 from yuwata/cleanup-nsflags core: allow to specify RestrictNamespaces= multiple times	2018-05-24 16:49:13 +02:00
Yu Watanabe	fdff1da299	core: chown RuntimeDirectory= if DynamicUser= is set When DynamicUser= is set, then RuntimeDirectory= should be always chowned, as the service unit may enable RuntimeDirectoryPreserve=, and the uid or gid may changed from the last run. This also makes easier to migrate the service to use DynamicUser=.	2018-05-22 22:26:22 +09:00
Lennart Poettering	9f8168eb23	process-util: add new helper call for adjusting the OOM score And let's make use of it in execute.c	2018-05-17 20:47:21 +02:00
Lennart Poettering	34a5df58da	rlimit-util: introduce setrlimit_closest_all() This new call applies all configured resource limits in one.	2018-05-17 20:40:04 +02:00
Lennart Poettering	31ce987c2b	rlimit-util: add a common destructor call for arrays of struct rlimit	2018-05-17 20:36:52 +02:00
Lennart Poettering	6550c24c7f	rlimit-util: rework rlimit_{from\|to}_string() to work without "Limit" prefix let's make the call more generic, so that we can also easily use it for parsing "RLIMIT_xyz" style constants.	2018-05-17 20:36:52 +02:00
Felipe Sateler	57b7a260c2	core: undo the dependency inversion between unit.h and all unit types	2018-05-15 14:24:34 -04:00
Yu Watanabe	130d3d22e9	tree-wide: use strv_free_and_replace() macro	2018-05-10 00:57:34 +09:00
Yu Watanabe	aa9d574de9	load-fragment: allow to specify RestrictNamespaces= multiple times If multiple RestrictNamespaces= settings are set, then merge the settings. This also drops supporting "~yes" and "~no".	2018-05-05 11:07:37 +09:00
Yu Watanabe	86c2a9f1c2	nsflsgs: drop namespace_flag_{from,to}_string() This also drops namespace_flag_to_string_many_with_check(), and renames namespace_flag_{from,to}_string_many() to namespace_flags_{from,to}_string().	2018-05-05 11:07:37 +09:00
Yu Watanabe	b5a33299b0	core: disable namespace sandboxing for '+' prefixed lines Fixes #8842.	2018-05-01 13:44:06 +09:00
Lennart Poettering	da6053d0a7	tree-wide: be more careful with the type of array sizes Previously we were a bit sloppy with the index and size types of arrays, we'd regularly use unsigned. While I don't think this ever resulted in real issues I think we should be more careful there and follow a stricter regime: unless there's a strong reason not to use size_t for array sizes and indexes, size_t it should be. Any allocations we do ultimately will use size_t anyway, and converting forth and back between unsigned and size_t will always be a source of problems. Note that on 32bit machines "unsigned" and "size_t" are equivalent, and on 64bit machines our arrays shouldn't grow that large anyway, and if they do we have a problem, however that kind of overly large allocation we have protections for usually, but for overflows we do not have that so much, hence let's add it. So yeah, it's a story of the current code being already "good enough", but I think some extra type hygiene is better. This patch tries to be comprehensive, but it probably isn't and I missed a few cases. But I guess we can cover that later as we notice it. Among smaller fixes, this changes: 1. strv_length()' return type becomes size_t 2. the unit file changes array size becomes size_t 3. DNS answer and query array sizes become size_t Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=76745	2018-04-27 14:29:06 +02:00
Lennart Poettering	5d13a15b1d	tree-wide: drop spurious newlines (#8764 ) Double newlines (i.e. one empty lines) are great to structure code. But let's avoid triple newlines (i.e. two empty lines), quadruple newlines, quintuple newlines, …, that's just spurious whitespace. It's an easy way to drop 121 lines of code, and keeps the coding style of our sources a bit tigther.	2018-04-19 12:13:23 +02:00
Zbigniew Jędrzejewski-Szmek	11a1589223	tree-wide: drop license boilerplate Files which are installed as-is (any .service and other unit files, .conf files, .policy files, etc), are left as is. My assumption is that SPDX identifiers are not yet that well known, so it's better to retain the extended header to avoid any doubt. I also kept any copyright lines. We can probably remove them, but it'd nice to obtain explicit acks from all involved authors before doing that.	2018-04-06 18:58:55 +02:00
Yu Watanabe	1cc6c93a95	tree-wide: use TAKE_PTR() and TAKE_FD() macros	2018-04-05 14:26:26 +09:00
Dimitri John Ledkov	e64c2d0b5f	core: use setreuid/setregid trick to create session keyring with right ownership (#8447 ) Re-use the hacks used to link user keyring, when creating the session keyring. This way changing ownership of the keyring is not required, and thus incovation_id can be correctly created in restricted environments. Creating invocation_id with root permissions works and linking it into session keyring works, as at that point session keyring is possessed. Simple way to validate this is with following commands: $ journalctl -f & $ sudo systemd-run --uid 1000 /bin/sh -c 'keyctl describe @s; keyctl list @s; keyctl read `keyctl search @s user invocation_id`' which now works in LXD containers as well as on the host. Fixes: https://github.com/systemd/systemd/issues/7655	2018-03-27 12:58:10 +02:00
Lennart Poettering	959071cac2	Merge pull request #8552 from keszybz/test-improvements Test and diagnostics improvements	2018-03-23 15:26:54 +01:00
Zbigniew Jędrzejewski-Szmek	37c1d5e97d	tree-wide: warn when a directory path already exists but has bad mode/owner/type When we are attempting to create directory somewhere in the bowels of /var/lib and get an error that it already exists, it can be quite hard to diagnose what is wrong (especially for a user who is not aware that the directory must have the specified owner, and permissions not looser than what was requested). Let's print a warning in most cases. A warning is appropriate, because such state is usually a sign of borked installation and needs to be resolved by the adminstrator. $ build/test-fs-util Path "/tmp/test-readlink_and_make_absolute" already exists and is not a directory, refusing. (or) Directory "/tmp/test-readlink_and_make_absolute" already exists, but has mode 0775 that is too permissive (0755 was requested), refusing. (or) Directory "/tmp/test-readlink_and_make_absolute" already exists, but is owned by 1001:1000 (1000:1000 was requested), refusing. Assertion 'mkdir_safe(tempdir, 0755, getuid(), getgid(), MKDIR_WARN_MODE) >= 0' failed at ../src/test/test-fs-util.c:320, function test_readlink_and_make_absolute(). Aborting. No functional change except for the new log lines.	2018-03-23 10:26:38 +01:00
Lennart Poettering	ae2a15bc14	macro: introduce TAKE_PTR() macro This macro will read a pointer of any type, return it, and set the pointer to NULL. This is useful as an explicit concept of passing ownership of a memory area between pointers. This takes inspiration from Rust: https://doc.rust-lang.org/std/option/enum.Option.html#method.take and was suggested by Alan Jenkins (@sourcejedi). It drops ~160 lines of code from our codebase, which makes me like it. Also, I think it clarifies passing of ownership, and thus helps readability a bit (at least for the initiated who know the new macro)	2018-03-22 20:21:42 +01:00
Zbigniew Jędrzejewski-Szmek	d50b5839b0	basic/mkdir: convert bool flag to enum In preparation for subsequent changes...	2018-03-22 15:57:56 +01:00
Lennart Poettering	2b33ab0957	tree-wide: port various places over to use new rearrange_stdio()	2018-03-02 11:42:10 +01:00
Zbigniew Jędrzejewski-Szmek	30c81ce2ce	pid1: when creating service directories, don't chown existing files (#8181 ) This partially reverts `3536f49e8f` and `3536f49e8f`. When the user is dynamic, and we are setting up state, cache, or logs dirs, behaviour is unchanged, we always do a recursive chown. This is necessary because the user number might change between invocations. But when setting up a directory for non-dynamic user, or a runtime directory for a dynamic user, do any ownership or mode changes only when the directory is initially created. Nothing says that the files under those directories have to be all recursively owned by our user. This restores behaviour before `3536f49e8f`, so modifications to the state of the runtime directory persist between ExecStartPre's and ExecStart's, and even longer in case the directory is persistent. I think it _would_ be a nice property if setting a user would automatically propagate to ownership of any Runtime/Logs/Cache directories. But this is incompatible with another nice property, namely preserving changes to those directories made by an admin, and with allowing change of ownership of files in those directories by the service (e.g. to allow other users to access them). Of the two, I think the second property is more important. Also, it's backwards compatible. https://bugzilla.redhat.com/show_bug.cgi?id=1508495 There is no need to chmod a directory we just created, so move that step up into a branch. After that, 'effective' is only used once, so get rid of it too.	2018-02-22 11:30:59 +01:00
Yu Watanabe	2abd4e388a	core: add new setting TemporaryFileSystem= This introduces a new setting TemporaryFileSystem=. This is useful to hide files not relevant to the processes invoked by unit, while necessary files or directories can be still accessed by combining with Bind{,ReadOnly}Paths=.	2018-02-21 09:17:52 +09:00
Yu Watanabe	4ca763a902	core/namespace: make '-' prefix in Bind{,ReadOnly}Paths= work Each path in `Bind{ReadOnly}Paths=` accept '-' prefix. However, the prefix is completely ignored. This makes it work as expected.	2018-02-21 09:07:56 +09:00
Yu Watanabe	8e06d57ccb	core/execute: clear bind_mounts	2018-02-21 09:05:37 +09:00
Yu Watanabe	a635a7aec6	core/execute: simplify compile_bind_mounts() It is not necessary to re-assign error code.	2018-02-21 09:05:35 +09:00
Lennart Poettering	7b91264852	terminal-util: make resolve_dev_console() less weird Let's normalize the behaviour: return a negative errno style error code, and return the resolved string directly as argument.	2018-02-14 17:30:37 +01:00
Lennart Poettering	8854d79504	terminal-util: rework acquire_terminal() This modernizes acquire_terminal() in a couple of ways: 1. The three boolean arguments are replaced by a flags parameter, that should be more descriptive in what it does. 2. We now properly handle inotify queue overruns 3. We use _cleanup_ for closing the fds now, to shorten the code quite a bit. Behaviour should not be altered by this.	2018-02-13 21:24:37 +01:00
Yu Watanabe	34cf6c4340	core/execute: make arguments constant if possible Also make functions static if possible.	2018-02-06 16:00:50 +09:00
Yu Watanabe	e8a565cb66	core: make ExecRuntime be manager managed object Before this, each ExecRuntime object is owned by a unit. However, it may be shared with other units which enable JoinsNamespaceOf=. Thus, by the serialization/deserialization process, its sharing information, more specifically, reference counter is lost, and causes issue #7790. This makes ExecRuntime objects be managed by manager, and changes the serialization/deserialization process. Fixes #7790.	2018-02-06 16:00:34 +09:00
Zbigniew Jędrzejewski-Szmek	5035800495	Merge pull request #7763 from yuwata/fix-7761 Revert "core/execute: RuntimeDirectory= or friends requires mount namespace"	2018-01-05 12:38:29 +01:00
Lennart Poettering	2e87a1fde9	tree-wide: make use of wait_for_terminate_and_check() at various places Using wait_for_terminate_and_check() instead of wait_for_terminate() let's us simplify, shorten and unify the return value checking and logging of waitid(). Hence, let's use it all over the place.	2018-01-04 13:27:27 +01:00
Yu Watanabe	4657abb5d4	execute: make "runtime" argument const in exec_needs_mount_namespace() The argument can be const, then let's make so.	2018-01-04 00:48:18 +09:00
Yu Watanabe	b43ee82fc1	core: RuntimeDirectory= does not request new mount namespace Now RuntimeDirectory= does not create 'private' directory. Thus, it is not neccessary to request new mount namespace. Follow-up for `8092a48cc1`.	2018-01-04 00:26:35 +09:00
Yu Watanabe	42b1d8e0f5	Revert "core/execute: RuntimeDirectory= or friends requires mount namespace" This reverts commit `652bb2637a`. Fixes #7761.	2018-01-04 00:26:11 +09:00
Lennart Poettering	4c253ed1ca	tree-wide: introduce new safe_fork() helper and port everything over This adds a new safe_fork() wrapper around fork() and makes use of it everywhere. The new wrapper does a couple of things we previously did manually and separately in a safer, more correct and automatic way: 1. Optionally resets signal handlers/mask in the child 2. Sets a name on all processes we fork off right after forking off (and the patch assigns useful names for all processes we fork off now, following a systematic naming scheme: always enclosed in () – in order to indicate that these are not proper, exec()ed processes, but only forked off children, and if the process is long-running with only our own code, without execve()'ing something else, it gets am "sd-" prefix.) 3. Optionally closes all file descriptors in the child 4. Optionally sets a PR_SET_DEATHSIG to SIGTERM in the child, in a safe way so that the parent dying before this happens being handled safely. 5. Optionally reopens the logs 6. Optionally connects stdin/stdout/stderr to /dev/null 7. Debug logs about the forked off processes.	2017-12-25 11:48:21 +01:00
Yu Watanabe	586290017d	tree-wide: use !strv_isempty() instead of strv_length() > 0	2017-12-19 10:43:57 +09:00
Lennart Poettering	f1d34068ef	tree-wide: add DEBUG_LOGGING macro that checks whether debug logging is on (#7645 ) This makes things a bit easier to read I think, and also makes sure we always use the _unlikely_ wrapper around it, which so far we used sometimes and other times we didn't. Let's clean that up.	2017-12-15 11:09:00 +01:00
Lennart Poettering	234519ae6d	tree-wide: drop a few == NULL and != NULL comparison Our CODING_STYLE suggests not comparing with NULL, but relying on C's downgrade-to-bool feature for that. Fix up some code to match these guidelines. (This is not comprehensive, the coccinelle output for this is unfortunately kinda borked)	2017-12-11 16:05:40 +01:00
Yu Watanabe	da681e1bd2	tree-wide: use cpu_set_mfree()	2017-12-06 10:32:38 +09:00
Yu Watanabe	7f59dd3566	execute: define the variable mac_selinux_contex_net only when build with SELinux	2017-12-05 14:08:09 +09:00
Yu Watanabe	92b423b9b4	execute: define setup_smack() only if SMACK is enabled This suppresses the following warning ``` execute.c:2149:12: warning: ‘setup_smack’ defined but not used [-Wunused-function] static int setup_smack( ^~~~~~~~~~~ ```	2017-12-05 14:04:15 +09:00
Lennart Poettering	949befd3f0	core: support upgrading from DynamicUser=0 to DynamicUser=1 for unit directories (#7507 ) This makes sure we migrate /var/lib/<foo> if it exists to /var/lib/private/<foo> if DynamicUser=1 is set. This is useful to allow turning on DynamicUser= on services that previously didn't use it, and we can deal with this, and migrate the relevant directories as necessary. Note that "downgrading" from DynamicUser=1 backto DynamicUser=0 works too. However in that case we simply continue to use /var/lib/private/<foo>, which works because /var/lib/<foo> is a symlink there after all.	2017-11-30 11:52:39 +01:00
Lennart Poettering	62b9bb2661	cgroup-util: merge cg_set_tasks_access() and cg-set_group_access() into one We never use these functions seperately, hence don't bother splitting them into to. Also, simplify things a bit, and maintain tables for the attribute files to chown. Let's also update those tables a bit, and include thenew "cgroup.threads" file in it, that needs to be delegated too, according to the documentation.	2017-11-25 17:08:21 +01:00
jobol	37ac2744cc	core/exec: Restore SmackProcessLabel setting (#7378 ) Smack LSM needs the capability CAP_MAC_ADMIN to allow setting of the current Smack exec label. Consequently, dropping capabilities must be done after changing the current exec label. This is only related to Smack LSM. But for clarity and regularity, all setting of security context moved before dropping capabilities. See Issue 7108	2017-11-21 12:01:13 +01:00
Lennart Poettering	0133d5553a	Merge pull request #7198 from poettering/stdin-stdout Add StandardInput=data, StandardInput=file:... and more	2017-11-19 19:49:11 +01:00
Zbigniew Jędrzejewski-Szmek	53e1b68390	Add SPDX license identifiers to source files under the LGPL This follows what the kernel is doing, c.f. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.	2017-11-19 19:08:15 +01:00
Lennart Poettering	befc4a800e	core: add exec_context_dump() support for fd: and file: stdio settings This was missing for using fdnames as stdio, let's add support for fdnames as well as file paths in one go.	2017-11-17 11:13:44 +01:00
Lennart Poettering	2038c3f584	core: add support for StandardInputFile= and friends These new settings permit specifiying arbitrary paths as stdin/stdout/stderr locations. We try to open/create them as necessary. Some special magic is applied: 1) if the same path is specified for both input and output/stderr, we'll open it only once O_RDWR, and duplicate them fd instead. 2) If we an AF_UNIX socket path is specified, we'll connect() to it, rather than open() it. This allows invoking systemd services with stdin/stdout/stderr connected to arbitrary foreign service sockets. Fixes: #3991	2017-11-17 11:13:44 +01:00
Lennart Poettering	e75a9ed176	execute: some extra asserts In some cases we checked for fd validity already explicitly, let's do this for all our fds.	2017-11-17 11:13:44 +01:00
Lennart Poettering	5073ff6bec	core: fold property_get_input_fdname() and property_get_output_fdname() into one property_get_output_fdname() already had two different control flows for stdout and stderr, it might as well handle stdin too, thus shortening our code a bit.	2017-11-17 11:13:44 +01:00
Lennart Poettering	3a274a218d	execute: fix type of open_terminal_as() flags parameter It's the flags parameter we propagate here, not the mode parameter, hence let's name it properly, and use the right type.	2017-11-17 11:13:44 +01:00
Lennart Poettering	08f3be7a38	core: add two new unit file settings: StandardInputData= + StandardInputText= Both permit configuring data to pass through STDIN to an invoked process. StandardInputText= accepts a line of text (possibly with embedded C-style escapes as well as unit specifiers), which is appended to the buffer to pass as stdin, followed by a single newline. StandardInputData= is similar, but accepts arbitrary base64 encoded data, and will not resolve specifiers or C-style escapes, nor append newlines. This may be used to pass input/configuration data to services, directly in-line from unit files, either in a cooked or in a more raw format.	2017-11-17 11:13:44 +01:00
Lennart Poettering	1fb0682e78	execute: check whether we are actually on a TTY before doing TIOCSCTTY Given that Linux assigns the same ioctl numbers ot multiple subsystems, we should be careful when invoking ioctls, so that we don't end up calling something we wouldn't want to call.	2017-11-17 11:13:44 +01:00
Lennart Poettering	046a82c1b2	fd-util: add new helper move_fd() and make use of it We are using the same pattern at various places: call dup2() on an fd, and close the old fd, usually in combination with some O_CLOEXEC fiddling. Let's add a little helper for this, and port a few obvious cases over.	2017-11-17 11:13:44 +01:00
Lennart Poettering	d3070fbdf6	core: implement /run/systemd/units/-based path for passing unit info from PID 1 to journald And let's make use of it to implement two new unit settings with it: 1. LogLevelMax= is a new per-unit setting that may be used to configure log priority filtering: set it to LogLevelMax=notice and only messages of level "notice" and lower (i.e. more important) will be processed, all others are dropped. 2. LogExtraFields= is a new per-unit setting for configuring per-unit journal fields, that are implicitly included in every log record generated by the unit's processes. It takes field/value pairs in the form of FOO=BAR. Also, related to this, one exisiting unit setting is ported to this new facility: 3. The invocation ID is now pulled from /run/systemd/units/ instead of cgroupfs xattrs. This substantially relaxes requirements of systemd on the kernel version and the privileges it runs with (specifically, cgroupfs xattrs are not available in containers, since they are stored in kernel memory, and hence are unsafe to permit to lesser privileged code). /run/systemd/units/ is a new directory, which contains a number of files and symlinks encoding the above information. PID 1 creates and manages these files, and journald reads them from there. Note that this is supposed to be a direct path between PID 1 and the journal only, due to the special runtime environment the journal runs in. Normally, today we shouldn't introduce new interfaces that (mis-)use a file system as IPC framework, and instead just an IPC system, but this is very hard to do between the journal and PID 1, as long as the IPC system is a subject PID 1 manages, and itself a client to the journal. This patch cleans up a couple of types used in journal code: specifically we switch to size_t for a couple of memory-sizing values, as size_t is the right choice for everything that is memory. Fixes: #4089 Fixes: #3041 Fixes: #4441	2017-11-16 12:40:17 +01:00
Yu Watanabe	3df90f24cc	core: allow to specify errno number in SystemCallErrorNumber=	2017-11-11 21:54:24 +09:00
Yu Watanabe	8cfa775f4f	core: add support to specify errno in SystemCallFilter= This makes each system call in SystemCallFilter= blacklist optionally takes errno name or number after a colon. The errno takes precedence over the one given by SystemCallErrorNumber=. C.f. #7173. Closes #7169.	2017-11-11 21:54:12 +09:00
Yu Watanabe	8092a48cc1	core/execute: do not create RuntimeDirectory= under private/ sub-directory RuntimeDirectory= often used for sharing files or sockets with other services. So, if creating them under private/ sub-directory, we cannot set DynamicUser= to service units which want to share something through RuntimeDirectory=. This makes the directories given by RuntimeDirectory= are created under /run/ even if DynamicUser= is set. Fixes #7260.	2017-11-08 15:50:58 +09:00
Yu Watanabe	652bb2637a	core/execute: RuntimeDirectory= or friends requires mount namespace Since #6940, RuntimeDirectory= or their friends imply BindPaths=. So, if at least one of them are set, mount namespace is required.	2017-11-08 15:48:51 +09:00
Yu Watanabe	7bcef4efe6	core: remove compile_read_write_paths() From `6c47cd7d3b`, RuntimeDirectory= and their friends also imply BindPaths=. Thus, implying ReadWritePaths= is meaningless.	2017-11-08 15:07:22 +09:00
Zbigniew Jędrzejewski-Szmek	895265ad7d	Merge pull request #7059 from yuwata/dynamic-user-7013 dynamic-user: permit the case static uid and gid are different	2017-10-18 08:37:12 +02:00
Yu Watanabe	e2b0cc3415	core: fix invalid error message The error message corresponds to EILSEQ is "Invalid or incomplete multibyte or wide character", and is not suitable in this case. So, let's show a custom error message when the function dynamic_creds_realize() returns -EILSEQ.	2017-10-18 08:57:59 +09:00
Yu Watanabe	709dbeac14	core: cleanup for enforce_groups() (#7064 ) SupplementaryGroups= is preprocessed in get_supplementary_groups(). So, it is not necessary to input ExecContext to enforce_groups().	2017-10-12 08:10:25 +02:00
Yu Watanabe	a8cabc612b	core: fix segfault in compile_bind_mounts() when BindPaths= or BindReadOnlyPaths= is set This fixes a bug introduced by `6c47cd7d3b`. Fixes #7055.	2017-10-11 12:28:22 +09:00
Zbigniew Jędrzejewski-Szmek	7081228acd	Merge pull request #7045 from poettering/namespace-casing some super-trivial fixes to namespace.c	2017-10-10 21:50:17 +02:00
Lennart Poettering	b74023db06	Merge pull request #7003 from yuwata/enable-dynamic-user timesyncd, journal-upload: Enable DynamicUser=	2017-10-10 10:05:43 +02:00
Lennart Poettering	bb0ff3fb1b	namespace: change NameSpace → Namespace We generally use the casing "Namespace" for the word, and that's visible in a number of user-facing interfaces, including "RestrictNamespace=" or "JoinsNamespaceOf=". Let's make sure to use the same casing internally too. As discussed in #7024	2017-10-10 09:51:58 +02:00
Michal Sekletar	6e2d7c4f13	namespace: fall back gracefully when kernel doesn't support network namespaces (#7024 )	2017-10-10 09:46:13 +02:00
Yu Watanabe	c31ad02403	mkdir: introduce follow_symlink flag to mkdir_safe{,_label}()	2017-10-06 16:03:33 +09:00
Lennart Poettering	4aa1d31c89	Merge pull request #6974 from keszybz/clean-up-defines Clean up define definitions	2017-10-04 19:25:30 +02:00
Lennart Poettering	5ad90fe376	Merge pull request #6985 from yuwata/empty load-fragment: do not create empty array	2017-10-04 17:54:35 +02:00
Yu Watanabe	4c70109600	tree-wide: use IN_SET macro (#6977 )	2017-10-04 16:01:32 +02:00
Zbigniew Jędrzejewski-Szmek	f9fa32f09c	build-sys: s/HAVE_SMACK/ENABLE_SMACK/ Same justification as for HAVE_UTMP.	2017-10-04 12:09:50 +02:00
Zbigniew Jędrzejewski-Szmek	349cc4a507	build-sys: use #if Y instead of #ifdef Y everywhere The advantage is that is the name is mispellt, cpp will warn us. $ git grep -Ee "conf.set$'(HAVE\|ENABLE)_" -l\|xargs sed -r -i "s/conf.set\('(HAVE\|ENABLE)_/conf.set10('\1_/" $ git grep -Ee '#ifn?def (HAVE\|ENABLE)' -l\|xargs sed -r -i 's/#ifdef (HAVE\|ENABLE)/#if \1/; s/#ifndef (HAVE\|ENABLE)/#if ! \1/;' $ git grep -Ee 'if.defined\(HAVE' -l\|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_])$/\1/g' $ git grep -Ee 'if.defined$ENABLE' -l\|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_])$/\1/g' + manual changes to meson.build squash! build-sys: use #if Y instead of #ifdef Y everywhere v2: - fix incorrect setting of HAVE_LIBIDN2	2017-10-04 12:09:29 +02:00
Zbigniew Jędrzejewski-Szmek	ac6e8be66e	core: use strv_isempty to check if supplementary_groups is empty With the previous commit, we know that it will be NULL if empty, but it's safe to always use strv_isempty() in case the code changes in the future.	2017-10-04 11:33:30 +02:00
Lennart Poettering	da50b85af7	core: when looking for a UID to use for a dynamic UID start with the current owner of the StateDirectory= and friends Let's optimize dynamic UID allocation a bit: if a StateDirectory= (or suchlike) is configured, we start our allocation loop from that UID and use it if it currently isn't used otherwise. This is beneficial as it saves us from having to expensively recursively chown() these directories in the typical case (which StateDirectory= does when it notices that the owner of the directory doesn't match the UID picked). With this in place we now have the a three-phase logic for allocating a dynamic UID: a) first, we try to use the owning UID of StateDirectory=, CacheDirectory=, LogDirectory= if that exists and is currently otherwise unused. b) if that didn't work out, we hash the UID from the service name c) if that didn't yield an unused UID either, randomly pick new ones until we find a free one.	2017-10-02 17:41:44 +02:00
Lennart Poettering	6c47cd7d3b	execute: make StateDirectory= and friends compatible with DynamicUser=1 and RootDirectory=/RootImage= Let's clean up the interaction of StateDirectory= (and friends) to DynamicUser=1: instead of creating these directories directly below /var/lib, place them in /var/lib/private instead if DynamicUser=1 is set, making that directory 0700 and owned by root:root. This way, if a dynamic UID is later reused, access to the old run's state directory is prohibited for that user. Then, use file system namespacing inside the service to make /var/lib/private a readable tmpfs, hiding all state directories that are not listed in StateDirectory=, and making access to the actual state directory possible. Mount all directories listed in StateDirectory= to the same places inside the service (which means they'll now be mounted into the tmpfs instance). Finally, add a symlink from the state directory name in /var/lib/ to the one in /var/lib/private, so that both the host and the service can access the path under the same location. Here's an example: let's say a service runs with StateDirectory=foo. When DynamicUser=0 is set, it will get the following setup, and no difference between what the unit and what the host sees: /var/lib/foo (created as directory) Now, if DynamicUser=1 is set, we'll instead get this on the host: /var/lib/private (created as directory with mode 0700, root:root) /var/lib/private/foo (created as directory) /var/lib/foo → private/foo (created as symlink) And from inside the unit: /var/lib/private (a tmpfs mount with mode 0755, root:root) /var/lib/private/foo (bind mounted from the host) /var/lib/foo → private/foo (the same symlink as above) This takes inspiration from how container trees are protected below /var/lib/machines: they generally reuse UIDs/GIDs of the host, but because /var/lib/machines itself is set to 0700 host users cannot access files in the container tree even if the UIDs/GIDs are reused. However, for this commit we add one further trick: inside and outside of the unit /var/lib/private is a different thing: outside it is a plain, inaccessible directory, and inside it is a world-readable tmpfs mount with only the whitelisted subdirs below it, bind mounte din. This means, from the outside the dir acts as an access barrier, but from the inside it does not. And the symlink created in /var/lib/foo itself points across the barrier in both cases, so that root and the unit's user always have access to these dirs without knowing the details of this mounting magic. This logic resolves a major shortcoming of DynamicUser=1 units: previously they couldn't safely store persistant data. With this change they can have their own private state, log and data directories, which they can write to, but which are protected from UID recycling. With this change, if RootDirectory= or RootImage= are used it is ensured that the specified state/log/cache directories are always mounted in from the host. This change of semantics I think is much preferable since this means the root directory/image logic can be used easily for read-only resource bundling (as all writable data resides outside of the image). Note that this is a change of behaviour, but given that we haven't released any systemd version with StateDirectory= and friends implemented this should be a safe change to make (in particular as previously it wasn't clear what would actually happen when used in combination). Moreover, by making this change we can later add a "+" modifier to these setings too working similar to the same modifier in ReadOnlyPaths= and friends, making specified paths relative to the container itself.	2017-10-02 17:41:44 +02:00
Lennart Poettering	72fd17682d	core: usually our enum's _INVALID and _MAX special values are named after the full type In most cases we followed the rule that the special _INVALID and _MAX values we use in our enums use the full type name as prefix (in contrast to regular values that we often make shorter), do so for ExecDirectoryType as well. No functional changes, just a little bit of renaming to make this code more like the rest.	2017-10-02 17:41:43 +02:00
Lennart Poettering	a1164ae380	core: chown() StateDirectory= and friends recursively when starting a service This is particularly useful when used in conjunction with DynamicUser=1, where the UID might change for every invocation, but is useful in other cases too, for example, when these directories are shared between systems where the UID assignments differ slightly.	2017-10-02 17:41:43 +02:00
Lennart Poettering	40a80078d2	execute: let's close glibc syslog channels too Just in case something opened them, let's make sure glibc invalidates them too. Thankfully so far no library opened log channels behind our back, at least as far as I know, hence this is actually a NOP, but let's better be safe than sorry.	2017-09-26 17:52:25 +02:00
Lennart Poettering	12145637e9	execute: normalize logging in execute.c Now that logging can implicitly reopen the log streams when needed we can log errors without any special magic, hence let's normalize things, and log the same way we do everywhere else.	2017-09-26 17:51:22 +02:00
Lennart Poettering	86ffb32560	execute: drop explicit log_open()/log_close() now that it is unnecessary	2017-09-26 17:46:34 +02:00
Lennart Poettering	2c027c62dd	execute: make use of the new logging mode in execute.c	2017-09-26 17:46:34 +02:00
Lennart Poettering	82677ae4c7	execute: downgrade a log message ERR → WARNING, since we proceed ignoring its result	2017-09-26 17:46:33 +02:00
Lennart Poettering	8002fb9747	execute: rework logging in setup_keyring() to include unit info Let's use log_unit_error() instead of log_error() everywhere (and friends).	2017-09-26 17:46:33 +02:00
Lennart Poettering	e6a7ec4b8e	io-util: add new IOVEC_INIT/IOVEC_MAKE macros This adds IOVEC_INIT() and IOVEC_MAKE() for initializing iovec structures from a pointer and a size. On top of these IOVEC_INIT_STRING() and IOVEC_MAKE_STRING() are added which take a string and automatically determine the size of the string using strlen(). This patch removes the old IOVEC_SET_STRING() macro, given that IOVEC_MAKE_STRING() is now useful for similar purposes. Note that the old IOVEC_SET_STRING() invocations were two characters shorter than the new ones using IOVEC_MAKE_STRING(), but I think the new syntax is more readable and more generic as it simply resolves to a C99 literal structure initialization. Moreover, we can use very similar syntax now for initializing strings and pointer+size iovec entries. We canalso use the new macros to initialize function parameters on-the-fly or array definitions. And given that we shouldn't have so many ways to do the same stuff, let's just settle on the new macros. (This also converts some code to use _cleanup_ where dynamically allocated strings were using IOVEC_SET_STRING() before, to modernize things a bit)	2017-09-22 15:28:04 +02:00
Lennart Poettering	f1c50becda	core: make sure to log invocation ID of units also when doing structured logging	2017-09-22 15:24:55 +02:00
Jan Synacek	f679ed6116	execute: fix typo in error message (#6881 )	2017-09-21 10:38:52 +02:00
Zbigniew Jędrzejewski-Szmek	61ceaea5c8	Move one space from dbus-execute.c to execute.c The number of spaces is conserved ;)	2017-09-16 08:45:02 +02:00
Zbigniew Jędrzejewski-Szmek	3d7d3cbbda	Merge pull request #6832 from poettering/keyring-mode Add KeyringMode unit property to fix cryptsetup key caching	2017-09-15 21:24:48 +02:00
Lennart Poettering	b1edf4456e	core: add new per-unit setting KeyringMode= for controlling kernel keyring setup Usually, it's a good thing that we isolate the kernel session keyring for the various services and disconnect them from the user keyring. However, in case of the cryptsetup key caching we actually want that multiple instances of the cryptsetup service can share the keys in the root user's user keyring, hence we need to be able to disable this logic for them. This adds KeyringMode=inherit\|private\|shared: inherit: don't do any keyring magic (this is the default in systemd --user) private: a private keyring as before (default in systemd --system) shared: the new setting	2017-09-15 16:53:35 +02:00
Lennart Poettering	0460aa5c08	execute: improve and augment execution log messages Let's generate friendly messages for more cases, and make slight adjustments to the existing messages.	2017-09-15 16:43:06 +02:00
Lennart Poettering	ab2116b140	core: make sure that $JOURNAL_STREAM prefers stderr over stdout information (#6824 ) If two separate log streams are connected to stdout and stderr, let's make sure $JOURNAL_STREAM points to the latter, as that's the preferred log destination, and the environment variable has been created in order to permit services to automatically upgrade from stderr based logging to native journal logging. Also, document this behaviour. Fixes: #6800	2017-09-15 08:26:38 +02:00
Lennart Poettering	00819cc151	core: add new UnsetEnvironment= setting for unit files With this setting we can explicitly unset specific variables for processes of a unit, as last step of assembling the environment block for them. This is useful to fix #6407. While we are at it, greatly expand the documentation on how the environment block for forked off processes is assembled.	2017-09-14 15:17:40 +02:00
Lennart Poettering	21022b9dde	util-lib: wrap personality() to fix up broken glibc error handling (#6766 ) glibc appears to propagate different errors in different ways, let's fix this up, so that our own code doesn't get confused by this. See #6752 + #6737 for details. Fixes: #6755	2017-09-08 17:16:29 +03:00
Lennart Poettering	aac8c0c382	execute: minor ExecOutput handling beautification (#6711 ) Let's clean up the checking for the various ExecOutput values a bit, let's use IN_SET everywhere, and the same concepts for all three bools we pass to dprintf().	2017-09-01 09:04:27 +09:00
Lennart Poettering	e8132d63fe	seccomp: default to something resembling the current personality when locking it Let's lock the personality to the currently set one, if nothing is specifically specified. But do so with a grain of salt, and never default to any exotic personality here, but only PER_LINUX or PER_LINUX32.	2017-08-29 15:56:57 +02:00
Topi Miettinen	78e864e5b3	seccomp: LockPersonality boolean (#6193 ) Add LockPersonality boolean to allow locking down personality(2) system call so that the execution domain can't be changed. This may be useful to improve security because odd emulations may be poorly tested and source of vulnerabilities, while system services shouldn't need any weird personalities.	2017-08-29 15:54:50 +02:00
Yu Watanabe	5ce96b141a	Merge pull request #6582 from poettering/logind-tty various tty path parsing fixes	2017-08-26 22:12:48 +09:00
Lennart Poettering	165a31c0db	core: add two new special ExecStart= character prefixes This patch adds two new special character prefixes to ExecStart= and friends, in addition to the existing "-", "@" and "+": "!" → much like "+", except with a much reduced effect as it only disables the actual setresuid()/setresgid()/setgroups() calls, but leaves all other security features on, including namespace options. This is very useful in combination with RuntimeDirectory= or DynamicUser= and similar option, as a user is still allocated and used for the runtime directory, but the actual UID/GID dropping is left to the daemon process itself. This should make RuntimeDirectory= a lot more useful for daemons which insist on doing their own privilege dropping. "!!" → Similar to "!", but on systems supporting ambient caps this becomes a NOP. This makes it relatively straightforward to write unit files that make use of ambient capabilities to let systemd drop all privs while retaining compatibility with systems that lack ambient caps, where priv dropping is the left to the daemon codes themselves. This is an alternative approach to #6564 and related PRs.	2017-08-10 15:04:32 +02:00
Lennart Poettering	43b1f7092d	execute: needs_{selinux,apparmor,smack} → use_{selinux,apparmor,smack} These booleans simply store whether selinux/apparmor/smack are supposed ot be used, and chache the various mac_xyz_use() calls before we transition into the namespace, hence let's use the same verb for the variables and the functions: "use"	2017-08-10 15:02:50 +02:00
Lennart Poettering	9f6444eb92	execute: make use of IN_SET() where we can	2017-08-10 15:02:50 +02:00
Lennart Poettering	937ccce94c	execute: simplify needs_sandboxing checking Let's merge three if blocks that shall only run when sandboxing is applied into one. Note that this changes behaviour in one corner case: PrivateUsers=1 is now honours both PermissionsStartOnly= and the "+" modifier in ExecStart=, and not just the former, as before. This was an oversight, so let's fix this now, at a point in time the option isn't used much yet.	2017-08-10 15:02:50 +02:00
Lennart Poettering	1703fa41a7	core: rename EXEC_APPLY_PERMISSIONS → EXEC_APPLY_SANDBOXING "Permissions" was a bit of a misnomer, as it suggests that UNIX file permission bits are adjusted, which aren't really changed here. Instead, this is about UNIX credentials such as users or groups, as well as namespacing, hence let's use a more generic term here, without any misleading reference to UNIX file permissions: "sandboxing", which shall refer to all kinds of sandboxing technologies, including UID/GID dropping, selinux relabelling, namespacing, seccomp, and so on.	2017-08-10 15:02:50 +02:00
Lennart Poettering	584b8688d1	execute: also fold the cgroup delegate bit into ExecFlags	2017-08-10 15:02:50 +02:00
Lennart Poettering	ac6479781e	execute: also control the SYSTEMD_NSS_BYPASS_BUS through an ExecFlags field Also, correct the logic while we are at it: the variable is only required for system services, not user services.	2017-08-10 15:02:49 +02:00
Lennart Poettering	c71b2eb77e	core: don't chown() the configuration directory The configuration directory is commonly not owned by a service, but remains root-owned, hence don't change the owner automatically for it.	2017-08-10 15:02:49 +02:00
Lennart Poettering	8679efde21	execute: add one more ExecFlags flag, for controlling unconditional directory chowning Let's decouple the Manager object from the execution logic a bit more here too, and simply pass along the fact whether we should unconditionally chown the runtime/... directories via the ExecFlags field too.	2017-08-10 14:44:58 +02:00
Lennart Poettering	af635cf377	execute: let's decouple execute.c a bit from the unit logic Let's try to decouple the execution engine a bit from the Unit/Manager concept, and hence pass one more flag as part of the ExecParameters flags field.	2017-08-10 14:44:58 +02:00
Lennart Poettering	3ed0cd26ea	execute: replace command flag bools by a flags field This way, we can extend it later on in an easier way, and can pass it along nicely.	2017-08-10 14:44:58 +02:00
Lennart Poettering	a119ec7c82	util-lib: add a new skip_dev_prefix() helper This new helper removes a leading /dev if there is one. We have code doing this all over the place, let's unify this, and correct it while we are at it, by using path_startswith() rather than startswith() to drop the prefix.	2017-08-09 19:01:18 +02:00
Yu Watanabe	07d46372fe	securebits-util: add secure_bits_{from_string,to_string_alloc}()	2017-08-07 23:40:25 +09:00
Yu Watanabe	dd1f5bd0aa	cap-list: add capability_set_{from_string,to_string_alloc}()	2017-08-07 23:25:11 +09:00
Yu Watanabe	837df14040	core: do not ignore returned values	2017-08-06 23:34:55 +09:00
Yu Watanabe	ecfbc84f1c	core: define variables only when they are required Follow-up for `7f18ef0a55`.	2017-08-06 13:08:34 +09:00
Fabio Kung	7f18ef0a55	core: check which MACs to use before a new mount ns is created (#6498 ) /sys is not guaranteed to exist when a new mount namespace is created. It is only mounted under conditions specified by `namespace_info_mount_apivfs`. Checking if the three available MAC LSMs are enabled requires a sysfs mounted at /sys, so the checks are moved to before a new mount ns is created.	2017-08-01 09:15:18 +02:00
Lennart Poettering	c867611e0a	execute: don't pass unit ID in --user mode to journald for stream logging When we create a log stream connection to journald, we pass along the unit ID. With this change we do this only when we run as system instance, not as user instance, to remove the ambiguity whether a user or system unit is specified. The effect of this change is minor: journald ignores the field anyway from clients with UID != 0. This patch hence only fixes the unit attribution for the --user instance of the root user.	2017-07-31 18:01:42 +02:00
Lennart Poettering	92a17af991	execute: make some code shorter Let's simplify some lines to make it shorter.	2017-07-31 18:01:42 +02:00
Lennart Poettering	cad93f2996	core, sd-bus, logind: make use of uid_is_valid() in more places	2017-07-31 18:01:42 +02:00
Lennart Poettering	df0ff12775	tree-wide: make use of getpid_cached() wherever we can This moves pretty much all uses of getpid() over to getpid_raw(). I didn't specifically check whether the optimization is worth it for each replacement, but in order to keep things simple and systematic I switched over everything at once.	2017-07-20 20:27:24 +02:00
Yu Watanabe	3536f49e8f	core: add {State,Cache,Log,Configuration}Directory= (#6384 ) This introduces {State,Cache,Log,Configuration}Directory= those are similar to RuntimeDirectory=. They create the directories under /var/lib, /var/cache/, /var/log, or /etc, respectively, with the mode specified in {State,Cache,Log,Configuration}DirectoryMode=. This also fixes #6391.	2017-07-18 14:34:52 +02:00
Lennart Poettering	688230d3a7	Merge pull request #6354 from walyong/smack_process_label_free core: modify resource leak and missed security context dump	2017-07-17 10:04:12 +02:00
Yu Watanabe	23a7448efa	core: support subdirectories in RuntimeDirectory= option	2017-07-17 16:30:53 +09:00
Yu Watanabe	53f47dfc7b	core: allow preserving contents of RuntimeDirectory= over process restart This introduces RuntimeDirectoryPreserve= option which takes a boolean argument or 'restart'. Closes #6087.	2017-07-17 16:22:25 +09:00
WaLyong Cho	80c21aea11	core: dump also missed security context	2017-07-13 13:12:24 +09:00
WaLyong Cho	5b8e1b7755	core: modify resource leak by SmackProcessLabel=	2017-07-13 13:12:15 +09:00
Lennart Poettering	782c925f7f	Revert "core: link user keyring to session keyring (#6275 )" (#6342 ) This reverts commit `437a85112e`. The outcome of this isn't that clear, let's revert this for now, see discussion on #6286.	2017-07-12 10:00:43 -04:00
Christian Hesse	437a85112e	core: link user keyring to session keyring (#6275 ) Commit `74dd6b515f` (core: run each system service with a fresh session keyring) broke adding keys to user keyring. Added keys could not be accessed with error message: keyctl_read_alloc: Permission denied So link the user keyring to our session keyring.	2017-07-04 09:38:31 +02:00
Lennart Poettering	7f452159b8	core: make IOSchedulingClass= and IOSchedulingPriority= settable for transient units This patch is a bit more complex thant I hoped. In particular the single IOScheduling= property exposed on the bus is split up into IOSchedulingClass= and IOSchedulingPriority= (though compat is retained). Otherwise the asymmetry between setting props and getting them is a bit too nasty. Fixes #5613	2017-06-26 17:43:18 +02:00
Zbigniew Jędrzejewski-Szmek	7e867138f5	Merge pull request #5600 from fbuihuu/make-logind-restartable Make logind restartable.	2017-06-24 18:58:36 -04:00
Franck Bui	4c47affcf1	core: remove the redundancy of 'n_fds' and 'n_storage_fds' in ExecParameters struct 'n_fds' field in the ExecParameters structure was counting the total number of file descriptors to be passed to a unit. This counter also includes the number of passed socket fds which is counted by 'n_socket_fds' already. This patch removes that redundancy by replacing 'n_fds' with 'n_storage_fds'. The new field only counts the fds passed via the storage store mechanism. That way each fd is counted at one place only. Subsequently the patch makes sure to fix code that used 'n_fds' and also wanted to iterate through all of them by explicitly adding 'n_socket_fds' + 'n_storage_fds'. Suggested by Lennart.	2017-06-08 16:21:35 +02:00
Franck Bui	9b1419111a	core: only apply NonBlocking= to fds passed via socket activation Make sure to only apply the O_NONBLOCK flag to the fds passed via socket activation. Previously the flag was also applied to the fds which came from the fd store but this was incorrect since services, after being restarted, expect that these passed fds have their flags unchanged and can be reused as before. The documentation was a bit unclear about this so clarify it.	2017-06-06 22:42:50 +02:00
Zbigniew Jędrzejewski-Szmek	52511fae7b	core: fix warning about unsigned variable (#5935 ) Fixup for `d8c92e8bc7`.	2017-05-11 08:15:28 +02:00
Lennart Poettering	4e168f4606	Merge pull request #5420 from OpenDZ/tixxdz/namespace-fixes-v2 Namespace: RootImage= RootDirectory= and MountAPIVFS fixes	2017-05-09 20:42:32 +02:00
Aggelos Avgerinos	488ab41cb8	execute: Properly log errors considering socket fds (#5910 ) Till now if the params->n_fds was 0, systemd was logging that there were more than one sockets. Thanks @gregoryp and @VFXcode who did the most work debugging this.	2017-05-08 19:09:22 -04:00
Zbigniew Jędrzejewski-Szmek	d8c92e8bc7	execute: filter out "." for ".." in EnvironmentFile= globs too This doesn't really matter much, only in case somebody would use something strange like EnvironmentFile=/etc/something/.* Make sure that "." and ".." is not returned by that glob. This makes all our globbing patterns behave the same.	2017-04-27 13:21:08 -04:00
Djalal Harouni	74e941c022	Merge pull request #5774 from keszybz/printf-annotations Printf annotation improvements	2017-04-23 01:03:42 +02:00
Zbigniew Jędrzejewski-Szmek	ba360bb05c	tree-wide: mark log_struct with _printf_ and fix fallout log_struct takes multiple format strings, each one followed by arguments. The _printf_ annotation is not sufficiently flexible to express this, but we can still annotate the first format string, though not its arguments (because their number is unknown). With the annotation, the places which specified the message id or similar as the first pattern cause a warning from -Wformat-nonliteral. This can be trivially fixed by putting the MESSAGE= first. This change will help find issues where a non-literal is erroneously used as the pattern.	2017-04-21 13:37:04 -04:00
Yu Watanabe	4d8b0f0f7a	core: downgrade error message if command is prefixed with `-` and the command is not found Fixes #5621	2017-04-03 15:38:37 +09:00

... 2 3 4 5 6 ...

692 commits