Systemd

Author	SHA1	Message	Date
Zbigniew Jędrzejewski-Szmek	d94a24ca2e	Add macro for checking if some flags are set This way we don't need to repeat the argument twice. I didn't replace all instances. I think it's better to leave out: - asserts - comparisons like x & y == x, which are mathematically equivalent, but here we aren't checking if flags are set, but if the argument fits in the flags.	2018-06-04 11:50:44 +02:00
Yu Watanabe	858d36c1ec	path-util: introduce path_simplify() The function is similar to path_kill_slashes() but also removes initial './', trailing '/.', and '/./' in the path. When the second argument of path_simplify() is false, then it behaves as the same as path_kill_slashes(). Hence, this also replaces path_kill_slashes() with path_simplify().	2018-06-03 23:39:26 +09:00
Lennart Poettering	6f8fa29465	Merge pull request #8981 from keszybz/ratelimit-and-dbus Ratelimit renaming and dbus error message fix	2018-05-18 21:38:30 +02:00
Felipe Sateler	57b7a260c2	core: undo the dependency inversion between unit.h and all unit types	2018-05-15 14:24:34 -04:00
Yu Watanabe	af4fa99d6a	core: use _cleanup_set_free_ instread of _cleanup_(set_freep)	2018-05-14 14:13:57 +09:00
Zbigniew Jędrzejewski-Szmek	7994ac1d85	Rename ratelimit_test to ratelimit_below When I see "test", I have to think three times what the return value means. With "below" this is immediately clear. ratelimit_below(&limit) sounds almost like English and is imho immediately obvious. (I also considered ratelimit_ok, but this strongly implies that being under the limit is somehow better. Most of the times this is true, but then we use the ratelimit to detect triple-c-a-d, and "ok" doesn't fit so well there.) C.f. `a1bcaa07`.	2018-05-13 22:08:30 +02:00
David Tardon	95f14a3e21	core: use automatic cleanup more	2018-05-12 18:29:41 +02:00
Lennart Poettering	d4fd1cf208	core: enforce that scope units can be started only once Scope units are populated from PIDs specified by the bus client. We do that when a scope is started. We really shouldn't allow scopes to be started multiple times, as the PIDs then might be heavily out of date. Moreover, clients should have the guarantee that any scope they allocate has a clear runtime cycle which is not repetitive.	2018-04-27 21:52:45 +02:00
Lennart Poettering	7a9a0c05d4	Merge pull request #8765 from poettering/test-fixes some short fixes for the tests	2018-04-19 16:18:46 +02:00
Lennart Poettering	5d13a15b1d	tree-wide: drop spurious newlines (#8764 ) Double newlines (i.e. one empty lines) are great to structure code. But let's avoid triple newlines (i.e. two empty lines), quadruple newlines, quintuple newlines, …, that's just spurious whitespace. It's an easy way to drop 121 lines of code, and keeps the coding style of our sources a bit tigther.	2018-04-19 12:13:23 +02:00
Lennart Poettering	8f63253149	core: don't export per-unit metadata files in test mode We shouldn't clobber the host's /run directories with metadata we export for our units when we run in test mode.	2018-04-19 11:30:18 +02:00
Lennart Poettering	4d09e1c8ba	Merge pull request #8676 from keszybz/drop-license-boilerplate Drop license boilerplate	2018-04-10 14:53:31 +02:00
Zbigniew Jędrzejewski-Szmek	e9e8cbc83a	core: minor comment update	2018-04-07 20:05:58 +02:00
Zbigniew Jędrzejewski-Szmek	11a1589223	tree-wide: drop license boilerplate Files which are installed as-is (any .service and other unit files, .conf files, .policy files, etc), are left as is. My assumption is that SPDX identifiers are not yet that well known, so it's better to retain the extended header to avoid any doubt. I also kept any copyright lines. We can probably remove them, but it'd nice to obtain explicit acks from all involved authors before doing that.	2018-04-06 18:58:55 +02:00
Yu Watanabe	1cc6c93a95	tree-wide: use TAKE_PTR() and TAKE_FD() macros	2018-04-05 14:26:26 +09:00
Michal Sekletar	19496554e2	core: delay adding target dependencies until all units are loaded and aliases resolved (#8381 ) Currently we add target dependencies while we are loading units. This can create ordering loops even if configuration doesn't contain any loop. Take for example following configuration, $ systemctl get-default multi-user.target $ cat /etc/systemd/system/test.service [Unit] After=default.target [Service] ExecStart=/bin/true [Install] WantedBy=multi-user.target If we encounter such unit file early during manager start-up (e.g. load queue is dispatched while enumerating devices due to SYSTEMD_WANTS in udev rules) we would add stub unit default.target and we order it Before test.service. At the same time we add implicit Before to multi-user.target. Later we merge two units and we create ordering cycle in the process. To fix the issue we will now never add any target dependencies until we loaded all the unit files and resolved all the aliases.	2018-03-23 15:28:06 +01:00
Lennart Poettering	ae2a15bc14	macro: introduce TAKE_PTR() macro This macro will read a pointer of any type, return it, and set the pointer to NULL. This is useful as an explicit concept of passing ownership of a memory area between pointers. This takes inspiration from Rust: https://doc.rust-lang.org/std/option/enum.Option.html#method.take and was suggested by Alan Jenkins (@sourcejedi). It drops ~160 lines of code from our codebase, which makes me like it. Also, I think it clarifies passing of ownership, and thus helps readability a bit (at least for the initiated who know the new macro)	2018-03-22 20:21:42 +01:00
Lennart Poettering	31dc1ca3bf	move MANAGER_IS_RELOADING() check into manager_recheck_{dbus\|journal}() (#8510 ) Let's better check this inside of the call than before it, so that we never issue this while reloading, even should these calls be called due to other reasons than just the unit notify. This makes sure the reload state is unset a bit earlier in manager_reload() so that we can safely call this function from there and they do the right thing. Follow-up for `e63ebf71ed`.	2018-03-21 12:03:45 +01:00
Evgeny Vereshchagin	e4711004d6	Merge pull request #8461 from keszybz/oss-fuzz-fixes Oss fuzz fixes	2018-03-19 00:06:44 +03:00
Zbigniew Jędrzejewski-Szmek	ca8700e922	core/unit: delay creating a stack variable until after length has been checked path_is_normalized() will reject paths longer than 4095 bytes, so it's better to not create a stack variable of unbounded size, but instead do the check first and only then do that allocation. Also use _cleanup_ to make things a bit shorter. https://oss-fuzz.com/v2/issue/5424177403133952/7000	2018-03-18 21:07:01 +01:00
Zbigniew Jędrzejewski-Szmek	e63ebf71ed	core: when reloading, delay any actions on journal and dbus connections manager_recheck_journal() and manager_recheck_dbus() would be called to early while we were deserialiazing units, before the systemd-journald.service and dbus.service have been deserialized. In effect we'd disable logging to the journald and close the bus connection. The first is not very noticable, it mostly means that logs emitted during deserialization are lost. The second is more noticeable, because manager_recheck_dbus() would call bus_done_api() and bus_done_system() and close dbus connections. Logging and bus connection would then be restored later after the respective units have been deserialized. This is easily reproduced by calling: $ sudo gdbus call --system --dest org.freedesktop.systemd1 --object-path /org/freedesktop/systemd1 --method "org.freedesktop.systemd1.Manager.Reload" which works fine before `8559b3b75c`, and then starts failing with: Error: GDBus.Error:org.freedesktop.DBus.Error.NoReply: Remote peer disconnected None of this should happen, and we should delay changing state until after deserialization is complete when reloading. manager_reload() already included the calls to manager_recheck_journal() and manager_recheck_dbus(), so the connection state will be updated after deserialization during reloading is done. Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1554578.	2018-03-16 23:14:04 +01:00
Zbigniew Jędrzejewski-Szmek	dc409696cf	Introduce _cleanup_(unit_freep)	2018-03-11 16:33:58 +01:00
Zbigniew Jędrzejewski-Szmek	bea28c5adb	core/unit: voidify one snprintf statement One more follow-up for `f810b631cd`.	2018-02-26 15:49:27 +01:00
Zbigniew Jędrzejewski-Szmek	f810b631cd	Revert "Replace use of snprintf with xsprintf" This reverts commit `a7419dbc59`. _All_ changes in that commit were wrong. Fixes #8211.	2018-02-23 00:13:52 +01:00
Lennart Poettering	aa2b6f1d2b	bpf: rework how we keep track and attach cgroup bpf programs So, the kernel's management of cgroup/BPF programs is a bit misdesigned: if you attach a BPF program to a cgroup and close the fd for it it will stay pinned to the cgroup with no chance of ever removing it again (or otherwise getting ahold of it again), because the fd is used for selecting which BPF program to detach. The only way to get rid of the program again is to destroy the cgroup itself. This is particularly bad for root the cgroup (and in fact any other cgroup that we cannot realistically remove during runtime, such as /system.slice, /init.scope or /system.slice/dbus.service) as getting rid of the program only works by rebooting the system. To counter this let's closely keep track to which cgroup a BPF program is attached and let's implicitly detach the BPF program when we are about to close the BPF fd. This hence changes the bpf_program_cgroup_attach() function to track where we attached the program and changes bpf_program_cgroup_detach() to use this information. Moreover bpf_program_unref() will now implicitly call bpf_program_cgroup_detach(). In order to simplify things, bpf_program_cgroup_attach() will now implicitly invoke bpf_program_load_kernel() when necessary, simplifying the caller's side. Finally, this adds proper reference counting to BPF programs. This is useful for working with two BPF programs in parallel: the BPF program we are preparing for installation and the BPF program we so far installed, shortening the window when we detach the old one and reattach the new one.	2018-02-21 16:43:36 +01:00
Lennart Poettering	00f5ad93b5	core: change KeyringMode= to "shared" by default for non-service units in the system manager (#8172 ) Before this change all unit types would default to "private" in the system service manager and "inherit" to in the user service manager. With this change this is slightly altered: non-service units of the system service manager are now run with KeyringMode=shared. This appears to be the more appropriate choice as isolation is not as desirable for mount tools, which regularly consume key material. After all mounts are a shared resource themselves as they appear system-wide hence it makes a lot of sense to share their key material too. Fixes: #8159	2018-02-20 08:53:34 +01:00
Lennart Poettering	30663b6c25	Merge pull request #8199 from keszybz/small-things Sundry small cleanups	2018-02-19 16:55:10 +01:00
Zbigniew Jędrzejewski-Szmek	f4aa0bde1c	core: drop obsolete comment https://github.com/systemd/systemd/pull/8125#pullrequestreview-96894581	2018-02-19 15:18:54 +01:00
Lennart Poettering	a94ab7acfd	Merge pull request #8175 from keszybz/gc-cleanup Garbage collection cleanup	2018-02-15 17:47:37 +01:00
Zbigniew Jędrzejewski-Szmek	648461c07d	Merge pull request #8125 from poettering/cgroups-migrate Trivial merge conflict resolved locally.	2018-02-15 16:15:45 +01:00
Zbigniew Jędrzejewski-Szmek	1bdf279002	pid1: properly remove references to the unit from gc queue during final cleanup When various references to the unit were dropped during cleanup in unit_free(), add_to_gc_queue() could be called on this unit. If the unit was previously in the gc queue (at the time when unit_free() was called on it), this wouldn't matter, because it'd have in_gc_queue still set even though it was already removed from the queue. But if it wasn't set, then the unit could be added to the queue. Then after unit_free() would deallocate the unit, we would be left with a dangling pointer in gc_queue. A unit could be added to the gc queue in two places called from unit_free(): in the job_install calls, and in unit_ref_unset(). The first was OK, because it was above the LIST_REMOVE(gc_queue,...) call, but the second was not, because it was after that. Move the all LIST_REMOVE() calls down.	2018-02-15 14:03:53 +01:00
Zbigniew Jędrzejewski-Szmek	a946fa9bb9	pid1: free basic unit information at the very end, before freeing the unit We would free stuff like the names of the unit first, and then recurse into other structures to remove the unit from there. Technically this was OK, since the code did not access the name, but this makes debugging harder. And if any log messages are added in any of those functions, they are likely to access u->id and such other basic information about the unit. So let's move the removal of this "basic" information towards the end of unit_free().	2018-02-15 13:32:59 +01:00
Zbigniew Jędrzejewski-Szmek	2641f02e23	pid1: fix collection of cycles of units which reference one another A .socket will reference a .service unit, by registering a UnitRef with the .service unit. If this .service unit has the .socket unit listed in Wants or Sockets or such, a cycle will be created. We would not free this cycle properly, because we treated any unit with non-empty refs as uncollectable. To solve this issue, treats refs with UnitRef in u->refs_by_target similarly to the refs in u->dependencies, and check if the "other" unit is known to be needed. If it is not needed, do not treat the reference from it as preventing the unit we are looking at from being freed.	2018-02-15 13:32:53 +01:00
Zbigniew Jędrzejewski-Szmek	7f7d01ed58	pid1: include the source unit in UnitRef No functional change. The source unit manages the reference. It allocates the UnitRef structure and registers it in the target unit, and then the reference must be destroyed before the source unit is destroyed. Thus, is should be OK to include the pointer to the source unit, it should be live as long as the reference exists. v2: - rename refs to refs_by_target	2018-02-15 13:27:06 +01:00
Zbigniew Jędrzejewski-Szmek	f2f725e5cc	pid1: rename unit_check_gc to unit_may_gc "check" is unclear: what is true, what is false? Let's rename to "can_gc" and revert the return value ("positive" values are easier to grok). v2: - rename from unit_can_gc to unit_may_gc	2018-02-15 13:04:12 +01:00
Lennart Poettering	6592b9759c	core: add new new bus call for migrating foreign processes to scope/service units This adds a new bus call to service and scope units called AttachProcesses() that moves arbitrary processes into the cgroup of the unit. The primary user for this new API is systemd itself: the systemd --user instance uses this call of the systemd --system instance to migrate processes if itself gets the request to migrate processes and the kernel refuses this due to access restrictions. The primary use-case of this is to make "systemd-run --scope --user …" invoked from user session scopes work correctly on pure cgroupsv2 environments. There, the kernel refuses to migrate processes between two unprivileged-owned cgroups unless the requestor as well as the ownership of the closest parent cgroup all match. This however is not the case between the session-XYZ.scope unit of a login session and the user@ABC.service of the systemd --user instance. The new logic always tries to move the processes on its own, but if that doesn't work when being the user manager, then the system manager is asked to do it instead. The new operation is relatively restrictive: it will only allow to move the processes like this if the caller is root, or the UID of the target unit, caller and process all match. Note that this means that unprivileged users cannot attach processes to scope units, as those do not have "owning" users (i.e. they have now User= field). Fixes: #3388	2018-02-12 11:34:00 +01:00
Lennart Poettering	8559b3b75c	core: rework how we connect to the bus This removes the current bus_init() call, as it had multiple problems: it munged handling of the three bus connections we care about (private, "api" and system) into one, even though the conditions when which was ready are very different. It also added redundant logging, as the individual calls it called all logged on their own anyway. The three calls bus_init_api(), bus_init_private() and bus_init_system() are now made public. A new call manager_dbus_is_running() is added that works much like manager_journal_is_running() and is a lot more careful when checking whether dbus is around. Optionally it checks the unit's deserialized_state rather than state, in order to accomodate for cases where we cant to connect to the bus before deserializing the "subscribed" list, before coldplugging the units. manager_recheck_dbus() is added, that works a lot like manager_recheck_journal() and is invoked in unit_notify(), i.e. when units change state. All in all this should make handling a bit more alike to journal handling, and it also fixes one major bug: when running in user mode we'll now connect to the system bus early on, without conditionalizing this in anyway.	2018-02-12 11:34:00 +01:00
Lennart Poettering	004c7f169e	core: fold manager_set_exec_params() into unit_set_exec_params() Let's simplify things a bit: we so far called both functions every single time, let's just merge one into the other, so that we have fewer functions to call.	2018-02-12 11:34:00 +01:00
Lennart Poettering	1d9cc8768f	cgroup: add a new "can_delegate" flag to the unit vtable, and set it for scope and service units only Currently we allowed delegation for alluntis with cgroup backing except for slices. Let's make this a bit more strict for now, and only allow this in service and scope units. Let's also add a generic accessor unit_cgroup_delegate() for checking whether a unit has delegation turned on that checks the new bool first. Also, when doing transient units, let's explcitly refuse turning on delegation for unit types that don#t support it. This is mostly cosmetical as we wouldn't act on the delegation request anyway, but certainly helpful for debugging.	2018-02-12 11:34:00 +01:00
Lennart Poettering	548f69375e	tree-wide: use path_hash_ops instead of string_hash_ops whenever we key by a path Let's make use of our new hash_ops!	2018-02-12 11:07:55 +01:00
Franck Bui	9ea3a0e702	core: use id unit when retrieving unit file state (#8038 ) Previous code was using the basename(id->fragment_path) which returned incorrect result if the unit was an instance. For example, assuming that no instances of "template" have been created so far: $ systemctl enable template@1 Created symlink from /etc/systemd/system/multi-user.target.wants/template@1.service to /usr/lib/systemd/system/template@.service. $ systemctl is-enabled template@3.service disabled $ systemctl status template@3.service ● template@3.service - openQA Worker #3 Loaded: loaded (/usr/lib/systemd/system/template@.service; enabled; vendor preset: disabled) [...] Here the unit file states reported by "status" and "is-enabled" were different.	2018-02-07 14:08:02 +01:00
Andrei Gherzan	3f602115b7	core: Avoid empty directory warning when we are bind-mounting a file (#8069 )	2018-02-06 16:35:52 +01:00
Yu Watanabe	e8a565cb66	core: make ExecRuntime be manager managed object Before this, each ExecRuntime object is owned by a unit. However, it may be shared with other units which enable JoinsNamespaceOf=. Thus, by the serialization/deserialization process, its sharing information, more specifically, reference counter is lost, and causes issue #7790. This makes ExecRuntime objects be managed by manager, and changes the serialization/deserialization process. Fixes #7790.	2018-02-06 16:00:34 +09:00
Lennart Poettering	81e9871e87	selinux: make sure we never use /dev/null for making unit selinux access decisions	2018-01-31 19:54:25 +01:00
Lennart Poettering	adefcf2821	core: rework how we count the n_on_console counter Let's add a per-unit boolean that tells us whether our unit is currently counted or not. This way it's unlikely we get out of sync again and things are generally more robust. This also allows us to remove the counting logic specific to service units (which was in fact mostly a copy from the generic implementation), in favour of fully generic code. Replaces: #7824	2018-01-24 20:14:51 +01:00
Lennart Poettering	bb2c768545	core: add a new unit_needs_console() call This call determines whether a specific unit currently needs access to the console. It's a fancy wrapper around exec_context_may_touch_console() ultimately, however for service units we'll explicitly exclude the SERVICE_EXITED state from when we report true.	2018-01-24 19:54:26 +01:00
Lennart Poettering	62a769136d	core: rework how we track which PIDs to watch for a unit Previously, we'd maintain two hashmaps keyed by PIDs, pointing to Unit interested in SIGCHLD events for them. This scheme allowed a specific PID to be watched by exactly 0, 1 or 2 units. With this rework this is replaced by a single hashmap which is primarily keyed by the PID and points to a Unit interested in it. However, it optionally also keyed by the negated PID, in which case it points to a NULL terminated array of additional Unit objects also interested. This scheme means arbitrary numbers of Units may now watch the same PID. Runtime and memory behaviour should not be impact by this change, as for the common case (i.e. each PID only watched by a single unit) behaviour stays the same, but for the uncommon case (a PID watched by more than one unit) we only pay with a single additional memory allocation for the array. Why this all? Primarily, because allowing exactly two units to watch a specific PID is not sufficient for some niche cases, as processes can belong to more than one unit these days: 1. sd_notify() with MAINPID= can be used to attach a process from a different cgroup to multiple units. 2. Similar, the PIDFile= setting in unit files can be used for similar setups, 3. By creating a scope unit a main process of a service may join a different unit, too. 4. On cgroupsv1 we frequently end up watching all processes remaining in a scope, and if a process opens lots of scopes one after the other it might thus end up being watch by many of them. This patch hence removes the 2-unit-per-PID limit. It also makes a couple of other changes, some of them quite relevant: - manager_get_unit_by_pid() (and the bus call wrapping it) when there's ambiguity will prefer returning the Unit the process belongs to based on cgroup membership, and only check the watch-pids hashmap if that fails. This change in logic is probably more in line with what people expect and makes things more stable as each process can belong to exactly one cgroup only. - Every SIGCHLD event is now dispatched to all units interested in its PID. Previously, there was some magic conditionalization: the SIGCHLD would only be dispatched to the unit if it was only interested in a single PID only, or the PID belonged to the control or main PID or we didn't dispatch a signle SIGCHLD to the unit in the current event loop iteration yet. These rules were quite arbitrary and also redundant as the the per-unit handlers would filter the PIDs anyway a second time. With this change we'll hence relax the rules: all we do now is dispatch every SIGCHLD event exactly once to each unit interested in it, and it's up to the unit to then use or ignore this. We use a generation counter in the unit to ensure that we only invoke the unit handler once for each event, protecting us from confusion if a unit is both associated with a specific PID through cgroup membership and through the "watch_pids" logic. It also protects us from being confused if the "watch_pids" hashmap is altered while we are dispatching to it (which is a very likely case). - sd_notify() message dispatching has been reworked to be very similar to SIGCHLD handling now. A generation counter is used for dispatching as well. This also adds a new test that validates that "watch_pid" registration and unregstration works correctly.	2018-01-23 21:29:31 +01:00
Alan Jenkins	25cd49647c	mount: forbid mount on path with symlinks It was forbidden to create mount units for a symlink. But the reason is that the mount unit needs to know the real path that will appear in /proc/self/mountinfo. The kernel dereferences all the symlinks in the path at mount time (I checked this with `mount -c` running under `strace`). This will have no effect on most systems. As recommended by docs, most systems use /etc/fstab, as opposed to native mount unit files. fstab-generator dereferences symlinks for backwards compatibility. A relatively minor issue regarding Time Of Check / Time Of Use also exists here. I can't see how to get rid of it entirely. If we pass an absolute path to mount, the racing process can replace it with a symlink. If we chdir() to the mount point and pass ".", the racing process can move the directory. The latter might potentially be nicer, except that it breaks WorkingDirectory=. I'm not saying the race is relevant to security - I just want to consider how bad the effect is. Currently, it can make the mount unit active (and hence the job return success), despite there never being a matching entry in /proc/self/mountinfo. This wart will be removed in the next commit; i.e. it will make the mount unit fail instead.	2018-01-20 22:06:34 +00:00
Lennart Poettering	75152a4d6a	tree-wide: install matches asynchronously Let's remove a number of synchronization points from our service startups: let's drop synchronous match installation, and let's opt for asynchronous instead. Also, let's use sd_bus_match_signal() instead of sd_bus_add_match() where we can.	2018-01-05 13:58:32 +01:00
Lennart Poettering	4c253ed1ca	tree-wide: introduce new safe_fork() helper and port everything over This adds a new safe_fork() wrapper around fork() and makes use of it everywhere. The new wrapper does a couple of things we previously did manually and separately in a safer, more correct and automatic way: 1. Optionally resets signal handlers/mask in the child 2. Sets a name on all processes we fork off right after forking off (and the patch assigns useful names for all processes we fork off now, following a systematic naming scheme: always enclosed in () – in order to indicate that these are not proper, exec()ed processes, but only forked off children, and if the process is long-running with only our own code, without execve()'ing something else, it gets am "sd-" prefix.) 3. Optionally closes all file descriptors in the child 4. Optionally sets a PR_SET_DEATHSIG to SIGTERM in the child, in a safe way so that the parent dying before this happens being handled safely. 5. Optionally reopens the logs 6. Optionally connects stdin/stdout/stderr to /dev/null 7. Debug logs about the forked off processes.	2017-12-25 11:48:21 +01:00

1 2 3 4 5 ...

494 commits