Systemd

Author	SHA1	Message	Date
Zbigniew Jędrzejewski-Szmek	c7adcb1af9	core: do not "warn" about mundane emergency actions For example in a container we'd log: Oct 17 17:01:10 rawhide systemd[1]: Started Power-Off. Oct 17 17:01:10 rawhide systemd[1]: Forcibly powering off: unit succeeded Oct 17 17:01:10 rawhide systemd[1]: Reached target Power-Off. Oct 17 17:01:10 rawhide systemd[1]: Shutting down. and on the console we'd write (in red) [ !! ] Forcibly powering off: unit succeeded This is not useful in any way, and the fact that we're calling an "emergency action" is an internal implementation detail. Let's log about c-a-d and the watchdog actions only.	2018-10-17 19:32:09 +02:00
Zbigniew Jędrzejewski-Szmek	1710d4beff	core: limit service-watchdogs=no to actual "watchdog" commands The setting is now only looked at when considering an action for a job timeout or unit start limit. It is ignored for ctrl-alt-del, SuccessAction, SuccessFailure. v2: turn the parameter into a flag field v3: rename Options to Flags	2018-10-17 19:31:50 +02:00
Lennart Poettering	93d4cb09d5	core: fix unfortunate typo in unit_is_unneeded() Follow-up for `a3c1168ac2`.	2018-10-13 13:01:08 +02:00
Zbigniew Jędrzejewski-Szmek	f436470ae1	Merge pull request #10343 from poettering/manager-state-fix various fixes for PID1's Manager object	2018-10-10 12:36:16 +02:00
Lennart Poettering	3316429f19	Merge pull request #10062 from rgushchin/device Support cgroup v2 bpf-based device controller	2018-10-09 23:29:27 +02:00
Lennart Poettering	5f616d5feb	core: add missing 'continue' statement	2018-10-09 21:11:06 +02:00
Lennart Poettering	638cece45d	core: clean up test run flags Let's make them typesafe, and let's add a nice macro helper for checking if we are in a test run, which should make testing for this much easier to read for most cases.	2018-10-09 19:43:43 +02:00
Roman Gushchin	084c700780	core: support cgroup v2 device controller Cgroup v2 provides the eBPF-based device controller, which isn't currently supported by systemd. This commit aims to provide such support. There are no user-visible changes, just the device policy and whitelist start working if cgroup v2 is used.	2018-10-09 09:47:51 -07:00
Roman Gushchin	17f149556a	core: refactor bpf firewall support into a pseudo-controller The idea is to introduce a concept of bpf-based pseudo-controllers to make adding new bpf-based features easier.	2018-10-09 09:46:08 -07:00
Lennart Poettering	0e699122b7	core: properly serialize "in_audit" per-unit boolean Fixes: #9962	2018-10-09 10:09:39 +02:00
Lennart Poettering	256f65d045	core: rearrange conditions in unit_notify() a bit This shouldn't change control flow, with one exception: we won't send notifications for boot progress to plymouth anymore during reload, which is something we really shouldn't.	2018-10-09 10:09:39 +02:00
Lennart Poettering	334415b16e	Merge pull request #10094 from keszybz/wants-loading Fix bogus fragment paths in units in .wants/.requires	2018-10-05 17:36:31 +02:00
Anita Zhang	c87700a133	Make Watchdog Signal Configurable Allows configuring the watchdog signal (with a default of SIGABRT). This allows an alternative to SIGABRT when coredumps are not desirable. Appropriate references to SIGABRT or aborting were renamed to reflect more liberal watchdog signals. Closes #8658	2018-09-26 16:14:29 +02:00
Zbigniew Jędrzejewski-Szmek	23e8c79665	pid1: drop now-unused path parameter to resolve_template()	2018-09-15 20:03:32 +02:00
Zbigniew Jędrzejewski-Szmek	5a72417084	pid1: drop unused path parameter to add_two_dependencies_by_name()	2018-09-15 20:02:00 +02:00
Zbigniew Jędrzejewski-Szmek	35d8c19ace	pid1: drop now-unused path parameter to add_dependency_by_name()	2018-09-15 19:57:52 +02:00
Zbigniew Jędrzejewski-Szmek	fda09318e3	core: rename function to better reflect semantics	2018-08-20 10:43:31 +02:00
Lennart Poettering	a3c1168ac2	core: rework StopWhenUnneeded= logic Previously, we'd act immediately on StopWhenUnneeded= when a unit state changes. With this rework we'll maintain a queue instead: whenever there's the chance that StopWhenUneeded= might have an effect we enqueue the unit, and process it later when we have nothing better to do. This should make the implementation a bit more reliable, as the unit notify event cannot immediately enqueue tons of side-effect jobs that might contradict each other, but we do so only in a strictly ordered fashion, from the main event loop. This slightly changes the check when to consider a unit "unneeded". Previously, we'd assume that a unit in "deactivating" state could also be cleaned up. With this new logic we'll only consider units unneeded that are fully up and have no job queued. This means that whenever there's something pending for a unit we won't clean it up.	2018-08-10 16:19:01 +02:00
Yu Watanabe	fe65e88ba6	namespace: implicitly adds DeviceAllow= when RootImage= is set RootImage= may require the following settings ``` DeviceAllow=/dev/loop-control rw DeviceAllow=block-loop rwm DeviceAllow=block-blkext rwm ``` This adds the following settings implicitly when RootImage= is specified. Fixes #9737.	2018-08-06 14:02:31 +09:00
Jon Ringle	fbb48d4c66	Make final kill signal configurable Usecase is to allow changing the final kill from SIGKILL to SIGQUIT which should create a core dump useful for debugging why the service didn't stop with the SIGTERM	2018-07-23 13:44:54 +02:00
Chris Lamb	3fe910794b	Correct a number of trivial typos.	2018-06-18 22:44:44 +02:00
Lennart Poettering	0c69794138	tree-wide: remove Lennart's copyright lines These lines are generally out-of-date, incomplete and unnecessary. With SPDX and git repository much more accurate and fine grained information about licensing and authorship is available, hence let's drop the per-file copyright notice. Of course, removing copyright lines of others is problematic, hence this commit only removes my own lines and leaves all others untouched. It might be nicer if sooner or later those could go away too, making git the only and accurate source of authorship information.	2018-06-14 10:20:20 +02:00
Lennart Poettering	818bf54632	tree-wide: drop 'This file is part of systemd' blurb This part of the copyright blurb stems from the GPL use recommendations: https://www.gnu.org/licenses/gpl-howto.en.html The concept appears to originate in times where version control was per file, instead of per tree, and was a way to glue the files together. Ultimately, we nowadays don't live in that world anymore, and this information is entirely useless anyway, as people are very welcome to copy these files into any projects they like, and they shouldn't have to change bits that are part of our copyright header for that. hence, let's just get rid of this old cruft, and shorten our codebase a bit.	2018-06-14 10:20:20 +02:00
Lennart Poettering	6f40aa4547	core: add a couple of more error cases that should result in "bad-setting" This changes a number of EINVAL cases to ENOEXEC, so that we enter "bad-setting" state if they fail.	2018-06-11 12:53:12 +02:00
Lennart Poettering	c4555ad8f6	core: introduce a new load state "bad-setting" Since `bb28e68477` parsing failures of certain unit file settings will result in load failures of units. This introduces a new load state "bad-setting" that is entered in precisely this case. With this addition error messages on bad settings should be a lot more explicit, as we don't have to show some generic "errno" error in that case, but can explicitly say that a bad setting is at fault. Internally this unit load state is entered as soon as any configuration loader call returns ENOEXEC. Hence: config parser calls should return ENOEXEC now for such essential unit file settings. Turns out, they generally already do. Fixes: #9107	2018-06-11 12:53:12 +02:00
Lennart Poettering	f0831ed2a0	core: add a new unit method "catchup()" This is very similar to the existing unit method coldplug() but is called a bit later. The idea is that that coldplug() restores the unit state from before any prior reload/restart, i.e. puts the deserialized state in effect. The catchup() call is then called a bit later, to catch up with the system state for which we missed notifications while we were reloading. This is only really useful for mount, swap and device mount points were we should be careful to generate all missing unit state change events (i.e. call unit_notify() appropriately) for everything that happened while we were reloading.	2018-06-07 15:28:50 +02:00
Lennart Poettering	50be4f4a46	core: rework how we track service and scope PIDs This reworks how systemd tracks processes on cgroupv1 systems where cgroup notification is not reliable. Previously, whenever we had reason to believe that new processes showed up or got removed we'd scan the cgroup of the scope or service unit for new processes, and would tidy up the list of PIDs previously watched. This scanning is relatively slow, and does not scale well. With this change behaviour is changed: instead of scanning for new/removed processes right away we do this work in a per-unit deferred event loop job. This event source is scheduled at a very low priority, so that it is executed when we have time but does not starve other event sources. This has two benefits: this expensive work is coalesced, if events happen in quick succession, and we won't delay SIGCHLD handling for too long. This patch basically replaces all direct invocation of unit_watch_all_pids() in scope.c and service.c with invocations of the new unit_enqueue_rewatch_pids() call which just enqueues a request of watching/tidying up the PID sets (with one exception: in scope_enter_signal() and service_enter_signal() we'll still do unit_watch_all_pids() synchronously first, since we really want to know all processes we are about to kill so that we can track them properly. Moreover, all direct invocations of unit_tidy_watch_pids() and unit_synthesize_cgroup_empty_event() are removed too, when the unit_enqueue_rewatch_pids() call is invoked, as the queued job will run those operations too. All of this is done on cgroupsv1 systems only, and is disabled on cgroupsv2 systems as cgroup-empty notifications are reliable there, and we do not need SIGCHLD events to track processes there. Fixes: #9138	2018-06-05 22:06:48 +02:00
Zbigniew Jędrzejewski-Szmek	79e221d078	Merge pull request #9158 from poettering/notify-auto-reload trigger OnFailure= only if Restart= is not in effect	2018-06-05 13:51:07 +02:00
Zbigniew Jędrzejewski-Szmek	a1230ff972	basic/log: add the log_struct terminator to macro This way all callers do not need to specify it. Exhaustively tested by running test-log under valgrind ;)	2018-06-04 13:46:03 +02:00
Zbigniew Jędrzejewski-Szmek	d94a24ca2e	Add macro for checking if some flags are set This way we don't need to repeat the argument twice. I didn't replace all instances. I think it's better to leave out: - asserts - comparisons like x & y == x, which are mathematically equivalent, but here we aren't checking if flags are set, but if the argument fits in the flags.	2018-06-04 11:50:44 +02:00
Yu Watanabe	858d36c1ec	path-util: introduce path_simplify() The function is similar to path_kill_slashes() but also removes initial './', trailing '/.', and '/./' in the path. When the second argument of path_simplify() is false, then it behaves as the same as path_kill_slashes(). Hence, this also replaces path_kill_slashes() with path_simplify().	2018-06-03 23:39:26 +09:00
Lennart Poettering	2ad2e41a72	core: don't trigger OnFailure= deps when a unit is going to restart This adds a flags parameter to unit_notify() which can be used to pass additional notification information to the function. We the make the old reload_failure boolean parameter one of these flags, and then add a new flag that let's unit_notify() if we are configured to restart the service. Note that this adjusts behaviour of systemd to match what the docs say. Fixes: #8398	2018-06-01 19:08:30 +02:00
Lennart Poettering	7f66b026bb	core: when we can't enqueue OnFailure= job show full error message Let's ask for the full error message and show it, there's really no reason to just show the crappy errno error.	2018-06-01 19:04:37 +02:00
Lennart Poettering	6f8fa29465	Merge pull request #8981 from keszybz/ratelimit-and-dbus Ratelimit renaming and dbus error message fix	2018-05-18 21:38:30 +02:00
Felipe Sateler	57b7a260c2	core: undo the dependency inversion between unit.h and all unit types	2018-05-15 14:24:34 -04:00
Yu Watanabe	af4fa99d6a	core: use _cleanup_set_free_ instread of _cleanup_(set_freep)	2018-05-14 14:13:57 +09:00
Zbigniew Jędrzejewski-Szmek	7994ac1d85	Rename ratelimit_test to ratelimit_below When I see "test", I have to think three times what the return value means. With "below" this is immediately clear. ratelimit_below(&limit) sounds almost like English and is imho immediately obvious. (I also considered ratelimit_ok, but this strongly implies that being under the limit is somehow better. Most of the times this is true, but then we use the ratelimit to detect triple-c-a-d, and "ok" doesn't fit so well there.) C.f. `a1bcaa07`.	2018-05-13 22:08:30 +02:00
David Tardon	95f14a3e21	core: use automatic cleanup more	2018-05-12 18:29:41 +02:00
Lennart Poettering	d4fd1cf208	core: enforce that scope units can be started only once Scope units are populated from PIDs specified by the bus client. We do that when a scope is started. We really shouldn't allow scopes to be started multiple times, as the PIDs then might be heavily out of date. Moreover, clients should have the guarantee that any scope they allocate has a clear runtime cycle which is not repetitive.	2018-04-27 21:52:45 +02:00
Lennart Poettering	7a9a0c05d4	Merge pull request #8765 from poettering/test-fixes some short fixes for the tests	2018-04-19 16:18:46 +02:00
Lennart Poettering	5d13a15b1d	tree-wide: drop spurious newlines (#8764 ) Double newlines (i.e. one empty lines) are great to structure code. But let's avoid triple newlines (i.e. two empty lines), quadruple newlines, quintuple newlines, …, that's just spurious whitespace. It's an easy way to drop 121 lines of code, and keeps the coding style of our sources a bit tigther.	2018-04-19 12:13:23 +02:00
Lennart Poettering	8f63253149	core: don't export per-unit metadata files in test mode We shouldn't clobber the host's /run directories with metadata we export for our units when we run in test mode.	2018-04-19 11:30:18 +02:00
Lennart Poettering	4d09e1c8ba	Merge pull request #8676 from keszybz/drop-license-boilerplate Drop license boilerplate	2018-04-10 14:53:31 +02:00
Zbigniew Jędrzejewski-Szmek	e9e8cbc83a	core: minor comment update	2018-04-07 20:05:58 +02:00
Zbigniew Jędrzejewski-Szmek	11a1589223	tree-wide: drop license boilerplate Files which are installed as-is (any .service and other unit files, .conf files, .policy files, etc), are left as is. My assumption is that SPDX identifiers are not yet that well known, so it's better to retain the extended header to avoid any doubt. I also kept any copyright lines. We can probably remove them, but it'd nice to obtain explicit acks from all involved authors before doing that.	2018-04-06 18:58:55 +02:00
Yu Watanabe	1cc6c93a95	tree-wide: use TAKE_PTR() and TAKE_FD() macros	2018-04-05 14:26:26 +09:00
Michal Sekletar	19496554e2	core: delay adding target dependencies until all units are loaded and aliases resolved (#8381 ) Currently we add target dependencies while we are loading units. This can create ordering loops even if configuration doesn't contain any loop. Take for example following configuration, $ systemctl get-default multi-user.target $ cat /etc/systemd/system/test.service [Unit] After=default.target [Service] ExecStart=/bin/true [Install] WantedBy=multi-user.target If we encounter such unit file early during manager start-up (e.g. load queue is dispatched while enumerating devices due to SYSTEMD_WANTS in udev rules) we would add stub unit default.target and we order it Before test.service. At the same time we add implicit Before to multi-user.target. Later we merge two units and we create ordering cycle in the process. To fix the issue we will now never add any target dependencies until we loaded all the unit files and resolved all the aliases.	2018-03-23 15:28:06 +01:00
Lennart Poettering	ae2a15bc14	macro: introduce TAKE_PTR() macro This macro will read a pointer of any type, return it, and set the pointer to NULL. This is useful as an explicit concept of passing ownership of a memory area between pointers. This takes inspiration from Rust: https://doc.rust-lang.org/std/option/enum.Option.html#method.take and was suggested by Alan Jenkins (@sourcejedi). It drops ~160 lines of code from our codebase, which makes me like it. Also, I think it clarifies passing of ownership, and thus helps readability a bit (at least for the initiated who know the new macro)	2018-03-22 20:21:42 +01:00
Lennart Poettering	31dc1ca3bf	move MANAGER_IS_RELOADING() check into manager_recheck_{dbus\|journal}() (#8510 ) Let's better check this inside of the call than before it, so that we never issue this while reloading, even should these calls be called due to other reasons than just the unit notify. This makes sure the reload state is unset a bit earlier in manager_reload() so that we can safely call this function from there and they do the right thing. Follow-up for `e63ebf71ed`.	2018-03-21 12:03:45 +01:00
Evgeny Vereshchagin	e4711004d6	Merge pull request #8461 from keszybz/oss-fuzz-fixes Oss fuzz fixes	2018-03-19 00:06:44 +03:00
Zbigniew Jędrzejewski-Szmek	ca8700e922	core/unit: delay creating a stack variable until after length has been checked path_is_normalized() will reject paths longer than 4095 bytes, so it's better to not create a stack variable of unbounded size, but instead do the check first and only then do that allocation. Also use _cleanup_ to make things a bit shorter. https://oss-fuzz.com/v2/issue/5424177403133952/7000	2018-03-18 21:07:01 +01:00
Zbigniew Jędrzejewski-Szmek	e63ebf71ed	core: when reloading, delay any actions on journal and dbus connections manager_recheck_journal() and manager_recheck_dbus() would be called to early while we were deserialiazing units, before the systemd-journald.service and dbus.service have been deserialized. In effect we'd disable logging to the journald and close the bus connection. The first is not very noticable, it mostly means that logs emitted during deserialization are lost. The second is more noticeable, because manager_recheck_dbus() would call bus_done_api() and bus_done_system() and close dbus connections. Logging and bus connection would then be restored later after the respective units have been deserialized. This is easily reproduced by calling: $ sudo gdbus call --system --dest org.freedesktop.systemd1 --object-path /org/freedesktop/systemd1 --method "org.freedesktop.systemd1.Manager.Reload" which works fine before `8559b3b75c`, and then starts failing with: Error: GDBus.Error:org.freedesktop.DBus.Error.NoReply: Remote peer disconnected None of this should happen, and we should delay changing state until after deserialization is complete when reloading. manager_reload() already included the calls to manager_recheck_journal() and manager_recheck_dbus(), so the connection state will be updated after deserialization during reloading is done. Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1554578.	2018-03-16 23:14:04 +01:00
Zbigniew Jędrzejewski-Szmek	dc409696cf	Introduce _cleanup_(unit_freep)	2018-03-11 16:33:58 +01:00
Zbigniew Jędrzejewski-Szmek	bea28c5adb	core/unit: voidify one snprintf statement One more follow-up for `f810b631cd`.	2018-02-26 15:49:27 +01:00
Zbigniew Jędrzejewski-Szmek	f810b631cd	Revert "Replace use of snprintf with xsprintf" This reverts commit `a7419dbc59`. _All_ changes in that commit were wrong. Fixes #8211.	2018-02-23 00:13:52 +01:00
Lennart Poettering	aa2b6f1d2b	bpf: rework how we keep track and attach cgroup bpf programs So, the kernel's management of cgroup/BPF programs is a bit misdesigned: if you attach a BPF program to a cgroup and close the fd for it it will stay pinned to the cgroup with no chance of ever removing it again (or otherwise getting ahold of it again), because the fd is used for selecting which BPF program to detach. The only way to get rid of the program again is to destroy the cgroup itself. This is particularly bad for root the cgroup (and in fact any other cgroup that we cannot realistically remove during runtime, such as /system.slice, /init.scope or /system.slice/dbus.service) as getting rid of the program only works by rebooting the system. To counter this let's closely keep track to which cgroup a BPF program is attached and let's implicitly detach the BPF program when we are about to close the BPF fd. This hence changes the bpf_program_cgroup_attach() function to track where we attached the program and changes bpf_program_cgroup_detach() to use this information. Moreover bpf_program_unref() will now implicitly call bpf_program_cgroup_detach(). In order to simplify things, bpf_program_cgroup_attach() will now implicitly invoke bpf_program_load_kernel() when necessary, simplifying the caller's side. Finally, this adds proper reference counting to BPF programs. This is useful for working with two BPF programs in parallel: the BPF program we are preparing for installation and the BPF program we so far installed, shortening the window when we detach the old one and reattach the new one.	2018-02-21 16:43:36 +01:00
Lennart Poettering	00f5ad93b5	core: change KeyringMode= to "shared" by default for non-service units in the system manager (#8172 ) Before this change all unit types would default to "private" in the system service manager and "inherit" to in the user service manager. With this change this is slightly altered: non-service units of the system service manager are now run with KeyringMode=shared. This appears to be the more appropriate choice as isolation is not as desirable for mount tools, which regularly consume key material. After all mounts are a shared resource themselves as they appear system-wide hence it makes a lot of sense to share their key material too. Fixes: #8159	2018-02-20 08:53:34 +01:00
Lennart Poettering	30663b6c25	Merge pull request #8199 from keszybz/small-things Sundry small cleanups	2018-02-19 16:55:10 +01:00
Zbigniew Jędrzejewski-Szmek	f4aa0bde1c	core: drop obsolete comment https://github.com/systemd/systemd/pull/8125#pullrequestreview-96894581	2018-02-19 15:18:54 +01:00
Lennart Poettering	a94ab7acfd	Merge pull request #8175 from keszybz/gc-cleanup Garbage collection cleanup	2018-02-15 17:47:37 +01:00
Zbigniew Jędrzejewski-Szmek	648461c07d	Merge pull request #8125 from poettering/cgroups-migrate Trivial merge conflict resolved locally.	2018-02-15 16:15:45 +01:00
Zbigniew Jędrzejewski-Szmek	1bdf279002	pid1: properly remove references to the unit from gc queue during final cleanup When various references to the unit were dropped during cleanup in unit_free(), add_to_gc_queue() could be called on this unit. If the unit was previously in the gc queue (at the time when unit_free() was called on it), this wouldn't matter, because it'd have in_gc_queue still set even though it was already removed from the queue. But if it wasn't set, then the unit could be added to the queue. Then after unit_free() would deallocate the unit, we would be left with a dangling pointer in gc_queue. A unit could be added to the gc queue in two places called from unit_free(): in the job_install calls, and in unit_ref_unset(). The first was OK, because it was above the LIST_REMOVE(gc_queue,...) call, but the second was not, because it was after that. Move the all LIST_REMOVE() calls down.	2018-02-15 14:03:53 +01:00
Zbigniew Jędrzejewski-Szmek	a946fa9bb9	pid1: free basic unit information at the very end, before freeing the unit We would free stuff like the names of the unit first, and then recurse into other structures to remove the unit from there. Technically this was OK, since the code did not access the name, but this makes debugging harder. And if any log messages are added in any of those functions, they are likely to access u->id and such other basic information about the unit. So let's move the removal of this "basic" information towards the end of unit_free().	2018-02-15 13:32:59 +01:00
Zbigniew Jędrzejewski-Szmek	2641f02e23	pid1: fix collection of cycles of units which reference one another A .socket will reference a .service unit, by registering a UnitRef with the .service unit. If this .service unit has the .socket unit listed in Wants or Sockets or such, a cycle will be created. We would not free this cycle properly, because we treated any unit with non-empty refs as uncollectable. To solve this issue, treats refs with UnitRef in u->refs_by_target similarly to the refs in u->dependencies, and check if the "other" unit is known to be needed. If it is not needed, do not treat the reference from it as preventing the unit we are looking at from being freed.	2018-02-15 13:32:53 +01:00
Zbigniew Jędrzejewski-Szmek	7f7d01ed58	pid1: include the source unit in UnitRef No functional change. The source unit manages the reference. It allocates the UnitRef structure and registers it in the target unit, and then the reference must be destroyed before the source unit is destroyed. Thus, is should be OK to include the pointer to the source unit, it should be live as long as the reference exists. v2: - rename refs to refs_by_target	2018-02-15 13:27:06 +01:00
Zbigniew Jędrzejewski-Szmek	f2f725e5cc	pid1: rename unit_check_gc to unit_may_gc "check" is unclear: what is true, what is false? Let's rename to "can_gc" and revert the return value ("positive" values are easier to grok). v2: - rename from unit_can_gc to unit_may_gc	2018-02-15 13:04:12 +01:00
Lennart Poettering	6592b9759c	core: add new new bus call for migrating foreign processes to scope/service units This adds a new bus call to service and scope units called AttachProcesses() that moves arbitrary processes into the cgroup of the unit. The primary user for this new API is systemd itself: the systemd --user instance uses this call of the systemd --system instance to migrate processes if itself gets the request to migrate processes and the kernel refuses this due to access restrictions. The primary use-case of this is to make "systemd-run --scope --user …" invoked from user session scopes work correctly on pure cgroupsv2 environments. There, the kernel refuses to migrate processes between two unprivileged-owned cgroups unless the requestor as well as the ownership of the closest parent cgroup all match. This however is not the case between the session-XYZ.scope unit of a login session and the user@ABC.service of the systemd --user instance. The new logic always tries to move the processes on its own, but if that doesn't work when being the user manager, then the system manager is asked to do it instead. The new operation is relatively restrictive: it will only allow to move the processes like this if the caller is root, or the UID of the target unit, caller and process all match. Note that this means that unprivileged users cannot attach processes to scope units, as those do not have "owning" users (i.e. they have now User= field). Fixes: #3388	2018-02-12 11:34:00 +01:00
Lennart Poettering	8559b3b75c	core: rework how we connect to the bus This removes the current bus_init() call, as it had multiple problems: it munged handling of the three bus connections we care about (private, "api" and system) into one, even though the conditions when which was ready are very different. It also added redundant logging, as the individual calls it called all logged on their own anyway. The three calls bus_init_api(), bus_init_private() and bus_init_system() are now made public. A new call manager_dbus_is_running() is added that works much like manager_journal_is_running() and is a lot more careful when checking whether dbus is around. Optionally it checks the unit's deserialized_state rather than state, in order to accomodate for cases where we cant to connect to the bus before deserializing the "subscribed" list, before coldplugging the units. manager_recheck_dbus() is added, that works a lot like manager_recheck_journal() and is invoked in unit_notify(), i.e. when units change state. All in all this should make handling a bit more alike to journal handling, and it also fixes one major bug: when running in user mode we'll now connect to the system bus early on, without conditionalizing this in anyway.	2018-02-12 11:34:00 +01:00
Lennart Poettering	004c7f169e	core: fold manager_set_exec_params() into unit_set_exec_params() Let's simplify things a bit: we so far called both functions every single time, let's just merge one into the other, so that we have fewer functions to call.	2018-02-12 11:34:00 +01:00
Lennart Poettering	1d9cc8768f	cgroup: add a new "can_delegate" flag to the unit vtable, and set it for scope and service units only Currently we allowed delegation for alluntis with cgroup backing except for slices. Let's make this a bit more strict for now, and only allow this in service and scope units. Let's also add a generic accessor unit_cgroup_delegate() for checking whether a unit has delegation turned on that checks the new bool first. Also, when doing transient units, let's explcitly refuse turning on delegation for unit types that don#t support it. This is mostly cosmetical as we wouldn't act on the delegation request anyway, but certainly helpful for debugging.	2018-02-12 11:34:00 +01:00
Lennart Poettering	548f69375e	tree-wide: use path_hash_ops instead of string_hash_ops whenever we key by a path Let's make use of our new hash_ops!	2018-02-12 11:07:55 +01:00
Franck Bui	9ea3a0e702	core: use id unit when retrieving unit file state (#8038 ) Previous code was using the basename(id->fragment_path) which returned incorrect result if the unit was an instance. For example, assuming that no instances of "template" have been created so far: $ systemctl enable template@1 Created symlink from /etc/systemd/system/multi-user.target.wants/template@1.service to /usr/lib/systemd/system/template@.service. $ systemctl is-enabled template@3.service disabled $ systemctl status template@3.service ● template@3.service - openQA Worker #3 Loaded: loaded (/usr/lib/systemd/system/template@.service; enabled; vendor preset: disabled) [...] Here the unit file states reported by "status" and "is-enabled" were different.	2018-02-07 14:08:02 +01:00
Andrei Gherzan	3f602115b7	core: Avoid empty directory warning when we are bind-mounting a file (#8069 )	2018-02-06 16:35:52 +01:00
Yu Watanabe	e8a565cb66	core: make ExecRuntime be manager managed object Before this, each ExecRuntime object is owned by a unit. However, it may be shared with other units which enable JoinsNamespaceOf=. Thus, by the serialization/deserialization process, its sharing information, more specifically, reference counter is lost, and causes issue #7790. This makes ExecRuntime objects be managed by manager, and changes the serialization/deserialization process. Fixes #7790.	2018-02-06 16:00:34 +09:00
Lennart Poettering	81e9871e87	selinux: make sure we never use /dev/null for making unit selinux access decisions	2018-01-31 19:54:25 +01:00
Lennart Poettering	adefcf2821	core: rework how we count the n_on_console counter Let's add a per-unit boolean that tells us whether our unit is currently counted or not. This way it's unlikely we get out of sync again and things are generally more robust. This also allows us to remove the counting logic specific to service units (which was in fact mostly a copy from the generic implementation), in favour of fully generic code. Replaces: #7824	2018-01-24 20:14:51 +01:00
Lennart Poettering	bb2c768545	core: add a new unit_needs_console() call This call determines whether a specific unit currently needs access to the console. It's a fancy wrapper around exec_context_may_touch_console() ultimately, however for service units we'll explicitly exclude the SERVICE_EXITED state from when we report true.	2018-01-24 19:54:26 +01:00
Lennart Poettering	62a769136d	core: rework how we track which PIDs to watch for a unit Previously, we'd maintain two hashmaps keyed by PIDs, pointing to Unit interested in SIGCHLD events for them. This scheme allowed a specific PID to be watched by exactly 0, 1 or 2 units. With this rework this is replaced by a single hashmap which is primarily keyed by the PID and points to a Unit interested in it. However, it optionally also keyed by the negated PID, in which case it points to a NULL terminated array of additional Unit objects also interested. This scheme means arbitrary numbers of Units may now watch the same PID. Runtime and memory behaviour should not be impact by this change, as for the common case (i.e. each PID only watched by a single unit) behaviour stays the same, but for the uncommon case (a PID watched by more than one unit) we only pay with a single additional memory allocation for the array. Why this all? Primarily, because allowing exactly two units to watch a specific PID is not sufficient for some niche cases, as processes can belong to more than one unit these days: 1. sd_notify() with MAINPID= can be used to attach a process from a different cgroup to multiple units. 2. Similar, the PIDFile= setting in unit files can be used for similar setups, 3. By creating a scope unit a main process of a service may join a different unit, too. 4. On cgroupsv1 we frequently end up watching all processes remaining in a scope, and if a process opens lots of scopes one after the other it might thus end up being watch by many of them. This patch hence removes the 2-unit-per-PID limit. It also makes a couple of other changes, some of them quite relevant: - manager_get_unit_by_pid() (and the bus call wrapping it) when there's ambiguity will prefer returning the Unit the process belongs to based on cgroup membership, and only check the watch-pids hashmap if that fails. This change in logic is probably more in line with what people expect and makes things more stable as each process can belong to exactly one cgroup only. - Every SIGCHLD event is now dispatched to all units interested in its PID. Previously, there was some magic conditionalization: the SIGCHLD would only be dispatched to the unit if it was only interested in a single PID only, or the PID belonged to the control or main PID or we didn't dispatch a signle SIGCHLD to the unit in the current event loop iteration yet. These rules were quite arbitrary and also redundant as the the per-unit handlers would filter the PIDs anyway a second time. With this change we'll hence relax the rules: all we do now is dispatch every SIGCHLD event exactly once to each unit interested in it, and it's up to the unit to then use or ignore this. We use a generation counter in the unit to ensure that we only invoke the unit handler once for each event, protecting us from confusion if a unit is both associated with a specific PID through cgroup membership and through the "watch_pids" logic. It also protects us from being confused if the "watch_pids" hashmap is altered while we are dispatching to it (which is a very likely case). - sd_notify() message dispatching has been reworked to be very similar to SIGCHLD handling now. A generation counter is used for dispatching as well. This also adds a new test that validates that "watch_pid" registration and unregstration works correctly.	2018-01-23 21:29:31 +01:00
Alan Jenkins	25cd49647c	mount: forbid mount on path with symlinks It was forbidden to create mount units for a symlink. But the reason is that the mount unit needs to know the real path that will appear in /proc/self/mountinfo. The kernel dereferences all the symlinks in the path at mount time (I checked this with `mount -c` running under `strace`). This will have no effect on most systems. As recommended by docs, most systems use /etc/fstab, as opposed to native mount unit files. fstab-generator dereferences symlinks for backwards compatibility. A relatively minor issue regarding Time Of Check / Time Of Use also exists here. I can't see how to get rid of it entirely. If we pass an absolute path to mount, the racing process can replace it with a symlink. If we chdir() to the mount point and pass ".", the racing process can move the directory. The latter might potentially be nicer, except that it breaks WorkingDirectory=. I'm not saying the race is relevant to security - I just want to consider how bad the effect is. Currently, it can make the mount unit active (and hence the job return success), despite there never being a matching entry in /proc/self/mountinfo. This wart will be removed in the next commit; i.e. it will make the mount unit fail instead.	2018-01-20 22:06:34 +00:00
Lennart Poettering	75152a4d6a	tree-wide: install matches asynchronously Let's remove a number of synchronization points from our service startups: let's drop synchronous match installation, and let's opt for asynchronous instead. Also, let's use sd_bus_match_signal() instead of sd_bus_add_match() where we can.	2018-01-05 13:58:32 +01:00
Lennart Poettering	4c253ed1ca	tree-wide: introduce new safe_fork() helper and port everything over This adds a new safe_fork() wrapper around fork() and makes use of it everywhere. The new wrapper does a couple of things we previously did manually and separately in a safer, more correct and automatic way: 1. Optionally resets signal handlers/mask in the child 2. Sets a name on all processes we fork off right after forking off (and the patch assigns useful names for all processes we fork off now, following a systematic naming scheme: always enclosed in () – in order to indicate that these are not proper, exec()ed processes, but only forked off children, and if the process is long-running with only our own code, without execve()'ing something else, it gets am "sd-" prefix.) 3. Optionally closes all file descriptors in the child 4. Optionally sets a PR_SET_DEATHSIG to SIGTERM in the child, in a safe way so that the parent dying before this happens being handled safely. 5. Optionally reopens the logs 6. Optionally connects stdin/stdout/stderr to /dev/null 7. Debug logs about the forked off processes.	2017-12-25 11:48:21 +01:00
Lennart Poettering	a8ea93a5e2	core: use empty_to_null() where we can	2017-12-07 12:13:00 +01:00
Michal Koutný	deb4e7080d	service: Don't stop unneeded units needed by restarted service (#7526 ) An auto-restarted unit B may depend on unit A with StopWhenUnneeded=yes. If A stops before B's restart timeout expires, it'll be started again as part of B's dependent jobs. However, if stopping takes longer than the timeout, B's running stop job collides start job which also cancels B's start job. Result is that neither A or B are active. Currently, when a service with automatic restarting fails, it transitions through following states: 1) SERVICE_FAILED or SERVICE_DEAD to indicate the failure, 2) SERVICE_AUTO_RESTART while restart timer is running. The StopWhenUnneeded= check takes place in service_enter_dead between the two state mentioned above. We temporarily store the auto restart flag to query it during the check. Because we don't return control to the main event loop, this new service unit flag needn't be serialized. This patch prevents the pathologic situation when the service with Restart= won't restart automatically. As a side effect it also avoid restarting the dependency unit with StopWhenUnneeded=yes. Fixes: #7377	2017-12-05 16:51:19 +01:00
Lennart Poettering	50fb00b707	core: use safe_fclose() where we can	2017-11-29 12:34:12 +01:00
Lennart Poettering	45639f1be5	core: never remove "transient" and "control" directories from unit search path This changes the unit search path logic to never drop the transient and control directories from the unit search path. This is necessary as we add new entries to both during runtime, due to the "systemctl set-property" and transient unit logic. Previously, the "transient" directory was created during early boot to deal with this, but the "control" directories were not covered like that. Creating the control directories early at boot is not possible however, as /etc might be read-only then, and we do define a persistent control directory. Hence, let's create these dirs on-demand when we need them, and make sure the search path clean-up logic never drops them from the search path even if they are initially missing. (Also, always create these paths properly labelled)	2017-11-29 12:34:12 +01:00
Lennart Poettering	0126c8f3f6	core: minor simplification	2017-11-29 12:34:12 +01:00
Lennart Poettering	2e59b241ca	core: add proper escaping to writing of drop-ins/transient unit files This majorly refactors the transient unit file and drop-in writing logic, so that we properly C-escape and specifier-escape (% → %%) everything we write out, so that when we read it back again, specifiers are parsed that aren't supposed to be parsed. This renames unit_write_drop_in() and friends by unit_write_setting(). The name change is supposed to clarify that the functions are not only used to write drop-in files, but also transient unit files. The previous "mode" parameter to this function is replaced by a more generic "flags", which knows additional flags for implicit C-style and specifier escaping before writing things out. This can cover most properties where either form of escaping is defined. For the cases where this isn't sufficient, we add helpers unit_escape_setting() and unit_concat_strv() for escaping individual strings or strvs properly. While we are at it, we also prettify generation of transient unit files: we try to reduce the number of section headers written out: previously we'd write the right section header our for each setting. With this change we do so only if the setting lives in a different section than the one before. (This should also be considered preparation for when we add proper APIs to systemd to write normal, persistant unit files through the bus API)	2017-11-29 12:34:12 +01:00
Lennart Poettering	a4634b214c	core: warn about left-over processes in cgroup on unit start Now that we don't kill control processes anymore, let's at least warn about any processes left-over in the unit cgroup at the moment of starting the unit.	2017-11-25 17:08:21 +01:00
Lennart Poettering	e98b2fbbe9	core: generalize the cgroup empty check on GC Let's move the cgroup empty check for all unit types into the generic unit_check_gc() call, out of the per-unit-type _check_gc() type. This not only allows us to share some code, but also hooks up mount and socket units with this kind of check, for free, as it was missing there previously.	2017-11-25 17:08:21 +01:00
Lennart Poettering	60c728adf7	unit: initialize bpf cgroup realization state properly Before this patch, the bpf cgroup realization state was implicitly set to "NO", meaning that the bpf configuration was realized but was turned off. That means invalidation requests for the bpf stuff (which we issue in blanket fashion when doing a daemon reload) would actually later result in a us re-realizing the unit, under the assumption it was already realized once, even though in reality it never was realized before. This had the effect that after each daemon-reload we'd end up realizing all defined units, even the unloaded ones, populating cgroupfs with lots of unneeded empty cgroups. With this fix we properly set the realiazation state to "INVALIDATED", i.e. indicating the bpf stuff was never set up for the unit, and hence when we try to invalidate it later we won't do anything.	2017-11-25 17:08:21 +01:00
Daniel Lockyer	a7419dbc59	Replace use of snprintf with xsprintf	2017-11-24 10:36:04 +00:00
Zbigniew Jędrzejewski-Szmek	ffb70e4424	Merge pull request #7381 from poettering/cgroup-unified-delegate-rework Fix delegation in the unified hierarchy + more cgroup work	2017-11-22 07:42:08 +01:00
Lennart Poettering	3c7416b6ca	core: unify common code for preparing for forking off unit processes This introduces a new function unit_prepare_exec() that encapsulates a number of calls we do in preparation for spawning off some processes in all our unit types that do so. This allows us to neatly unify a bit of code between unit types and shorten our code.	2017-11-21 11:54:08 +01:00
Lennart Poettering	e7dfbb4e74	core: introduce SuccessAction= as unit file property SuccessAction= is similar to FailureAction= but declares what to do on success of a unit, rather than on failure. This is useful for running commands in qemu/nspawn images, that shall power down on completion. We frequently see "ExecStopPost=/usr/bin/systemctl poweroff" or so in unit files like this. Offer a simple, more declarative alternative for this. While we are at it, hook up failure action with unit_dump() and transient units too.	2017-11-20 16:37:22 +01:00
Lennart Poettering	53c35a766f	core: generalize FailureAction= move it from service to unit All kinds of units can fail, hence it makes sense to offer this as generic concept for all unit types.	2017-11-20 16:37:22 +01:00
Lennart Poettering	0133d5553a	Merge pull request #7198 from poettering/stdin-stdout Add StandardInput=data, StandardInput=file:... and more	2017-11-19 19:49:11 +01:00
Zbigniew Jędrzejewski-Szmek	53e1b68390	Add SPDX license identifiers to source files under the LGPL This follows what the kernel is doing, c.f. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.	2017-11-19 19:08:15 +01:00
Lennart Poettering	99be45a46f	fs-util: rename path_is_safe() → path_is_normalized() Already, path_is_safe() refused paths container the "." dir. Doing that isn't strictly necessary to be "safe" by most definitions of the word. But it is necessary in order to consider a path "normalized". Hence, "path_is_safe()" is slightly misleading a name, but "path_is_normalize()" is more descriptive, hence let's rename things accordingly. No functional changes.	2017-11-17 11:13:44 +01:00
Lennart Poettering	5afe510c89	core: add a new unit file setting CollectMode= for tweaking the GC logic Right now, the option only takes one of two possible values "inactive" or "inactive-or-failed", the former being the default, and exposing same behaviour as the status quo ante. If set to "inactive-or-failed" units may be collected by the GC logic when in the "failed" state too. This logic should be a nicer alternative to using the "-" modifier for ExecStart= and friends, as the exit data is collected and logged about and only removed when the GC comes along. This should be useful in particular for per-connection socket-activated services, as well as "systemd-run" command lines that shall leave no artifacts in the system. I was thinking about whether to expose this as a boolean, but opted for an enum instead, as I have the suspicion other tweaks like this might be a added later on, in which case we extend this setting instead of having to add yet another one. Also, let's add some documentation for the GC logic.	2017-11-16 14:38:36 +01:00
Lennart Poettering	7eb2a8a125	unit: rework a bit how we keep the service fdstore from being destroyed during service restart When preparing for a restart we quickly go through the DEAD/INACTIVE service state before entering AUTO_RESTART. When doing this, we need to make sure we don't destroy the FD store. Previously this was done by checking the failure state of the unit, and keeping the FD store around when the unit failed, under the assumption that the restart logic will then get into action. This is not entirely correct howver, as there might be failure states that will no result in restarts. With this commit we slightly alter the logic: a ref counter for the fd store is added, that is increased right before we handle the restart logic, and decreased again right-after. This should ensure that the fdstore lives exactly as long as it needs. Follow-up for `f0bfbfac43`.	2017-11-16 14:37:33 +01:00
Lennart Poettering	d3070fbdf6	core: implement /run/systemd/units/-based path for passing unit info from PID 1 to journald And let's make use of it to implement two new unit settings with it: 1. LogLevelMax= is a new per-unit setting that may be used to configure log priority filtering: set it to LogLevelMax=notice and only messages of level "notice" and lower (i.e. more important) will be processed, all others are dropped. 2. LogExtraFields= is a new per-unit setting for configuring per-unit journal fields, that are implicitly included in every log record generated by the unit's processes. It takes field/value pairs in the form of FOO=BAR. Also, related to this, one exisiting unit setting is ported to this new facility: 3. The invocation ID is now pulled from /run/systemd/units/ instead of cgroupfs xattrs. This substantially relaxes requirements of systemd on the kernel version and the privileges it runs with (specifically, cgroupfs xattrs are not available in containers, since they are stored in kernel memory, and hence are unsafe to permit to lesser privileged code). /run/systemd/units/ is a new directory, which contains a number of files and symlinks encoding the above information. PID 1 creates and manages these files, and journald reads them from there. Note that this is supposed to be a direct path between PID 1 and the journal only, due to the special runtime environment the journal runs in. Normally, today we shouldn't introduce new interfaces that (mis-)use a file system as IPC framework, and instead just an IPC system, but this is very hard to do between the journal and PID 1, as long as the IPC system is a subject PID 1 manages, and itself a client to the journal. This patch cleans up a couple of types used in journal code: specifically we switch to size_t for a couple of memory-sizing values, as size_t is the right choice for everything that is memory. Fixes: #4089 Fixes: #3041 Fixes: #4441	2017-11-16 12:40:17 +01:00
Lennart Poettering	0263828039	core: rework the Delegate= unit file setting to take a list of controller names Previously it was not possible to select which controllers to enable for a unit where Delegate=yes was set, as all controllers were enabled. With this change, this is made configurable, and thus delegation units can pick specifically what they want to manage themselves, and what they don't care about.	2017-11-13 10:49:15 +01:00
Lennart Poettering	c999cf385a	core: add internal API to remove dependencies again, based on dependency mask let's make use of the dependency mask, and add internal API to remove dependencies ago, based on bits in the dependency mask.	2017-11-10 19:45:29 +01:00
Lennart Poettering	eef85c4a3f	core: track why unit dependencies came to be This replaces the dependencies Set* objects by Hashmap* objects, where the key is the depending Unit, and the value is a bitmask encoding why the specific dependency was created. The bitmask contains a number of different, defined bits, that indicate why dependencies exist, for example whether they are created due to explicitly configured deps in files, by udev rules or implicitly. Note that memory usage is not increased by this change, even though we store more information, as we manage to encode the bit mask inside the value pointer each Hashmap entry contains. Why this all? When we know how a dependency came to be, we can update dependencies correctly when a configuration source changes but others are left unaltered. Specifically: 1. We can fix UDEV_WANTS dependency generation: so far we kept adding dependencies configured that way, but if a device lost such a dependency we couldn't them again as there was no scheme for removing of dependencies in place. 2. We can implement "pin-pointed" reload of unit files. If we know what dependencies were created as result of configuration in a unit file, then we know what to flush out when we want to reload it. 3. It's useful for debugging: "systemd-analyze dump" now shows this information, helping substantially with understanding how systemd's dependency tree came to be the way it came to be.	2017-11-10 19:45:29 +01:00
Lubomir Rintel	19a44dfe45	core: fragments of masked units ought not be considered for NeedDaemonReload (#7060 ) The units that are not loaded don't have dropin_paths set. This currently results in units that have fragments to always have NeedDaemonReload=true when masked: $ find {/usr/lib,/run/user/8086}/systemd/user/meh.service* \|xargs ls -ld lrwxrwxrwx. 1 lkundrak lkundrak 9 Oct 11 11:19 /run/user/8086/systemd/user/meh.service -> /dev/null -rw-rw-r--. 1 root root 49 Oct 11 10:16 /usr/lib/systemd/user/meh.service drwxrwxr-x. 2 root root 4096 Oct 11 10:50 /usr/lib/systemd/user/meh.service.d -rw-rw-r--. 1 root root 666 Oct 11 10:50 /usr/lib/systemd/user/meh.service.d/override.conf $ systemctl --user daemon-reload $ busctl --user get-property org.freedesktop.systemd1 \ /org/freedesktop/systemd1/unit/meh_2eservice \ org.freedesktop.systemd1.Unit NeedDaemonReload b true	2017-10-18 08:38:50 +02:00
Yu Watanabe	4c70109600	tree-wide: use IN_SET macro (#6977 )	2017-10-04 16:01:32 +02:00
Lennart Poettering	72fd17682d	core: usually our enum's _INVALID and _MAX special values are named after the full type In most cases we followed the rule that the special _INVALID and _MAX values we use in our enums use the full type name as prefix (in contrast to regular values that we often make shorter), do so for ExecDirectoryType as well. No functional changes, just a little bit of renaming to make this code more like the rest.	2017-10-02 17:41:43 +02:00
Andreas Rammhold	ec2ce0c5d7	tree-wide: use `!IN_SET(..)` for `a != b && a != c && …` The included cocci was used to generate the changes. Thanks to @flo-wer for pointing this case out.	2017-10-02 13:09:56 +02:00
Andreas Rammhold	3742095b27	tree-wide: use IN_SET where possible In addition to the changes from #6933 this handles cases that could be matched with the included cocci file.	2017-10-02 13:09:54 +02:00
Lennart Poettering	ed77d407d3	core: log unit failure with type-specific result code This slightly changes how we log about failures. Previously, service_enter_dead() would log that a service unit failed along with its result code, and unit_notify() would do this again but without the result code. For other unit types only the latter would take effect. This cleans this up: we keep the message in unit_notify() only for debug purposes, and add type-specific log lines to all our unit types that can fail, and always place them before unit_notify() is invoked. Or in other words: the duplicate log message for service units is removed, and all other unit types get a more useful line with the precise result code.	2017-09-27 18:26:18 +02:00
Lennart Poettering	84b26d5149	core: free_and_strdup() FTW!	2017-09-27 18:26:18 +02:00
Lennart Poettering	09e2465407	cgroup: after determining that a cgroup is empty, asynchronously dispatch this This makes sure that if we learn via inotify or another event source that a cgroup is empty, and we checked that this is indeed the case (as we might get spurious notifications through inotify, as the inotify logic through the "cgroups.event" is pretty unspecific and might be trigger for a variety of reasons), then we'll enqueue a defer event for it, at a priority lower than SIGCHLD handling, so that we know for sure that if there's waitid() data for a process we used it before considering the cgroup empty notification. Fixes: #6608	2017-09-27 18:26:18 +02:00
Lennart Poettering	91a6073ef7	core: rename cgroup_queue → cgroup_realize_queue We are about to add second cgroup-related queue, called "cgroup_empty_queue", hence let's rename "cgroup_queue" to "cgroup_realize_queue" (as that is its purpose) to minimize confusion about the two queues. Just a rename, no functional changes.	2017-09-27 17:59:25 +02:00
Zbigniew Jędrzejewski-Szmek	2e4025c0f9	core/cgroup: add a helper macro for a common pattern (#6926 )	2017-09-27 17:54:06 +02:00
Jan Synacek	0cde65e263	test-cpu-set-util.c: fix typo in comment (#6916 )	2017-09-26 16:07:34 +02:00
Lennart Poettering	7960b0c704	cgroup: make use of unit_cgroup_delegate() where useful It's an easy-to-use wrapper, so let's take benefit of it.	2017-09-22 20:02:23 +02:00
Lennart Poettering	915b1d0174	core: whenever a unit terminates, log its consumed resources to the journal This adds a new recognizable log message for each unit invocation that contains structured information about consumed resources of the unit as a whole after it terminated. This is particular useful for apps that want to figure out what the resource consumption of a unit given a specific invocation ID was. The log message is only generated for units that have at least one XyzAccounting= property turned on, and currently only covers IP traffic and CPU time metrics.	2017-09-22 15:28:05 +02:00
Lennart Poettering	f1c50becda	core: make sure to log invocation ID of units also when doing structured logging	2017-09-22 15:24:55 +02:00
Lennart Poettering	58d83430e1	core: when coming back from reload/reexec, reapply all cgroup properties With this change we'll invalidate all cgroup settings after coming back from a daemon reload/reexec, so that the new settings are instantly applied. This is useful for the BPF case, because we don't serialize/deserialize the BPF program fd, and hence have to install a new, updated BPF program when coming back from the reload/reexec. However, this is also useful for the rest of the cgroup settings, as it ensures that user configuration really takes effect wherever we can.	2017-09-22 15:24:55 +02:00
Lennart Poettering	6b659ed87e	core: serialize/deserialize IP accounting across daemon reload/reexec Make sure the current IP accounting counters aren't lost during reload/reexec. Note that we destroy all BPF file objects during a reload: the BPF programs, the access and the accounting maps. The former two need to be regenerated anyway with the newly loaded configuration data, but the latter one needs to survive reloads/reexec. In this implementation I opted to only save/restore the accounting map content instead of the map itself. While this opens a (theoretic) window where IP traffic is still accounted to the old map after we read it out, and we thus miss a few bytes this has the benefit that we can alter the map layout between versions should the need arise.	2017-09-22 15:24:55 +02:00
Lennart Poettering	a79279c7fd	core: when creating the socket fds for a socket unit, join socket's cgroup first Let's make sure that a socket unit's IPAddressAllow=/IPAddressDeny= settings are in effect on all socket fds associated with it. In order to make this happen we need to make sure the cgroup the fds are associated with are the socket unit's cgroup. The only way to do that is invoking socket()+accept() in them. Since we really don't want to migrate PID 1 around we do this by forking off a helper process, which invokes socket()/accept() and sends the newly created fd to PID 1. Ugly, but works, and there's apparently no better way right now. This generalizes forking off per-unit helper processes in a new function unit_fork_helper_process(), which is then also used by the NSS chown() code of socket units.	2017-09-22 15:24:55 +02:00
Daniel Mack	377bfd2d49	manager: hook up IP accounting defaults	2017-09-22 15:24:55 +02:00
Daniel Mack	906c06f64a	cgroup, unit, fragment parser: make use of new firewall functions	2017-09-22 15:24:55 +02:00
Daniel Mack	6a48d82f02	cgroup: add fields to accommodate eBPF related details Add pointers for compiled eBPF programs as well as list heads for allowed and denied hosts for both directions.	2017-09-22 15:24:54 +02:00
Lennart Poettering	b1edf4456e	core: add new per-unit setting KeyringMode= for controlling kernel keyring setup Usually, it's a good thing that we isolate the kernel session keyring for the various services and disconnect them from the user keyring. However, in case of the cryptsetup key caching we actually want that multiple instances of the cryptsetup service can share the keys in the root user's user keyring, hence we need to be able to disable this logic for them. This adds KeyringMode=inherit\|private\|shared: inherit: don't do any keyring magic (this is the default in systemd --user) private: a private keyring as before (default in systemd --system) shared: the new setting	2017-09-15 16:53:35 +02:00
JÃÂ©rÃÂ©my Rosen	f54bcca5c1	unit : allow any unit which propagates reloads to be reloaded	2017-09-10 18:53:26 +02:00
Yu Watanabe	ada5e27657	core: StateDirectory= and friends imply RequiresMountsFor=	2017-08-31 18:19:35 +09:00
Lennart Poettering	f0d477979e	core: introduce unit_set_exec_params() The new unit_set_exec_params() call is to units what manager_set_exec_params() is to the manager object: it initializes the various fields from the relevant generic properties set.	2017-08-10 15:02:50 +02:00
Zbigniew Jędrzejewski-Szmek	0742986650	core: properly handle deserialization of unknown unit types (#6476 ) We just abort startup, without printing any error. Make sure we always print something, and when we cannot deserialize some unit, just ignore it and continue. Fixup for `4bc5d27b94`. Without this, we would hang in daemon-reexec after upgrade.	2017-07-31 08:05:35 +02:00
Martin Pitt	9fcaa574f0	Merge pull request #6465 from keszybz/drop-kdbus Drop kdbus-dependent code	2017-07-28 09:29:07 +02:00
Zbigniew Jędrzejewski-Szmek	4bc5d27b94	Drop busname unit type Since busname units are only useful with kdbus, they weren't actively used. This was dead code, only compile-tested. If busname units are ever added back, it'll be cleaner to start from scratch (possibly reverting parts of this patch).	2017-07-23 09:29:02 -04:00
Zbigniew Jędrzejewski-Szmek	9e4ea9cc34	Revert "core: don't load dropin data multiple times for the same unit (#5139 )" This reverts commit `2d058a87ff`. When we add another name to a unit (by following an alias), we need to reload all drop-ins. This is necessary to load any additional dropins found in the dirs created from the alias name. Fixes #6334.	2017-07-22 16:03:00 -04:00
Zbigniew Jędrzejewski-Szmek	13ddc3fc2b	systemd: do not stop units bound to inactive units while coldplugging (#6316 ) When running systemd-analyze verify I would get a random subset of warnings (sometimes none, sometimes one or two): dev-mapper-luks\x2d8db85dcf\x2d6230\x2d4e88\x2d940d\x2dba176d062b31.swap: Unit is bound to inactive unit dev-mapper-luks\x2d8db85dcf\x2d6230\x2d4e88\x2d940d\x2dba176d062b31.device. Stopping, too. home.mount: Unit is bound to inactive unit dev-disk-by\x2duuid-75751556\x2d6e31\x2d438b\x2d99c9\x2dd626330d9a1b.device. Stopping, too. boot.mount: Unit is bound to inactive unit dev-disk-by\x2duuid-56c56bfd\x2d93f0\x2d48fb\x2dbc4b\x2d90aa67144ea5.device. Stopping, too. When running with debug on, it's pretty obvious what is happening: home.mount: Changed dead -> mounted home.mount: Unit is bound to inactive unit dev-disk-by\x2duuid-75751556\x2d6e31\x2d438b\x2d99c9\x2dd626330d9a1b.device. Stopping, too. home.mount: Trying to enqueue job home.mount/stop/fail home.mount: Installed new job home.mount/stop as 27 home.mount: Enqueued job home.mount/stop as 27 ... dev-disk-by\x2duuid-75751556\x2d6e31\x2d438b\x2d99c9\x2dd626330d9a1b.device: Installed new job dev-disk-by\x2duuid-75751556\x2d6e31\x2d438b\x2d99c9\x2dd626330d9a1b.device/start as 47 dev-disk-by\x2duuid-75751556\x2d6e31\x2d438b\x2d99c9\x2dd626330d9a1b.device: Changed dead -> plugged dev-disk-by\x2duuid-75751556\x2d6e31\x2d438b\x2d99c9\x2dd626330d9a1b.device: Job dev-disk-by\x2duuid-75751556\x2d6e31\x2d438b\x2d99c9\x2dd626330d9a1b.device/start finished, result=done Fixes #2206, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=808151.	2017-07-11 10:45:03 +02:00
Michal Koutný	b007626897	core: dbus: Interpret released names properly (#6175 ) When a DBus name is released, NameOwnerChanged signal contains an empty string as new_owner. Commit `bbc2908` changed interpretation of the empty string to a valid name, which is not consistent with values that are sent by dbus-daemon. As a side effect, this masks symptoms of systemd-logind dbus disconnections (#2925) by completely restarting it so it can freshly reconnect to dbus.	2017-06-22 20:26:04 -04:00
Franck Bui	8b108bd0ef	core: when deserializing a unit, fully restore its cgroup state The state of a unit was not fully restored, especially the "cgroup_realized_mask/cgroup_enabled_mask" fields were missing. This could be seen with the following sequence: $ systemctl show -p TasksCurrent sshd TasksCurrent=1 $ systemctl daemon-reload $ systemctl show -p TasksCurrent sshd TasksCurrent=18446744073709551615 This was also visible with the "status" command: "Tasks: " row wasn't showed in status of a service after a "daemon-reload" command.	2017-05-04 09:41:23 +02:00
Franck Bui	aae7e17f9c	core: introduce cg_mask_from_string()/cg_mask_to_string()	2017-05-04 09:41:19 +02:00
Lennart Poettering	db7076bf78	Merge pull request #5164 from Werkov/ordering-for-_netdev-devices Ordering for _netdev devices	2017-04-29 18:40:19 +02:00
Michal Koutný	a2df3ea4ae	job: add JobRunningTimeoutSec for JOB_RUNNING state Unit.JobTimeoutSec starts when a job is enqueued in a transaction. The introduced distinct Unit.JobRunningTimeoutSec starts only when the job starts running (e.g. it groups all Exec* commands of a service or spans waiting for a device period.) Unit.JobRunningTimeoutSec is intended to be used by default instead of Unit.JobTimeoutSec for device units where such behavior causes less confusion (consider a job for a _netdev mount device, with this change the timeout will start ticking only after the network is ready).	2017-04-25 18:00:29 +02:00
Zbigniew Jędrzejewski-Szmek	ba360bb05c	tree-wide: mark log_struct with _printf_ and fix fallout log_struct takes multiple format strings, each one followed by arguments. The _printf_ annotation is not sufficiently flexible to express this, but we can still annotate the first format string, though not its arguments (because their number is unknown). With the annotation, the places which specified the message id or similar as the first pattern cause a warning from -Wformat-nonliteral. This can be trivially fixed by putting the MESSAGE= first. This change will help find issues where a non-literal is erroneously used as the pattern.	2017-04-21 13:37:04 -04:00
Lennart Poettering	77969722aa	core: when a unit's SourcePath points to API VFS pretend we are never out-of-date (#5487 ) If the unit's SourcePath is below /proc then it's a unit genreated from a kernel resource (such as a .mount or .swap unit). And those we watch anyway, and hence should never be out-of-date. Fixes: #5461	2017-03-01 10:25:08 -05:00
Lennart Poettering	ae572acd62	core: always consider clients that pinned a unit to be subscribers If a client pins a unit, then it makes sense to also implicitly make it a subscriber. This is useful for clients that just want to watch one specific unit: they can pin it and receive its messages.	2017-02-28 18:34:58 +01:00
Zbigniew Jędrzejewski-Szmek	78e4f19ebc	Merge pull request #5444 from poettering/cgroups-revert-no-error Revert "core: simplify cg_[all_]unified()" and more.	2017-02-24 18:48:57 -05:00
AsciiWolf	13e785f7a0	Fix missing space in comments (#5439 )	2017-02-24 18:14:02 +01:00
Lennart Poettering	c22800e40e	cgroup: rename cg_unified() → cg_unified_controller() cg_unified() is a bit generic a name, let's make clear that it checks whether a specified controller is in unified mode.	2017-02-24 18:00:04 +01:00
Lennart Poettering	b4cccbc13a	cgroup: change cg_unified() to possibly return errors again We use our cgroup APIs in various contexts, including from our libraries sd-login, sd-bus. As we don#t control those environments we can't rely that the unified cgroup setup logic succeeds, and hence really shouldn't assert on it. This more or less reverts `415fc41cea`.	2017-02-24 17:52:58 +01:00
Tejun Heo	415fc41cea	core: simplify cg_[all_]unified() cg_[all_]unified() test whether a specific controller or all controllers are on the unified hierarchy. While what's being asked is a simple binary question, the callers must assume that the functions may fail any time, which unnecessarily complicates their usages. This complication is unnecessary. Internally, the test result is cached anyway and there are only a few places where the test actually needs to be performed. This patch simplifies cg_[all_]unified(). * cg_[all_]unified() are updated to return bool. If the result can't be decided, assertion failure is triggered. Error handlings from their callers are dropped. * cg_unified_flush() is updated to calculate the new result synchrnously and return whether it succeeded or not. Places which need to flush the test result are updated to test for failure. This ensures that all the following cg_[all_]unified() tests succeed. * Places which expected possible cg_[all_]unified() failures are updated to call and test cg_unified_flush() before calling cg_[all_]unified(). This includes functions used while setting up mounts during boot and manager_setup_cgroup().	2017-02-18 17:51:13 -05:00
Lennart Poettering	2fe917fe91	Merge pull request #4526 from keszybz/coredump-python Collect interpreter backtraces in systemd-coredump	2017-02-16 11:24:03 +01:00
Zbigniew Jędrzejewski-Szmek	2b0445262a	tree-wide: add SD_ID128_MAKE_STR, remove LOG_MESSAGE_ID Embedding sd_id128_t's in constant strings was rather cumbersome. We had SD_ID128_CONST_STR which returned a const char[], but it had two problems: - it wasn't possible to statically concatanate this array with a normal string - gcc wasn't really able to optimize this, and generated code to perform the "conversion" at runtime. Because of this, even our own code in coredumpctl wasn't using SD_ID128_CONST_STR. Add a new macro to generate a constant string: SD_ID128_MAKE_STR. It is not as elegant as SD_ID128_CONST_STR, because it requires a repetition of the numbers, but in practice it is more convenient to use, and allows gcc to generate smarter code: $ size .libs/systemd{,-logind,-journald}{.old,} text data bss dec hex filename 1265204 149564 4808 1419576 15a938 .libs/systemd.old 1260268 149564 4808 1414640 1595f0 .libs/systemd 246805 13852 209 260866 3fb02 .libs/systemd-logind.old 240973 13852 209 255034 3e43a .libs/systemd-logind 146839 4984 34 151857 25131 .libs/systemd-journald.old 146391 4984 34 151409 24f71 .libs/systemd-journald It is also much easier to check if a certain binary uses a certain MESSAGE_ID: $ strings .libs/systemd.old\|grep MESSAGE_ID MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x $ strings .libs/systemd\|grep MESSAGE_ID MESSAGE_ID=c7a787079b354eaaa9e77b371893cd27 MESSAGE_ID=b07a249cd024414a82dd00cd181378ff MESSAGE_ID=641257651c1b4ec9a8624d7a40a9e1e7 MESSAGE_ID=de5b426a63be47a7b6ac3eaac82e2f6f MESSAGE_ID=d34d037fff1847e6ae669a370e694725 MESSAGE_ID=7d4958e842da4a758f6c1cdc7b36dcc5 MESSAGE_ID=1dee0369c7fc4736b7099b38ecb46ee7 MESSAGE_ID=39f53479d3a045ac8e11786248231fbf MESSAGE_ID=be02cf6855d2428ba40df7e9d022f03d MESSAGE_ID=7b05ebc668384222baa8881179cfda54 MESSAGE_ID=9d1aaa27d60140bd96365438aad20286	2017-02-15 00:45:12 -05:00
Lennart Poettering	631b676bb7	core: explicitly verify that BindsTo= deps are in order before dispatch start operation of a unit Let's make sure we verify that all BindsTo= are in order before we actually go and dispatch a start operation to a unit. Normally the job queue should already have made sure all deps are in order, but this might not have been sufficient in two cases: a) when the user changes deps during runtime and reloads the daemon, and b) when the user placed BindsTo= dependencies without matching After= dependencies, so that we don't actually wait for the bound to unit to be up before upping also the binding unit. See: #4725	2017-02-14 13:38:24 +01:00
Lennart Poettering	8367fea557	core: make sure to destroy all name watching bus slots when we are kicked off the bus (#5294 ) Fixes: #4528	2017-02-09 21:54:48 -05:00
Lennart Poettering	915e6d1676	core: add RootImage= setting for using a specific image file as root directory for a service This is similar to RootDirectory= but mounts the root file system from a block device or loopback file instead of another directory. This reuses the image dissector code now used by nspawn and gpt-auto-discovery.	2017-02-07 12:19:42 +01:00
Franck Bui	2d058a87ff	core: don't load dropin data multiple times for the same unit (#5139 ) When an alias is loaded, we resolve this alias to its final unit first to load the dropin data. However if the final unit was already loaded, there's no point in reloading the dropin data a second time. This patch optimizes this case. Also this allows the dropin loading code to assume that only units not yet loaded are passed down. This assumption is not yet used but might be in the future. [zj: invert the condition in the if]	2017-01-24 08:29:57 -05:00
Lennart Poettering	d71f050599	core: implicitly order units with PrivateTmp= after systemd-tmpfiles-setup.service Preparation for fixing #4401.	2016-12-27 23:25:24 +01:00
Franck Bui	ebc8968bc0	core: make mount units from /proc/self/mountinfo possibly bind to a device (#4515 ) Since commit `9d06297`, mount units from mountinfo are not bound to their devices anymore (they use the "Requires" dependency instead). This has the following drawback: if a media is mounted and the eject button is pressed then the media is unconditionally ejected leaving some inconsistent states. Since udev is the component that is reacting (no matter if the device is used or not) to the eject button, users expect that udev at least try to unmount the media properly. This patch introduces a new property "SYSTEMD_MOUNT_DEVICE_BOUND". When set on a block device, all units that requires this device will see their "Requires" dependency upgraded to a "BindTo" one. This is currently only used by cdrom devices. This patch also gives the possibility to the user to restore the previous behavior that is bind a mount unit to a device. This is achieved by passing the "x-systemd.device-bound" option to mount(8). Please note that currently this is not working because libmount treats the x-* options has comments therefore they're not available in utab for later application retrievals.	2016-12-16 17:13:58 +01:00
Zbigniew Jędrzejewski-Szmek	59ec09a83e	pid1: simplify the logic in two statements related to killing processes Generally non-inverted conditions are nicer, and ternary operators with complex conditions are a bit hard to read. No functional change.	2016-12-09 13:53:31 -05:00
Lennart Poettering	c9d5c9c0e1	core: make unit_free() accept NULL pointers We generally try to make our destructors robust regarding NULL pointers, much in the same way as glibc's free(). Do this also for unit_free(). Follow-up for #4748.	2016-12-01 00:25:51 +01:00
Lennart Poettering	2e6dbc0fcd	Merge pull request #4538 from fbuihuu/confirm-spawn-fixes Confirm spawn fixes/enhancements	2016-11-18 11:08:06 +01:00
Franck Bui	c891efaf8a	core: confirm_spawn: always accept units with same_pgrp set for now For some reasons units remaining in the same process group as PID 1 (same_pgrp=true) fail to acquire the console even if it's not taken by anyone. So always accept for units with same_pgrp set for now.	2016-11-17 18:16:51 +01:00
Lennart Poettering	c5a97ed132	core: GC redundant device jobs from the run queue In contrast to all other unit types device units when queued just track external state, they cannot effect state changes on their own. Hence unless a client or other job waits for them there's no reason to keep them in the job queue. This adds a concept of GC'ing jobs of this type as soon as no client or other job waits for them anymore. To ensure this works correctly we need to track which clients actually reference a job (i.e. which ones enqueued it). Unfortunately that's pretty nasty to do for direct connections, as sd_bus_track doesn't work for them. For now, work around this, by simply remembering in a boolean that a job was requested by a direct connection, and reset it when we notice the direct connection is gone. This means the GC logic works fine, except that jobs are not immediately removed when direct connections disconnect. In the longer term, a rework of the bus logic should fix this properly. For now this should be good enough, as GC works for fine all cases except this one, and thus is a clear improvement over the previous behaviour. Fixes: #1921	2016-11-16 15:03:26 +01:00
Lennart Poettering	a2d72e265a	core: drop n_in_gc_queue field of Manager structure We count the units in the GC queue with this, but actually never make use of it, hence drop it.	2016-11-16 15:03:26 +01:00
Djalal Harouni	c92e8afebd	core: improve the logic that implies no new privileges The no_new_privileged_set variable is not used any more since commit `9b232d3241` that fixed another thing. So remove it. Also no need to check if we are under user manager, remove that part too.	2016-11-15 15:04:31 +01:00
Zbigniew Jędrzejewski-Szmek	f97b34a629	Rename formats-util.h to format-util.h We don't have plural in the name of any other -util files and this inconsistency trips me up every time I try to type this file name from memory. "formats-util" is even hard to pronounce.	2016-11-07 10:15:08 -05:00
Lennart Poettering	493fd52f1a	Merge pull request #4510 from keszybz/tree-wide-cleanups Tree wide cleanups	2016-11-03 13:59:20 -06:00
Zbigniew Jędrzejewski-Szmek	e68eedbbdc	Revert some uses of xsprintf This reverts some changes introduced in `d054f0a4d4`. xsprintf should be used in cases where we calculated the right buffer size by hand (using DECIMAL_STRING_MAX and such), and never in cases where we are printing externally specified strings of arbitrary length. Fixes #4534.	2016-11-02 22:36:29 -04:00
Zbigniew Jędrzejewski-Szmek	7fa6328cc4	Merge pull request #4481 from poettering/perpetual Add "perpetual" unit concept, sysctl fixes, networkd fixes, systemctl color fixes, nspawn discard.	2016-11-02 21:03:26 -04:00
Lennart Poettering	a581e45ae8	unit: unify some code with new unit_new_for_name() call	2016-11-02 11:29:59 -06:00
Lennart Poettering	f5869324e3	core: rework the "no_gc" unit flag to become a more generic "perpetual" flag So far "no_gc" was set on -.slice and init.scope, to units that are always running, cannot be stopped and never exist in an "inactive" state. Since these units are the only users of this flag, let's remodel it and rename it "perpetual" and let's derive more funcitonality off it. Specifically, refuse enqueing stop jobs for these units, and report that they are "unstoppable" in the CanStop bus property.	2016-11-02 11:29:59 -06:00
Zbigniew Jędrzejewski-Szmek	f0bfbfac43	core: when restarting services, don't close fds We would close all the stored fds in service_release_resources(), which of course broke the whole concept of storing fds over service restart. Fixes #4408.	2016-11-01 21:20:21 -04:00
Zbigniew Jędrzejewski-Szmek	605405c6cc	tree-wide: drop NULL sentinel from strjoin This makes strjoin and strjoina more similar and avoids the useless final argument. spatch -I . -I ./src -I ./src/basic -I ./src/basic -I ./src/shared -I ./src/shared -I ./src/network -I ./src/locale -I ./src/login -I ./src/journal -I ./src/journal -I ./src/timedate -I ./src/timesync -I ./src/nspawn -I ./src/resolve -I ./src/resolve -I ./src/systemd -I ./src/core -I ./src/core -I ./src/libudev -I ./src/udev -I ./src/udev/net -I ./src/udev -I ./src/libsystemd/sd-bus -I ./src/libsystemd/sd-event -I ./src/libsystemd/sd-login -I ./src/libsystemd/sd-netlink -I ./src/libsystemd/sd-network -I ./src/libsystemd/sd-hwdb -I ./src/libsystemd/sd-device -I ./src/libsystemd/sd-id128 -I ./src/libsystemd-network --sp-file coccinelle/strjoin.cocci --in-place $(git ls-files src/.c) git grep -e '\bstrjoin\b.NULL' -l\|xargs sed -i -r 's/strjoin$(.*), NULL$/strjoin(\1)/' This might have missed a few cases (spatch has a really hard time dealing with _cleanup_ macros), but that's no big issue, they can always be fixed later.	2016-10-23 11:43:27 -04:00
Lukas Nykryn	87a47f99bc	failure-action: generalize failure action to emergency action	2016-10-21 15:13:50 +02:00
Luca Bruno	52c239d770	core/exec: add a named-descriptor option ("fd") for streams (#4179 ) This commit adds a `fd` option to `StandardInput=`, `StandardOutput=` and `StandardError=` properties in order to connect standard streams to externally named descriptors provided by some socket units. This option looks for a file descriptor named as the corresponding stream. Custom names can be specified, separated by a colon. If multiple name-matches exist, the first matching fd will be used.	2016-10-17 20:05:49 -04:00
Zbigniew Jędrzejewski-Szmek	ba25d39e44	pid1: do not use mtime==0 as sign of masking (#4388 ) It is allowed for unit files to have an mtime==0, so instead of assuming that any file that had mtime==0 was masked, use the load_state to filter masked units. Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1384150.	2016-10-17 07:15:03 +02:00
Zbigniew Jędrzejewski-Szmek	6b430fdb7c	tree-wide: use mfree more	2016-10-16 23:35:39 -04:00
Djalal Harouni	2cd0a73547	core:sandbox: remove CAP_SYS_RAWIO on PrivateDevices=yes The rawio system calls were filtered, but CAP_SYS_RAWIO allows to access raw data through /proc, ioctl and some other exotic system calls...	2016-10-12 13:39:49 +02:00
Djalal Harouni	502d704e5e	core:sandbox: Add ProtectKernelModules= option This is useful to turn off explicit module load and unload operations on modular kernels. This option removes CAP_SYS_MODULE from the capability bounding set for the unit, and installs a system call filter to block module system calls. This option will not prevent the kernel from loading modules using the module auto-load feature which is a system wide operation.	2016-10-12 13:31:21 +02:00
Lennart Poettering	4b58153dd2	core: add "invocation ID" concept to service manager This adds a new invocation ID concept to the service manager. The invocation ID identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is generated each time a unit moves from and inactive to an activating or active state. The primary usecase for this concept is to connect the runtime data PID 1 maintains about a service with the offline data the journal stores about it. Previously we'd use the unit name plus start/stop times, which however is highly racy since the journal will generally process log data after the service already ended. The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel, except that it applies to an individual unit instead of the whole system. The invocation ID is passed to the activated processes as environment variable. It is additionally stored as extended attribute on the cgroup of the unit. The latter is used by journald to automatically retrieve it for each log logged message and attach it to the log entry. The environment variable is very easily accessible, even for unprivileged services. OTOH the extended attribute is only accessible to privileged processes (this is because cgroupfs only supports the "trusted." xattr namespace, not "user."). The environment variable may be altered by services, the extended attribute may not be, hence is the better choice for the journal. Note that reading the invocation ID off the extended attribute from journald is racy, similar to the way reading the unit name for a logging process is. This patch adds APIs to read the invocation ID to sd-id128: sd_id128_get_invocation() may be used in a similar fashion to sd_id128_get_boot(). PID1's own logging is updated to always include the invocation ID when it logs information about a unit. A new bus call GetUnitByInvocationID() is added that allows retrieving a bus path to a unit by its invocation ID. The bus path is built using the invocation ID, thus providing a path for referring to a unit that is valid only for the current runtime cycleof it. Outlook for the future: should the kernel eventually allow passing of cgroup information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we can alter the invocation ID to be generated as hash from that rather than entirely randomly. This way we can derive the invocation race-freely from the messages.	2016-10-07 20:14:38 +02:00
Zbigniew Jędrzejewski-Szmek	dd5e7000cb	core: complain if Before= dep on .device is declared [Unit] Before=foobar.device [Service] ExecStart=/bin/true Type=oneshot $ systemd-analyze verify before-device.service before-device.service: Dependency Before=foobar.device ignored (.device units cannot be delayed)	2016-10-01 22:53:17 +02:00
Lennart Poettering	63bb64a056	core: imply ProtectHome=read-only and ProtectSystem=strict if DynamicUser=1 Let's make sure that services that use DynamicUser=1 cannot leave files in the file system should the system accidentally have a world-writable directory somewhere. This effectively ensures that directories need to be whitelisted rather than blacklisted for access when DynamicUser=1 is set.	2016-09-25 10:42:18 +02:00
Lennart Poettering	390bc2b149	core: let's use set_contains() where appropriate	2016-08-22 16:14:21 +02:00
Lennart Poettering	fe700f46ec	core: cache last CPU usage counter, before destorying a cgroup It is useful for clients to be able to read the last CPU usage counter value of a unit even if the unit is already terminated. Hence, before destroying a cgroup's cgroup cache the last CPU usage counter and return it if the cgroup is gone.	2016-08-22 16:14:21 +02:00
Lennart Poettering	05a98afd3e	core: add Ref()/Unref() bus calls for units This adds two (privileged) bus calls Ref() and Unref() to the Unit interface. The two calls may be used by clients to pin a unit into memory, so that various runtime properties aren't flushed out by the automatic GC. This is necessary to permit clients to race-freely acquire runtime results (such as process exit status/code or accumulated CPU time) on successful service termination. Ref() and Unref() are fully recursive, hence act like the usual reference counting concept in C. Taking a reference is a privileged operation, as this allows pinning units into memory which consumes resources. Transient units may also gain a reference at the time of creation, via the new AddRef property (that is only defined for transient units at the time of creation).	2016-08-22 16:14:21 +02:00
Zbigniew Jędrzejewski-Szmek	2056ec1927	Merge pull request #3965 from htejun/systemd-controller-on-unified	2016-08-19 19:58:01 -04:00
Lennart Poettering	00d9ef8560	core: add RemoveIPC= setting This adds the boolean RemoveIPC= setting to service, socket, mount and swap units (i.e. all unit types that may invoke processes). if turned on, and the unit's user/group is not root, all IPC objects of the user/group are removed when the service is shut down. The life-cycle of the IPC objects is hence bound to the unit life-cycle. This is particularly relevant for units with dynamic users, as it is essential that no objects owned by the dynamic users survive the service exiting. In fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set. In order to communicate the UID/GID of an executed process back to PID 1 this adds a new "user lookup" socket pair, that is inherited into the forked processes, and closed before the exec(). This is needed since we cannot do NSS from PID 1 due to deadlock risks, However need to know the used UID/GID in order to clean up IPC owned by it if the unit shuts down.	2016-08-19 00:37:25 +02:00
Tejun Heo	5da38d0768	core: use the unified hierarchy for the systemd cgroup controller hierarchy Currently, systemd uses either the legacy hierarchies or the unified hierarchy. When the legacy hierarchies are used, systemd uses a named legacy hierarchy mounted on /sys/fs/cgroup/systemd without any kernel controllers for process management. Due to the shortcomings in the legacy hierarchy, this involves a lot of workarounds and complexities. Because the unified hierarchy can be mounted and used in parallel to legacy hierarchies, there's no reason for systemd to use a legacy hierarchy for management even if the kernel resource controllers need to be mounted on legacy hierarchies. It can simply mount the unified hierarchy under /sys/fs/cgroup/systemd and use it without affecting other legacy hierarchies. This disables a significant amount of fragile workaround logics and would allow using features which depend on the unified hierarchy membership such bpf cgroup v2 membership test. In time, this would also allow deleting the said complexities. This patch updates systemd so that it prefers the unified hierarchy for the systemd cgroup controller hierarchy when legacy hierarchies are used for kernel resource controllers. * cg_unified(@controller) is introduced which tests whether the specific controller in on unified hierarchy and used to choose the unified hierarchy code path for process and service management when available. Kernel controller specific operations remain gated by cg_all_unified(). * "systemd.legacy_systemd_cgroup_controller" kernel argument can be used to force the use of legacy hierarchy for systemd cgroup controller. * nspawn: By default nspawn uses the same hierarchies as the host. If UNIFIED_CGROUP_HIERARCHY is set to 1, unified hierarchy is used for all. If 0, legacy for all. * nspawn: arg_unified_cgroup_hierarchy is made an enum and now encodes one of three options - legacy, only systemd controller on unified, and unified. The value is passed into mount setup functions and controls cgroup configuration. * nspawn: Interpretation of SYSTEMD_CGROUP_CONTROLLER to the actual mount option is moved to mount_legacy_cgroup_hierarchy() so that it can take an appropriate action depending on the configuration of the host. v2: - CGroupUnified enum replaces open coded integer values to indicate the cgroup operation mode. - Various style updates. v3: Fixed a bug in detect_unified_cgroup_hierarchy() introduced during v2. v4: Restored legacy container on unified host support and fixed another bug in detect_unified_cgroup_hierarchy().	2016-08-17 17:44:36 -04:00
Tejun Heo	ca2f6384aa	core: rename cg_unified() to cg_all_unified() A following patch will update cgroup handling so that the systemd controller (/sys/fs/cgroup/systemd) can use the unified hierarchy even if the kernel resource controllers are on the legacy hierarchies. This would require distinguishing whether all controllers are on cgroup v2 or only the systemd controller is. In preparation, this patch renames cg_unified() to cg_all_unified(). This patch doesn't cause any functional changes.	2016-08-15 18:13:36 -04:00
Tejun Heo	66ebf6c0a1	core: add cgroup CPU controller support on the unified hierarchy Unfortunately, due to the disagreements in the kernel development community, CPU controller cgroup v2 support has not been merged and enabling it requires applying two small out-of-tree kernel patches. The situation is explained in the following documentation. https://git.kernel.org/cgit/linux/kernel/git/tj/cgroup.git/tree/Documentation/cgroup-v2-cpu.txt?h=cgroup-v2-cpu While it isn't clear what will happen with CPU controller cgroup v2 support, there are critical features which are possible only on cgroup v2 such as buffered write control making cgroup v2 essential for a lot of workloads. This commit implements systemd CPU controller support on the unified hierarchy so that users who choose to deploy CPU controller cgroup v2 support can easily take advantage of it. On the unified hierarchy, "cpu.weight" knob replaces "cpu.shares" and "cpu.max" replaces "cpu.cfs_period_us" and "cpu.cfs_quota_us". [Startup]CPUWeight config options are added with the usual compat translation. CPU quota settings remain unchanged and apply to both legacy and unified hierarchies. v2: - Error in man page corrected. - CPU config application in cgroup_context_apply() refactored. - CPU accounting now works on unified hierarchy.	2016-08-07 09:45:39 -04:00
Lennart Poettering	29206d4619	core: add a concept of "dynamic" user ids, that are allocated as long as a service is running This adds a new boolean setting DynamicUser= to service files. If set, a new user will be allocated dynamically when the unit is started, and released when it is stopped. The user ID is allocated from the range 61184..65519. The user will not be added to /etc/passwd (but an NSS module to be added later should make it show up in getent passwd). For now, care should be taken that the service writes no files to disk, since this might result in files owned by UIDs that might get assigned dynamically to a different service later on. Later patches will tighten sandboxing in order to ensure that this cannot happen, except for a few selected directories. A simple way to test this is: systemd-run -p DynamicUser=1 /bin/sleep 99999	2016-07-22 15:53:45 +02:00
Lennart Poettering	1d98fef17d	core: when forcibly killing/aborting left-over unit processes log about it Let's lot at LOG_NOTICE about any processes that we are going to SIGKILL/SIGABRT because clean termination of them didn't work. This turns the various boolean flag parameters to cg_kill(), cg_migrate() and related calls into a single binary flags parameter, simply because the function now gained even more parameters and the parameter listed shouldn't get too long. Logging for killing processes is done either when the kill signal is SIGABRT or SIGKILL, or on explicit request if KILL_TERMINATE_AND_LOG instead of LOG_TERMINATE is passed. This isn't used yet in this patch, but is made use of in a later patch.	2016-07-20 14:35:15 +02:00
Michael Biebl	595bfe7df2	Various fixes for typos found by lintian (#3705 )	2016-07-12 12:52:11 +02:00
Torstein Husebø	61233823aa	treewide: fix typos and remove accidental repetition of words	2016-07-11 16:18:43 +02:00
David Michael	4f952a3f07	core: queue loading transient units after setting their properties (#3676 ) The unit load queue can be processed in the middle of setting the unit's properties, so its load_state would no longer be UNIT_STUB for the check in bus_unit_set_properties(), which would cause it to incorrectly return an error.	2016-07-08 05:43:01 +02:00
Kyle Walker	36f20ae3b2	manager: Only invoke a single sigchld per unit within a cleanup cycle By default, each iteration of manager_dispatch_sigchld() results in a unit level sigchld event being invoked. For scope units, this results in a scope_sigchld_event() which can seemingly stall for workloads that have a large number of PIDs within the scope. The stall exhibits itself as a SIG_0 being initiated for each u->pids entry as a result of pid_is_unwaited(). v2: This patch resolves this condition by only paying to cost of a sigchld in the underlying scope unit once per sigchld iteration. A new "sigchldgen" member resides within the Unit struct. The Manager is incremented via the sd event loop, accessed via sd_event_get_iteration, and the Unit member is set to the same value as the manager each time that a sigchld event is invoked. If the Manager iteration value and Unit member match, the sigchld event is not invoked for that iteration.	2016-06-30 15:16:47 -04:00
Lennart Poettering	fc40065bcd	core: when writing transient unit files, make sure all lines end with a newline This is a fix-up for `2a9a6f8ac0` which covered non-transient units, but missed the case for transient units.	2016-06-23 01:29:33 +02:00
Lennart Poettering	3f71dec5d7	unit: properly comment generated comments in unit files Fix-up for `2a9a6f8ac0`	2016-06-14 20:01:45 +02:00
Zbigniew Jędrzejewski-Szmek	2a9a6f8ac0	core/unit: append newline when writing drop ins unit_write_drop_in{,_private}{,_format} are all affected. We already append a header to the file (and section markers), so those functions can only be used to write a whole file at once. Including the newline at the end feels natural. After this commit newlines will be duplicated. They will be removed in subsequent commit. Also, rewrap the "autogenerated" header to fit within 80 columns.	2016-05-28 16:17:54 -04:00
Lennart Poettering	3103459e90	Merge pull request #3193 from htejun/cgroup-io-controller core: add io controller support on the unified hierarchy	2016-05-16 22:05:27 +02:00
Michal Sekletar	833f92ad39	core: don't log job status message in case job was effectively NOP (#3199 ) We currently generate log message about unit being started even when unit was started already and job didn't do anything. This is because job was requested explicitly and hence became anchor job of the transaction thus we could not eliminate it. That is fine but, let's not pollute journal with useless log messages. $ systemctl start systemd-resolved $ systemctl start systemd-resolved $ systemctl start systemd-resolved Current state: $ journalctl -u systemd-resolved \| grep Started May 05 15:31:42 rawhide systemd[1]: Started Network Name Resolution. May 05 15:31:59 rawhide systemd[1]: Started Network Name Resolution. May 05 15:32:01 rawhide systemd[1]: Started Network Name Resolution. After patch applied: $ journalctl -u systemd-resolved \| grep Started May 05 16:42:12 rawhide systemd[1]: Started Network Name Resolution. Fixes #1723	2016-05-16 11:24:51 -04:00
Tejun Heo	99e66921c8	core: allow slice to be overriden if cgroups aren't realized (#3246 ) unit_set_slice() fails with -EBUSY if the unit already has a slice associated with it. This makes it impossible to override slice through dropin config or over dbus. There's no reason to disallow slice changes as long as cgroups aren't realized. Fix it. Fixes #3240. Signed-off-by: Tejun Heo <htejun@fb.com> Reported-by: Davide Cavalca <dcavalca@fb.com>	2016-05-14 15:56:53 -04:00
Lennart Poettering	f76707da45	core: update the right mtime after finishing writing of transient units (#3203 ) Fixes: #3194	2016-05-06 19:22:22 +03:00
Tejun Heo	13c31542cc	core: add io controller support on the unified hierarchy On the unified hierarchy, blkio controller is renamed to io and the interface is changed significantly. * blkio.weight and blkio.weight_device are consolidated into io.weight which uses the standardized weight range [1, 10000] with 100 as the default value. * blkio.throttle.{read\|write}_{bps\|iops}_device are consolidated into io.max. Expansion of throttling features is being worked on to support work-conserving absolute limits (io.low and io.high). * All stats are consolidated into io.stats. This patchset adds support for the new interface. As the interface has been revamped and new features are expected to be added, it seems best to treat it as a separate controller rather than trying to expand the blkio settings although we might add automatic translation if only blkio settings are specified. * io.weight handling is mostly identical to blkio.weight[_device] handling except that the weight range is different. * Both read and write bandwidth settings are consolidated into CGroupIODeviceLimit which describes all limits applicable to the device. This makes it less painful to add new limits. * "max" can be used to specify the maximum limit which is equivalent to no config for max limits and treated as such. If a given CGroupIODeviceLimit doesn't contain any non-default configs, the config struct is discarded once the no limit config is applied to cgroup. * lookup_blkio_device() is renamed to lookup_block_device(). Signed-off-by: Tejun Heo <htejun@fb.com>	2016-05-05 16:43:06 -04:00

... 2 3 4 5 6 ...

675 commits