Systemd

Author	SHA1	Message	Date
Michal Sekletár	d9e45bc3ab	core: introduce support for cgroup freezer With cgroup v2 the cgroup freezer is implemented as a cgroup attribute called cgroup.freeze. cgroup can be frozen by writing "1" to the file and kernel will send us a notification through "cgroup.events" after the operation is finished and processes in the cgroup entered quiescent state, i.e. they are not scheduled to run. Writing "0" to the attribute file does the inverse and process execution is resumed. This commit exposes above low-level functionality through systemd's DBus API. Each unit type must provide specialized implementation for these methods, otherwise, we return an error. So far only service, scope, and slice unit types provide the support. It is possible to check if a given unit has the support using CanFreeze() DBus property. Note that DBus API has a synchronous behavior and we dispatch the reply to freeze/thaw requests only after the kernel has notified us that requested operation was completed.	2020-04-30 19:02:51 +02:00
Luca Boccassi	c5bc2c01ee	core: add log_get_max_level check optimization in log_unit_full Just as log_full already does, check if the log level would result in logging immediately in the macro in order to avoid doing unnecessary work that adds up in hot spots.	2020-04-21 18:05:24 +01:00
Zbigniew Jędrzejewski-Szmek	eda0cbf071	Use Finished instead of Started for Type=oneshot services (#14851 ) UnitStatusMessageFormats.finished_job, if present, will be called with the same arguments as job_get_done_status_message_format() to provide a format string appropriate for the context This commit replaces "Started" with "Finished" for started oneshot units, as mentioned in the referenced issue Closes #2458.	2020-03-05 17:24:19 +01:00
Zbigniew Jędrzejewski-Szmek	5bcf34ebf3	pid1: when showing error status, do not switch to status=temporary We would flip to status=temporary mode on the first error, and then switch back to status=auto after the initial transaction was done. This isn't very useful, because usually all the messages about successfully started units and not related to the original failure. In fact, all those messages most likely cause the information about the prime error to scroll off screen. And if the user requested quiet boot, there's no reason to think that they care about those success messages. Also, when logging about dependency cycles, treat this similarly to a unit error and show the message even if the status is "soft disabled" (before we wouldn't show it in that case).	2020-03-01 11:42:42 +01:00
Christian Göttsche	f156e60c66	core: unit_label_path(): take const unit	2020-02-04 18:36:19 +01:00
Lennart Poettering	44b0d1fd59	core: add implicit ordering dep on blockdev@.target from all mount units This way we shuld be able to order mounts properly against their backing services in case complex storage is used (i.e. LUKS), even if the device path used for mounting the devices is different from the expected device node of the backing service. Specifically, if we have a LUKS device /dev/mapper/foo that is mounted by this name all is trivial as the relationship can be established a priori easily. But if it is mounted via a /dev/disk/by-uuid/ symlink or similar we only can relate the device node generated to the one mounted at the moment the device is actually established. That's because the UUID of the fs is stored inside the encrypted volume and thus not knowable until the volume is set up. This patch tries to improve on this situation: a implicit After=blockdev@.target dependency is generated for all mounts, based on the data from /proc/self/mountinfo, which should be the actual device node, with all symlinks resolved. This means that as soon as the mount is established the ordering via blockdev@.target will work, and that means during shutdown it is honoured, which is what we are looking for. Note that specifying /etc/fstab entries via UUID= for LUKS devices still sucks and shouldn't be done, because it means we cannot know which LUKS device to activate to make an fs appear, and that means unless the volume is set up at boot anyway we can't really handle things automatically when putting together transactions that need the mount.	2020-01-21 20:23:44 +01:00
Lennart Poettering	b90cf10245	core: make a number of functions not used externally static	2020-01-21 11:51:45 +01:00
Lennart Poettering	eea45a3399	Merge pull request #14424 from poettering/watch-bus-name-rework pid1: simplify drastically how we watch bus names for service's BusName= setting	2020-01-15 11:46:11 +01:00
Lennart Poettering	c80a9a33d0	core: clearly refuse OnFailure= deps on units that can't fail Similar, refuse triggering deps on units that cannot trigger. And rework how we ignore After= dependencies on device units, to work the same way. See: #14142	2020-01-09 11:03:53 +01:00
Lennart Poettering	fc67a943d9	core: drop initial ListNames() bus call from PID 1 Previously, when first connecting to the bus after connecting to it we'd issue a ListNames() bus call to the driver to figure out which bus names are currently active. This information was then used to initialize the initial state for services that use BusName=. This change removes the whole code for this and replaces it with something vastly simpler. First of all, the ListNames() call was issues synchronosuly, which meant if dbus was for some reason synchronously calling into PID1 for some reason we'd deadlock. As it turns out there's now a good chance it does: the nss-systemd userdb hookup means that any user dbus-daemon resolves might result in a varlink call into PID 1, and dbus resolves quite a lot of users while parsing its policy. My original goal was to fix this deadlock. But as it turns out we don't need the ListNames() call at all anymore, since #12957 has been merged. That PR was supposed to fix a race where asynchronous installation of bus matches would cause us missing the initial owner of a bus name when a service is first started. It fixed it (correctly) by enquiring with GetOwnerName() who currently owns the name, right after installing the match. But this means whenever we start watching a bus name we anyway issue a GetOwnerName() for it, and that means also when first connecting to the bus we don't need to issue ListNames() anymore since that just tells us the same info: which names are currently owned. hence, let's drop ListNames() and instead make better use of the GetOwnerName() result: if it failed the name is not owned. Also, while we are at it, let's simplify the unit's owner_name_changed() callback(): let's drop the "old_owner" argument. We never used that besides logging, and it's hard to synthesize from just the return of a GetOwnerName(), hence don't bother.	2020-01-06 15:21:47 +01:00
Franck Bui	d336ba9fa6	core: drop 'wants' parameter from unit_add_node_dependency() Since Wants dependency is no more automagically added to swap and mount units, this parameter is no more used hence this patch drops it.	2019-10-28 18:51:23 +01:00
Zbigniew Jędrzejewski-Szmek	c362077087	core: turn unit_load_fragment_and_dropin_optional() into a flag unit_load_fragment_and_dropin() and unit_load_fragment_and_dropin_optional() are really the same, with one minor difference in behaviour. Let's drop the second function. "_optional" in the name suggests that it's the "dropin" part that is optional. (Which it is, but in this case, we mean the fragment to be optional.) I think the new version with a flag is easier to understand.	2019-10-11 10:45:33 +02:00
Zbigniew Jędrzejewski-Szmek	a232ebcc2c	core: add support for RestartKillSignal= to override signal used for restart jobs v2: - if RestartKillSignal= is not specified, fall back to KillSignal=. This is necessary to preserve backwards compatibility (and keep KillSignal= generally useful).	2019-10-02 14:01:25 +02:00
Zbigniew Jędrzejewski-Szmek	28a2dfe801	core: add helper function to check job status Since job.h includes unit.h, and unit.h includes job.h, imports need to be adjusted to make sure unit.h is included first if the helper is used.	2019-10-01 15:05:27 +02:00
Zbigniew Jędrzejewski-Szmek	5ac1530eca	tree-wide: say "ratelimit" not "rate_limit" "ratelimit" is a real word, so we don't need to use the other form anywhere. We had both forms in various places, let's standarize on the shorter and more correct one.	2019-09-20 16:05:53 +02:00
Zbigniew Jędrzejewski-Szmek	7bf081a1e5	pid1: rename start_limit to start_ratelimit This way it is clearer what the type is. We also have auto_stop_ratelimit adjacent, and it feels ugly to have a different suffix for those two.	2019-09-20 16:05:53 +02:00
Zbigniew Jędrzejewski-Szmek	de5ae832f2	Merge pull request #13439 from yuwata/core-support-systemctl-clean-more core: support systemctl clean more	2019-09-13 16:15:02 +02:00
Yu Watanabe	810ef3180e	core: introduce unit_fork_and_watch_rm_rf()	2019-08-28 23:09:54 +09:00
Yu Watanabe	52a12341f9	core: make RuntimeDirectoryPreserve= works with non-service units	2019-08-23 00:08:16 +09:00
Yu Watanabe	95939aed21	core: introduce unit_destroy_runtime_directory() Currently `unit_will_restart()` can return true only when the unit is service. Hence, should not change anything.	2019-08-22 23:50:52 +09:00
Zbigniew Jędrzejewski-Szmek	5cc2cd1cd8	pid1: always log successfull process termination quietly Fixes #13372.	2019-08-22 09:09:45 +02:00
Mattias Jernberg	a5a8776ae5	core: Avoid race when starting dbus services In high load scenarios it is possible for services to be started before the NameOwnerChanged signal is properly installed. Emulate a callback by also queuing a GetNameOwner when the match is installed. Fixes: #12956	2019-08-14 16:12:31 +02:00
Zbigniew Jędrzejewski-Szmek	5cfa33e0bc	Create src/shared/unit-file.[ch] for unit-file related ops So far we put such functinos in install.[ch], but that is tied too closely to enable/disable. Let's start moving things to a place with a better name.	2019-07-19 16:51:14 +02:00
Zbigniew Jędrzejewski-Szmek	96cf3ec966	pid1: get rid of unit_supported() helper Another case where "open code" is easier to read than the helper.	2019-07-19 16:51:14 +02:00
Anita Zhang	31cd5f63ce	core: ExecCondition= for services Closes #10596	2019-07-17 11:35:02 +02:00
Lennart Poettering	380dc8b0a2	core: add generic "clean" operation to units This adds basic infrastructure to implement a "clean" operation for unit types. This "clean" operation is supposed to remove on-disk resources of units, and is supposed to be used in a later commit to clean our RuntimeDirectory=, StateDirectory= and so on of service units. Later commits will open this up to the bus, and hook up service units with this. This also adds a new generic ActiveState called UNIT_MAINTENANCE. It's supposed to cover all kinds of "maintainance" state of units. Specifically, this is supposed to cover the "cleaning" operations later added for service units which might take a bit of time. This high-level, generic, abstract state is called UNIT_MAINTENANCE instead of the more specific "UNIT_CLEANING", since I think this should be kept open for different operations possibly later on that could be nicely subsumed under this (for example, maybe a recursive chown()ing operation could be covered by this, and similar).	2019-07-11 12:18:51 +02:00
Lennart Poettering	261e7d9270	Merge pull request #12755 from keszybz/short-identifiers Allow using unit names in status messages	2019-07-11 00:00:51 +02:00
Zbigniew Jędrzejewski-Szmek	2a8f53c67b	Use unit->id instead of description in messages v2: - rename unit_identifier to unit_status_string	2019-07-10 13:35:26 +02:00
Zbigniew Jędrzejewski-Szmek	62c6bbbc09	tree-wide: use PROJECT_FILE instead of __FILE__ This replaces the internal uses of __FILE__ with the new macro.	2019-07-04 10:36:00 +02:00
Kai Lüke	fab347489f	bpf-firewall: custom BPF programs through IP(Ingress\|Egress)FilterPath= Takes a single /sys/fs/bpf/pinned_prog string as argument, but may be specified multiple times. An empty assignment resets all previous filters. Closes https://github.com/systemd/systemd/issues/10227	2019-06-25 09:56:16 +02:00
Ben Boeckel	5238e95759	codespell: fix spelling errors	2019-04-29 16:47:18 +02:00
Lennart Poettering	fbe14fc9a7	croup: expose IO accounting data per unit This was the last kind of accounting still not exposed on for each unit. Let's fix that. Note that this is a relatively simplistic approach: we don't expose per-device stats, but sum them all up, much like cgtop does. This kind of metric is probably the most interesting for most usecases, and covers the "systemctl status" output best. If we want per-device stats one day we can of course always add that eventually.	2019-04-12 14:25:44 +02:00
Lennart Poettering	afcfaa695c	core: implement OOMPolicy= and watch cgroups for OOM killings This adds a new per-service OOMPolicy= (along with a global DefaultOOMPolicy=) that controls what to do if a process of the service is killed by the kernel's OOM killer. It has three different values: "continue" (old behaviour), "stop" (terminate the service), "kill" (let the kernel kill all the service's processes). On top of that, track OOM killer events per unit: generate a per-unit structured, recognizable log message when we see an OOM killer event, and put the service in a failure state if an OOM killer event was seen and the selected policy was not "continue". A new "result" is defined for this case: "oom-kill". All of this relies on new cgroupv2 kernel functionality: the "memory.events" notification interface and the "memory.oom.group" attribute (which makes the kernel kill all cgroup processes automatically).	2019-04-09 11:17:58 +02:00
Lennart Poettering	0bb814c2c2	core: rename cgroup_inotify_wd → cgroup_control_inotify_wd Let's rename the .cgroup_inotify_wd field of the Unit object to .cgroup_control_inotify_wd. Let's similarly rename the hashmap .cgroup_inotify_wd_unit of the Manager object to .cgroup_control_inotify_wd_unit. Why? As preparation for a later commit that allows us to watch the "memory.events" cgroup attribute file in addition to the "cgroup.events" file we already watch with the fields above. In that later commit we'll add new fields "cgroup_memory_inotify_wd" to Unit and "cgroup_memory_inotify_wd_unit" to Manager, that are used to watch these other events file. No change in behaviour. Just some renaming.	2019-04-09 11:17:57 +02:00
Franck Bui	f75f613d25	core: reduce the number of stalled PIDs from the watched processes list when possible Some PIDs can remain in the watched list even though their processes have exited since a long time. It can easily happen if the main process of a forking service manages to spawn a child before the control process exits for example. However when a pid is about to be mapped to a unit by calling unit_watch_pid(), the caller usually knows if the pid should belong to this unit exclusively: if we just forked() off a child, then we can be sure that its PID is otherwise unused. In this case we take this opportunity to remove any stalled PIDs from the watched process list. If we learnt about a PID in any other form (for example via PID file, via searching, MAINPID= and so on), then we can't assume anything.	2019-03-20 10:51:49 +01:00
Lennart Poettering	a4191c9fb5	core: unify code for checking whether unit to trigger is loaded	2019-03-18 16:06:36 +01:00
Lennart Poettering	97a3f4ee05	core: rename unit_{start_limit\|condition\|assert}_test() to unit_test_xyz() Just some renaming, no change in behaviour. Background: I'd like to add more functions unit_test_xyz() that test various things, hence let's streamline the naming a bit.	2019-03-18 16:06:36 +01:00
Lennart Poettering	5bcffb4b54	Merge pull request #11457 from grooverdan/sendsigkill_no service: killmode=cgroup\|mixed, SendSIGKILL=no services are not multiprocess	2019-02-18 13:41:52 +01:00
Filipe Brandenburger	527ede0c63	core: downgrade CPUQuotaPeriodSec= clamping logs to debug After the first warning log, further messages are downgraded to LOG_DEBUG.	2019-02-14 11:04:42 -08:00
Daniel Black	c53d2d54bd	service: make killmode=cgroup\|mixed, SendSIGKILL=no services singletons KillMode=mixed and control group are used to indicate that all process should be killed off. SendSIGKILL is used for services that require a clean shutdown. These are typically database service where a SigKilled process would result in a lengthy recovery and who's shutdown or startup time is quite variable (so Timeout settings aren't of use). Here we take these two factors and refuse to start a service if there are existing processes within a control group. Databases, while generally having some protection against multiple instances running, lets not stress the rigor of these. Also ExecStartPre parts of the service aren't as rigoriously written to protect against against multiple use. closes #8630	2019-01-29 15:35:59 +11:00
Chris Down	4e1dfa45e9	cgroup: s/cgroups? ?v?([0-9])/cgroup v\1/gI Nitpicky, but we've used a lot of random spacings and names in the past, but we're trying to be completely consistent on "cgroup vN" now. Generated by `fd -0 \| xargs -0 -n1 sed -ri --follow-symlinks 's/cgroups? ?v?([0-9])/cgroup v\1/gI'`. I manually ignored places where it's not appropriate to replace (eg. "cgroup2" fstype and in src/shared/linux).	2019-01-03 11:32:40 +09:00
Zbigniew Jędrzejewski-Szmek	303ee60151	Mark data and userdata params to specifier_printf() as const It would be very wrong if any of the specfier printf calls modified any of the objects or data being printed. Let's mark all arguments as const (primarily to make it easier for the reader to see where modifications cannot occur).	2018-12-12 16:45:33 +01:00
Lennart Poettering	a95c0505ad	core: extend comments regarding coldplug() vs. catchup()	2018-12-12 11:20:53 +01:00
Lennart Poettering	7af67e9a8b	core: allow to set exit status when using SuccessAction=/FailureAction=exit in units This adds SuccessActionExitStatus= and FailureActionExitStatus= that may be used to configure the exit status to propagate in when SuccessAction=exit or FailureAction=exit is used. When not specified let's also propagate the exit status of the main process we fork off for the unit.	2018-11-27 09:44:40 +01:00
Lennart Poettering	5af8805872	cgroup: drastically simplify caching of cgroups members mask Previously we tried to be smart: when a new unit appeared and it only added controllers to the cgroup mask we'd update the cached members mask in all parents by ORing in the controller flags in their cached values. Unfortunately this was quite broken, as we missed some conditions when this cache had to be reset (for example, when a unit got unloaded), moreover the optimization doesn't work when a controller is removed anyway (as in that case there's no other way for the parent to iterate though all children if any other, remaining child unit still needs it). Hence, let's simplify the logic substantially: instead of updating the cache on the right events (which we didn't get right), let's simply invalidate the cache, and generate it lazily when we encounter it later. This should actually result in better behaviour as we don't have to calculate the new members mask for a whole subtree whever we have the suspicion something changed, but can delay it to the point where we actually need the members mask. This allows us to simplify things quite a bit, which is good, since validating this cache for correctness is hard enough. Fixes: #9512	2018-11-23 13:41:37 +01:00
Lennart Poettering	5a62e5e2ac	cgroup: document what the various masks variables are used for	2018-11-23 13:41:37 +01:00
Lennart Poettering	27da878e7e	unit: drop an unused fields from Unit struct	2018-11-23 00:37:00 +01:00
Lennart Poettering	66fa4bdd70	core: add two minor comments (#10890 )	2018-11-23 06:25:27 +09:00
Zbigniew Jędrzejewski-Szmek	aac99f303a	core: introduce a helper function to wrap unit_log_{success,failure} It's inline so that the compiler can easily optimize away the call to get status string.	2018-11-16 19:47:07 +01:00
Lennart Poettering	523ee2d414	core: log a recognizable message when a unit succeeds, too We already are doing it on failure, let's do it on success, too. Fixes: #10265	2018-11-16 15:22:48 +01:00

1 2 3 4 5 ...

267 commits