Systemd

Author	SHA1	Message	Date
Lennart Poettering	3c7416b6ca	core: unify common code for preparing for forking off unit processes This introduces a new function unit_prepare_exec() that encapsulates a number of calls we do in preparation for spawning off some processes in all our unit types that do so. This allows us to neatly unify a bit of code between unit types and shorten our code.	2017-11-21 11:54:08 +01:00
Zbigniew Jędrzejewski-Szmek	53e1b68390	Add SPDX license identifiers to source files under the LGPL This follows what the kernel is doing, c.f. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.	2017-11-19 19:08:15 +01:00
Lennart Poettering	5afe510c89	core: add a new unit file setting CollectMode= for tweaking the GC logic Right now, the option only takes one of two possible values "inactive" or "inactive-or-failed", the former being the default, and exposing same behaviour as the status quo ante. If set to "inactive-or-failed" units may be collected by the GC logic when in the "failed" state too. This logic should be a nicer alternative to using the "-" modifier for ExecStart= and friends, as the exit data is collected and logged about and only removed when the GC comes along. This should be useful in particular for per-connection socket-activated services, as well as "systemd-run" command lines that shall leave no artifacts in the system. I was thinking about whether to expose this as a boolean, but opted for an enum instead, as I have the suspicion other tweaks like this might be a added later on, in which case we extend this setting instead of having to add yet another one. Also, let's add some documentation for the GC logic.	2017-11-16 14:38:36 +01:00
Lennart Poettering	7eb2a8a125	unit: rework a bit how we keep the service fdstore from being destroyed during service restart When preparing for a restart we quickly go through the DEAD/INACTIVE service state before entering AUTO_RESTART. When doing this, we need to make sure we don't destroy the FD store. Previously this was done by checking the failure state of the unit, and keeping the FD store around when the unit failed, under the assumption that the restart logic will then get into action. This is not entirely correct howver, as there might be failure states that will no result in restarts. With this commit we slightly alter the logic: a ref counter for the fd store is added, that is increased right before we handle the restart logic, and decreased again right-after. This should ensure that the fdstore lives exactly as long as it needs. Follow-up for `f0bfbfac43`.	2017-11-16 14:37:33 +01:00
Lennart Poettering	d3070fbdf6	core: implement /run/systemd/units/-based path for passing unit info from PID 1 to journald And let's make use of it to implement two new unit settings with it: 1. LogLevelMax= is a new per-unit setting that may be used to configure log priority filtering: set it to LogLevelMax=notice and only messages of level "notice" and lower (i.e. more important) will be processed, all others are dropped. 2. LogExtraFields= is a new per-unit setting for configuring per-unit journal fields, that are implicitly included in every log record generated by the unit's processes. It takes field/value pairs in the form of FOO=BAR. Also, related to this, one exisiting unit setting is ported to this new facility: 3. The invocation ID is now pulled from /run/systemd/units/ instead of cgroupfs xattrs. This substantially relaxes requirements of systemd on the kernel version and the privileges it runs with (specifically, cgroupfs xattrs are not available in containers, since they are stored in kernel memory, and hence are unsafe to permit to lesser privileged code). /run/systemd/units/ is a new directory, which contains a number of files and symlinks encoding the above information. PID 1 creates and manages these files, and journald reads them from there. Note that this is supposed to be a direct path between PID 1 and the journal only, due to the special runtime environment the journal runs in. Normally, today we shouldn't introduce new interfaces that (mis-)use a file system as IPC framework, and instead just an IPC system, but this is very hard to do between the journal and PID 1, as long as the IPC system is a subject PID 1 manages, and itself a client to the journal. This patch cleans up a couple of types used in journal code: specifically we switch to size_t for a couple of memory-sizing values, as size_t is the right choice for everything that is memory. Fixes: #4089 Fixes: #3041 Fixes: #4441	2017-11-16 12:40:17 +01:00
Lennart Poettering	c999cf385a	core: add internal API to remove dependencies again, based on dependency mask let's make use of the dependency mask, and add internal API to remove dependencies ago, based on bits in the dependency mask.	2017-11-10 19:45:29 +01:00
Lennart Poettering	eef85c4a3f	core: track why unit dependencies came to be This replaces the dependencies Set* objects by Hashmap* objects, where the key is the depending Unit, and the value is a bitmask encoding why the specific dependency was created. The bitmask contains a number of different, defined bits, that indicate why dependencies exist, for example whether they are created due to explicitly configured deps in files, by udev rules or implicitly. Note that memory usage is not increased by this change, even though we store more information, as we manage to encode the bit mask inside the value pointer each Hashmap entry contains. Why this all? When we know how a dependency came to be, we can update dependencies correctly when a configuration source changes but others are left unaltered. Specifically: 1. We can fix UDEV_WANTS dependency generation: so far we kept adding dependencies configured that way, but if a device lost such a dependency we couldn't them again as there was no scheme for removing of dependencies in place. 2. We can implement "pin-pointed" reload of unit files. If we know what dependencies were created as result of configuration in a unit file, then we know what to flush out when we want to reload it. 3. It's useful for debugging: "systemd-analyze dump" now shows this information, helping substantially with understanding how systemd's dependency tree came to be the way it came to be.	2017-11-10 19:45:29 +01:00
Lennart Poettering	eae51da36e	unit: when JobTimeoutSec= is turned off, implicitly turn off JobRunningTimeoutSec= too We added JobRunningTimeoutSec= late, and Dracut configured only JobTimeoutSec= to turn of root device timeouts before. With this change we'll propagate a reset of JobTimeoutSec= into JobRunningTimeoutSec=, but only if the latter wasn't set explicitly. This should restore compatibility with older systemd versions. Fixes: #6402	2017-10-05 13:06:44 +02:00
Andreas Rammhold	3742095b27	tree-wide: use IN_SET where possible In addition to the changes from #6933 this handles cases that could be matched with the included cocci file.	2017-10-02 13:09:54 +02:00
Lennart Poettering	09e2465407	cgroup: after determining that a cgroup is empty, asynchronously dispatch this This makes sure that if we learn via inotify or another event source that a cgroup is empty, and we checked that this is indeed the case (as we might get spurious notifications through inotify, as the inotify logic through the "cgroups.event" is pretty unspecific and might be trigger for a variety of reasons), then we'll enqueue a defer event for it, at a priority lower than SIGCHLD handling, so that we know for sure that if there's waitid() data for a process we used it before considering the cgroup empty notification. Fixes: #6608	2017-09-27 18:26:18 +02:00
Lennart Poettering	91a6073ef7	core: rename cgroup_queue → cgroup_realize_queue We are about to add second cgroup-related queue, called "cgroup_empty_queue", hence let's rename "cgroup_queue" to "cgroup_realize_queue" (as that is its purpose) to minimize confusion about the two queues. Just a rename, no functional changes.	2017-09-27 17:59:25 +02:00
Lennart Poettering	6d330fef4d	unit: remove unused fields from Unit structure	2017-09-27 17:59:25 +02:00
Lennart Poettering	f1c50becda	core: make sure to log invocation ID of units also when doing structured logging	2017-09-22 15:24:55 +02:00
Lennart Poettering	6b659ed87e	core: serialize/deserialize IP accounting across daemon reload/reexec Make sure the current IP accounting counters aren't lost during reload/reexec. Note that we destroy all BPF file objects during a reload: the BPF programs, the access and the accounting maps. The former two need to be regenerated anyway with the newly loaded configuration data, but the latter one needs to survive reloads/reexec. In this implementation I opted to only save/restore the accounting map content instead of the map itself. While this opens a (theoretic) window where IP traffic is still accounted to the old map after we read it out, and we thus miss a few bytes this has the benefit that we can alter the map layout between versions should the need arise.	2017-09-22 15:24:55 +02:00
Lennart Poettering	a79279c7fd	core: when creating the socket fds for a socket unit, join socket's cgroup first Let's make sure that a socket unit's IPAddressAllow=/IPAddressDeny= settings are in effect on all socket fds associated with it. In order to make this happen we need to make sure the cgroup the fds are associated with are the socket unit's cgroup. The only way to do that is invoking socket()+accept() in them. Since we really don't want to migrate PID 1 around we do this by forking off a helper process, which invokes socket()/accept() and sends the newly created fd to PID 1. Ugly, but works, and there's apparently no better way right now. This generalizes forking off per-unit helper processes in a new function unit_fork_helper_process(), which is then also used by the NSS chown() code of socket units.	2017-09-22 15:24:55 +02:00
Daniel Mack	906c06f64a	cgroup, unit, fragment parser: make use of new firewall functions	2017-09-22 15:24:55 +02:00
Daniel Mack	6a48d82f02	cgroup: add fields to accommodate eBPF related details Add pointers for compiled eBPF programs as well as list heads for allowed and denied hosts for both directions.	2017-09-22 15:24:54 +02:00
Lennart Poettering	f0d477979e	core: introduce unit_set_exec_params() The new unit_set_exec_params() call is to units what manager_set_exec_params() is to the manager object: it initializes the various fields from the relevant generic properties set.	2017-08-10 15:02:50 +02:00
Zbigniew Jędrzejewski-Szmek	0742986650	core: properly handle deserialization of unknown unit types (#6476 ) We just abort startup, without printing any error. Make sure we always print something, and when we cannot deserialize some unit, just ignore it and continue. Fixup for `4bc5d27b94`. Without this, we would hang in daemon-reexec after upgrade.	2017-07-31 08:05:35 +02:00
Zbigniew Jędrzejewski-Szmek	4bc5d27b94	Drop busname unit type Since busname units are only useful with kdbus, they weren't actively used. This was dead code, only compile-tested. If busname units are ever added back, it'll be cleaner to start from scratch (possibly reverting parts of this patch).	2017-07-23 09:29:02 -04:00
Michal Koutný	a2df3ea4ae	job: add JobRunningTimeoutSec for JOB_RUNNING state Unit.JobTimeoutSec starts when a job is enqueued in a transaction. The introduced distinct Unit.JobRunningTimeoutSec starts only when the job starts running (e.g. it groups all Exec* commands of a service or spans waiting for a device period.) Unit.JobRunningTimeoutSec is intended to be used by default instead of Unit.JobTimeoutSec for device units where such behavior causes less confusion (consider a job for a _netdev mount device, with this change the timeout will start ticking only after the network is ready).	2017-04-25 18:00:29 +02:00
Lennart Poettering	2e6dbc0fcd	Merge pull request #4538 from fbuihuu/confirm-spawn-fixes Confirm spawn fixes/enhancements	2016-11-18 11:08:06 +01:00
Franck Bui	c891efaf8a	core: confirm_spawn: always accept units with same_pgrp set for now For some reasons units remaining in the same process group as PID 1 (same_pgrp=true) fail to acquire the console even if it's not taken by anyone. So always accept for units with same_pgrp set for now.	2016-11-17 18:16:51 +01:00
Lennart Poettering	c5a97ed132	core: GC redundant device jobs from the run queue In contrast to all other unit types device units when queued just track external state, they cannot effect state changes on their own. Hence unless a client or other job waits for them there's no reason to keep them in the job queue. This adds a concept of GC'ing jobs of this type as soon as no client or other job waits for them anymore. To ensure this works correctly we need to track which clients actually reference a job (i.e. which ones enqueued it). Unfortunately that's pretty nasty to do for direct connections, as sd_bus_track doesn't work for them. For now, work around this, by simply remembering in a boolean that a job was requested by a direct connection, and reset it when we notice the direct connection is gone. This means the GC logic works fine, except that jobs are not immediately removed when direct connections disconnect. In the longer term, a rework of the bus logic should fix this properly. For now this should be good enough, as GC works for fine all cases except this one, and thus is a clear improvement over the previous behaviour. Fixes: #1921	2016-11-16 15:03:26 +01:00
Zbigniew Jędrzejewski-Szmek	7fa6328cc4	Merge pull request #4481 from poettering/perpetual Add "perpetual" unit concept, sysctl fixes, networkd fixes, systemctl color fixes, nspawn discard.	2016-11-02 21:03:26 -04:00
Lennart Poettering	a581e45ae8	unit: unify some code with new unit_new_for_name() call	2016-11-02 11:29:59 -06:00
Lennart Poettering	f5869324e3	core: rework the "no_gc" unit flag to become a more generic "perpetual" flag So far "no_gc" was set on -.slice and init.scope, to units that are always running, cannot be stopped and never exist in an "inactive" state. Since these units are the only users of this flag, let's remodel it and rename it "perpetual" and let's derive more funcitonality off it. Specifically, refuse enqueing stop jobs for these units, and report that they are "unstoppable" in the CanStop bus property.	2016-11-02 11:29:59 -06:00
Zbigniew Jędrzejewski-Szmek	f0bfbfac43	core: when restarting services, don't close fds We would close all the stored fds in service_release_resources(), which of course broke the whole concept of storing fds over service restart. Fixes #4408.	2016-11-01 21:20:21 -04:00
Lukas Nykryn	87a47f99bc	failure-action: generalize failure action to emergency action	2016-10-21 15:13:50 +02:00
Lennart Poettering	4b58153dd2	core: add "invocation ID" concept to service manager This adds a new invocation ID concept to the service manager. The invocation ID identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is generated each time a unit moves from and inactive to an activating or active state. The primary usecase for this concept is to connect the runtime data PID 1 maintains about a service with the offline data the journal stores about it. Previously we'd use the unit name plus start/stop times, which however is highly racy since the journal will generally process log data after the service already ended. The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel, except that it applies to an individual unit instead of the whole system. The invocation ID is passed to the activated processes as environment variable. It is additionally stored as extended attribute on the cgroup of the unit. The latter is used by journald to automatically retrieve it for each log logged message and attach it to the log entry. The environment variable is very easily accessible, even for unprivileged services. OTOH the extended attribute is only accessible to privileged processes (this is because cgroupfs only supports the "trusted." xattr namespace, not "user."). The environment variable may be altered by services, the extended attribute may not be, hence is the better choice for the journal. Note that reading the invocation ID off the extended attribute from journald is racy, similar to the way reading the unit name for a logging process is. This patch adds APIs to read the invocation ID to sd-id128: sd_id128_get_invocation() may be used in a similar fashion to sd_id128_get_boot(). PID1's own logging is updated to always include the invocation ID when it logs information about a unit. A new bus call GetUnitByInvocationID() is added that allows retrieving a bus path to a unit by its invocation ID. The bus path is built using the invocation ID, thus providing a path for referring to a unit that is valid only for the current runtime cycleof it. Outlook for the future: should the kernel eventually allow passing of cgroup information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we can alter the invocation ID to be generated as hash from that rather than entirely randomly. This way we can derive the invocation race-freely from the messages.	2016-10-07 20:14:38 +02:00
Evgeny Vereshchagin	6afe14ff5b	Merge pull request #3984 from poettering/refcnt permit bus clients to pin units to avoid automatic GC	2016-08-26 16:17:05 +03:00
Felipe Sateler	8dec4a9d2d	core,network: Use const qualifiers for block-local variables in macro functions (#4019 ) Prevents discard-qualifiers warnings when the passed variable was const	2016-08-23 12:29:30 +03:00
Lennart Poettering	fe700f46ec	core: cache last CPU usage counter, before destorying a cgroup It is useful for clients to be able to read the last CPU usage counter value of a unit even if the unit is already terminated. Hence, before destroying a cgroup's cgroup cache the last CPU usage counter and return it if the cgroup is gone.	2016-08-22 16:14:21 +02:00
Lennart Poettering	05a98afd3e	core: add Ref()/Unref() bus calls for units This adds two (privileged) bus calls Ref() and Unref() to the Unit interface. The two calls may be used by clients to pin a unit into memory, so that various runtime properties aren't flushed out by the automatic GC. This is necessary to permit clients to race-freely acquire runtime results (such as process exit status/code or accumulated CPU time) on successful service termination. Ref() and Unref() are fully recursive, hence act like the usual reference counting concept in C. Taking a reference is a privileged operation, as this allows pinning units into memory which consumes resources. Transient units may also gain a reference at the time of creation, via the new AddRef property (that is only defined for transient units at the time of creation).	2016-08-22 16:14:21 +02:00
Lennart Poettering	00d9ef8560	core: add RemoveIPC= setting This adds the boolean RemoveIPC= setting to service, socket, mount and swap units (i.e. all unit types that may invoke processes). if turned on, and the unit's user/group is not root, all IPC objects of the user/group are removed when the service is shut down. The life-cycle of the IPC objects is hence bound to the unit life-cycle. This is particularly relevant for units with dynamic users, as it is essential that no objects owned by the dynamic users survive the service exiting. In fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set. In order to communicate the UID/GID of an executed process back to PID 1 this adds a new "user lookup" socket pair, that is inherited into the forked processes, and closed before the exec(). This is needed since we cannot do NSS from PID 1 due to deadlock risks, However need to know the used UID/GID in order to clean up IPC owned by it if the unit shuts down.	2016-08-19 00:37:25 +02:00
Lennart Poettering	b4c990e91b	unit: remove orphaned cgroup_netclass_id field	2016-08-18 22:49:48 +02:00
Tejun Heo	66ebf6c0a1	core: add cgroup CPU controller support on the unified hierarchy Unfortunately, due to the disagreements in the kernel development community, CPU controller cgroup v2 support has not been merged and enabling it requires applying two small out-of-tree kernel patches. The situation is explained in the following documentation. https://git.kernel.org/cgit/linux/kernel/git/tj/cgroup.git/tree/Documentation/cgroup-v2-cpu.txt?h=cgroup-v2-cpu While it isn't clear what will happen with CPU controller cgroup v2 support, there are critical features which are possible only on cgroup v2 such as buffered write control making cgroup v2 essential for a lot of workloads. This commit implements systemd CPU controller support on the unified hierarchy so that users who choose to deploy CPU controller cgroup v2 support can easily take advantage of it. On the unified hierarchy, "cpu.weight" knob replaces "cpu.shares" and "cpu.max" replaces "cpu.cfs_period_us" and "cpu.cfs_quota_us". [Startup]CPUWeight config options are added with the usual compat translation. CPU quota settings remain unchanged and apply to both legacy and unified hierarchies. v2: - Error in man page corrected. - CPU config application in cgroup_context_apply() refactored. - CPU accounting now works on unified hierarchy.	2016-08-07 09:45:39 -04:00
Lennart Poettering	29206d4619	core: add a concept of "dynamic" user ids, that are allocated as long as a service is running This adds a new boolean setting DynamicUser= to service files. If set, a new user will be allocated dynamically when the unit is started, and released when it is stopped. The user ID is allocated from the range 61184..65519. The user will not be added to /etc/passwd (but an NSS module to be added later should make it show up in getent passwd). For now, care should be taken that the service writes no files to disk, since this might result in files owned by UIDs that might get assigned dynamically to a different service later on. Later patches will tighten sandboxing in order to ensure that this cannot happen, except for a few selected directories. A simple way to test this is: systemd-run -p DynamicUser=1 /bin/sleep 99999	2016-07-22 15:53:45 +02:00
Lennart Poettering	1d98fef17d	core: when forcibly killing/aborting left-over unit processes log about it Let's lot at LOG_NOTICE about any processes that we are going to SIGKILL/SIGABRT because clean termination of them didn't work. This turns the various boolean flag parameters to cg_kill(), cg_migrate() and related calls into a single binary flags parameter, simply because the function now gained even more parameters and the parameter listed shouldn't get too long. Logging for killing processes is done either when the kill signal is SIGABRT or SIGKILL, or on explicit request if KILL_TERMINATE_AND_LOG instead of LOG_TERMINATE is passed. This isn't used yet in this patch, but is made use of in a later patch.	2016-07-20 14:35:15 +02:00
Kyle Walker	36f20ae3b2	manager: Only invoke a single sigchld per unit within a cleanup cycle By default, each iteration of manager_dispatch_sigchld() results in a unit level sigchld event being invoked. For scope units, this results in a scope_sigchld_event() which can seemingly stall for workloads that have a large number of PIDs within the scope. The stall exhibits itself as a SIG_0 being initiated for each u->pids entry as a result of pid_is_unwaited(). v2: This patch resolves this condition by only paying to cost of a sigchld in the underlying scope unit once per sigchld iteration. A new "sigchldgen" member resides within the Unit struct. The Manager is incremented via the sd event loop, accessed via sd_event_get_iteration, and the Unit member is set to the same value as the manager each time that a sigchld event is invoked. If the Manager iteration value and Unit member match, the sigchld event is not invoked for that iteration.	2016-06-30 15:16:47 -04:00
Zbigniew Jędrzejewski-Szmek	74ad38ff0e	Merge pull request #3160 from htejun/cgroup-fixes-rev2 Cgroup fixes.	2016-05-07 15:08:57 -04:00
Lennart Poettering	1ed7ebcfca	Merge pull request #3170 from poettering/v230-preparation-fixes make virtualization detection quieter, rework unit start limit logic, detect unit file drop-in changes correctly, fix autofs state propagation	2016-05-04 10:46:13 +02:00
Lennart Poettering	072993504e	core: move enforcement of the start limit into per-unit-type code again Let's move the enforcement of the per-unit start limit from unit.c into the type-specific files again. For unit types that know a concept of "result" codes this allows us to hook up the start limit condition to it with an explicit result code. Also, this makes sure that the state checks in clal like service_start() may be done before the start limit is checked, as the start limit really should be checked last, right before everything has been verified to be in order. The generic start limit logic is left in unit.c, but the invocation of it is moved into the per-type files, in the various xyz_start() functions, so that they may place the check at the right location. Note that this change drops the enforcement entirely from device, slice, target and scope units, since these unit types generally may not fail activation, or may only be activated a single time. This is also documented now. Note that restores the "start-limit-hit" result code that existed before `6bf0f408e4` already in the service code. However, it's not introduced for all units that have a result code concept. Fixes #3166.	2016-05-02 13:08:00 +02:00
Zbigniew Jędrzejewski-Szmek	ce99c68a33	Move no_instances information to shared/ This way it can be used in install.c in subsequent commit.	2016-05-01 19:58:59 -04:00
Zbigniew Jędrzejewski-Szmek	8a993b61d1	Move no_alias information to shared/ This way it can be used in install.c in subsequent commit.	2016-05-01 19:40:51 -04:00
Tejun Heo	ccf78df1fc	core: make unit_has_mask_realized() consider controller enable state unit_has_mask_realized() determines whether the specified unit has its cgroups set up properly given the desired target_mask; however, on the unified hierarchy, controllers need to be enabled explicitly for children and the mask of enabled controllers can deviate from target_mask. Only considering target_mask in unit_has_mask_realized() can lead to false positives and skipping enabling the requested controllers. This patch adds unit->cgroup_enabled_mask to track which controllers are enabled and updates unit_has_mask_realized() to also consider enable_mask. Signed-off-by: Tejun Heo <htejun@fb.com>	2016-04-30 16:12:54 -04:00
Lennart Poettering	291d565a04	core,systemctl: add bus API to retrieve processes of a unit This adds a new GetProcesses() bus call to the Unit object which returns an array consisting of all PIDs, their process names, as well as their full cgroup paths. This is then used by "systemctl status" to show the per-unit process tree. This has the benefit that the client-side no longer needs to access the cgroupfs directly to show the process tree of a unit. Instead, it now uses this new API, which means it also works if -H or -M are used correctly, as the information from the specific host is used, and not the one from the local system. Fixes: #2945	2016-04-22 16:06:20 +02:00
Lennart Poettering	4f4afc88ec	core: rework how transient unit files and property drop-ins work With this change the logic for placing transient unit files and drop-ins generated via "systemctl set-property" is reworked. The latter are now placed in the newly introduced "control" unit file directory. The fomer are now placed in the "transient" unit file directory. Note that the properties originally set when a transient unit was created will be written to and stay in the transient unit file directory, while later changes are done via drop-ins. This is preparation for a later "systemctl revert" addition, where existing drop-ins are flushed out, but the original transient definition is restored.	2016-04-12 13:43:32 +02:00
Martin Pitt	16a798deb3	Merge pull request #2569 from zonque/removals Remove some old cruft	2016-02-10 14:01:46 +01:00
Daniel Mack	b26fa1a2fb	tree-wide: remove Emacs lines from all files This should be handled fine now by .dir-locals.el, so need to carry that stuff in every file.	2016-02-10 13:41:57 +01:00

1 2 3 4

167 commits