Systemd

Author	SHA1	Message	Date
Lennart Poettering	4b58153dd2	core: add "invocation ID" concept to service manager This adds a new invocation ID concept to the service manager. The invocation ID identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is generated each time a unit moves from and inactive to an activating or active state. The primary usecase for this concept is to connect the runtime data PID 1 maintains about a service with the offline data the journal stores about it. Previously we'd use the unit name plus start/stop times, which however is highly racy since the journal will generally process log data after the service already ended. The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel, except that it applies to an individual unit instead of the whole system. The invocation ID is passed to the activated processes as environment variable. It is additionally stored as extended attribute on the cgroup of the unit. The latter is used by journald to automatically retrieve it for each log logged message and attach it to the log entry. The environment variable is very easily accessible, even for unprivileged services. OTOH the extended attribute is only accessible to privileged processes (this is because cgroupfs only supports the "trusted." xattr namespace, not "user."). The environment variable may be altered by services, the extended attribute may not be, hence is the better choice for the journal. Note that reading the invocation ID off the extended attribute from journald is racy, similar to the way reading the unit name for a logging process is. This patch adds APIs to read the invocation ID to sd-id128: sd_id128_get_invocation() may be used in a similar fashion to sd_id128_get_boot(). PID1's own logging is updated to always include the invocation ID when it logs information about a unit. A new bus call GetUnitByInvocationID() is added that allows retrieving a bus path to a unit by its invocation ID. The bus path is built using the invocation ID, thus providing a path for referring to a unit that is valid only for the current runtime cycleof it. Outlook for the future: should the kernel eventually allow passing of cgroup information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we can alter the invocation ID to be generated as hash from that rather than entirely randomly. This way we can derive the invocation race-freely from the messages.	2016-10-07 20:14:38 +02:00
Tejun Heo	5da38d0768	core: use the unified hierarchy for the systemd cgroup controller hierarchy Currently, systemd uses either the legacy hierarchies or the unified hierarchy. When the legacy hierarchies are used, systemd uses a named legacy hierarchy mounted on /sys/fs/cgroup/systemd without any kernel controllers for process management. Due to the shortcomings in the legacy hierarchy, this involves a lot of workarounds and complexities. Because the unified hierarchy can be mounted and used in parallel to legacy hierarchies, there's no reason for systemd to use a legacy hierarchy for management even if the kernel resource controllers need to be mounted on legacy hierarchies. It can simply mount the unified hierarchy under /sys/fs/cgroup/systemd and use it without affecting other legacy hierarchies. This disables a significant amount of fragile workaround logics and would allow using features which depend on the unified hierarchy membership such bpf cgroup v2 membership test. In time, this would also allow deleting the said complexities. This patch updates systemd so that it prefers the unified hierarchy for the systemd cgroup controller hierarchy when legacy hierarchies are used for kernel resource controllers. * cg_unified(@controller) is introduced which tests whether the specific controller in on unified hierarchy and used to choose the unified hierarchy code path for process and service management when available. Kernel controller specific operations remain gated by cg_all_unified(). * "systemd.legacy_systemd_cgroup_controller" kernel argument can be used to force the use of legacy hierarchy for systemd cgroup controller. * nspawn: By default nspawn uses the same hierarchies as the host. If UNIFIED_CGROUP_HIERARCHY is set to 1, unified hierarchy is used for all. If 0, legacy for all. * nspawn: arg_unified_cgroup_hierarchy is made an enum and now encodes one of three options - legacy, only systemd controller on unified, and unified. The value is passed into mount setup functions and controls cgroup configuration. * nspawn: Interpretation of SYSTEMD_CGROUP_CONTROLLER to the actual mount option is moved to mount_legacy_cgroup_hierarchy() so that it can take an appropriate action depending on the configuration of the host. v2: - CGroupUnified enum replaces open coded integer values to indicate the cgroup operation mode. - Various style updates. v3: Fixed a bug in detect_unified_cgroup_hierarchy() introduced during v2. v4: Restored legacy container on unified host support and fixed another bug in detect_unified_cgroup_hierarchy().	2016-08-17 17:44:36 -04:00
Tejun Heo	ca2f6384aa	core: rename cg_unified() to cg_all_unified() A following patch will update cgroup handling so that the systemd controller (/sys/fs/cgroup/systemd) can use the unified hierarchy even if the kernel resource controllers are on the legacy hierarchies. This would require distinguishing whether all controllers are on cgroup v2 or only the systemd controller is. In preparation, this patch renames cg_unified() to cg_all_unified(). This patch doesn't cause any functional changes.	2016-08-15 18:13:36 -04:00
Lennart Poettering	a0fef983ab	core: remember first unit failure, not last unit failure Previously, the result value of a unit was overriden with each failure that took place, so that the result always reported the last failure that took place. With this commit this is changed, so that the first failure taking place is stored instead. This should normally not matter much as multiple failures are sufficiently uncommon. However, it improves one behaviour: if we send SIGABRT to a service due to a watchdog timeout, then this currently would be reported as "coredump" failure, rather than the "watchodg" failure it really is. Hence, in order to report information about the type of the failure, and not about the effect of it, let's change this from all unit type to store the first, not the last failure. This addresses the issue pointed out here: https://github.com/systemd/systemd/pull/3818#discussion_r73433520	2016-08-04 23:08:05 +02:00
Lennart Poettering	3862e809d0	core: when a scope was abandoned, always log about processes we kill After all, if a unit is abandoned, all processes inside of it may be considered "left over" and are something we should better log about.	2016-07-20 14:35:15 +02:00
Lennart Poettering	1d98fef17d	core: when forcibly killing/aborting left-over unit processes log about it Let's lot at LOG_NOTICE about any processes that we are going to SIGKILL/SIGABRT because clean termination of them didn't work. This turns the various boolean flag parameters to cg_kill(), cg_migrate() and related calls into a single binary flags parameter, simply because the function now gained even more parameters and the parameter listed shouldn't get too long. Logging for killing processes is done either when the kill signal is SIGABRT or SIGKILL, or on explicit request if KILL_TERMINATE_AND_LOG instead of LOG_TERMINATE is passed. This isn't used yet in this patch, but is made use of in a later patch.	2016-07-20 14:35:15 +02:00
Evgeny Vereshchagin	bbc85a16e1	core: on unified we don't need to check u->pids: we can use proper notifications (#3531 ) Fixes: #3483	2016-06-14 14:08:01 +02:00
Zbigniew Jędrzejewski-Szmek	ce99c68a33	Move no_instances information to shared/ This way it can be used in install.c in subsequent commit.	2016-05-01 19:58:59 -04:00
Zbigniew Jędrzejewski-Szmek	8a993b61d1	Move no_alias information to shared/ This way it can be used in install.c in subsequent commit.	2016-05-01 19:40:51 -04:00
Lennart Poettering	4f4afc88ec	core: rework how transient unit files and property drop-ins work With this change the logic for placing transient unit files and drop-ins generated via "systemctl set-property" is reworked. The latter are now placed in the newly introduced "control" unit file directory. The fomer are now placed in the "transient" unit file directory. Note that the properties originally set when a transient unit was created will be written to and stay in the transient unit file directory, while later changes are done via drop-ins. This is preparation for a later "systemctl revert" addition, where existing drop-ins are flushed out, but the original transient definition is restored.	2016-04-12 13:43:32 +02:00
Lennart Poettering	2c289ea833	core: introduce MANAGER_IS_RELOADING() macro This replaces the old function call manager_is_reloading_or_reexecuting() which was used only at very few places. Use the new macro wherever we check whether we are reloading. This should hopefully make things a bit more readable, given the nature of Manager:n_reloading being a counter.	2016-04-12 13:43:30 +02:00
Lennart Poettering	1b4cd0cf11	core: exclude .slice units from "systemctl isolate" Fixes: #1969	2016-02-20 22:42:29 +01:00
Daniel Mack	b26fa1a2fb	tree-wide: remove Emacs lines from all files This should be handled fine now by .dir-locals.el, so need to carry that stuff in every file.	2016-02-10 13:41:57 +01:00
Lennart Poettering	7a7821c878	core: rework job_get_timeout() to use usec_t and handle USEC_INFINITY time events correctly	2016-02-04 00:35:43 +01:00
Lennart Poettering	36c16a7cdd	core: rework unit timeout handling, and add new setting RuntimeMaxSec= This clean-ups timeout handling in PID 1. Specifically, instead of storing 0 in internal timeout variables as indication for a disabled timeout, use USEC_INFINITY which is in-line with how we do this in the rest of our code (following the logic that 0 means "no", and USEC_INFINITY means "never"). This also replace all usec_t additions with invocations to usec_add(), so that USEC_INFINITY is properly propagated, and sd-event considers it has indication for turning off the event source. This also alters the deserialization of the units to restart timeouts from the time they were originally started from. Before this patch timeouts would be restarted beginning with the time of the deserialization, which could lead to artificially prolonged timeouts if a daemon reload took place. Finally, a new RuntimeMaxSec= setting is introduced for service units, that specifies a maximum runtime after which a specific service is forcibly terminated. This is useful to put time limits on time-intensive processing jobs. This also simplifies the various xyz_spawn() calls of the various types in that explicit distruction of the timers is removed, as that is done anyway by the state change handlers, and a state change is always done when the xyz_spawn() calls fail. Fixes: #2249	2016-02-01 22:18:16 +01:00
Lennart Poettering	700e2d63b7	core: simplify scope unit GC checking code a bit	2015-11-13 19:50:52 +01:00
Lennart Poettering	4c9ea260ae	core: simplify things a bit by checking default_dependencies boolean in callee, not caller It's nicer to hide the check away in the various xyz_add_default_dependencies() calls, rather than making it explicit in the caller, and thus require deeper nesing.	2015-11-11 20:42:39 +01:00
Tom Gundersen	7042fc14ff	Merge pull request #1837 from poettering/grabbag2 variety of fixes	2015-11-11 02:31:29 +01:00
Zbigniew Jędrzejewski-Szmek	36b4a7ba55	Remove snapshot unit type Snapshots were never useful or used for anything. Many systemd developers that I spoke to at systemd.conf2015, didn't even know they existed, so it is fairly safe to assume that this type can be deleted without harm. The fundamental problem with snapshots is that the state of the system is dynamic, devices come and go, users log in and out, timers fire... and restoring all units to some state from the past would "undo" those changes, which isn't really possible. Tested by creating a snapshot, running the new binary, and checking that the transition did not cause errors, and the snapshot is gone, and snapshots cannot be created anymore. New systemctl says: Unknown operation snapshot. Old systemctl says: Failed to create snapshot: Support for snapshots has been removed. IgnoreOnSnaphost settings are warned about and ignored: Support for option IgnoreOnSnapshot= has been removed and it is ignored http://lists.freedesktop.org/archives/systemd-devel/2015-November/034872.html	2015-11-10 19:33:06 -05:00
Lennart Poettering	ba64af90ec	core: change return value of the unit's enumerate() call to void We cannot handle enumeration failures in a sensible way, hence let's try hard to continue without making such failures fatal, and log about it with precise error messages.	2015-11-10 21:03:49 +01:00
Lennart Poettering	b5efdb8af4	util-lib: split out allocation calls into alloc-util.[ch]	2015-10-27 13:45:53 +01:00
Lennart Poettering	8b43440b7e	util-lib: move string table stuff into its own string-table.[ch]	2015-10-27 13:25:56 +01:00
Lennart Poettering	07630cea1f	util-lib: split our string related calls from util.[ch] into its own file string-util.[ch] There are more than enough calls doing string manipulations to deserve its own files, hence do something about it. This patch also sorts the #include blocks of all files that needed to be updated, according to the sorting suggestions from CODING_STYLE. Since pretty much every file needs our string manipulation functions this effectively means that most files have sorted #include blocks now. Also touches a few unrelated include files.	2015-10-24 23:05:02 +02:00
Lennart Poettering	417800228f	core: ignore -.slice and init.scope when isolating Otherwise, we might end up trying to isolate it away when starting user instances. While we are at it, also prohibit manual start/stop of these two units. Fixes: #1507	2015-10-09 17:20:32 +02:00
Zbigniew Jędrzejewski-Szmek	7e55de3b96	Move all unit states to basic/ and extend systemctl --state=help	2015-09-28 15:09:34 -04:00
Lennart Poettering	a1e58e8ee1	tree-wide: use coccinelle to patch a lot of code to use mfree() This replaces this: free(p); p = NULL; by this: p = mfree(p); Change generated using coccinelle. Semantic patch is added to the sources.	2015-09-09 08:19:27 +02:00
Thomas Hindoe Paaboel Andersen	09d2f5b1c9	scope: do not compare a bool return with "<= 0"	2015-09-02 19:58:12 +02:00
Lennart Poettering	efdb02375b	core: unified cgroup hierarchy support This patch set adds full support the new unified cgroup hierarchy logic of modern kernels. A new kernel command line option "systemd.unified_cgroup_hierarchy=1" is added. If specified the unified hierarchy is mounted to /sys/fs/cgroup instead of a tmpfs. No further hierarchies are mounted. The kernel command line option defaults to off. We can turn it on by default as soon as the kernel's APIs regarding this are stabilized (but even then downstream distros might want to turn this off, as this will break any tools that access cgroupfs directly). It is possibly to choose for each boot individually whether the unified or the legacy hierarchy is used. nspawn will by default provide the legacy hierarchy to containers if the host is using it, and the unified otherwise. However it is possible to run containers with the unified hierarchy on a legacy host and vice versa, by setting the $UNIFIED_CGROUP_HIERARCHY environment variable for nspawn to 1 or 0, respectively. The unified hierarchy provides reliable cgroup empty notifications for the first time, via inotify. To make use of this we maintain one manager-wide inotify fd, and each cgroup to it. This patch also removes cg_delete() which is unused now. On kernel 4.2 only the "memory" controller is compatible with the unified hierarchy, hence that's the only controller systemd exposes when booted in unified heirarchy mode. This introduces a new enum for enumerating supported controllers, plus a related enum for the mask bits mapping to it. The core is changed to make use of this everywhere. This moves PID 1 into a new "init.scope" implicit scope unit in the root slice. This is necessary since on the unified hierarchy cgroups may either contain subgroups or processes but not both. PID 1 hence has to move out of the root cgroup (strictly speaking the root cgroup is the only one where processes and subgroups are still allowed, but in order to support containers nicey, we move PID 1 into the new scope in all cases.) This new unit is also used on legacy hierarchy setups. It's actually pretty useful on all systems, as it can then be used to filter journal messages coming from PID 1, and so on. The root slice ("-.slice") is now implicitly created and started (and does not require a unit file on disk anymore), since that's where "init.scope" is located and the slice needs to be started before the scope can. To check whether we are in unified or legacy hierarchy mode we use statfs() on /sys/fs/cgroup. If the .f_type field reports tmpfs we are in legacy mode, if it reports cgroupfs we are in unified mode. This patch set carefuly makes sure that cgls and cgtop continue to work as desired. When invoking nspawn as a service it will implicitly create two subcgroups in the cgroup it is using, one to move the nspawn process into, the other to move the actual container processes into. This is done because of the requirement that cgroups may either contain processes or other subgroups.	2015-09-01 23:52:27 +02:00
Lennart Poettering	6f883237f1	cgroup: drop "ignore_self" argument from cg_is_empty() In all cases where the function (or cg_is_empty_recursive()) ignoring the calling process is actually wrong, as a process keeps a cgroup busy regardless if its the current one or another. Hence, let's simplify things and drop the "ignore_self" parameter.	2015-09-01 18:37:01 +02:00
Lennart Poettering	d79200e26e	unit: unify how we assing slices to units This adds a new call unit_set_slice(), and simplifies unit_add_default_slice(). THis should make our code a bit more robust and simpler.	2015-08-31 13:20:43 +02:00
Lennart Poettering	21b735e798	core: add unit_dbus_interface_from_type() to unit-name.h Let's add a way to get the type-specific D-Bus interface of a unit from either its type or name to src/basic/unit-name.[ch]. That way we can share it with the client side, where it is useful in tools like cgls or machinectl. Also ports over machinectl to make use of this.	2015-08-28 02:10:10 +02:00
Lennart Poettering	cbf60d0a7f	core: only set event source name when we create an event source	2015-05-13 18:30:14 +02:00
Lennart Poettering	f2341e0a87	core,network: major per-object logging rework This changes log_unit_info() (and friends) to take a real Unit* object insted of just a unit name as parameter. The call will now prefix all logged messages with the unit name, thus allowing the unit name to be dropped from the various passed romat strings, simplifying invocations drastically, and unifying log output across messages. Also, UNIT= vs. USER_UNIT= is now derived from the Manager object attached to the Unit object, instead of getpid(). This has the benefit of correcting the field for --test runs. Also contains a couple of other logging improvements: - Drops a couple of strerror() invocations in favour of using %m. - Not only .mount units now warn if a symlinks exist for the mount point already, .automount units do that too, now. - A few invocations of log_struct() that didn't actually pass any additional structured data have been replaced by simpler invocations of log_unit_info() and friends. - For structured data a new LOG_UNIT_MESSAGE() macro has been added, that works like LOG_MESSAGE() but prefixes the message with the unit name. Similar, there's now LOG_LINK_MESSAGE() and LOG_NETDEV_MESSAGE(). - For structured data new LOG_UNIT_ID(), LOG_LINK_INTERFACE(), LOG_NETDEV_INTERFACE() macros have been added that generate the necessary per object fields. The old log_unit_struct() call has been removed in favour of these new macros used in raw log_struct() invocations. In addition to removing one more function call this allows generated structured log messages that contain two object fields, as necessary for example for network interfaces that are joined into another network interface, and whose messages shall be indexed by both. - The LOG_ERRNO() macro has been removed, in favour of log_struct_errno(). The latter has the benefit of ensuring that %m in format strings is properly resolved to the specified error number. - A number of logging messages have been converted to use log_unit_info() instead of log_info() - The client code in sysv-generator no longer #includes core code from src/core/. - log_unit_full_errno() has been removed, log_unit_full() instead takes an errno now, too. - log_unit_info(), log_link_info(), log_netdev_info() and friends, now avoid double evaluation of their parameters	2015-05-11 22:24:45 +02:00
Tom Gundersen	7dfbe2e3fc	core: annotate event sources	2015-04-29 17:08:31 +02:00
Thomas Hindoe Paaboel Andersen	68a01fb658	scope: use correct enum type	2015-04-28 19:03:11 +02:00
Lennart Poettering	dd305ec9c6	core: when we cannot add PID to a scope cgroup, log about it Also, place the scope unit in failed state.	2015-04-28 12:20:57 +02:00
Lennart Poettering	be847e82cf	Revert "core: do not spawn jobs or touch other units during coldplugging" This reverts commit `6e392c9c45`. We really shouldn't invent external state keeping hashmaps, if we can keep this state in the units themselves.	2015-04-24 15:51:10 +02:00
Ivan Shapovalov	6e392c9c45	core: do not spawn jobs or touch other units during coldplugging Because the order of coldplugging is not defined, we can reference a not-yet-coldplugged unit and read its state while it has not yet been set to a meaningful value. This way, already active units may get started again. We fix this by deferring such actions until all units have been at least somehow coldplugged. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=88401	2015-03-07 08:44:57 -05:00
Lennart Poettering	5ad096b3f1	core: expose consumed CPU time per unit This adds support for showing the accumulated consumed CPU time per-unit in the "systemctl status" output. The property is also readable via the bus.	2015-03-02 12:15:25 +01:00
Thomas Hindoe Paaboel Andersen	2eec67acbb	remove unused includes This patch removes includes that are not used. The removals were found with include-what-you-use which checks if any of the symbols from a header is in use.	2015-02-23 23:53:42 +01:00
Lennart Poettering	82a2b6bb5e	core: output unit status output strings to console, only if we actually are changing unit state Unit _start() and _stop() implementations can fail with -EAGAIN to delay execution temporarily. Thus, we should not output status messages before invoking these calls, but after, and only when we know that the invocation actually made a change.	2015-01-28 15:07:13 +01:00
Lennart Poettering	7b3fd6313c	scope: make attachment of initial PIDs a bit more robust	2014-12-10 22:06:44 +01:00
Ronny Chevalier	4e2744fcb5	core: remove unused variables	2014-11-30 02:35:56 +01:00
Michal Schmidt	23bbb0de4e	treewide: more log_*_errno + return simplifications	2014-11-28 18:24:30 +01:00
Michal Schmidt	da927ba997	treewide: no need to negate errno for log_*_errno() It corrrectly handles both positive and negative errno values.	2014-11-28 13:29:21 +01:00
Michal Schmidt	0a1beeb642	treewide: auto-convert the simple cases to log__errno() As a followup to `086891e5c1` "log: add an "error" parameter to all low-level logging calls and intrdouce log_error_errno() as log calls that take error numbers", use sed to convert the simple cases to use the new macros: find . -name '.[ch]' \| xargs sed -r -i -e \ 's/log_(debug\|info\|notice\|warning\|error\|emergency)$"(.)%s"(.), strerror\(-([a-zA-Z_]+)$\);/log_\1_errno(-\4, "\2%m"\3);/' Multi-line log_() invocations are not covered. And we also should add log_unit__errno().	2014-11-28 12:04:41 +01:00
Lennart Poettering	79008bddf6	log: rearrange log function naming - Rename log_meta() → log_internal(), to follow naming scheme of most other log functions that are usually invoked through macros, but never directly. - Rename log_info_object() to log_object_info(), simply because the object should be before any other parameters, to follow OO-style programming style.	2014-11-27 22:05:24 +01:00
Umut Tezduyar Lindskog	db2cb23b5b	core: send sigabrt on watchdog timeout to get the stacktrace if sigabrt doesn't do the job, follow regular shutdown routine, sigterm > sigkill.	2014-10-28 17:37:39 +01:00
Lennart Poettering	6a0f1f6d5a	sd-event: rework API to support CLOCK_REALTIME_ALARM and CLOCK_BOOTTIME_ALARM, too	2014-03-24 02:58:41 +01:00
Lennart Poettering	598459ceba	core: rework context initialization/destruction logic Let's automatically initialize the kill, exec and cgroup contexts of the various unit types when the object is constructed, instead of invididually in type-specific code. Also, when PrivateDevices= is set, set DevicePolicy= to closed.	2014-03-19 21:06:53 +01:00

1 2

65 commits