Systemd

Author	SHA1	Message	Date
Franck Bui	bda7d78ba1	pid1: preserve current value of log target across re-{load,execution} To make debugging easier, this patches allows one to change the log target and do reload/reexec without modifying configuration permanently, which makes debugging easier. Indeed if one changed the log target at runtime (via the bus or via signals), the change was lost on the next reload/reexecution. In order to restore back the default value (set via system.conf, environment variables or any other means ), the empty string in the "LogTarget" property is now supported as well as sending SIGTRMIN+26 signal.	2018-06-13 18:52:27 +02:00
Franck Bui	a6ecbf836c	pid1: preserve current value of log level across re-{load,execution} To make debugging easier, this patches allows one to change the log level and do reload/reexec without modifying configuration permanently, which makes debugging easier. Indeed if one changed the log max level at runtime (via the bus or via signals), the change was lost on the next daemon reload/reexecution. In order to restore the original value back (set via system.conf, environment variables or any other means), the empty string in the "LogLevel" property is now supported as well as sending SIGRTMIN+23 signal.	2018-06-13 18:52:27 +02:00
Lennart Poettering	bbf5fd8e41	core: subscribe to /etc/localtime timezone changes and update timer elapsation accordingly Fixes: #8233 This is our first real-life usecase for the new sd_event_add_inotify() calls we just added.	2018-06-06 10:53:56 +02:00
David Tardon	a7a7163df7	fix race between daemon-reload and other commands When "systemctl daemon-reload" is run at the same time as "systemctl start foo", the latter might hang. That's because commands like start wait for JobRemoved signal to know when the job is finished. But if the job is finished during reloading, the signal is never sent. The hang can be easily reproduced by running # for ((N=1; N>0; N++)) ; do echo $N ; systemctl daemon-reload ; done # for ((N=1; N>0; N++)) ; do echo $N ; systemctl start systemd-coredump.socket ; done in two different terminals. The start command will hang after 1-2 iterations. This keeps track of jobs that were started before reload and finished during it and sends JobRemoved after the reload has finished.	2018-05-19 11:37:00 +02:00
Felipe Sateler	57b7a260c2	core: undo the dependency inversion between unit.h and all unit types	2018-05-15 14:24:34 -04:00
Zbigniew Jędrzejewski-Szmek	a1113e0865	core/manager: make manager_enumerate() static	2018-04-24 11:44:19 +02:00
Lennart Poettering	2cb36f7c1e	Merge pull request #8575 from keszybz/non-absolute-paths Do not require absolute paths in ExecStart and friends	2018-04-17 15:54:10 +02:00
Zbigniew Jędrzejewski-Szmek	4109ede778	core/manager: split out function to verify that unit is loaded and not masked No functional change.	2018-04-16 16:07:27 +02:00
Zbigniew Jędrzejewski-Szmek	11a1589223	tree-wide: drop license boilerplate Files which are installed as-is (any .service and other unit files, .conf files, .policy files, etc), are left as is. My assumption is that SPDX identifiers are not yet that well known, so it's better to retain the extended header to avoid any doubt. I also kept any copyright lines. We can probably remove them, but it'd nice to obtain explicit acks from all involved authors before doing that.	2018-04-06 18:58:55 +02:00
Michael Olbrich	227b8a762f	core: don't include libmount.h in a header file (#8580 ) linux/fs.h sys/mount.h, libmount.h and missing.h all include MS_* definitions. To avoid problems, only one of linux/fs.h, sys/mount.h and libmount.h should be included. And missing.h must be included last. Without this, building systemd may fail with: In file included from [...]/libmount/libmount.h:31:0, from ../systemd-238/src/core/manager.h:23, from ../systemd-238/src/core/emergency-action.h:37, from ../systemd-238/src/core/unit.h:34, from ../systemd-238/src/core/dbus-timer.h:25, from ../systemd-238/src/core/timer.c:26: [...]/sys/mount.h:57:2: error: expected identifier before numeric constant	2018-03-26 17:34:53 +02:00
Michal Sekletar	19496554e2	core: delay adding target dependencies until all units are loaded and aliases resolved (#8381 ) Currently we add target dependencies while we are loading units. This can create ordering loops even if configuration doesn't contain any loop. Take for example following configuration, $ systemctl get-default multi-user.target $ cat /etc/systemd/system/test.service [Unit] After=default.target [Service] ExecStart=/bin/true [Install] WantedBy=multi-user.target If we encounter such unit file early during manager start-up (e.g. load queue is dispatched while enumerating devices due to SYSTEMD_WANTS in udev rules) we would add stub unit default.target and we order it Before test.service. At the same time we add implicit Before to multi-user.target. Later we merge two units and we create ordering cycle in the process. To fix the issue we will now never add any target dependencies until we loaded all the unit files and resolved all the aliases.	2018-03-23 15:28:06 +01:00
Zbigniew Jędrzejewski-Szmek	e8112e67e4	Make MANAGER_TEST_RUN_MINIMAL just allocate data structures When running tests like test-unit-name, there is not point in setting up the cgroup and signals and interacting with the environment. Similarly when running fuzz testing of the parser. Add new MANAGER_TEST_RUN_BASIC which takes the role of MANAGER_TEST_RUN_MINIMAL, and redefine MANAGER_TEST_RUN_MINIMAL to just create the basic data structures.	2018-03-11 16:33:59 +01:00
Zbigniew Jędrzejewski-Szmek	c70cac548a	Introduce _cleanup_(manager_freep)	2018-03-11 16:33:57 +01:00
Lennart Poettering	5f109056d5	core: delay bus name synchronization after reload/reexec into a later event loop iteration Previously, we'd synchronize bus names immediately when we succeeded connecting to the bus, potentially even before coldplugging the units. This was problematic, as synchronizing bus names meant invoking the per-unit name change handler function which might change the unit's state — which will result in consistency when done before we coldplug things. With this change we instead enqueue a job for the event loop to resync the names in a later loop iteration, i.e. at a point where we know coldplugging has finished.	2018-02-12 11:34:00 +01:00
Lennart Poettering	8559b3b75c	core: rework how we connect to the bus This removes the current bus_init() call, as it had multiple problems: it munged handling of the three bus connections we care about (private, "api" and system) into one, even though the conditions when which was ready are very different. It also added redundant logging, as the individual calls it called all logged on their own anyway. The three calls bus_init_api(), bus_init_private() and bus_init_system() are now made public. A new call manager_dbus_is_running() is added that works much like manager_journal_is_running() and is a lot more careful when checking whether dbus is around. Optionally it checks the unit's deserialized_state rather than state, in order to accomodate for cases where we cant to connect to the bus before deserializing the "subscribed" list, before coldplugging the units. manager_recheck_dbus() is added, that works a lot like manager_recheck_journal() and is invoked in unit_notify(), i.e. when units change state. All in all this should make handling a bit more alike to journal handling, and it also fixes one major bug: when running in user mode we'll now connect to the system bus early on, without conditionalizing this in anyway.	2018-02-12 11:34:00 +01:00
Lennart Poettering	004c7f169e	core: fold manager_set_exec_params() into unit_set_exec_params() Let's simplify things a bit: we so far called both functions every single time, let's just merge one into the other, so that we have fewer functions to call.	2018-02-12 11:34:00 +01:00
Yu Watanabe	e8a565cb66	core: make ExecRuntime be manager managed object Before this, each ExecRuntime object is owned by a unit. However, it may be shared with other units which enable JoinsNamespaceOf=. Thus, by the serialization/deserialization process, its sharing information, more specifically, reference counter is lost, and causes issue #7790. This makes ExecRuntime objects be managed by manager, and changes the serialization/deserialization process. Fixes #7790.	2018-02-06 16:00:34 +09:00
Zbigniew Jędrzejewski-Szmek	5eb83fa645	Merge pull request #7991 from poettering/n-on-console a comprehensive fix for the n_on_console miscounting issue	2018-01-25 13:48:08 +03:00
Zbigniew Jędrzejewski-Szmek	f26f5b60d0	Merge pull request #7915 from poettering/pids-max-tweak	2018-01-25 10:24:35 +01:00
Lennart Poettering	adefcf2821	core: rework how we count the n_on_console counter Let's add a per-unit boolean that tells us whether our unit is currently counted or not. This way it's unlikely we get out of sync again and things are generally more robust. This also allows us to remove the counting logic specific to service units (which was in fact mostly a copy from the generic implementation), in favour of fully generic code. Replaces: #7824	2018-01-24 20:14:51 +01:00
Lennart Poettering	62a769136d	core: rework how we track which PIDs to watch for a unit Previously, we'd maintain two hashmaps keyed by PIDs, pointing to Unit interested in SIGCHLD events for them. This scheme allowed a specific PID to be watched by exactly 0, 1 or 2 units. With this rework this is replaced by a single hashmap which is primarily keyed by the PID and points to a Unit interested in it. However, it optionally also keyed by the negated PID, in which case it points to a NULL terminated array of additional Unit objects also interested. This scheme means arbitrary numbers of Units may now watch the same PID. Runtime and memory behaviour should not be impact by this change, as for the common case (i.e. each PID only watched by a single unit) behaviour stays the same, but for the uncommon case (a PID watched by more than one unit) we only pay with a single additional memory allocation for the array. Why this all? Primarily, because allowing exactly two units to watch a specific PID is not sufficient for some niche cases, as processes can belong to more than one unit these days: 1. sd_notify() with MAINPID= can be used to attach a process from a different cgroup to multiple units. 2. Similar, the PIDFile= setting in unit files can be used for similar setups, 3. By creating a scope unit a main process of a service may join a different unit, too. 4. On cgroupsv1 we frequently end up watching all processes remaining in a scope, and if a process opens lots of scopes one after the other it might thus end up being watch by many of them. This patch hence removes the 2-unit-per-PID limit. It also makes a couple of other changes, some of them quite relevant: - manager_get_unit_by_pid() (and the bus call wrapping it) when there's ambiguity will prefer returning the Unit the process belongs to based on cgroup membership, and only check the watch-pids hashmap if that fails. This change in logic is probably more in line with what people expect and makes things more stable as each process can belong to exactly one cgroup only. - Every SIGCHLD event is now dispatched to all units interested in its PID. Previously, there was some magic conditionalization: the SIGCHLD would only be dispatched to the unit if it was only interested in a single PID only, or the PID belonged to the control or main PID or we didn't dispatch a signle SIGCHLD to the unit in the current event loop iteration yet. These rules were quite arbitrary and also redundant as the the per-unit handlers would filter the PIDs anyway a second time. With this change we'll hence relax the rules: all we do now is dispatch every SIGCHLD event exactly once to each unit interested in it, and it's up to the unit to then use or ignore this. We use a generation counter in the unit to ensure that we only invoke the unit handler once for each event, protecting us from confusion if a unit is both associated with a specific PID through cgroup membership and through the "watch_pids" logic. It also protects us from being confused if the "watch_pids" hashmap is altered while we are dispatching to it (which is a very likely case). - sd_notify() message dispatching has been reworked to be very similar to SIGCHLD handling now. A generation counter is used for dispatching as well. This also adds a new test that validates that "watch_pid" registration and unregstration works correctly.	2018-01-23 21:29:31 +01:00
Lennart Poettering	575b300b79	pid1: rework how we dispatch SIGCHLD and other signals This fundamentally makes one change: we never process more than one signal or more than one waitid() event per event loop. We'll never tight loop around waitid() or around read() on our signalfd instead, but always return to the main event loop after processing one event. By doing this we put the event priorization handling into full power again, as we'll always check for higher priority events before looking at the next signal or waitid() again. This introduces a new "defer" event source "sigchld_event". It's enabled as soon as we see SIGCHLD, and disabled as soon as waitid() reported no further children pending. It's running at a relatively high priority, one step higher than signal handling itself, but lower than /proc/self/mountinfo event handling, so that the latter always takes precedence. Since we want to process sd_notify() events at an even higher priority than SIGCHLD (as before) it is moved one priority step up, too. Fixes: #7932 Possibly fixes: #7966	2018-01-23 18:41:40 +01:00
Lennart Poettering	4259d20215	manager: add MANAGER_IS_RUNNING() for checking whether the manager is running This macro is useful as the check is not obvious, and we better abstract this away.	2018-01-23 16:43:56 +01:00
Jan Klötzke	2a12e32efa	pid1: add option to disable service watchdogs Add a "systemd.service_watchdogs=" option to the command line which disables all service runtime watchdogs and emergency actions.	2018-01-22 18:10:03 +01:00
Lennart Poettering	00b5974f70	core: propagate TasksMax= on the root slice to sysctls The cgroup "pids" controller is not supported on the root cgroup. However we expose TasksMax= on it, but currently don't actually apply it to anything. Let's correct this: if set, let's propagate things to the right sysctls. This way we can expose TasksMax= on all units in a somewhat sensible way.	2018-01-22 16:26:55 +01:00
Zbigniew Jędrzejewski-Szmek	d8eb10d61a	core: delay logging the taint string until after basic.target is reached (#7935 ) This happens to be almost the same moment as when we send READY=1 in the user instance, but the logic is slightly different, since we log taint when basic.target is reached in the system manager, but we send the notification only in the user manager. So add a separate flag for this and propagate it across reloads. Fixes #7683.	2018-01-21 21:17:54 +09:00
Lennart Poettering	af6b0ecc4c	core: make "taint" string logic a bit more generic and output it at boot The tainting logic existed for a long time, but was hidden inside the bus interfaces. Let's give it a small bit more coverage, by logging its value early at boot during initialization.	2017-12-07 11:27:07 +01:00
Lennart Poettering	49d5666cc5	manager: introduce MANAGER_IS_FINISHED() macro Let's make our finished checks a bit more readable. Checking the timestamp is not entirely obvious, hence let's abstract that a bit by adding a macro that shows what we are doing here, not how we doing it. This is particularly useful if we want to change the definition of "finished" later on, in particular, when we try to fix #7023.	2017-11-21 11:01:34 +01:00
Lennart Poettering	713f6f901d	manager: add manager_get_dump_string() It's like manager_dump(), but returns a string. This allows us to reduce some duplicate code. Also, while we are at it, turn off stdio locking while we write to the memory FILE *f.	2017-11-21 11:01:34 +01:00
Lennart Poettering	ad75b9e765	core: add manager_dump() call, and make it output timestamp data It's a wrapper around manager_dump_units() and manager_dump_jobs(), and outputs some additional timestamp data. Also, port two users of this over.	2017-11-21 10:22:28 +01:00
Lennart Poettering	9f9f034271	manager: rework the timestamps logic, so that they are an enum-index array This makes things quite a bit more systematic I think, as we can systematically operate on all timestamps, for example for the purpose of serialization/deserialization. This rework doesn't necessarily make things shorter in the individual lines, but it does reduce the line count a bit. (This is useful particularly when we want to add additional timestamps, for example to solve #7023)	2017-11-21 10:22:28 +01:00
Zbigniew Jędrzejewski-Szmek	53e1b68390	Add SPDX license identifiers to source files under the LGPL This follows what the kernel is doing, c.f. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.	2017-11-19 19:08:15 +01:00
Zbigniew Jędrzejewski-Szmek	0c2826c60c	core: in --user mode, report READY=1 as soon as basic.target is reached (#7102 ) When a user logs in, systemd-pam will wait for the user manager instance to report readiness. We don't need to wait for all the jobs to finish, it is enough if the basic startup is done and the user manager is responsive. systemd --user will now send out a READY=1 notification when either of two conditions becomes true: - basic.target/start job is gone, - the initial transaction is done. Also fixes #2863.	2017-10-24 14:48:54 +02:00
Zbigniew Jędrzejewski-Szmek	892a035c2e	core: make gc_marker unsigned (#7004 ) This matches the definition in unit.h.	2017-10-05 15:04:19 +02:00
Lennart Poettering	72fd17682d	core: usually our enum's _INVALID and _MAX special values are named after the full type In most cases we followed the rule that the special _INVALID and _MAX values we use in our enums use the full type name as prefix (in contrast to regular values that we often make shorter), do so for ExecDirectoryType as well. No functional changes, just a little bit of renaming to make this code more like the rest.	2017-10-02 17:41:43 +02:00
Lennart Poettering	09e2465407	cgroup: after determining that a cgroup is empty, asynchronously dispatch this This makes sure that if we learn via inotify or another event source that a cgroup is empty, and we checked that this is indeed the case (as we might get spurious notifications through inotify, as the inotify logic through the "cgroups.event" is pretty unspecific and might be trigger for a variety of reasons), then we'll enqueue a defer event for it, at a priority lower than SIGCHLD handling, so that we know for sure that if there's waitid() data for a process we used it before considering the cgroup empty notification. Fixes: #6608	2017-09-27 18:26:18 +02:00
Lennart Poettering	91a6073ef7	core: rename cgroup_queue → cgroup_realize_queue We are about to add second cgroup-related queue, called "cgroup_empty_queue", hence let's rename "cgroup_queue" to "cgroup_realize_queue" (as that is its purpose) to minimize confusion about the two queues. Just a rename, no functional changes.	2017-09-27 17:59:25 +02:00
Daniel Mack	377bfd2d49	manager: hook up IP accounting defaults	2017-09-22 15:24:55 +02:00
Daniel Mack	6a48d82f02	cgroup: add fields to accommodate eBPF related details Add pointers for compiled eBPF programs as well as list heads for allowed and denied hosts for both directions.	2017-09-22 15:24:54 +02:00
Zbigniew Jędrzejewski-Szmek	e0a3da1fd2	Make test_run into a flags field and disable generators again Now generators are only run in systemd --test mode, where this makes most sense (how are you going to test what would happen otherwise?). Fixes #6842. v2: - rename test_run to test_run_flags	2017-09-19 20:14:05 +02:00
Lennart Poettering	19bbdd985e	core: manager_set_exec_params() cannot fail, hence make it void Let's simplify things a bit.	2017-08-10 15:02:50 +02:00
Jouke Witteveen	15d167f8a3	core: propagate reload from RELOADING=1 notification (#6550 )	2017-08-07 11:27:24 +02:00
Yu Watanabe	3536f49e8f	core: add {State,Cache,Log,Configuration}Directory= (#6384 ) This introduces {State,Cache,Log,Configuration}Directory= those are similar to RuntimeDirectory=. They create the directories under /var/lib, /var/cache/, /var/log, or /etc, respectively, with the mode specified in {State,Cache,Log,Configuration}DirectoryMode=. This also fixes #6391.	2017-07-18 14:34:52 +02:00
Lennart Poettering	2e6dbc0fcd	Merge pull request #4538 from fbuihuu/confirm-spawn-fixes Confirm spawn fixes/enhancements	2016-11-18 11:08:06 +01:00
Franck Bui	b0eb29449e	core: add 'c' in confirmation_spawn to resume the boot process	2016-11-17 18:16:50 +01:00
Franck Bui	7d5ceb6416	core: allow to redirect confirmation messages to a different console It's rather hard to parse the confirmation messages (enabled with systemd.confirm_spawn=true) amongst the status messages and the kernel ones (if enabled). This patch gives the possibility to the user to redirect the confirmation message to a different virtual console, either by giving its name or its path, so those messages are separated from the other ones and easier to read.	2016-11-17 18:16:16 +01:00
Lennart Poettering	c5a97ed132	core: GC redundant device jobs from the run queue In contrast to all other unit types device units when queued just track external state, they cannot effect state changes on their own. Hence unless a client or other job waits for them there's no reason to keep them in the job queue. This adds a concept of GC'ing jobs of this type as soon as no client or other job waits for them anymore. To ensure this works correctly we need to track which clients actually reference a job (i.e. which ones enqueued it). Unfortunately that's pretty nasty to do for direct connections, as sd_bus_track doesn't work for them. For now, work around this, by simply remembering in a boolean that a job was requested by a direct connection, and reset it when we notice the direct connection is gone. This means the GC logic works fine, except that jobs are not immediately removed when direct connections disconnect. In the longer term, a rework of the bus logic should fix this properly. For now this should be good enough, as GC works for fine all cases except this one, and thus is a clear improvement over the previous behaviour. Fixes: #1921	2016-11-16 15:03:26 +01:00
Lennart Poettering	a2d72e265a	core: drop n_in_gc_queue field of Manager structure We count the units in the GC queue with this, but actually never make use of it, hence drop it.	2016-11-16 15:03:26 +01:00
Lukas Nykryn	ae8c7939df	core: use emergency_action for ctr+alt+del burst Fixes #4306	2016-10-21 15:13:50 +02:00
Lennart Poettering	4b58153dd2	core: add "invocation ID" concept to service manager This adds a new invocation ID concept to the service manager. The invocation ID identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is generated each time a unit moves from and inactive to an activating or active state. The primary usecase for this concept is to connect the runtime data PID 1 maintains about a service with the offline data the journal stores about it. Previously we'd use the unit name plus start/stop times, which however is highly racy since the journal will generally process log data after the service already ended. The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel, except that it applies to an individual unit instead of the whole system. The invocation ID is passed to the activated processes as environment variable. It is additionally stored as extended attribute on the cgroup of the unit. The latter is used by journald to automatically retrieve it for each log logged message and attach it to the log entry. The environment variable is very easily accessible, even for unprivileged services. OTOH the extended attribute is only accessible to privileged processes (this is because cgroupfs only supports the "trusted." xattr namespace, not "user."). The environment variable may be altered by services, the extended attribute may not be, hence is the better choice for the journal. Note that reading the invocation ID off the extended attribute from journald is racy, similar to the way reading the unit name for a logging process is. This patch adds APIs to read the invocation ID to sd-id128: sd_id128_get_invocation() may be used in a similar fashion to sd_id128_get_boot(). PID1's own logging is updated to always include the invocation ID when it logs information about a unit. A new bus call GetUnitByInvocationID() is added that allows retrieving a bus path to a unit by its invocation ID. The bus path is built using the invocation ID, thus providing a path for referring to a unit that is valid only for the current runtime cycleof it. Outlook for the future: should the kernel eventually allow passing of cgroup information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we can alter the invocation ID to be generated as hash from that rather than entirely randomly. This way we can derive the invocation race-freely from the messages.	2016-10-07 20:14:38 +02:00
Lukáš Nykrýn	24dd31c19e	core: add possibility to set action for ctrl-alt-del burst (#4105 ) For some certification, it should not be possible to reboot the machine through ctrl-alt-delete. Currently we suggest our customers to mask the ctrl-alt-delete target, but that is obviously not enough. Patching the keymaps to disable that is really not a way to go for them, because the settings need to be easily checked by some SCAP tools.	2016-10-06 21:08:21 -04:00
Zbigniew Jędrzejewski-Szmek	232f6754f6	pid1: drop kdbus_fd and all associated logic	2016-09-09 15:16:26 +01:00
Lennart Poettering	00d9ef8560	core: add RemoveIPC= setting This adds the boolean RemoveIPC= setting to service, socket, mount and swap units (i.e. all unit types that may invoke processes). if turned on, and the unit's user/group is not root, all IPC objects of the user/group are removed when the service is shut down. The life-cycle of the IPC objects is hence bound to the unit life-cycle. This is particularly relevant for units with dynamic users, as it is essential that no objects owned by the dynamic users survive the service exiting. In fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set. In order to communicate the UID/GID of an executed process back to PID 1 this adds a new "user lookup" socket pair, that is inherited into the forked processes, and closed before the exec(). This is needed since we cannot do NSS from PID 1 due to deadlock risks, However need to know the used UID/GID in order to clean up IPC owned by it if the unit shuts down.	2016-08-19 00:37:25 +02:00
Lennart Poettering	29206d4619	core: add a concept of "dynamic" user ids, that are allocated as long as a service is running This adds a new boolean setting DynamicUser= to service files. If set, a new user will be allocated dynamically when the unit is started, and released when it is stopped. The user ID is allocated from the range 61184..65519. The user will not be added to /etc/passwd (but an NSS module to be added later should make it show up in getent passwd). For now, care should be taken that the service writes no files to disk, since this might result in files owned by UIDs that might get assigned dynamically to a different service later on. Later patches will tighten sandboxing in order to ensure that this cannot happen, except for a few selected directories. A simple way to test this is: systemd-run -p DynamicUser=1 /bin/sleep 99999	2016-07-22 15:53:45 +02:00
Lennart Poettering	3103459e90	Merge pull request #3193 from htejun/cgroup-io-controller core: add io controller support on the unified hierarchy	2016-05-16 22:05:27 +02:00
Tejun Heo	13c31542cc	core: add io controller support on the unified hierarchy On the unified hierarchy, blkio controller is renamed to io and the interface is changed significantly. * blkio.weight and blkio.weight_device are consolidated into io.weight which uses the standardized weight range [1, 10000] with 100 as the default value. * blkio.throttle.{read\|write}_{bps\|iops}_device are consolidated into io.max. Expansion of throttling features is being worked on to support work-conserving absolute limits (io.low and io.high). * All stats are consolidated into io.stats. This patchset adds support for the new interface. As the interface has been revamped and new features are expected to be added, it seems best to treat it as a separate controller rather than trying to expand the blkio settings although we might add automatic translation if only blkio settings are specified. * io.weight handling is mostly identical to blkio.weight[_device] handling except that the weight range is different. * Both read and write bandwidth settings are consolidated into CGroupIODeviceLimit which describes all limits applicable to the device. This makes it less painful to add new limits. * "max" can be used to specify the maximum limit which is equivalent to no config for max limits and treated as such. If a given CGroupIODeviceLimit doesn't contain any non-default configs, the config struct is discarded once the no limit config is applied to cgroup. * lookup_blkio_device() is renamed to lookup_block_device(). Signed-off-by: Tejun Heo <htejun@fb.com>	2016-05-05 16:43:06 -04:00
Lennart Poettering	d8fdc62037	core: use an AF_UNIX/SOCK_DGRAM socket for cgroup agent notification dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On overloaded systems this means that only 30 connections may be queued without dbus-daemon processing them before further connection attempts fail. Our cgroups-agent binary so far used D-Bus for its messaging, and hitting this limit hence may result in us losing cgroup empty messages. This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM. Since sockets of these types need no connection set up, no listen() backlog applies. Our cgroup-agent binary will hence simply block as long as it can't enqueue its datagram message, so that we won't lose cgroup empty messages as likely anymore. This also rearranges the ordering of the processing of SIGCHLD signals, service notification messages (sd_notify()...) and the two types of cgroup notifications (inotify for the unified hierarchy support, and agent for the classic hierarchy support). We now always process events for these in the following order: 1. service notification messages (SD_EVENT_PRIORITY_NORMAL-7) 2. SIGCHLD signals (SD_EVENT_PRIORITY_NORMAL-6) 3. cgroup inotify and cgroup agent (SD_EVENT_PRIORITY_NORMAL-5) This is because when receiving SIGCHLD we invalidate PID information, which we need to process the service notification messages which are bound to PIDs. Hence the order between the first two items. And we want to process SIGCHLD metadata to detect whether a service is gone, before using cgroup notifications, to decide when a service is gone, since the former carries more useful metadata. Related to this: https://bugs.freedesktop.org/show_bug.cgi?id=95264 https://github.com/systemd/systemd/issues/1961	2016-05-05 12:37:04 +02:00
Lennart Poettering	2c289ea833	core: introduce MANAGER_IS_RELOADING() macro This replaces the old function call manager_is_reloading_or_reexecuting() which was used only at very few places. Use the new macro wherever we check whether we are reloading. This should hopefully make things a bit more readable, given the nature of Manager:n_reloading being a counter.	2016-04-12 13:43:30 +02:00
Lennart Poettering	463d0d1569	core: remove ManagerRunningAs enum Previously, we had two enums ManagerRunningAs and UnitFileScope, that were mostly identical and converted from one to the other all the time. The latter had one more value UNIT_FILE_GLOBAL however. Let's simplify things, and remove ManagerRunningAs and replace it by UnitFileScope everywhere, thus making the translation unnecessary. Introduce two new macros MANAGER_IS_SYSTEM() and MANAGER_IS_USER() to simplify checking if we are running in one or the user context.	2016-04-12 13:43:30 +02:00
Lennart Poettering	a3c4eb0710	core: rework generator dir logic, move the dirs into LookupPaths structure A long time ago – when generators where first introduced – the directories for them were randomly created via mkdtemp(). This was changed later so that they use fixed name directories now. Let's make use of this, and add the genrator dirs to the LookupPaths structure and into the unit file search path maintained in it. This has the benefit that the generator dirs are now normal part of the search path for all tools, and thus are shown in "systemctl list-unit-files" too.	2016-04-12 13:43:29 +02:00
Daniel Mack	50f48ad37a	cgroup: remove support for NetClass= directive Support for net_cls.class_id through the NetClass= configuration directive has been added in v227 in preparation for a per-unit packet filter mechanism. However, it turns out the kernel people have decided to deprecate the net_cls and net_prio controllers in v2. Tejun provides a comprehensive justification for this in his commit, which has landed during the merge window for kernel v4.5: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=bd1060a1d671 As we're aiming for full support for the v2 cgroup hierarchy, we can no longer support this feature. Userspace tool such as nftables are moving over to setting rules that are specific to the full cgroup path of a task, which obsoletes these controllers anyway. This commit removes support for tweaking details in the net_cls controller, but keeps the NetClass= directive around for legacy compatibility reasons.	2016-02-10 16:38:56 +01:00
Daniel Mack	b26fa1a2fb	tree-wide: remove Emacs lines from all files This should be handled fine now by .dir-locals.el, so need to carry that stuff in every file.	2016-02-10 13:41:57 +01:00
Thomas Hindoe Paaboel Andersen	71d35b6b55	tree-wide: sort includes in *.h This is a continuation of the previous include sort patch, which only sorted for .c files.	2015-11-18 23:09:02 +01:00
Lennart Poettering	0af20ea2ee	core: add new DefaultTasksMax= setting for system.conf This allows initializing the TasksMax= setting of all units by default to some fixed value, instead of leaving it at infinity as before.	2015-11-13 19:50:52 +01:00
Lennart Poettering	53f1841669	core: unify code that warns about jobs we fail to enqueue This allows us to shorten our code a bit.	2015-11-12 20:14:06 +01:00
Lennart Poettering	4bd29fe5ce	core: drop "override" flag when building transactions Now that we don't have RequiresOverridable= and RequisiteOverridable= dependencies anymore, we can get rid of tracking the "override" boolean for jobs in the job engine, as it serves no purpose anymore. While we are at it, fix some error messages we print when invoking functions that take the override parameter.	2015-11-12 19:54:07 +01:00
Lennart Poettering	f32b43bda4	core: remove support for RequiresOverridable= and RequisiteOverridable= As discussed at systemd.conf 2015 and on also raised on the ML: http://lists.freedesktop.org/archives/systemd-devel/2015-November/034880.html This removes the two XyzOverridable= unit dependencies, that were basically never used, and do not enhance user experience in any way. Most folks looking for the functionality this provides probably opt for the "ignore-dependencies" job mode, and that's probably a good idea. Hence, let's simplify systemd's dependency engine and remove these two dependency types (and their inverses). The unit file parser and the dbus property parser will now redirect the settings/properties to result in an equivalent non-overridable dependency. In the case of the unit file parser we generate a warning, to inform the user. The dbus properties for this unit type stay available on the unit objects, but they are now hidden from usual introspection and will always return the empty list when queried. This should provide enough compatibility for the few unit files that actually ever made use of this.	2015-11-12 19:27:24 +01:00
Tom Gundersen	7042fc14ff	Merge pull request #1837 from poettering/grabbag2 variety of fixes	2015-11-11 02:31:29 +01:00
Zbigniew Jędrzejewski-Szmek	36b4a7ba55	Remove snapshot unit type Snapshots were never useful or used for anything. Many systemd developers that I spoke to at systemd.conf2015, didn't even know they existed, so it is fairly safe to assume that this type can be deleted without harm. The fundamental problem with snapshots is that the state of the system is dynamic, devices come and go, users log in and out, timers fire... and restoring all units to some state from the past would "undo" those changes, which isn't really possible. Tested by creating a snapshot, running the new binary, and checking that the transition did not cause errors, and the snapshot is gone, and snapshots cannot be created anymore. New systemctl says: Unknown operation snapshot. Old systemctl says: Failed to create snapshot: Support for snapshots has been removed. IgnoreOnSnaphost settings are warned about and ignored: Support for option IgnoreOnSnapshot= has been removed and it is ignored http://lists.freedesktop.org/archives/systemd-devel/2015-November/034872.html	2015-11-10 19:33:06 -05:00
Lennart Poettering	ba64af90ec	core: change return value of the unit's enumerate() call to void We cannot handle enumeration failures in a sensible way, hence let's try hard to continue without making such failures fatal, and log about it with precise error messages.	2015-11-10 21:03:49 +01:00
Thomas Hindoe Paaboel Andersen	74165387ee	manager: remove unused function	2015-10-13 22:17:26 +02:00
Lennart Poettering	400f1a33cf	core: sort includes of manager.[ch] according to CODING_STYLE	2015-09-29 21:08:36 +02:00
Lennart Poettering	85fade1edb	Merge pull request #986 from karelzak/monitor mount: use libmount to monitor mountinfo & utab	2015-09-22 14:31:58 +02:00
Alban Crequy	287419c119	containers: systemd exits with non-zero code When a systemd service running in a container exits with a non-zero code, it can be useful to terminate the container immediately and get the exit code back to the host, when systemd-nspawn returns. This was not possible to do. This patch adds the following to make it possible: - Add a read-only "ExitCode" property on PID 1's "Manager" bus object. By default, it is 0 so the behaviour stays the same as previously. - Add a method "SetExitCode" on the same object. The method fails when called on baremetal: it is only allowed in containers or in user session. - Add support in systemctl to call "systemctl exit 42". It reuses the existing code for user session. - Add exit.target and systemd-exit.service to the system instance. - Change main() to actually call systemd-shutdown to exit() with the correct value. - Add verb 'exit' in systemd-shutdown with parameter --exit-code - Update systemctl manpage. I used the following to test it: \| $ sudo rkt --debug --insecure-skip-verify run \ \| --mds-register=false --local docker://busybox \ \| --exec=/bin/chroot -- /proc/1/root \ \| systemctl --force exit 42 \| ... \| Container rkt-895a0cba-5c66-4fa5-831c-e3f8ddc5810d failed with error code 42. \| $ echo $? \| 42 Fixes https://github.com/systemd/systemd/issues/1290	2015-09-21 17:32:45 +02:00
Daniel Mack	32ee7d3309	cgroup: add support for net_cls controllers Add a new config directive called NetClass= to CGroup enabled units. Allowed values are positive numbers for fix assignments and "auto" for picking a free value automatically, for which we need to keep track of dynamically assigned net class IDs of units. Introduce a hash table for this, and also record the last ID that was given out, so the allocator can start its search for the next 'hole' from there. This could eventually be optimized with something like an irb. The class IDs up to 65536 are considered reserved and won't be assigned automatically by systemd. This barrier can be made a config directive in the future. Values set in unit files are stored in the CGroupContext of the unit and considered read-only. The actually assigned number (which may have been chosen dynamically) is stored in the unit itself and is guaranteed to remain stable as long as the unit is active. In the CGroup controller, set the configured CGroup net class to net_cls.classid. Multiple unit may share the same net class ID, and those which do are linked together.	2015-09-16 00:21:55 +02:00
Karel Zak	d379d44255	mount: use libmount to monitor mountinfo & utab The current implementation directly monitor /proc/self/mountinfo and /run/mount/utab files. It's really not optimal because utab file is private libmount stuff without any official guaranteed semantic. The libmount since v2.26 provides API to monitor mount kernel & userspace changes and since v2.27 the monitor is usable for non-root users too. This patch replaces the current implementation with libmount based solution. Signed-off-by: Karel Zak <kzak@redhat.com>	2015-09-14 09:12:31 +02:00
Lennart Poettering	5269eb6b32	core: allocate sets of startup and failed units on-demand There's a good chance we never needs these sets, hence allocate them only when needed.	2015-09-11 18:31:49 +02:00
Lennart Poettering	03a7b521e3	core: add support for the "pids" cgroup controller This adds support for the new "pids" cgroup controller of 4.3 kernels. It allows accounting the number of tasks in a cgroup and enforcing limits on it. This adds two new setting TasksAccounting= and TasksMax= to each unit, as well as a gloabl option DefaultTasksAccounting=. This also updated "cgtop" to optionally make use of the new kernel-provided accounting. systemctl has been updated to show the number of tasks for each service if it is available. This patch also adds correct support for undoing memory limits for units using a MemoryLimit=infinity syntax. We do the same for TasksMax= now and hence keep things in sync here.	2015-09-10 18:41:06 +02:00
Lennart Poettering	efdb02375b	core: unified cgroup hierarchy support This patch set adds full support the new unified cgroup hierarchy logic of modern kernels. A new kernel command line option "systemd.unified_cgroup_hierarchy=1" is added. If specified the unified hierarchy is mounted to /sys/fs/cgroup instead of a tmpfs. No further hierarchies are mounted. The kernel command line option defaults to off. We can turn it on by default as soon as the kernel's APIs regarding this are stabilized (but even then downstream distros might want to turn this off, as this will break any tools that access cgroupfs directly). It is possibly to choose for each boot individually whether the unified or the legacy hierarchy is used. nspawn will by default provide the legacy hierarchy to containers if the host is using it, and the unified otherwise. However it is possible to run containers with the unified hierarchy on a legacy host and vice versa, by setting the $UNIFIED_CGROUP_HIERARCHY environment variable for nspawn to 1 or 0, respectively. The unified hierarchy provides reliable cgroup empty notifications for the first time, via inotify. To make use of this we maintain one manager-wide inotify fd, and each cgroup to it. This patch also removes cg_delete() which is unused now. On kernel 4.2 only the "memory" controller is compatible with the unified hierarchy, hence that's the only controller systemd exposes when booted in unified heirarchy mode. This introduces a new enum for enumerating supported controllers, plus a related enum for the mask bits mapping to it. The core is changed to make use of this everywhere. This moves PID 1 into a new "init.scope" implicit scope unit in the root slice. This is necessary since on the unified hierarchy cgroups may either contain subgroups or processes but not both. PID 1 hence has to move out of the root cgroup (strictly speaking the root cgroup is the only one where processes and subgroups are still allowed, but in order to support containers nicey, we move PID 1 into the new scope in all cases.) This new unit is also used on legacy hierarchy setups. It's actually pretty useful on all systems, as it can then be used to filter journal messages coming from PID 1, and so on. The root slice ("-.slice") is now implicitly created and started (and does not require a unit file on disk anymore), since that's where "init.scope" is located and the slice needs to be started before the scope can. To check whether we are in unified or legacy hierarchy mode we use statfs() on /sys/fs/cgroup. If the .f_type field reports tmpfs we are in legacy mode, if it reports cgroupfs we are in unified mode. This patch set carefuly makes sure that cgls and cgtop continue to work as desired. When invoking nspawn as a service it will implicitly create two subcgroups in the cgroup it is using, one to move the nspawn process into, the other to move the actual container processes into. This is done because of the requirement that cgroups may either contain processes or other subgroups.	2015-09-01 23:52:27 +02:00
Lennart Poettering	ae2a2c53dd	manager: don't write first-boot flag file all the time Instead, remember that we have already written it.	2015-09-01 17:20:56 +02:00
Daniel Mack	bbc2908635	core: dbus: track bus names per unit Currently, PID1 installs an unfiltered NameOwnerChanged signal match, and dispatches the signals itself. This does not scale, as right now, PID1 wakes up every time a bus client connects. To fix this, install individual matches once they are requested by unit_watch_bus_name(), and remove the watches again through their slot in unit_unwatch_bus_name(). If the bus is not available during unit_watch_bus_name(), just store name in the 'watch_bus' hashmap, and let bus_setup_api() do the installing later.	2015-08-06 10:14:41 +02:00
Lennart Poettering	b2c23da8ce	core: rename SystemdRunningAs to ManagerRunningAs It's primarily just a property of the Manager object after all, and we try to refer to PID 1 as "manager" instead of "systemd", hence let's to stick to this here too.	2015-05-11 22:51:49 +02:00
Lennart Poettering	f2341e0a87	core,network: major per-object logging rework This changes log_unit_info() (and friends) to take a real Unit* object insted of just a unit name as parameter. The call will now prefix all logged messages with the unit name, thus allowing the unit name to be dropped from the various passed romat strings, simplifying invocations drastically, and unifying log output across messages. Also, UNIT= vs. USER_UNIT= is now derived from the Manager object attached to the Unit object, instead of getpid(). This has the benefit of correcting the field for --test runs. Also contains a couple of other logging improvements: - Drops a couple of strerror() invocations in favour of using %m. - Not only .mount units now warn if a symlinks exist for the mount point already, .automount units do that too, now. - A few invocations of log_struct() that didn't actually pass any additional structured data have been replaced by simpler invocations of log_unit_info() and friends. - For structured data a new LOG_UNIT_MESSAGE() macro has been added, that works like LOG_MESSAGE() but prefixes the message with the unit name. Similar, there's now LOG_LINK_MESSAGE() and LOG_NETDEV_MESSAGE(). - For structured data new LOG_UNIT_ID(), LOG_LINK_INTERFACE(), LOG_NETDEV_INTERFACE() macros have been added that generate the necessary per object fields. The old log_unit_struct() call has been removed in favour of these new macros used in raw log_struct() invocations. In addition to removing one more function call this allows generated structured log messages that contain two object fields, as necessary for example for network interfaces that are joined into another network interface, and whose messages shall be indexed by both. - The LOG_ERRNO() macro has been removed, in favour of log_struct_errno(). The latter has the benefit of ensuring that %m in format strings is properly resolved to the specified error number. - A number of logging messages have been converted to use log_unit_info() instead of log_info() - The client code in sysv-generator no longer #includes core code from src/core/. - log_unit_full_errno() has been removed, log_unit_full() instead takes an errno now, too. - log_unit_info(), log_link_info(), log_netdev_info() and friends, now avoid double evaluation of their parameters	2015-05-11 22:24:45 +02:00
Lennart Poettering	8f88ecf623	core: for queued reload message there is no need to store the bus explicitly After all it can be derived from the message directly, and already is.	2015-04-29 19:02:08 +02:00
Lucas De Marchi	03455c2879	core: emit changes for NFailedUnits property By notifying the clients when this property is changed it's possible to allow "system health monitor" tools to get transitions like running<->degraded. This is an alternative to send changes on the SystemState property since the latter is more difficult to derive.	2015-02-26 09:38:50 -05:00
Thomas Hindoe Paaboel Andersen	2eec67acbb	remove unused includes This patch removes includes that are not used. The removals were found with include-what-you-use which checks if any of the symbols from a header is in use.	2015-02-23 23:53:42 +01:00
Lennart Poettering	2e5c94b9aa	core: when the user hits Ctrl-Alt-Del more than 7x per 2s, reboot immediately This should be useful for cases where clean rebooting doesn't work, and the user wants to hurry up the reboot.	2015-01-28 02:18:59 +01:00
Zbigniew Jędrzejewski-Szmek	e801700e9a	Implement masking and overriding of generators Sometimes it is necessary to stop a generator from running. Either because of a bug, or for testing, or some other reason. The only way to do that would be to rename or chmod the generator binary, which is inconvenient and does not survive upgrades. Allow masking and overriding generators similarly to units and other configuration files. For the systemd instance, masking would be more common, rather than overriding generators. For the user instances, it may also be useful for users to have generators in $XDG_CONFIG_HOME to augment or override system-wide generators. Directories are searched according to the usual scheme (/usr/lib, /usr/local/lib, /run, /etc), and files with the same name in higher priority directories override files with the same name in lower priority directories. Empty files and links to /dev/null mask a given name. https://bugs.freedesktop.org/show_bug.cgi?id=87230	2015-01-11 18:17:33 -05:00
Chris Leech	befb6d5494	mount: monitor for utab changes with inotify Parsing the mount table with libmount races against the mount command, which will handle the actual mounting before updating utab. This means the poll event on /proc/self/mountinfo can kick of a reparse in systemd before the utab information is available. This change adds in an additional event source using inotify to watch for changes to utab. It only watches for IN_MOVED_TO events, matching libmount behavior of always overwriting this file using rename(2). This does add a second pass through the mount table parsing when utab is updated.	2014-11-28 14:30:50 -05:00
Zbigniew Jędrzejewski-Szmek	06d8d842e9	manager: let manager_free() handle NULLs This makes the calling code a bit simpler.	2014-11-23 19:17:28 -05:00
Zbigniew Jędrzejewski-Szmek	ebc5788e88	manager: print warning on console before reboot It will be printed even if a prompt is blocking other messages.	2014-10-27 23:17:49 -04:00
Zbigniew Jędrzejewski-Szmek	127d5fd156	manager: convert ephemeral to enum In preparation for subsequent changes.	2014-10-27 23:02:54 -04:00
Zbigniew Jędrzejewski-Szmek	e46b13c8c7	manager: do not print anything while passwords are being queried https://bugs.freedesktop.org/show_bug.cgi?id=73942	2014-10-27 22:33:14 -04:00
Lennart Poettering	fa1b91632c	core: remove system start timeout logic again The system start timeout as previously implemented would get confused by long-running services that are included in the initial system startup transaction for example by being cron-job-like long-running services triggered immediately at boot. Such long-running jobs would be subject to the default 15min timeout, esily triggering it. Hence, remove this again. In a subsequent commit, introduce per-target job timeouts instead, that allow us to control these timeouts more finegrained.	2014-10-28 01:42:13 +01:00
Lennart Poettering	d81afec1c9	core: split up "starting" manager state into "initializing" and "starting" We'll stay in "initializing" until basic.target has reached, at which point we will enter "starting". This is preparation so that we can change the startip timeout to only apply to the first phase of startup, not the full procedure.	2014-08-22 18:10:31 +02:00
Lennart Poettering	2928b0a863	core: add support for a configurable system-wide start-up timeout When this system-wide start-up timeout is hit we execute one of the failure actions already implemented for services that fail. This should not only be useful on embedded devices, but also on laptops which have the power-button reachable when the lid is closed. This devices, when in a backpack might get powered on by accident due to the easily reachable power button. We want to make sure that the system turns itself off if it starts up due this after a while. When the system manages to fully start-up logind will suspend the machine by default if the lid is closed. However, in some cases we don't even get as far as logind, and the boot hangs much earlier, for example because we ask for a LUKS password that nobody ever enters. Yeah, this is a real-life problem on my Yoga 13, which has one of those easily accessible power buttons, even if the device is closed.	2014-08-22 18:10:31 +02:00
Stef Walter	283868e1dc	core: Verify systemd1 DBus method callers via polkit DBus methods that retrieve information can be called by anyone. DBus methods that modify state of units are verified via polkit action: org.freedesktop.systemd1.manage-units DBus methods that modify state of unit files are verified via polkit action: org.freedesktop.systemd1.manage-unit-files DBus methods that reload the entire daemon state are verified via polkit action: org.freedesktop.systemd1.reload-daemon DBus methods that modify job state are callable from the clients that started the job. root (ie: CAP_SYS_ADMIN) can continue to perform all calls, property access etc. There are several DBus methods that can only be called by root. Open up the dbus1 policy for the above methods. (Heavily modified by Lennart, making use of the new bus_verify_polkit_async() version that doesn't force us to always pass the original callback around. Also, interactive auhentication must be opt-in, not unconditional, hence I turned this off.)	2014-08-18 18:08:28 +02:00
Zbigniew Jędrzejewski-Szmek	0d8c31ff72	test-engine: fix access to unit load path Also add a bit of debugging output to help diagnose problems, add missing units, and simplify cppflags. Move test-engine to normal tests from manual tests, it should now work without destroying the system.	2014-07-20 19:48:16 -04:00
Lennart Poettering	e26807239b	firstboot: get rid of firstboot generator again, introduce ConditionFirstBoot= instead As Zbigniew pointed out a new ConditionFirstBoot= appears like the nicer way to hook in systemd-firstboot.service on first boots (those with /etc unpopulated), so let's do this, and get rid of the generator again.	2014-07-07 21:05:09 +02:00
Lennart Poettering	418b9be500	firstboot: add new component to query basic system settings on first boot, or when creating OS images offline A new tool "systemd-firstboot" can be used either interactively on boot, where it will query basic locale, timezone, hostname, root password information and set it. Or it can be used non-interactively from the command line when prepareing disk images for booting. When used non-inertactively the tool can either copy settings from the host, or take settings on the command line. $ systemd-firstboot --root=/path/to/my/new/root --copy-locale --copy-root-password --hostname=waldi The tool will be automatically invoked (interactively) now on first boot if /etc is found unpopulated. This also creates the infrastructure for generators to be notified via an environment variable whether they are running on the first boot, or not.	2014-07-07 15:25:55 +02:00

1 2 3 4 5

216 commits