Systemd

Commit Graph

Author	SHA1	Message	Date
Zbigniew Jędrzejewski-Szmek	dc409696cf	Introduce _cleanup_(unit_freep)	2018-03-11 16:33:58 +01:00
Zbigniew Jędrzejewski-Szmek	c70cac548a	Introduce _cleanup_(manager_freep)	2018-03-11 16:33:57 +01:00
Zbigniew Jędrzejewski-Szmek	8750ac0238	pid1: make use of high rt signals on hppa with newer kernels Back in `4dffec1459` we stopped using SIGRTMIN+26 and higher on hppa because they were not available. Then they became available in linux 3.18: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1f25df2eff5b25f52c139d3ff31bc883eee9a0ab Instead of hard-coding the list based on architecture, let's use a runtime check like signal(7) says. (A note about implementation: RTSIG_IF_AVAILABLE is defined to take the full signal and not just an offset from SIGRTMIN so that it's still possible to grep for SIGRTMIN\+.) Add a simple "test" to print the signal values. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=84931.	2018-03-09 10:35:33 +01:00
Yu Watanabe	a1d32bac2a	Revert "core: don't setup init.scope in test mode (#8380 )" (#8390 ) This reverts commit `a9e8ecf037`, as it breaks test-path. Fixes #8389.	2018-03-08 15:29:19 +09:00
Michal Sekletar	a9e8ecf037	core: don't setup init.scope in test mode (#8380 ) Reproducer: $ meson build && cd build $ ninja $ sudo useradd test $ sudo su test $ ./systemd --system --test ... Failed to create /user.slice/user-1000.slice/session-6.scope/init.scope control group: Permission denied Failed to allocate manager object: Permission denied Above error message is caused by the fact that user test didn't have its own session and we tried to set up init.scope already running as user test in the directory owned by different user. Let's skip setting up init.scope altogether since we won't be launching processes anyway.	2018-03-07 16:41:41 +01:00
Lennart Poettering	e0a085811d	core: don't process dbus unit and job queue when there are already too many messages pending We maintain a queue of units and jobs that we are supposed to generate change/new notifications for because they were either just created or some of their property has changed. Let's throttle processing of this queue a bit: as soon as > 1K of bus messages are queued for writing let's skip processing the queue, and then recheck on the next iteration again. Moreover, never process more than 100 units in one go, return to the event loop after that. Both limits together should put effective limits on both space and time usage of the function, delaying further operations until a later moment, when the queue is empty or the the event loop is sufficiently idle again. This should keep the number of generated messages much lower than before on busy systems or where some client is hanging. Note that this also means a bad client can slow down message dispatching substantially for up to 90s if it likes to, for all clients. But that should be acceptable as we only allow trusted bus clients, anyway. Fixes: #8166	2018-02-27 19:54:29 +01:00
Lennart Poettering	30663b6c25	Merge pull request #8199 from keszybz/small-things Sundry small cleanups	2018-02-19 16:55:10 +01:00
Zbigniew Jędrzejewski-Szmek	9ecdba8cb7	Move config_parse_join_controllers to shared, add test config_parse_join_controllers would free the destination argument on failure, which is contrary to our normal style, where failed parsing has no effect. Moving it to shared also allows a test to be added.	2018-02-19 15:02:13 +01:00
Lennart Poettering	a94ab7acfd	Merge pull request #8175 from keszybz/gc-cleanup Garbage collection cleanup	2018-02-15 17:47:37 +01:00
Lennart Poettering	476a8618fc	Merge pull request #8150 from poettering/memory-accounting-by-default pid1: turn memory accounting on by default now	2018-02-15 17:22:36 +01:00
Zbigniew Jędrzejewski-Szmek	648461c07d	Merge pull request #8125 from poettering/cgroups-migrate Trivial merge conflict resolved locally.	2018-02-15 16:15:45 +01:00
Zbigniew Jędrzejewski-Szmek	2ab3050f6e	pid1: rename job_check_gc to job_may_gc The reasoning is the same as for unit_can_gc. v2: - rename can_gc to may_gc	2018-02-15 14:09:40 +01:00
Zbigniew Jędrzejewski-Szmek	2641f02e23	pid1: fix collection of cycles of units which reference one another A .socket will reference a .service unit, by registering a UnitRef with the .service unit. If this .service unit has the .socket unit listed in Wants or Sockets or such, a cycle will be created. We would not free this cycle properly, because we treated any unit with non-empty refs as uncollectable. To solve this issue, treats refs with UnitRef in u->refs_by_target similarly to the refs in u->dependencies, and check if the "other" unit is known to be needed. If it is not needed, do not treat the reference from it as preventing the unit we are looking at from being freed.	2018-02-15 13:32:53 +01:00
Zbigniew Jędrzejewski-Szmek	f2f725e5cc	pid1: rename unit_check_gc to unit_may_gc "check" is unclear: what is true, what is false? Let's rename to "can_gc" and revert the return value ("positive" values are easier to grok). v2: - rename from unit_can_gc to unit_may_gc	2018-02-15 13:04:12 +01:00
Zbigniew Jędrzejewski-Szmek	444d586333	meson: add -Dmemory-accounting-default=true\|false This makes it easy to set the default for distributions and users which want to default to off because they primarily use older kernels.	2018-02-15 12:02:41 +01:00
Zbigniew Jędrzejewski-Szmek	04a5236233	Merge pull request #8144 from poettering/journal-inotify-fixes various journal fixes	2018-02-14 13:52:17 +01:00
Alan Jenkins	8afabc5090	manager: avoid infinite loop for unexpected waitid() error (#8168 ) I think if we log the error as being _ignored_, we should also consider the event as handled and clear it. This was the behaviour prior to `575b300b` (PR #7968). I don't think we particularly wanted to change behaviour and keep retrying. Sometimes that's useful, other times you cause more problems by filling the logs. Plus a nearby typo fix.	2018-02-13 19:04:31 +01:00
Lennart Poettering	5f109056d5	core: delay bus name synchronization after reload/reexec into a later event loop iteration Previously, we'd synchronize bus names immediately when we succeeded connecting to the bus, potentially even before coldplugging the units. This was problematic, as synchronizing bus names meant invoking the per-unit name change handler function which might change the unit's state — which will result in consistency when done before we coldplug things. With this change we instead enqueue a job for the event loop to resync the names in a later loop iteration, i.e. at a point where we know coldplugging has finished.	2018-02-12 11:34:00 +01:00
Lennart Poettering	cedf508886	core: simplify manager_recheck_journal() a bit No need for an if check if we just pass along a bool anyway.	2018-02-12 11:34:00 +01:00
Lennart Poettering	217677abb0	core: tweak manager_journal_is_running() a bit more Let's also use the journal if it is currently reloading. In that state it should also be able to process our requests. Moreover, we might otherwise end up disconnecting/reconnecting from the journal without really any need to hence, relax the check accordingly.	2018-02-12 11:34:00 +01:00
Lennart Poettering	7d814a197a	manager: tweak manager_journal_is_running() a bit regarding test mode In test mode, let's not consider the journal to be up ever: we want all output to go to stderr.	2018-02-12 11:34:00 +01:00
Lennart Poettering	8559b3b75c	core: rework how we connect to the bus This removes the current bus_init() call, as it had multiple problems: it munged handling of the three bus connections we care about (private, "api" and system) into one, even though the conditions when which was ready are very different. It also added redundant logging, as the individual calls it called all logged on their own anyway. The three calls bus_init_api(), bus_init_private() and bus_init_system() are now made public. A new call manager_dbus_is_running() is added that works much like manager_journal_is_running() and is a lot more careful when checking whether dbus is around. Optionally it checks the unit's deserialized_state rather than state, in order to accomodate for cases where we cant to connect to the bus before deserializing the "subscribed" list, before coldplugging the units. manager_recheck_dbus() is added, that works a lot like manager_recheck_journal() and is invoked in unit_notify(), i.e. when units change state. All in all this should make handling a bit more alike to journal handling, and it also fixes one major bug: when running in user mode we'll now connect to the system bus early on, without conditionalizing this in anyway.	2018-02-12 11:34:00 +01:00
Lennart Poettering	004c7f169e	core: fold manager_set_exec_params() into unit_set_exec_params() Let's simplify things a bit: we so far called both functions every single time, let's just merge one into the other, so that we have fewer functions to call.	2018-02-12 11:34:00 +01:00
Lennart Poettering	548f69375e	tree-wide: use path_hash_ops instead of string_hash_ops whenever we key by a path Let's make use of our new hash_ops!	2018-02-12 11:07:55 +01:00
Lennart Poettering	e0c46a7364	pid1: turn memory accounting on by default now After discussions with @htejun it appears it's OK now to enable memory accounting by default for all units without affecting system performance too badly. facebook has made good experiences with deploying memory accounting across their infrastructure. This hence turns MemoryAccounting= from opt-in to opt-out, similar to how TasksAccounting= is already handled. The other accounting options remain off, their performance impact is too big still.	2018-02-09 20:06:33 +01:00
Yu Watanabe	e8a565cb66	core: make ExecRuntime be manager managed object Before this, each ExecRuntime object is owned by a unit. However, it may be shared with other units which enable JoinsNamespaceOf=. Thus, by the serialization/deserialization process, its sharing information, more specifically, reference counter is lost, and causes issue #7790. This makes ExecRuntime objects be managed by manager, and changes the serialization/deserialization process. Fixes #7790.	2018-02-06 16:00:34 +09:00
Alan Jenkins	cc2b9e6b20	rationalize interface for opening/closing logging log_open_console() did not switch from stderr to /dev/console, when "always_reopen_console" was set. It was necessary to call log_close_console() first. By contrast, log_open() did switch between e.g. journald and kmsg according to the value of "prohibit_ipc". Let's fix log_open() to respect the values of all the log options, and we can make log_close_*() private. Also log_close_console() is changed. There was some precaution, avoiding closing the console fd if we are not PID 1. I think commit `48a601fe` made a little mistake in leaving this in, and it only served to confuse readers :). Also I changed systemd-shutdown. Now we have log_set_prohibit_ipc(), let's use it to clarify that systemd-shutdown is not expected to try and log via journald (which it is about to kill). We avoided ever asking it to, but it's more convenient for the reader if they don't have to think about that. In that sense, it's similar to using assert() to validate a function's arguments.	2018-01-27 18:01:51 +00:00
Alan Jenkins	ba30753899	pid1: when we can't log to journal, remember our fallback log target If we have to force the logging to close the journal fd, then we can open any fallback log target. E.g. kmsg, if the target was the default JOURNAL_OR_KMSG. This is the behaviour I would expect from the documentation. I couldn't find any justification in the code, for why we would want to start dropping log messages instead of sending them to the fallback target. This means we will match the behaviour of processes which we fork and which set `open_when_needed`, and with generators - which use log_set_prohibit_ipc(true) - which we fork+exec during a reload. IMO this illustrates that the log_open/log_close interface is too clunky. So with the behaviour settled, I will refactor the interface in the next commit :).	2018-01-26 22:47:16 +00:00
Zbigniew Jędrzejewski-Szmek	dc3c9f5e36	core: initalize buffer	2018-01-26 00:59:23 +09:00
Yu Watanabe	dd1db3c288	core: manager logs firmware and loader time when startup finished	2018-01-26 00:59:20 +09:00
Zbigniew Jędrzejewski-Szmek	5eb83fa645	Merge pull request #7991 from poettering/n-on-console a comprehensive fix for the n_on_console miscounting issue	2018-01-25 13:48:08 +03:00
Lennart Poettering	adefcf2821	core: rework how we count the n_on_console counter Let's add a per-unit boolean that tells us whether our unit is currently counted or not. This way it's unlikely we get out of sync again and things are generally more robust. This also allows us to remove the counting logic specific to service units (which was in fact mostly a copy from the generic implementation), in favour of fully generic code. Replaces: #7824	2018-01-24 20:14:51 +01:00
Lennart Poettering	46fb617bf9	manager: minor manager_get_show_status() simplification Since the the whole function ultimately is just a fancy getter for the show_status field, let's actually return it as last step literally without an extra needless "if".	2018-01-24 19:52:29 +01:00
Lennart Poettering	5a69973ff2	manager: add some explanatory comments to manager_dispatch_idle_pipe_fd()	2018-01-24 19:52:14 +01:00
Lennart Poettering	d075092f14	pid1: make use of new "prohibit_ipc" logging flag in PID 1 Let's set it initially, and then toggle it only when we know its safe.	2018-01-24 18:22:56 +01:00
Lennart Poettering	62a769136d	core: rework how we track which PIDs to watch for a unit Previously, we'd maintain two hashmaps keyed by PIDs, pointing to Unit interested in SIGCHLD events for them. This scheme allowed a specific PID to be watched by exactly 0, 1 or 2 units. With this rework this is replaced by a single hashmap which is primarily keyed by the PID and points to a Unit interested in it. However, it optionally also keyed by the negated PID, in which case it points to a NULL terminated array of additional Unit objects also interested. This scheme means arbitrary numbers of Units may now watch the same PID. Runtime and memory behaviour should not be impact by this change, as for the common case (i.e. each PID only watched by a single unit) behaviour stays the same, but for the uncommon case (a PID watched by more than one unit) we only pay with a single additional memory allocation for the array. Why this all? Primarily, because allowing exactly two units to watch a specific PID is not sufficient for some niche cases, as processes can belong to more than one unit these days: 1. sd_notify() with MAINPID= can be used to attach a process from a different cgroup to multiple units. 2. Similar, the PIDFile= setting in unit files can be used for similar setups, 3. By creating a scope unit a main process of a service may join a different unit, too. 4. On cgroupsv1 we frequently end up watching all processes remaining in a scope, and if a process opens lots of scopes one after the other it might thus end up being watch by many of them. This patch hence removes the 2-unit-per-PID limit. It also makes a couple of other changes, some of them quite relevant: - manager_get_unit_by_pid() (and the bus call wrapping it) when there's ambiguity will prefer returning the Unit the process belongs to based on cgroup membership, and only check the watch-pids hashmap if that fails. This change in logic is probably more in line with what people expect and makes things more stable as each process can belong to exactly one cgroup only. - Every SIGCHLD event is now dispatched to all units interested in its PID. Previously, there was some magic conditionalization: the SIGCHLD would only be dispatched to the unit if it was only interested in a single PID only, or the PID belonged to the control or main PID or we didn't dispatch a signle SIGCHLD to the unit in the current event loop iteration yet. These rules were quite arbitrary and also redundant as the the per-unit handlers would filter the PIDs anyway a second time. With this change we'll hence relax the rules: all we do now is dispatch every SIGCHLD event exactly once to each unit interested in it, and it's up to the unit to then use or ignore this. We use a generation counter in the unit to ensure that we only invoke the unit handler once for each event, protecting us from confusion if a unit is both associated with a specific PID through cgroup membership and through the "watch_pids" logic. It also protects us from being confused if the "watch_pids" hashmap is altered while we are dispatching to it (which is a very likely case). - sd_notify() message dispatching has been reworked to be very similar to SIGCHLD handling now. A generation counter is used for dispatching as well. This also adds a new test that validates that "watch_pid" registration and unregstration works correctly.	2018-01-23 21:29:31 +01:00
Lennart Poettering	575b300b79	pid1: rework how we dispatch SIGCHLD and other signals This fundamentally makes one change: we never process more than one signal or more than one waitid() event per event loop. We'll never tight loop around waitid() or around read() on our signalfd instead, but always return to the main event loop after processing one event. By doing this we put the event priorization handling into full power again, as we'll always check for higher priority events before looking at the next signal or waitid() again. This introduces a new "defer" event source "sigchld_event". It's enabled as soon as we see SIGCHLD, and disabled as soon as waitid() reported no further children pending. It's running at a relatively high priority, one step higher than signal handling itself, but lower than /proc/self/mountinfo event handling, so that the latter always takes precedence. Since we want to process sd_notify() events at an even higher priority than SIGCHLD (as before) it is moved one priority step up, too. Fixes: #7932 Possibly fixes: #7966	2018-01-23 18:41:40 +01:00
Lennart Poettering	67ae4e8d59	core: move user lookup event priority to -11 This is internal stuff, us talking to ourselves and relatively independent of everything else, let's put this at highest priority hence.	2018-01-23 18:15:16 +01:00
Lennart Poettering	4259d20215	manager: add MANAGER_IS_RUNNING() for checking whether the manager is running This macro is useful as the check is not obvious, and we better abstract this away.	2018-01-23 16:43:56 +01:00
Lennart Poettering	4adf314b77	manager: split out send_ready and basic.target checking into functions of their own Let's shorten manager_check_finished() a bit by splitting out checking of basic.target and the two things we do when we reach it. This should not change behaviour, except for one thing: we now check basic.target's actual state for figuring out whether it is up, instead of generically checking whether it has any job queued. This is arguably more correct, and is what other code does too for similar purposes, for example manager_state()	2018-01-23 16:39:12 +01:00
Jan Klötzke	2a12e32efa	pid1: add option to disable service watchdogs Add a "systemd.service_watchdogs=" option to the command line which disables all service runtime watchdogs and emergency actions.	2018-01-22 18:10:03 +01:00
Zbigniew Jędrzejewski-Szmek	d8eb10d61a	core: delay logging the taint string until after basic.target is reached (#7935 ) This happens to be almost the same moment as when we send READY=1 in the user instance, but the logic is slightly different, since we log taint when basic.target is reached in the system manager, but we send the notification only in the user manager. So add a separate flag for this and propagate it across reloads. Fixes #7683.	2018-01-21 21:17:54 +09:00
Lennart Poettering	db256aab13	core: be stricter when handling PID files and MAINPID sd_notify() messages Let's be more restrictive when validating PID files and MAINPID= messages: don't accept PIDs that make no sense, and if the configuration source is not trusted, don't accept out-of-cgroup PIDs. A configuratin source is considered trusted when the PID file is owned by root, or the message was received from root. This should lock things down a bit, in case service authors write out PID files from unprivileged code or use NotifyAccess=all with unprivileged code. Note that doing so was always problematic, just now it's a bit less problematic. When we open the PID file we'll now use the CHASE_SAFE chase_symlinks() logic, to ensure that we won't follow an unpriviled-owned symlink to a privileged-owned file thinking this was a valid privileged PID file, even though it really isn't. Fixes: #6632	2018-01-11 15:12:16 +01:00
Lennart Poettering	15e23e8cdf	manager: make use of pid_is_valid() where appropriate	2018-01-11 15:12:16 +01:00
Lennart Poettering	007e4b5490	manager: make use of NEWLINE macro where appropriate	2018-01-11 15:12:16 +01:00
Lennart Poettering	da5fb86100	manager: swap order in which we ellipsize/escape sd_notify() messages for debugging If we have to chose between truncated escape sequences and strings exploded to 4 times the desried length by fully escaping, prefer the latter. It's for debug only, hence doesn't really matter much.	2018-01-11 15:12:16 +01:00
Lennart Poettering	47cf8ff206	manager: rework manager_clean_environment() Let's rename it manager_sanitize_environment() which is a more precise name. Moreover, sort the environment implicitly inside it, as all our callers do that anyway afterwards and we can save some code this way. Also, update the list of env vars to drop, i.e. the env vars we manage ourselves and don't want user code to interfear with. Also sort this list to make it easier to update later on.	2018-01-10 18:30:06 +01:00
Lennart Poettering	665dfe9318	io-util: make flush_fd() return how many bytes where flushed This is useful so that callers know whether anything at all and how much was flushed. This patches through users of this functions to ensure that the return values > 0 which may be returned now are not propagated in public APIs. Also, users that ignore the return value are changed to do so explicitly now.	2018-01-05 13:55:08 +01:00
Lennart Poettering	f1d34068ef	tree-wide: add DEBUG_LOGGING macro that checks whether debug logging is on (#7645 ) This makes things a bit easier to read I think, and also makes sure we always use the _unlikely_ wrapper around it, which so far we used sometimes and other times we didn't. Let's clean that up.	2017-12-15 11:09:00 +01:00
Lennart Poettering	e3140015a7	Merge pull request #7640 from keszybz/tainting-updates Tainting updates	2017-12-14 22:57:17 +01:00
Zbigniew Jędrzejewski-Szmek	198ce93248	core: drop taints for nobody user/group names We have a check and warning at compile time. The user cannot do anything about this at runtime, and all other taints are about checks that happen at runtime and are specific to that system (and at least potentially correctable). (The logic in the compilation-time check was updated to treat "nogroup" as OK, but not the runtime check. But I think it's better to remove the runtime check for this altogether, so this becomes moot.)	2017-12-14 22:14:38 +01:00
Lennart Poettering	fbd0b64f44	tree-wide: make use of new STRLEN() macro everywhere (#7639 ) Let's employ coccinelle to do this for us. Follow-up for #7625.	2017-12-14 19:02:29 +01:00
Lennart Poettering	0d53667334	tree-wide: use __fsetlocking() instead of fxyz_unlocked() Let's replace usage of fputc_unlocked() and friends by __fsetlocking(f, FSETLOCKING_BYCALLER). This turns off locking for the entire FILE, instead of doing individual per-call decision whether to use normal calls or _unlocked() calls. This has various benefits: 1. It's easier to read and easier not to forget 2. It's more comprehensive, as fprintf() and friends are covered too (as these functions have no _unlocked() counterpart) 3. Philosophically, it's a bit more correct, because it's more a property of the file handle really whether we ever pass it on to another thread, not of the operations we then apply to it. This patch reworks all pieces of codes that so far used fxyz_unlocked() calls to use __fsetlocking() instead. It also reworks all places that use open_memstream(), i.e. use stdio FILE for string manipulations. Note that this in some way a revert of `4b61c87511`.	2017-12-14 10:42:25 +01:00
Alan Jenkins	0fd402b012	core: fix undefined behaviour due to uninitialized string buffer (#7597 ) Failure of systemd to respond on the bus interface was bisected to `af6b0ecc` "core: make "taint" string logic a bit more generic and output it at boot". Failure was presumably caused by trying to append strings to an unintialized buffer, leading to writing outside the unterminated buffer and hence undefined behaviour.	2017-12-10 19:58:01 +09:00
Zbigniew Jędrzejewski-Szmek	ba60adc623	Merge pull request #7572 from poettering/taint-manager "taint" logic improvements and other minor fixes	2017-12-07 21:06:28 +01:00
Lennart Poettering	90d7464d83	manager: taint the manager if the overflowuid/overflowgid aren't set to 65534	2017-12-07 12:34:46 +01:00
Lennart Poettering	af6b0ecc4c	core: make "taint" string logic a bit more generic and output it at boot The tainting logic existed for a long time, but was hidden inside the bus interfaces. Let's give it a small bit more coverage, by logging its value early at boot during initialization.	2017-12-07 11:27:07 +01:00
Lennart Poettering	e27fe688f2	manager: don't check /usr state of initrd to determine "taint-usr" taint	2017-12-07 11:09:09 +01:00
Lennart Poettering	5eb397cfad	manager: don't bother with creating /run/systemd/units/ in test mode This makes sure running "systemd --test" works again on systems running older systemd versions where the dir doesn't exist yet.	2017-12-07 11:07:55 +01:00
Lennart Poettering	279d81dd46	manager: split out code that sets up run_queue event source into function of its own Let's shorten manager_new() a bit.	2017-12-07 11:02:47 +01:00
Lennart Poettering	45639f1be5	core: never remove "transient" and "control" directories from unit search path This changes the unit search path logic to never drop the transient and control directories from the unit search path. This is necessary as we add new entries to both during runtime, due to the "systemctl set-property" and transient unit logic. Previously, the "transient" directory was created during early boot to deal with this, but the "control" directories were not covered like that. Creating the control directories early at boot is not possible however, as /etc might be read-only then, and we do define a persistent control directory. Hence, let's create these dirs on-demand when we need them, and make sure the search path clean-up logic never drops them from the search path even if they are initially missing. (Also, always create these paths properly labelled)	2017-11-29 12:34:12 +01:00
Lennart Poettering	45a7b16bae	core: don't reference rescue/emergency targets in --user mode They are only defined for system mode, hence let's not check for them in --user mode. Follow-up for #7433	2017-11-29 12:34:12 +01:00
Yu Watanabe	706424c2e2	core/manager: check the existance of the special units (#7433 ) In the user mode, not all special units exist. So, we need to check whether the units exist or not before operate something to the units. Such the check was mistakenly dropped by `e68537f0ba`. Fixes #7426.	2017-11-23 13:25:56 +01:00
Zbigniew Jędrzejewski-Szmek	bfbcf21d75	Merge pull request #7406 from poettering/timestamp-rework timestamping rework	2017-11-22 11:55:04 +01:00
Lennart Poettering	e68537f0ba	core: make use of unit_active_or_pending() where we can Let's make use of unit_active_or_pending() where we can. Note that this change changes beaviour in one specific case: when shutdown.target is active we'll now also return that the system is in "stopping" state, not only when we try to get into it. That makes sense as shutdown.target is ordered before the actually shutdown units such as "systemd-poweroff.service", and if the state is queried between reaching those we should also report "stopping".	2017-11-21 11:01:34 +01:00
Lennart Poettering	49d5666cc5	manager: introduce MANAGER_IS_FINISHED() macro Let's make our finished checks a bit more readable. Checking the timestamp is not entirely obvious, hence let's abstract that a bit by adding a macro that shows what we are doing here, not how we doing it. This is particularly useful if we want to change the definition of "finished" later on, in particular, when we try to fix #7023.	2017-11-21 11:01:34 +01:00
Lennart Poettering	713f6f901d	manager: add manager_get_dump_string() It's like manager_dump(), but returns a string. This allows us to reduce some duplicate code. Also, while we are at it, turn off stdio locking while we write to the memory FILE *f.	2017-11-21 11:01:34 +01:00
Lennart Poettering	ad75b9e765	core: add manager_dump() call, and make it output timestamp data It's a wrapper around manager_dump_units() and manager_dump_jobs(), and outputs some additional timestamp data. Also, port two users of this over.	2017-11-21 10:22:28 +01:00
Lennart Poettering	9f9f034271	manager: rework the timestamps logic, so that they are an enum-index array This makes things quite a bit more systematic I think, as we can systematically operate on all timestamps, for example for the purpose of serialization/deserialization. This rework doesn't necessarily make things shorter in the individual lines, but it does reduce the line count a bit. (This is useful particularly when we want to add additional timestamps, for example to solve #7023)	2017-11-21 10:22:28 +01:00
Shawn Landden	4831981d89	tree-wide: adjust fall through comments so that gcc is happy Distcc removes comments, making the comment silencing not work. I know there was a decision against a macro in commit `ec251fe7d5`	2017-11-20 13:06:25 -08:00
Zbigniew Jędrzejewski-Szmek	53e1b68390	Add SPDX license identifiers to source files under the LGPL This follows what the kernel is doing, c.f. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.	2017-11-19 19:08:15 +01:00
Lennart Poettering	fd1306121d	core: never apply first boot presets in the initrd Presets are useful to initialize uninitialized /etc, but that doesn't apply to the initrd. Also, let's rename etc_empty → first_boot. After all, the variable doesn't actually reflect whether /etc is really empty, it just reflects whether /etc/machine-id existed originally or not. Moreover, we later on directly initialize manager_set_first_boot() from it, hence let's just name it the same way all through the codepath, to make this all less confusing. See: #7100	2017-11-17 11:28:17 +01:00
Lennart Poettering	d3070fbdf6	core: implement /run/systemd/units/-based path for passing unit info from PID 1 to journald And let's make use of it to implement two new unit settings with it: 1. LogLevelMax= is a new per-unit setting that may be used to configure log priority filtering: set it to LogLevelMax=notice and only messages of level "notice" and lower (i.e. more important) will be processed, all others are dropped. 2. LogExtraFields= is a new per-unit setting for configuring per-unit journal fields, that are implicitly included in every log record generated by the unit's processes. It takes field/value pairs in the form of FOO=BAR. Also, related to this, one exisiting unit setting is ported to this new facility: 3. The invocation ID is now pulled from /run/systemd/units/ instead of cgroupfs xattrs. This substantially relaxes requirements of systemd on the kernel version and the privileges it runs with (specifically, cgroupfs xattrs are not available in containers, since they are stored in kernel memory, and hence are unsafe to permit to lesser privileged code). /run/systemd/units/ is a new directory, which contains a number of files and symlinks encoding the above information. PID 1 creates and manages these files, and journald reads them from there. Note that this is supposed to be a direct path between PID 1 and the journal only, due to the special runtime environment the journal runs in. Normally, today we shouldn't introduce new interfaces that (mis-)use a file system as IPC framework, and instead just an IPC system, but this is very hard to do between the journal and PID 1, as long as the IPC system is a subject PID 1 manages, and itself a client to the journal. This patch cleans up a couple of types used in journal code: specifically we switch to size_t for a couple of memory-sizing values, as size_t is the right choice for everything that is memory. Fixes: #4089 Fixes: #3041 Fixes: #4441	2017-11-16 12:40:17 +01:00
Zbigniew Jędrzejewski-Szmek	17f01ace62	core/manager: just return an error if we fail halfway We would continue, but still return an error at the end. This isn't useful because we'd still error-out in main(). Also, add a missing error message when we fail to mkdir.	2017-11-15 22:58:24 +01:00
Lennart Poettering	eef85c4a3f	core: track why unit dependencies came to be This replaces the dependencies Set* objects by Hashmap* objects, where the key is the depending Unit, and the value is a bitmask encoding why the specific dependency was created. The bitmask contains a number of different, defined bits, that indicate why dependencies exist, for example whether they are created due to explicitly configured deps in files, by udev rules or implicitly. Note that memory usage is not increased by this change, even though we store more information, as we manage to encode the bit mask inside the value pointer each Hashmap entry contains. Why this all? When we know how a dependency came to be, we can update dependencies correctly when a configuration source changes but others are left unaltered. Specifically: 1. We can fix UDEV_WANTS dependency generation: so far we kept adding dependencies configured that way, but if a device lost such a dependency we couldn't them again as there was no scheme for removing of dependencies in place. 2. We can implement "pin-pointed" reload of unit files. If we know what dependencies were created as result of configuration in a unit file, then we know what to flush out when we want to reload it. 3. It's useful for debugging: "systemd-analyze dump" now shows this information, helping substantially with understanding how systemd's dependency tree came to be the way it came to be.	2017-11-10 19:45:29 +01:00
Michal Sekletar	41dfa61d35	manager: fix connecting to bus when dbus is actually around (#7205 ) manager_connect_bus() is called before manager_coldplug(). As a last thing in service_coldplug() we set service state to s->deserialized_state, and thus before we do that all services are inactive and try_connect always evaluates to false. To fix that we must look at deserialized state instead of current unit state. Fixes #7146	2017-11-01 10:25:48 +01:00
Zbigniew Jędrzejewski-Szmek	0c2826c60c	core: in --user mode, report READY=1 as soon as basic.target is reached (#7102 ) When a user logs in, systemd-pam will wait for the user manager instance to report readiness. We don't need to wait for all the jobs to finish, it is enough if the basic startup is done and the user manager is responsive. systemd --user will now send out a READY=1 notification when either of two conditions becomes true: - basic.target/start job is gone, - the initial transaction is done. Also fixes #2863.	2017-10-24 14:48:54 +02:00
Lennart Poettering	4aa1d31c89	Merge pull request #6974 from keszybz/clean-up-defines Clean up define definitions	2017-10-04 19:25:30 +02:00
Yu Watanabe	4c70109600	tree-wide: use IN_SET macro (#6977 )	2017-10-04 16:01:32 +02:00
Zbigniew Jędrzejewski-Szmek	349cc4a507	build-sys: use #if Y instead of #ifdef Y everywhere The advantage is that is the name is mispellt, cpp will warn us. $ git grep -Ee "conf.set$'(HAVE\|ENABLE)_" -l\|xargs sed -r -i "s/conf.set\('(HAVE\|ENABLE)_/conf.set10('\1_/" $ git grep -Ee '#ifn?def (HAVE\|ENABLE)' -l\|xargs sed -r -i 's/#ifdef (HAVE\|ENABLE)/#if \1/; s/#ifndef (HAVE\|ENABLE)/#if ! \1/;' $ git grep -Ee 'if.defined\(HAVE' -l\|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_])$/\1/g' $ git grep -Ee 'if.defined$ENABLE' -l\|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_])$/\1/g' + manual changes to meson.build squash! build-sys: use #if Y instead of #ifdef Y everywhere v2: - fix incorrect setting of HAVE_LIBIDN2	2017-10-04 12:09:29 +02:00
Lennart Poettering	c621849539	core: fix special directories for user services The system paths were listed where the user paths should have been listed. Correct that.	2017-10-02 17:41:44 +02:00
Lennart Poettering	72fd17682d	core: usually our enum's _INVALID and _MAX special values are named after the full type In most cases we followed the rule that the special _INVALID and _MAX values we use in our enums use the full type name as prefix (in contrast to regular values that we often make shorter), do so for ExecDirectoryType as well. No functional changes, just a little bit of renaming to make this code more like the rest.	2017-10-02 17:41:43 +02:00
Andreas Rammhold	ec2ce0c5d7	tree-wide: use `!IN_SET(..)` for `a != b && a != c && …` The included cocci was used to generate the changes. Thanks to @flo-wer for pointing this case out.	2017-10-02 13:09:56 +02:00
Andreas Rammhold	3742095b27	tree-wide: use IN_SET where possible In addition to the changes from #6933 this handles cases that could be matched with the included cocci file.	2017-10-02 13:09:54 +02:00
Lennart Poettering	09e2465407	cgroup: after determining that a cgroup is empty, asynchronously dispatch this This makes sure that if we learn via inotify or another event source that a cgroup is empty, and we checked that this is indeed the case (as we might get spurious notifications through inotify, as the inotify logic through the "cgroups.event" is pretty unspecific and might be trigger for a variety of reasons), then we'll enqueue a defer event for it, at a priority lower than SIGCHLD handling, so that we know for sure that if there's waitid() data for a process we used it before considering the cgroup empty notification. Fixes: #6608	2017-09-27 18:26:18 +02:00
Lennart Poettering	91a6073ef7	core: rename cgroup_queue → cgroup_realize_queue We are about to add second cgroup-related queue, called "cgroup_empty_queue", hence let's rename "cgroup_queue" to "cgroup_realize_queue" (as that is its purpose) to minimize confusion about the two queues. Just a rename, no functional changes.	2017-09-27 17:59:25 +02:00
Lennart Poettering	f1c50becda	core: make sure to log invocation ID of units also when doing structured logging	2017-09-22 15:24:55 +02:00
Lennart Poettering	bd389aa734	manager: initialize timeouts when allocating a naked Manager object This way we can safely run manager objects from tests and good timeouts apply. Without this all timeouts are set 0, which means they fire instantly, when run from tests which do not explicitly configure them (the way main.c does).	2017-09-22 15:24:54 +02:00
Zbigniew Jędrzejewski-Szmek	e0a3da1fd2	Make test_run into a flags field and disable generators again Now generators are only run in systemd --test mode, where this makes most sense (how are you going to test what would happen otherwise?). Fixes #6842. v2: - rename test_run to test_run_flags	2017-09-19 20:14:05 +02:00
Zbigniew Jędrzejewski-Szmek	a1f31f4715	core/manager: when running in test mode, use a temp dir for generated stuff When running through systemd-analyze verify or with --test, we would not run generators (environment or unit). But at the end, we would nuke the generator dirs anyway. Simplify things by actually running generators of both types, but redirecting their output to a temporary directory. This has the advantage that we test more code, and the verification is more complete. Since now we are not touching the real generator directories, we also don't delete them, which fixes #5609.	2017-09-14 19:41:24 +02:00
Zbigniew Jędrzejewski-Szmek	81fe6cdee2	pid1: improve the check guarding unit_file_preset_all() When running in systemd-analyze verify, first_boot was initialized to -1 and never changed, so we'd try to run unit_file_preset_all(). Change the check to > 0 which is more correct. Also, add a separate test for !test_run, since we wouldn't want to run presets even if we were in first boot (or /etc was empty for whatever other reason).	2017-09-14 19:07:44 +02:00
Zbigniew Jędrzejewski-Szmek	c5aaaebced	Merge pull request #6780 from poettering/agent-message Three minor fixes.	2017-09-09 22:32:37 +02:00
Lennart Poettering	d5f1532657	core: downgrade log message about inability to propagate cgroup release message If dbus is already down during shutdown, we can't propagate the cgroup release message anymore, but that's expected and nothing to warn about. Hence let's downgrade the message from LOG_WARN to LOG_DEBUG. Fixes: #6777	2017-09-08 17:24:57 +02:00
Michal Sekletar	5463fa0a88	manager: when reexecuting try to connect to bus only when dbus.service is around (#6773 ) Trying to connect otherwise is pointless, because if socket isn't around we won't connect. However, when dbus.socket is present we attempt to connect. That attempt can't succeed because we are then supposed to activate dbus.service as a response to connection from us. This results in deadlock. Fixes #6303	2017-09-08 15:41:44 +02:00
Alan Jenkins	d60cb656fc	manager: fix job mode when signalled to shutdown etc The irreversible job mode is required to ensure that shutdown is not interrupted by the activation of a unit with a conflict. We already used the correct job mode for `ctrl-alt-del.target`. But not for `exit.target` (SIGINT of user manager). The SIGRT shutdown signals also needed fixing. Also change SIGRTMIN+0 to isolate default.target, instead of starting it. The previous behaviour was documented. However there was no reason given for it, nor can we provide one. The problem that isolate is too aggressive anywhere outside of emergency.target (#2607) is orthogonal. This feature is "accessible by different means and only really a safety net"; it is confusing for it to differ from `systemctl default` without explanation. `AllowIsolate=yes` is retained on poweroff.target etc. for backwards compatibility. `sigpwr.target` is also an obvious candidate for linking to a shutdown target. Unforunately it is also a possible hook for implementing some logic like system V init did, reading `/etc/powerstatus`. If we switched to starting `sigpwr.target` with REPLACE_IRREVERSIBLY, attempts to run `systemctl shutdown` from it would fail, if they had not thought to set `DefaultDependencies=no`. We had provided no examples for `sigpwr`, and the whole idea is cruft to keep legacy people happy. For the moment, I leave `sigpwr` alone, with no risk of disrupting anyone's previously-working, half-working, or untested setup. Fixes #6484. See also #6471	2017-08-31 16:17:42 +01:00
Alan Jenkins	c75fbadac6	manager: remove fallback for user/exit.target The comment here was misleading: the job can fail to enqueue for reasons other than the target not existing. The fallback caused an error to be logged, and dates back to when the "user" directory was named "session". units/session/exit.target was added later the same year. This is consistent with the documentation (man systemd), and the handling of similar signals. It's also consistent with `systemctl exit`, which is what most people would expect.	2017-08-31 16:17:41 +01:00
Lennart Poettering	19bbdd985e	core: manager_set_exec_params() cannot fail, hence make it void Let's simplify things a bit.	2017-08-10 15:02:50 +02:00
Lennart Poettering	8679efde21	execute: add one more ExecFlags flag, for controlling unconditional directory chowning Let's decouple the Manager object from the execution logic a bit more here too, and simply pass along the fact whether we should unconditionally chown the runtime/... directories via the ExecFlags field too.	2017-08-10 14:44:58 +02:00
Lennart Poettering	af635cf377	execute: let's decouple execute.c a bit from the unit logic Let's try to decouple the execution engine a bit from the Unit/Manager concept, and hence pass one more flag as part of the ExecParameters flags field.	2017-08-10 14:44:58 +02:00
Jouke Witteveen	15d167f8a3	core: propagate reload from RELOADING=1 notification (#6550 )	2017-08-07 11:27:24 +02:00
Luca Bruno	28dd66ecfc	core: evaluate presets after generators have run (#6526 ) This commit moves the first-boot system preset-settings evaluation out of main and into the manager startup logic itself. Notably, it reverses the order between generators and presets evaluation, so that any changes performed by first-boot generators are taken into the account by presets logic. After this change, units created by a generator can be enabled as part of a preset.	2017-08-06 09:24:24 -04:00
Zbigniew Jędrzejewski-Szmek	0742986650	core: properly handle deserialization of unknown unit types (#6476 ) We just abort startup, without printing any error. Make sure we always print something, and when we cannot deserialize some unit, just ignore it and continue. Fixup for `4bc5d27b94`. Without this, we would hang in daemon-reexec after upgrade.	2017-07-31 08:05:35 +02:00
Lennart Poettering	4b61c87511	tree-wide: fput[cs]() → fput[cs]_unlocked() wherever that makes sense (#6396 ) As a follow-up for `db3f45e2d2` let's do the same for all other cases where we create a FILE* with local scope and know that no other threads hence can have access to it. For most cases this shouldn't change much really, but this should speed dbus introspection and calender time formatting up a bit.	2017-07-21 10:35:45 +02:00
Yu Watanabe	35aba85a73	core/manager: fix memory leak (#6400 ) This fixes the memory leak introduced by `3536f49e8f`, which forgot to free the prefixes in the manager. Fixes #6398.	2017-07-18 17:30:52 +03:00
Yu Watanabe	3536f49e8f	core: add {State,Cache,Log,Configuration}Directory= (#6384 ) This introduces {State,Cache,Log,Configuration}Directory= those are similar to RuntimeDirectory=. They create the directories under /var/lib, /var/cache/, /var/log, or /etc, respectively, with the mode specified in {State,Cache,Log,Configuration}DirectoryMode=. This also fixes #6391.	2017-07-18 14:34:52 +02:00
Zbigniew Jędrzejewski-Szmek	d233c99ac8	manager: just warn about an invalid environment entry Apart from bugs (as in #6152), this can happen if we ever make our requirements for environment entries more stringent. As with the rest of deserialization, we should just warn and continue.	2017-06-23 20:46:33 -04:00
Zbigniew Jędrzejewski-Szmek	62c460c6e2	manager: raise level of notices about failed deserialization This is something that shouldn't happen. If it does, we want to know about it.	2017-06-23 20:46:33 -04:00
Lennart Poettering	00c83b4300	core: return a friendlier error for a dbus path referring to a non-existent unit See: #6059	2017-06-22 20:54:54 -04:00
Lennart Poettering	c22800e40e	cgroup: rename cg_unified() → cg_unified_controller() cg_unified() is a bit generic a name, let's make clear that it checks whether a specified controller is in unified mode.	2017-02-24 18:00:04 +01:00
Lennart Poettering	b4cccbc13a	cgroup: change cg_unified() to possibly return errors again We use our cgroup APIs in various contexts, including from our libraries sd-login, sd-bus. As we don#t control those environments we can't rely that the unified cgroup setup logic succeeds, and hence really shouldn't assert on it. This more or less reverts `415fc41cea`.	2017-02-24 17:52:58 +01:00
Lennart Poettering	ecc0eab247	Merge pull request #4670 from htejun/systemd-controller-on-unified-v2 Systemd controller on unified v2	2017-02-23 16:23:02 +01:00
Lennart Poettering	a4dde27d73	Merge pull request #5131 from keszybz/environment-generators Environment generators	2017-02-21 11:11:44 +01:00
Zbigniew Jędrzejewski-Szmek	64691d2024	manager: run environment generators Environment file generators are a lot like unit file generators, but not exactly: 1. environment file generators are run for each manager instance, and their output is (or at least can be) individualized. The generators themselves are system-wide, the same for all users. 2. environment file generators are run sequentially, in priority order. Thus, the lifetime of those files is tied to lifecycle of the manager instance. Because generators are run sequentially, later generators can use or modify the output of earlier generators. Each generator is run with no arguments, and the whole state is stored in the environment variables. The generator can echo a set of variable assignments to standard output: VAR_A=something VAR_B=something else This output is parsed, and the next and subsequent generators run with those updated variables in the environment. After the last generator is done, the environment that the manager itself exports is updated. Each generator must return 0, otherwise the output is ignored. The generators in /user-env-generator are for the user session managers, including root, and the ones in /system-env-generator are for pid1.	2017-02-20 18:49:14 -05:00
Zbigniew Jędrzejewski-Szmek	fe902fa496	core/manager: move environment serialization out to basic/env-util.c This protocol is generally useful, we might just as well reuse it for the env. generators. The implementation is changed a bit: instead of making a new strv and freeing the old one, just mutate the original. This is much faster with larger arrays, while in fact atomicity is preserved, since we only either insert the new entry or not, without being in inconsistent state. v2: - fix confusion with return value	2017-02-20 18:49:14 -05:00
Zbigniew Jędrzejewski-Szmek	71cb7d306a	core/manager: fix grammar in comment	2017-02-20 18:49:14 -05:00
Zbigniew Jędrzejewski-Szmek	c6e47247a7	basic/exec-util: add support for synchronous (ordered) execution The output of processes can be gathered, and passed back to the callee. (This commit just implements the basic functionality and tests.) After the preparation in previous commits, the change in functionality is relatively simple. For coding convenience, alarm is prepared before any children are executed, and not before. This shouldn't matter usually, since just forking of the children should be pretty quick. One could also argue that this is more correct, because we will also catch the case when (for whatever reason), forking itself is slow. Three callback functions and three levels of serialization are used: - from individual generator processes to the generator forker - from the forker back to the main process - deserialization in the main process v2: - replace an structure with an indexed array of callbacks	2017-02-20 18:49:13 -05:00
Zbigniew Jędrzejewski-Szmek	504afd7c34	core/manager: split out creation of serialization fd out to a helper There is a slight change in behaviour: the user manager for root will create a temporary file in /run/systemd, not /tmp. I don't think this matters, but simplifies implementation.	2017-02-20 18:49:09 -05:00
Tejun Heo	415fc41cea	core: simplify cg_[all_]unified() cg_[all_]unified() test whether a specific controller or all controllers are on the unified hierarchy. While what's being asked is a simple binary question, the callers must assume that the functions may fail any time, which unnecessarily complicates their usages. This complication is unnecessary. Internally, the test result is cached anyway and there are only a few places where the test actually needs to be performed. This patch simplifies cg_[all_]unified(). * cg_[all_]unified() are updated to return bool. If the result can't be decided, assertion failure is triggered. Error handlings from their callers are dropped. * cg_unified_flush() is updated to calculate the new result synchrnously and return whether it succeeded or not. Places which need to flush the test result are updated to test for failure. This ensures that all the following cg_[all_]unified() tests succeed. * Places which expected possible cg_[all_]unified() failures are updated to call and test cg_unified_flush() before calling cg_[all_]unified(). This includes functions used while setting up mounts during boot and manager_setup_cgroup().	2017-02-18 17:51:13 -05:00
Zbigniew Jędrzejewski-Szmek	2b0445262a	tree-wide: add SD_ID128_MAKE_STR, remove LOG_MESSAGE_ID Embedding sd_id128_t's in constant strings was rather cumbersome. We had SD_ID128_CONST_STR which returned a const char[], but it had two problems: - it wasn't possible to statically concatanate this array with a normal string - gcc wasn't really able to optimize this, and generated code to perform the "conversion" at runtime. Because of this, even our own code in coredumpctl wasn't using SD_ID128_CONST_STR. Add a new macro to generate a constant string: SD_ID128_MAKE_STR. It is not as elegant as SD_ID128_CONST_STR, because it requires a repetition of the numbers, but in practice it is more convenient to use, and allows gcc to generate smarter code: $ size .libs/systemd{,-logind,-journald}{.old,} text data bss dec hex filename 1265204 149564 4808 1419576 15a938 .libs/systemd.old 1260268 149564 4808 1414640 1595f0 .libs/systemd 246805 13852 209 260866 3fb02 .libs/systemd-logind.old 240973 13852 209 255034 3e43a .libs/systemd-logind 146839 4984 34 151857 25131 .libs/systemd-journald.old 146391 4984 34 151409 24f71 .libs/systemd-journald It is also much easier to check if a certain binary uses a certain MESSAGE_ID: $ strings .libs/systemd.old\|grep MESSAGE_ID MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x $ strings .libs/systemd\|grep MESSAGE_ID MESSAGE_ID=c7a787079b354eaaa9e77b371893cd27 MESSAGE_ID=b07a249cd024414a82dd00cd181378ff MESSAGE_ID=641257651c1b4ec9a8624d7a40a9e1e7 MESSAGE_ID=de5b426a63be47a7b6ac3eaac82e2f6f MESSAGE_ID=d34d037fff1847e6ae669a370e694725 MESSAGE_ID=7d4958e842da4a758f6c1cdc7b36dcc5 MESSAGE_ID=1dee0369c7fc4736b7099b38ecb46ee7 MESSAGE_ID=39f53479d3a045ac8e11786248231fbf MESSAGE_ID=be02cf6855d2428ba40df7e9d022f03d MESSAGE_ID=7b05ebc668384222baa8881179cfda54 MESSAGE_ID=9d1aaa27d60140bd96365438aad20286	2017-02-15 00:45:12 -05:00
Zbigniew Jędrzejewski-Szmek	4440b27d41	core/manager: silence gcc warning about unitialized variable At -O3, this was printed a hundred times for various callers of manager_add_job_by_name(). AFAICT, there is no error and `unit` is always intialized. Nevertheless, add explicit initialization to silence the noise. src/core/manager.c: In function 'manager_start_target': src/core/manager.c:1413:16: warning: 'unit' may be used uninitialized in this function [-Wmaybe-uninitialized] return manager_add_job(m, type, unit, mode, e, ret); ^ src/core/manager.c:1401:15: note: 'unit' was declared here Unit *unit; ^	2017-02-12 12:56:40 -05:00
Zbigniew Jędrzejewski-Szmek	7a6a095a9e	core/manager: make manager_load_unit*() functions always take output arg We were inconsistent, manager_load_unit_prepare() would crash if _ret was ever NULL. But none of the callers use NULL. So simplify things and require it to be non-NULL.	2017-02-12 12:40:09 -05:00
Zbigniew Jędrzejewski-Szmek	89711996b3	basic/util: move execute_directory() to separate file It's a fairly specialized function. Let's make new files for it and the tests.	2017-02-11 18:21:06 -05:00
Lennart Poettering	d53333d4b1	core: use a memfd for serialization If we can, use a memfd for serializing state during a daemon reload or reexec. Fall back to a file in /run/systemd or /tmp only if memfds are not available. See: #5016	2017-02-06 16:58:35 +01:00
Lennart Poettering	ae57dad3f9	manager: refuse reloading/reexecing when /run is overly full Let's add an extra safety check: before entering a reload/reexec, let's verify that there's enough room in /run for it. Fixes: #5016	2017-02-06 16:58:06 +01:00
Zbigniew Jędrzejewski-Szmek	a80c157506	core: downgrade "Time has been changed" to debug (#4906 ) That message is emitted by every systemd instance on every resume: Dec 06 08:03:38 laptop systemd[1]: Time has been changed Dec 06 08:03:38 laptop systemd[823]: Time has been changed Dec 06 08:03:38 laptop systemd[916]: Time has been changed Dec 07 08:00:32 laptop systemd[1]: Time has been changed Dec 07 08:00:32 laptop systemd[823]: Time has been changed Dec 07 08:00:32 laptop systemd[916]: Time has been changed -- Reboot -- Dec 07 08:02:46 laptop systemd[836]: Time has been changed Dec 07 08:02:46 laptop systemd[1]: Time has been changed Dec 07 08:02:46 laptop systemd[926]: Time has been changed Dec 07 19:48:12 laptop systemd[1]: Time has been changed Dec 07 19:48:12 laptop systemd[836]: Time has been changed Dec 07 19:48:12 laptop systemd[926]: Time has been changed ... Fixes #4896.	2016-12-18 13:21:19 +01:00
Zbigniew Jędrzejewski-Szmek	5a1d6cb19d	pid1,catalog: use a different MESSAGE_ID for user manager startup This add a new message id for the end of user instance startup. User manager startup is a different beast then the system startup. Their descriptions are completely different too. Let's just separate them. Partially fixes #3351. Also remove "successful" from the description, since we don't know if the startup was successful or not.	2016-12-11 12:41:23 -05:00
Reverend Homer	8fb3f00997	tree-wide: replace all readdir cycles with FOREACH_DIRENT{,_ALL} (#4853 )	2016-12-09 10:04:30 +01:00
Lennart Poettering	2e6dbc0fcd	Merge pull request #4538 from fbuihuu/confirm-spawn-fixes Confirm spawn fixes/enhancements	2016-11-18 11:08:06 +01:00
Franck Bui	b0eb29449e	core: add 'c' in confirmation_spawn to resume the boot process	2016-11-17 18:16:50 +01:00
Franck Bui	7d5ceb6416	core: allow to redirect confirmation messages to a different console It's rather hard to parse the confirmation messages (enabled with systemd.confirm_spawn=true) amongst the status messages and the kernel ones (if enabled). This patch gives the possibility to the user to redirect the confirmation message to a different virtual console, either by giving its name or its path, so those messages are separated from the other ones and easier to read.	2016-11-17 18:16:16 +01:00
Franck Bui	42bf1ae17b	core: prevent the cylon when confirmation_spawn=yes (#2194 ) When booting with systemd.confirm_spawn=true, the eye of cylon animation kicks in pretty quickly so user doesn't have any chance to answer the questions which services to start before the confirmation message is screwed by the cylon. This basically breaks the confirm_spawn functionality completely. This patch prevents the cylon animation to kick in when confirmation_spawn=yes. Fixes: #2194	2016-11-17 18:11:21 +01:00
Lennart Poettering	c5a97ed132	core: GC redundant device jobs from the run queue In contrast to all other unit types device units when queued just track external state, they cannot effect state changes on their own. Hence unless a client or other job waits for them there's no reason to keep them in the job queue. This adds a concept of GC'ing jobs of this type as soon as no client or other job waits for them anymore. To ensure this works correctly we need to track which clients actually reference a job (i.e. which ones enqueued it). Unfortunately that's pretty nasty to do for direct connections, as sd_bus_track doesn't work for them. For now, work around this, by simply remembering in a boolean that a job was requested by a direct connection, and reset it when we notice the direct connection is gone. This means the GC logic works fine, except that jobs are not immediately removed when direct connections disconnect. In the longer term, a rework of the bus logic should fix this properly. For now this should be good enough, as GC works for fine all cases except this one, and thus is a clear improvement over the previous behaviour. Fixes: #1921	2016-11-16 15:03:26 +01:00
Lennart Poettering	a2d72e265a	core: drop n_in_gc_queue field of Manager structure We count the units in the GC queue with this, but actually never make use of it, hence drop it.	2016-11-16 15:03:26 +01:00
Lennart Poettering	493fd52f1a	Merge pull request #4510 from keszybz/tree-wide-cleanups Tree wide cleanups	2016-11-03 13:59:20 -06:00
Zbigniew Jędrzejewski-Szmek	605405c6cc	tree-wide: drop NULL sentinel from strjoin This makes strjoin and strjoina more similar and avoids the useless final argument. spatch -I . -I ./src -I ./src/basic -I ./src/basic -I ./src/shared -I ./src/shared -I ./src/network -I ./src/locale -I ./src/login -I ./src/journal -I ./src/journal -I ./src/timedate -I ./src/timesync -I ./src/nspawn -I ./src/resolve -I ./src/resolve -I ./src/systemd -I ./src/core -I ./src/core -I ./src/libudev -I ./src/udev -I ./src/udev/net -I ./src/udev -I ./src/libsystemd/sd-bus -I ./src/libsystemd/sd-event -I ./src/libsystemd/sd-login -I ./src/libsystemd/sd-netlink -I ./src/libsystemd/sd-network -I ./src/libsystemd/sd-hwdb -I ./src/libsystemd/sd-device -I ./src/libsystemd/sd-id128 -I ./src/libsystemd-network --sp-file coccinelle/strjoin.cocci --in-place $(git ls-files src/.c) git grep -e '\bstrjoin\b.NULL' -l\|xargs sed -i -r 's/strjoin$(.*), NULL$/strjoin(\1)/' This might have missed a few cases (spatch has a really hard time dealing with _cleanup_ macros), but that's no big issue, they can always be fixed later.	2016-10-23 11:43:27 -04:00
Zbigniew Jędrzejewski-Szmek	fb4650aa34	tree-wide: use startswith return value to avoid hardcoded offset I think it's an antipattern to have to count the number of bytes in the prefix by hand. We should do this automatically to avoid wasting programmer time, and possible errors. I didn't any offsets that were wrong, so this change is mostly to make future development easier.	2016-10-22 16:15:46 -04:00
Lukas Nykryn	ae8c7939df	core: use emergency_action for ctr+alt+del burst Fixes #4306	2016-10-21 15:13:50 +02:00
Zbigniew Jędrzejewski-Szmek	3ce40911bd	pid1: downgrade some rlimit warnings Since we ignore the result anyway, downgrade errors to warning. log_oom() will still emit an error, but that's mostly theoretical, so it is not worth complicating the code to avoid the small inconsistency	2016-10-19 22:17:16 -04:00
Zbigniew Jędrzejewski-Szmek	6b430fdb7c	tree-wide: use mfree more	2016-10-16 23:35:39 -04:00
Lennart Poettering	4b58153dd2	core: add "invocation ID" concept to service manager This adds a new invocation ID concept to the service manager. The invocation ID identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is generated each time a unit moves from and inactive to an activating or active state. The primary usecase for this concept is to connect the runtime data PID 1 maintains about a service with the offline data the journal stores about it. Previously we'd use the unit name plus start/stop times, which however is highly racy since the journal will generally process log data after the service already ended. The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel, except that it applies to an individual unit instead of the whole system. The invocation ID is passed to the activated processes as environment variable. It is additionally stored as extended attribute on the cgroup of the unit. The latter is used by journald to automatically retrieve it for each log logged message and attach it to the log entry. The environment variable is very easily accessible, even for unprivileged services. OTOH the extended attribute is only accessible to privileged processes (this is because cgroupfs only supports the "trusted." xattr namespace, not "user."). The environment variable may be altered by services, the extended attribute may not be, hence is the better choice for the journal. Note that reading the invocation ID off the extended attribute from journald is racy, similar to the way reading the unit name for a logging process is. This patch adds APIs to read the invocation ID to sd-id128: sd_id128_get_invocation() may be used in a similar fashion to sd_id128_get_boot(). PID1's own logging is updated to always include the invocation ID when it logs information about a unit. A new bus call GetUnitByInvocationID() is added that allows retrieving a bus path to a unit by its invocation ID. The bus path is built using the invocation ID, thus providing a path for referring to a unit that is valid only for the current runtime cycleof it. Outlook for the future: should the kernel eventually allow passing of cgroup information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we can alter the invocation ID to be generated as hash from that rather than entirely randomly. This way we can derive the invocation race-freely from the messages.	2016-10-07 20:14:38 +02:00
Zbigniew Jędrzejewski-Szmek	8f4d640135	core: only warn on short reads on signal fd	2016-10-07 10:05:04 -04:00
Lennart Poettering	875ca88da5	manager: tighten incoming notification message checks Let's not accept datagrams with embedded NUL bytes. Previously we'd simply ignore everything after the first NUL byte. But given that sending us that is pretty ugly let's instead complain and refuse. With this change we'll only accept messages that have exactly zero or one NUL bytes at the very end of the datagram.	2016-10-07 12:14:33 +02:00
Lennart Poettering	045a3d5989	manager: be stricter with incomining notifications, warn properly about too large ones Let's make the kernel let us know the full, original datagram size of the incoming message. If it's larger than the buffer space provided by us, drop the whole message with a warning. Before this change the kernel would truncate the message for us to the buffer space provided, and we'd not complain about this, and simply process the incomplete message as far as it made sense.	2016-10-07 12:12:10 +02:00
Lennart Poettering	c55ae51e77	manager: don't ever busy loop when we get a notification message we can't process If the kernel doesn't permit us to dequeue/process an incoming notification datagram message it's still better to stop processing the notification messages altogether than to enter a busy loop where we keep getting notified but can't do a thing about it. With this change, manager_dispatch_notify_fd() behaviour is changed like this: - if an error indicating a spurious wake-up is seen on recvmsg(), ignore it (EAGAIN/EINTR) - if any other error is seen on recvmsg() propagate it, thus disabling processing of further wakeups - if any error is seen on later code in the function, warn about it but do not propagate it, as in this cas we're not going to busy loop as the offending message is already dequeued.	2016-10-07 12:08:51 +02:00
Lukáš Nykrýn	24dd31c19e	core: add possibility to set action for ctrl-alt-del burst (#4105 ) For some certification, it should not be possible to reboot the machine through ctrl-alt-delete. Currently we suggest our customers to mask the ctrl-alt-delete target, but that is obviously not enough. Patching the keymaps to disable that is really not a way to go for them, because the settings need to be easily checked by some SCAP tools.	2016-10-06 21:08:21 -04:00
Zbigniew Jędrzejewski-Szmek	a63ee40751	core: do not try to create /run/systemd/transient in test mode This prevented systemd-analyze from unprivileged operation on older systemd installations, which should be possible. Also, we shouldn't touch the file system in test mode even if we can.	2016-10-01 22:53:17 +02:00
Zbigniew Jędrzejewski-Szmek	5fd2c135f1	core: update warning message "closing all" might suggest that _all_ fds received with the notification message will be closed. Reword the message to clarify that only the "unused" ones will be closed.	2016-10-01 11:01:31 +02:00
Zbigniew Jędrzejewski-Szmek	c4bee3c40e	core: get rid of unneeded state variable No functional change.	2016-10-01 11:01:31 +02:00
Zbigniew Jędrzejewski-Szmek	a86b76753d	pid1: more informative error message for ignored notifications It's probably easier to diagnose a bad notification message if the contents are printed. But still, do anything only if debugging is on.	2016-09-29 22:57:57 +02:00
Zbigniew Jędrzejewski-Szmek	8523bf7dd5	pid1: process zero-length notification messages again This undoes `531ac2b234`. I acked that patch without looking at the code carefully enough. There are two problems: - we want to process the fds anyway - in principle empty notification messages are valid, and we should process them as usual, including logging using log_unit_debug().	2016-09-29 22:57:57 +02:00
Franck Bui	9987750e7a	pid1: don't return any error in manager_dispatch_notify_fd() (#4240 ) If manager_dispatch_notify_fd() fails and returns an error then the handling of service notifications will be disabled entirely leading to a compromised system. For example pid1 won't be able to receive the WATCHDOG messages anymore and will kill all services supposed to send such messages.	2016-09-29 19:44:34 +02:00
Jorge Niedbalski	531ac2b234	If the notification message length is 0, ignore the message (#4237 ) Fixes #4234. Signed-off-by: Jorge Niedbalski <jnr@metaklass.org>	2016-09-29 05:26:16 -04:00
Zbigniew Jędrzejewski-Szmek	232f6754f6	pid1: drop kdbus_fd and all associated logic	2016-09-09 15:16:26 +01:00
Lennart Poettering	05a98afd3e	core: add Ref()/Unref() bus calls for units This adds two (privileged) bus calls Ref() and Unref() to the Unit interface. The two calls may be used by clients to pin a unit into memory, so that various runtime properties aren't flushed out by the automatic GC. This is necessary to permit clients to race-freely acquire runtime results (such as process exit status/code or accumulated CPU time) on successful service termination. Ref() and Unref() are fully recursive, hence act like the usual reference counting concept in C. Taking a reference is a privileged operation, as this allows pinning units into memory which consumes resources. Transient units may also gain a reference at the time of creation, via the new AddRef property (that is only defined for transient units at the time of creation).	2016-08-22 16:14:21 +02:00
Zbigniew Jędrzejewski-Szmek	2056ec1927	Merge pull request #3965 from htejun/systemd-controller-on-unified	2016-08-19 19:58:01 -04:00
Lennart Poettering	00d9ef8560	core: add RemoveIPC= setting This adds the boolean RemoveIPC= setting to service, socket, mount and swap units (i.e. all unit types that may invoke processes). if turned on, and the unit's user/group is not root, all IPC objects of the user/group are removed when the service is shut down. The life-cycle of the IPC objects is hence bound to the unit life-cycle. This is particularly relevant for units with dynamic users, as it is essential that no objects owned by the dynamic users survive the service exiting. In fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set. In order to communicate the UID/GID of an executed process back to PID 1 this adds a new "user lookup" socket pair, that is inherited into the forked processes, and closed before the exec(). This is needed since we cannot do NSS from PID 1 due to deadlock risks, However need to know the used UID/GID in order to clean up IPC owned by it if the unit shuts down.	2016-08-19 00:37:25 +02:00
Tejun Heo	5da38d0768	core: use the unified hierarchy for the systemd cgroup controller hierarchy Currently, systemd uses either the legacy hierarchies or the unified hierarchy. When the legacy hierarchies are used, systemd uses a named legacy hierarchy mounted on /sys/fs/cgroup/systemd without any kernel controllers for process management. Due to the shortcomings in the legacy hierarchy, this involves a lot of workarounds and complexities. Because the unified hierarchy can be mounted and used in parallel to legacy hierarchies, there's no reason for systemd to use a legacy hierarchy for management even if the kernel resource controllers need to be mounted on legacy hierarchies. It can simply mount the unified hierarchy under /sys/fs/cgroup/systemd and use it without affecting other legacy hierarchies. This disables a significant amount of fragile workaround logics and would allow using features which depend on the unified hierarchy membership such bpf cgroup v2 membership test. In time, this would also allow deleting the said complexities. This patch updates systemd so that it prefers the unified hierarchy for the systemd cgroup controller hierarchy when legacy hierarchies are used for kernel resource controllers. * cg_unified(@controller) is introduced which tests whether the specific controller in on unified hierarchy and used to choose the unified hierarchy code path for process and service management when available. Kernel controller specific operations remain gated by cg_all_unified(). * "systemd.legacy_systemd_cgroup_controller" kernel argument can be used to force the use of legacy hierarchy for systemd cgroup controller. * nspawn: By default nspawn uses the same hierarchies as the host. If UNIFIED_CGROUP_HIERARCHY is set to 1, unified hierarchy is used for all. If 0, legacy for all. * nspawn: arg_unified_cgroup_hierarchy is made an enum and now encodes one of three options - legacy, only systemd controller on unified, and unified. The value is passed into mount setup functions and controls cgroup configuration. * nspawn: Interpretation of SYSTEMD_CGROUP_CONTROLLER to the actual mount option is moved to mount_legacy_cgroup_hierarchy() so that it can take an appropriate action depending on the configuration of the host. v2: - CGroupUnified enum replaces open coded integer values to indicate the cgroup operation mode. - Various style updates. v3: Fixed a bug in detect_unified_cgroup_hierarchy() introduced during v2. v4: Restored legacy container on unified host support and fixed another bug in detect_unified_cgroup_hierarchy().	2016-08-17 17:44:36 -04:00
Tejun Heo	ca2f6384aa	core: rename cg_unified() to cg_all_unified() A following patch will update cgroup handling so that the systemd controller (/sys/fs/cgroup/systemd) can use the unified hierarchy even if the kernel resource controllers are on the legacy hierarchies. This would require distinguishing whether all controllers are on cgroup v2 or only the systemd controller is. In preparation, this patch renames cg_unified() to cg_all_unified(). This patch doesn't cause any functional changes.	2016-08-15 18:13:36 -04:00
Lennart Poettering	43992e57e0	core: drop spurious newline	2016-08-03 14:52:16 +02:00
Zbigniew Jędrzejewski-Szmek	dadd6ecfa5	Merge pull request #3728 from poettering/dynamic-users	2016-07-25 16:40:26 -04:00
Lennart Poettering	29206d4619	core: add a concept of "dynamic" user ids, that are allocated as long as a service is running This adds a new boolean setting DynamicUser= to service files. If set, a new user will be allocated dynamically when the unit is started, and released when it is stopped. The user ID is allocated from the range 61184..65519. The user will not be added to /etc/passwd (but an NSS module to be added later should make it show up in getent passwd). For now, care should be taken that the service writes no files to disk, since this might result in files owned by UIDs that might get assigned dynamically to a different service later on. Later patches will tighten sandboxing in order to ensure that this cannot happen, except for a few selected directories. A simple way to test this is: systemd-run -p DynamicUser=1 /bin/sleep 99999	2016-07-22 15:53:45 +02:00
Lennart Poettering	79baeeb96d	core: change TasksMax= default for system services to 15% As it turns out 512 is max number of tasks per service is hit by too many applications, hence let's bump it a bit, and make it relative to the system's maximum number of PIDs. With this change the new default is 15%. At the kernel's default pids_max value of 32768 this translates to 4915. At machined's default TasksMax= setting of 16384 this translates to 2457. Why 15%? Because it sounds like a round number and is close enough to 4096 which I was going for, i.e. an eight-fold increase over the old 512 Summary: \| on the host \| in a container old default \| 512 \| 512 new default \| 4915 \| 2457	2016-07-22 15:33:13 +02:00
Thomas H. P. Andersen	f8298f7be3	core: remove duplicate includes (#3771 )	2016-07-21 10:52:07 +02:00
Lukáš Nykrýn	ccc2c98e1b	manager: don't skip sigchld handler for main and control pid for services (#3738 ) During stop when service has one "regular" pid one main pid and one control pid and the sighld for the regular one is processed first the unit_tidy_watch_pids will skip the main and control pid and does not remove them from u->pids(). But then we skip the sigchld event because we already did one in the iteration and there are two pids in u->pids. v2: Use general unit_main_pid() and unit_control_pid() instead of reaching directly to service structure.	2016-07-16 15:04:13 -04:00
Kyle Walker	1e706c8dff	manager: Fixing a debug printf formatting mistake (#3640 ) A 'llu' formatting statement was used in a debugging printf statement instead of a 'PRIu64'. Correcting that mistake here.	2016-07-01 20:03:35 +03:00
Kyle Walker	36f20ae3b2	manager: Only invoke a single sigchld per unit within a cleanup cycle By default, each iteration of manager_dispatch_sigchld() results in a unit level sigchld event being invoked. For scope units, this results in a scope_sigchld_event() which can seemingly stall for workloads that have a large number of PIDs within the scope. The stall exhibits itself as a SIG_0 being initiated for each u->pids entry as a result of pid_is_unwaited(). v2: This patch resolves this condition by only paying to cost of a sigchld in the underlying scope unit once per sigchld iteration. A new "sigchldgen" member resides within the Unit struct. The Manager is incremented via the sd event loop, accessed via sd_event_get_iteration, and the Unit member is set to the same value as the manager each time that a sigchld event is invoked. If the Manager iteration value and Unit member match, the sigchld event is not invoked for that iteration.	2016-06-30 15:16:47 -04:00
Dave Reisner	222953e87f	Ensure kdbus isn't used (#3501 ) Delete the dbus1 generator and some critical wiring. This prevents kdbus from being loaded or detected. As such, it will never be used, even if the user still has a useful kdbus module loaded on their system. Sort of fixes #3480. Not really, but it's better than the current state.	2016-06-18 17:24:23 -04:00
Lukáš Nykrýn	4892084f09	manager: reduce complexity of unit_gc_sweep (#3507 ) When unit is marked as UNSURE, we are trying to find if it state was changed over and over again. So lets not go through the UNSURE states again. Also when we find a GOOD unit lets propagate the GOOD state to all units that this unit reference. This is a problem on machines with a lot of initscripts with different starting priority, since those units will reference each other and the original algorithm might get to n! complexity. Thanks HATAYAMA Daisuke for the expand_good_state code.	2016-06-14 14:20:56 +02:00
Franck Bui	64c3610b55	core: disable colors when displaying cylon when systemd.log_color=off (#3495 )	2016-06-10 18:33:15 +02:00
Lennart Poettering	3d0b8a55f2	manager: remove spurious newline	2016-05-26 15:34:41 +02:00
Michal Sekletar	833f92ad39	core: don't log job status message in case job was effectively NOP (#3199 ) We currently generate log message about unit being started even when unit was started already and job didn't do anything. This is because job was requested explicitly and hence became anchor job of the transaction thus we could not eliminate it. That is fine but, let's not pollute journal with useless log messages. $ systemctl start systemd-resolved $ systemctl start systemd-resolved $ systemctl start systemd-resolved Current state: $ journalctl -u systemd-resolved \| grep Started May 05 15:31:42 rawhide systemd[1]: Started Network Name Resolution. May 05 15:31:59 rawhide systemd[1]: Started Network Name Resolution. May 05 15:32:01 rawhide systemd[1]: Started Network Name Resolution. After patch applied: $ journalctl -u systemd-resolved \| grep Started May 05 16:42:12 rawhide systemd[1]: Started Network Name Resolution. Fixes #1723	2016-05-16 11:24:51 -04:00
Lennart Poettering	fc2fffe770	tree-wide: introduce new SOCKADDR_UN_LEN() macro, and use it everywhere The macro determines the right length of a AF_UNIX "struct sockaddr_un" to pass to connect() or bind(). It automatically figures out if the socket refers to an abstract namespace socket, or a socket in the file system, and properly handles the full length of the path field. This macro is not only safer, but also simpler to use, than the usual offsetof() + strlen() logic.	2016-05-05 22:24:36 +02:00
Lennart Poettering	d8fdc62037	core: use an AF_UNIX/SOCK_DGRAM socket for cgroup agent notification dbus-daemon currently uses a backlog of 30 on its D-bus system bus socket. On overloaded systems this means that only 30 connections may be queued without dbus-daemon processing them before further connection attempts fail. Our cgroups-agent binary so far used D-Bus for its messaging, and hitting this limit hence may result in us losing cgroup empty messages. This patch adds a seperate cgroup agent socket of type AF_UNIX/SOCK_DGRAM. Since sockets of these types need no connection set up, no listen() backlog applies. Our cgroup-agent binary will hence simply block as long as it can't enqueue its datagram message, so that we won't lose cgroup empty messages as likely anymore. This also rearranges the ordering of the processing of SIGCHLD signals, service notification messages (sd_notify()...) and the two types of cgroup notifications (inotify for the unified hierarchy support, and agent for the classic hierarchy support). We now always process events for these in the following order: 1. service notification messages (SD_EVENT_PRIORITY_NORMAL-7) 2. SIGCHLD signals (SD_EVENT_PRIORITY_NORMAL-6) 3. cgroup inotify and cgroup agent (SD_EVENT_PRIORITY_NORMAL-5) This is because when receiving SIGCHLD we invalidate PID information, which we need to process the service notification messages which are bound to PIDs. Hence the order between the first two items. And we want to process SIGCHLD metadata to detect whether a service is gone, before using cgroup notifications, to decide when a service is gone, since the former carries more useful metadata. Related to this: https://bugs.freedesktop.org/show_bug.cgi?id=95264 https://github.com/systemd/systemd/issues/1961	2016-05-05 12:37:04 +02:00
Lennart Poettering	03532f0ae0	coredump,basic: generalize O_TMPFILE handling a bit This moves the O_TMPFILE handling from the coredumping code into common library code, and generalizes it as open_tmpfile_linkable() + link_tmpfile(). The existing open_tmpfile() function (which creates an unlinked temporary file that cannot be linked into the fs) is renamed to open_tmpfile_unlinkable(), to make the distinction clear. Thus, code may now choose between: a) open_tmpfile_linkable() + link_tmpfile() b) open_tmpfile_unlinkable() Depending on whether they want a file that may be linked back into the fs later on or not. In a later commit we should probably convert fopen_temporary() to make use of open_tmpfile_linkable(). Followup for: #3065	2016-04-22 16:16:53 +02:00
Lennart Poettering	4943d14306	systemctl: don't confuse sysv code with generated units The SysV compat code checks whether there's a native unit file before looking for a SysV init script. Since the newest rework generated units will show up in the unit path, and hence the checks ended up assuming that there always was a native unit file for each init script: the generated one. With this change the generated unit file directory is suppressed from the search path when this check is done, to avoid the confusion.	2016-04-12 13:43:32 +02:00
Lennart Poettering	9183df707b	install: rename generator_paths() → generator_binary_paths() This is too confusing, as this funciton returns the paths to the generator binaries, while usually when we refer to the just the "generator path" we mean the generated unit files. Let's clean this up.	2016-04-12 13:43:31 +02:00
Lennart Poettering	07a7864324	core: move flushing of generated unit files to path-lookup.c It's very similar to the mkdir and trim operations for the generator dirs, hence let's unify this at a single place.	2016-04-12 13:43:31 +02:00
Lennart Poettering	d063a52741	core: modernize manager_build_unit_patch_cache() a bit	2016-04-12 13:43:31 +02:00
Lennart Poettering	a145334304	core: rework logic to drop duplicate and non-existing items from search path Move this into a function of its own, so that we can run it after we ran the generators, so that it takes into account removed generator dirs.	2016-04-12 13:43:30 +02:00
Lennart Poettering	cd64fd5613	path-lookup: split out logic for mkdir/rmdir of generator dirs in their own functions	2016-04-12 13:43:30 +02:00
Lennart Poettering	3959135139	core: add a separate unit directory for transient units Previously, transient units were created below the normal runtime directory /run/systemd/system. With this change they are created in a special transient directory /run/systemd/transient, which only contains data for transient units. This clarifies the life-cycle of transient units, and makes clear they are distinct from user-provided runtime units. In particular, users may now extend transient units via /run/systemd/system, without systemd interfering with the life-cycle of these files. This change also adds code so that when a transient unit exits only the drop-ins in this new directory are removed, but nothing else. Fixes: #2139	2016-04-12 13:43:30 +02:00
Lennart Poettering	92dd7c4965	core: reuse manager_get_runtime_prefix() at more places	2016-04-12 13:43:30 +02:00
Lennart Poettering	2c289ea833	core: introduce MANAGER_IS_RELOADING() macro This replaces the old function call manager_is_reloading_or_reexecuting() which was used only at very few places. Use the new macro wherever we check whether we are reloading. This should hopefully make things a bit more readable, given the nature of Manager:n_reloading being a counter.	2016-04-12 13:43:30 +02:00
Lennart Poettering	463d0d1569	core: remove ManagerRunningAs enum Previously, we had two enums ManagerRunningAs and UnitFileScope, that were mostly identical and converted from one to the other all the time. The latter had one more value UNIT_FILE_GLOBAL however. Let's simplify things, and remove ManagerRunningAs and replace it by UnitFileScope everywhere, thus making the translation unnecessary. Introduce two new macros MANAGER_IS_SYSTEM() and MANAGER_IS_USER() to simplify checking if we are running in one or the user context.	2016-04-12 13:43:30 +02:00
Lennart Poettering	a3c4eb0710	core: rework generator dir logic, move the dirs into LookupPaths structure A long time ago – when generators where first introduced – the directories for them were randomly created via mkdtemp(). This was changed later so that they use fixed name directories now. Let's make use of this, and add the genrator dirs to the LookupPaths structure and into the unit file search path maintained in it. This has the benefit that the generator dirs are now normal part of the search path for all tools, and thus are shown in "systemctl list-unit-files" too.	2016-04-12 13:43:29 +02:00
Lukas Nykryn	5d512d5442	core: improve error message when starting template without instance	2016-03-30 13:54:33 +02:00
Zbigniew Jędrzejewski-Szmek	1b81db7a66	Merge pull request #2903 from keszybz/cgroup2-v3 core: cgroup2 support	2016-03-29 20:25:00 -04:00
Evgeny Vereshchagin	947292eef4	core: RuntimeWatchdogSec=infinity disables the watchdog logic	2016-03-28 17:17:32 +00:00
Tejun Heo	e57051f542	core: update invoke_sigchld_event() to handle NULL ->sigchld_event() After receiving SIGCHLD, one of the ways manager_dispatch_sigchld() maps the now zombie $PID to its unit is through manager_get_unit_by_pid_cgroup() which reads /proc/$PID/cgroup and looks up the unit associated with the cgroup path. On non-unified cgroup hierarchies, a process is immediately migrated to the root cgroup on death and the cgroup lookup would always have returned the unit associated with it, making it rather pointless but safe. On unified hierarchy, a zombie remains associated with the cgroup that it was associated with at the time of death and thus manager_get_unit_by_pid_cgroup() will look up the unit properly. However, by the time manager_dispatch_sigchld() is running, the original cgroup may have become empty and it and its associated unit might already have been removed. If the cgroup path doesn't yield a match, manager_dispatch_sigchld() keeps pruning the leaf component. This means that the function may return a slice unit for a pid and as a slice doesn't have ->sigchld_event() handler, calling invoke_sigchld_event() on it causes a segfault. This patch updates invoke_sigchld_event() so that it skips calling if the handler is not set.	2016-03-26 12:06:06 -04:00
Vito Caputo	313cefa1d9	tree-wide: make ++/-- usage consistent WRT spacing Throughout the tree there's spurious use of spaces separating ++ and -- operators from their respective operands. Make ++ and -- operator consistent with the majority of existing uses; discard the spaces.	2016-02-22 20:32:04 -08:00
Daniel Mack	50f48ad37a	cgroup: remove support for NetClass= directive Support for net_cls.class_id through the NetClass= configuration directive has been added in v227 in preparation for a per-unit packet filter mechanism. However, it turns out the kernel people have decided to deprecate the net_cls and net_prio controllers in v2. Tejun provides a comprehensive justification for this in his commit, which has landed during the merge window for kernel v4.5: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=bd1060a1d671 As we're aiming for full support for the v2 cgroup hierarchy, we can no longer support this feature. Userspace tool such as nftables are moving over to setting rules that are specific to the full cgroup path of a task, which obsoletes these controllers anyway. This commit removes support for tweaking details in the net_cls controller, but keeps the NetClass= directive around for legacy compatibility reasons.	2016-02-10 16:38:56 +01:00
Daniel Mack	b26fa1a2fb	tree-wide: remove Emacs lines from all files This should be handled fine now by .dir-locals.el, so need to carry that stuff in every file.	2016-02-10 13:41:57 +01:00
Lennart Poettering	f0469b8c4a	core: when determining system state, don't bother with JOB_TRY_RESTART When we determine the current system state we check whether units like emergency.target are running or a job that results in them being run is queued. However, this is not the case for JOB_TRY_RESTART, since that's a NOP if the unit has not been running before. Hence, don't bother with checking for that job type.	2016-01-28 18:49:59 +01:00
Lennart Poettering	5f0f8d749d	Merge pull request #2357 from keszybz/warnings-2 Remove gcc warnings v2	2016-01-19 15:09:53 +01:00
Evgeny Vereshchagin	d9814c76ec	core: fix memory leak on reload ==1== HEAP SUMMARY: ==1== in use at exit: 61,728 bytes in 22 blocks ==1== total heap usage: 258,122 allocs, 258,100 frees, 78,219,628 bytes allocated ==1== ==1== 16 bytes in 1 blocks are definitely lost in loss record 1 of 6 ==1== at 0x4C2BBCF: malloc (vg_replace_malloc.c:299) ==1== by 0x1E350E: memdup (alloc-util.c:34) ==1== by 0x135AFB: memdup_multiply (alloc-util.h:74) ==1== by 0x140F97: manager_set_default_rlimits (manager.c:2929) ==1== by 0x1303DA: manager_set_defaults (main.c:737) ==1== by 0x133A02: main (main.c:1718) ==1== ==1== 272 bytes in 17 blocks are definitely lost in loss record 2 of 6 ==1== at 0x4C2BBCF: malloc (vg_replace_malloc.c:299) ==1== by 0x1E350E: memdup (alloc-util.c:34) ==1== by 0x135AFB: memdup_multiply (alloc-util.h:74) ==1== by 0x140F97: manager_set_default_rlimits (manager.c:2929) ==1== by 0x1303DA: manager_set_defaults (main.c:737) ==1== by 0x13480D: main (main.c:1828) ==1== ==1== LEAK SUMMARY: ==1== definitely lost: 288 bytes in 18 blocks ==1== indirectly lost: 0 bytes in 0 blocks ==1== possibly lost: 0 bytes in 0 blocks ==1== still reachable: 61,440 bytes in 4 blocks ==1== suppressed: 0 bytes in 0 blocks ==1== Reachable blocks (those to which a pointer was found) are not shown. ==1== To see them, rerun with: --leak-check=full --show-leak-kinds=all	2016-01-14 07:45:03 +00:00
Zbigniew Jędrzejewski-Szmek	b326715278	tree-wide: check if errno is greater than zero (2) Compare errno with zero in a way that tells gcc that (if the condition is true) errno is positive.	2016-01-13 15:10:17 -05:00
Evgeny Vereshchagin	37453b3a2a	core: don't enable special signals in test mode Fixes: $ systemd-analyze verify ... Failed to open /dev/tty0: Permission denied	2016-01-04 18:39:55 +00:00
Evgeny Vereshchagin	2ce2cce3ad	core: revert "manager: do not set up signals in test mode" This reverts commit `5aa1054521`. Fixes test-execute $ sudo make check TESTS=test-execute ... $ cat test-execute.log + test /tmp/test-exec_workingdirectory = /tmp/test-exec_workingdirectory Test timeout when testing exec-workingdirectory.service exec-workingdirectory.service UMask: 0022 WorkingDirectory: /tmp/test-exec_workingdirectory RootDirectory: / NonBlocking: no PrivateTmp: no PrivateNetwork: no PrivateDevices: no ProtectHome: no ProtectSystem: no IgnoreSIGPIPE: yes RuntimeDirectoryMode: 0755 StandardInput: null StandardOutput: inherit StandardError: inherit FAIL test-execute (exit status: 1)	2016-01-04 04:13:00 +00:00
Lennart Poettering	3260929919	Merge pull request #2224 from keszybz/analyze-verify-warning manager: do not set up signals in test mode	2015-12-26 18:53:50 +01:00
Zbigniew Jędrzejewski-Szmek	5aa1054521	manager: do not set up signals in test mode When we are running in test mode, we don't expect any signals. In fact ^C should end the program. This also avoids permission issues when running systemd-analyze verify.	2015-12-25 00:24:16 -05:00
Daniel Mack	8936a5e34d	core: re-sync bus name list after deserializing during daemon-reload When the daemon reloads, it doesn not actually give up its DBus connection, as wrongly stated in an earlier commit. However, even though the bus connection stays open, the daemon flushes out all its internal state. Hence, if there is a NameOwnerChanged signal after the flush and before the deserialization, it cannot be matched against any pending unit. To fix this, rename bus_list_names() to manager_sync_bus_names() and call it explicitly at the end of the daemon reload operation.	2015-12-23 23:31:35 +01:00
Zbigniew Jędrzejewski-Szmek	4cee3a78bb	manager: log log level changes uniformly Output the same message when a request to change the log level is received over dbus and through a signal. From the user point of view those two operations are very similar and it's easy to think that the dbus operation didn't work when the expected message is not emitted. Also "downgrade" the message level to info, since this is a normal user initiated action.	2015-12-13 14:53:52 -05:00
Zbigniew Jędrzejewski-Szmek	76b6f3f68f	manager: move status output change debug messages to set function This way we can only print the debug message when the status actually changes. We also means we don't print anything when running in --user mode, where status output is always disabled.	2015-12-13 14:52:19 -05:00
Lennart Poettering	4afd3348c7	tree-wide: expose "p"-suffix unref calls in public APIs to make gcc cleanup easy GLIB has recently started to officially support the gcc cleanup attribute in its public API, hence let's do the same for our APIs. With this patch we'll define an xyz_unrefp() call for each public xyz_unref() call, to make it easy to use inside a __attribute__((cleanup())) expression. Then, all code is ported over to make use of this. The new calls are also documented in the man pages, with examples how to use them (well, I only added docs where the _unref() call itself already had docs, and the examples, only cover sd_bus_unrefp() and sd_event_unrefp()). This also renames sd_lldp_free() to sd_lldp_unref(), since that's how we tend to call our destructors these days. Note that this defines no public macro that wraps gcc's attribute and makes it easier to use. While I think it's our duty in the library to make our stuff easy to use, I figure it's not our duty to make gcc's own features easy to use on its own. Most likely, client code which wants to make use of this should define its own: #define _cleanup_(function) __attribute__((cleanup(function))) Or similar, to make the gcc feature easier to use. Making this logic public has the benefit that we can remove three header files whose only purpose was to define these functions internally. See #2008.	2015-11-27 19:19:36 +01:00
Lennart Poettering	9ded9cd14c	core: enable TasksMax= for all services by default, and set it to 512 Also, enable TasksAccounting= for all services by default, too. See: http://lists.freedesktop.org/archives/systemd-devel/2015-November/035006.html	2015-11-16 11:57:48 +01:00
Lennart Poettering	0af20ea2ee	core: add new DefaultTasksMax= setting for system.conf This allows initializing the TasksMax= setting of all units by default to some fixed value, instead of leaving it at infinity as before.	2015-11-13 19:50:52 +01:00
Michal Schmidt	7152869f0a	Merge pull request #1869 from poettering/kill-overridable Remove support for RequiresOverridable= and RequisiteOverridable=	2015-11-13 14:04:34 +01:00
Evgeny Vereshchagin	df0060346a	core: use SD_EVENT_PRIORITY_NORMAL-n instead on -n	2015-11-12 19:54:34 +00:00
Lennart Poettering	53f1841669	core: unify code that warns about jobs we fail to enqueue This allows us to shorten our code a bit.	2015-11-12 20:14:06 +01:00
Lennart Poettering	4bd29fe5ce	core: drop "override" flag when building transactions Now that we don't have RequiresOverridable= and RequisiteOverridable= dependencies anymore, we can get rid of tracking the "override" boolean for jobs in the job engine, as it serves no purpose anymore. While we are at it, fix some error messages we print when invoking functions that take the override parameter.	2015-11-12 19:54:07 +01:00
Lennart Poettering	9ff1a6f1d6	core: change type of distribute_fds() prototype to return void We can't handle errors of thisc all sanely anyway, and we never actually return any errors from the unit type that implements the call. Hence, let's make this void, in order to simplify things.	2015-11-10 21:03:49 +01:00
Lennart Poettering	ba64af90ec	core: change return value of the unit's enumerate() call to void We cannot handle enumeration failures in a sensible way, hence let's try hard to continue without making such failures fatal, and log about it with precise error messages.	2015-11-10 21:03:49 +01:00
Vito Caputo	ad231c7787	core: still make progress when throttling the manager loop Don't simply continue after sleeping, it potentially puts us in a state of spinning doing nothing slowly, if the ratelimit_test() keeps detecting the need for limiting. Observed in vms after the host had been suspended for a while, on resume systemd entered a loop of making zero progress spamming the console with: [431942.850090] systemd[1]: Looping too fast. Throttling execution a little. I see no reason to have a continue here, the intention should be to throttle execution, not circumvent it altogether.	2015-11-04 17:32:16 -08:00
Lennart Poettering	a47806fafa	sd-daemon: increase sd_notify() socket buffer size Let's make sure we don't start blocking on sd_notify() earlier than necessary, let's bump the socket buffer sizes to 8M. We already do something similar for our logging socket buffers, hence apply a similar bump here.	2015-10-31 19:09:20 +01:00
Lennart Poettering	a1a078eef2	core: bail our earlier when doing audit Let's make sure we don't even try to create the audit socket	2015-10-31 19:09:20 +01:00
Lennart Poettering	97044145b4	core,nspawn: minor coding style fixes	2015-10-31 19:09:20 +01:00
Lennart Poettering	96d66d89c9	core: constify a few things	2015-10-31 19:09:20 +01:00
David Herrmann	b215b0ede1	core: fix priority ordering in notify-handling Currently, we dispatch NOTIFY messages in a tight loop. Regardless how much data is incoming, we always dispatch everything that is queued. This, however, completely breaks priority event-handling of sd-event. When dispatching one NOTIFY event, another completely different event might fire, or might be queued by the NOTIFY handling. However, this event will not get dispatched until all other further NOTIFY messages are handled. Those might even arrive _after_ the other event fired, and as such completely break priority ordering of sd-event (which several code paths rely on). Break this by never dispatching multiple messages. Just return after each message that was read and let sd-event handle everything else. (The patch looks scarier that it is. It basically just drops the for(;;) loop and re-indents the loop-content.)	2015-10-28 19:11:36 +01:00
Lennart Poettering	b5efdb8af4	util-lib: split out allocation calls into alloc-util.[ch]	2015-10-27 13:45:53 +01:00
Lennart Poettering	affb60b1ef	util-lib: split out umask-related code to umask-util.h	2015-10-27 13:25:56 +01:00
Lennart Poettering	8b43440b7e	util-lib: move string table stuff into its own string-table.[ch]	2015-10-27 13:25:56 +01:00
Lennart Poettering	8fcde01280	util-lib: split stat()/statfs()/stavfs() related calls into stat-util.[ch]	2015-10-27 13:25:56 +01:00
Lennart Poettering	f4f15635ec	util-lib: move a number of fs operations into fs-util.[ch]	2015-10-27 13:25:56 +01:00
Lennart Poettering	0d39fa9c69	util-lib: move more file I/O related calls into fileio.[ch]	2015-10-27 13:25:55 +01:00
Lennart Poettering	6bedfcbb29	util-lib: split string parsing related calls from util.[ch] into parse-util.[ch]	2015-10-27 13:25:55 +01:00
Lennart Poettering	c004493cde	util-lib: split out IO related calls to io-util.[ch]	2015-10-26 01:24:38 +01:00
Lennart Poettering	3ffd4af220	util-lib: split out fd-related operations into fd-util.[ch] There are more than enough to deserve their own .c file, hence move them over.	2015-10-25 13:19:18 +01:00
Lennart Poettering	07630cea1f	util-lib: split our string related calls from util.[ch] into its own file string-util.[ch] There are more than enough calls doing string manipulations to deserve its own files, hence do something about it. This patch also sorts the #include blocks of all files that needed to be updated, according to the sorting suggestions from CODING_STYLE. Since pretty much every file needs our string manipulation functions this effectively means that most files have sorted #include blocks now. Also touches a few unrelated include files.	2015-10-24 23:05:02 +02:00
Lennart Poettering	4f5dd3943b	util: split out escaping code into escape.[ch] This really deserves its own file, given how much code this is now.	2015-10-24 23:04:42 +02:00
Lennart Poettering	ac5b0c13d8	tree-wide: add more void casts for various syscall invocations	2015-10-19 23:07:18 +02:00
Thomas Hindoe Paaboel Andersen	74165387ee	manager: remove unused function	2015-10-13 22:17:26 +02:00
Lennart Poettering	8dd4c05b54	core: add support for naming file descriptors passed using socket activation This adds support for naming file descriptors passed using socket activation. The names are passed in a new $LISTEN_FDNAMES= environment variable, that matches the existign $LISTEN_FDS= one and contains a colon-separated list of names. This also adds support for naming fds submitted to the per-service fd store using FDNAME= in the sd_notify() message. This also adds a new FileDescriptorName= setting for socket unit files to set the name for fds created by socket units. This also adds a new call sd_listen_fds_with_names(), that is similar to sd_listen_fds(), but also returns the names of the fds. systemd-activate gained the new --fdname= switch to specify a name for testing socket activation. This is based on #1247 by Maciej Wereski. Fixes #1247.	2015-10-06 11:52:48 +02:00
Lennart Poettering	400f1a33cf	core: sort includes of manager.[ch] according to CODING_STYLE	2015-09-29 21:08:36 +02:00
Daniel Mack	d11885c814	Merge pull request #1335 from poettering/some-fixes A variety of mostly unrelated fixes	2015-09-22 17:04:38 +02:00
Lennart Poettering	1fc464f6fb	cgtop: underline table header Let's underline the header line of the table shown by cgtop, how it is customary for tables. In order to do this, let's introduce new ANSI underline macros, and clean up the existing ones as side effect.	2015-09-22 16:30:42 +02:00
Lennart Poettering	ed0d40229b	util: add safe_closedir() similar to safe_fclose()	2015-09-22 16:30:24 +02:00
Lennart Poettering	85fade1edb	Merge pull request #986 from karelzak/monitor mount: use libmount to monitor mountinfo & utab	2015-09-22 14:31:58 +02:00
Daniel Mack	32ee7d3309	cgroup: add support for net_cls controllers Add a new config directive called NetClass= to CGroup enabled units. Allowed values are positive numbers for fix assignments and "auto" for picking a free value automatically, for which we need to keep track of dynamically assigned net class IDs of units. Introduce a hash table for this, and also record the last ID that was given out, so the allocator can start its search for the next 'hole' from there. This could eventually be optimized with something like an irb. The class IDs up to 65536 are considered reserved and won't be assigned automatically by systemd. This barrier can be made a config directive in the future. Values set in unit files are stored in the CGroupContext of the unit and considered read-only. The actually assigned number (which may have been chosen dynamically) is stored in the unit itself and is guaranteed to remain stable as long as the unit is active. In the CGroup controller, set the configured CGroup net class to net_cls.classid. Multiple unit may share the same net class ID, and those which do are linked together.	2015-09-16 00:21:55 +02:00
Karel Zak	d379d44255	mount: use libmount to monitor mountinfo & utab The current implementation directly monitor /proc/self/mountinfo and /run/mount/utab files. It's really not optimal because utab file is private libmount stuff without any official guaranteed semantic. The libmount since v2.26 provides API to monitor mount kernel & userspace changes and since v2.27 the monitor is usable for non-root users too. This patch replaces the current implementation with libmount based solution. Signed-off-by: Karel Zak <kzak@redhat.com>	2015-09-14 09:12:31 +02:00
Lennart Poettering	e7ab4d1ac9	cgroup: unify how we invalidate cgroup controller settings Let's make sure that we follow the same codepaths when adjusting a cgroup property via the dbus SetProperty() call, and when we execute the StartupCPUShares= effect.	2015-09-11 18:31:50 +02:00
Lennart Poettering	cd72bd8a46	core: invalidate idle pipe event source in manager_close_idle_pipe() In all occasions when this function is called we do so anyway, so let's move this inside, to make things easier.	2015-09-11 18:31:50 +02:00
Lennart Poettering	5269eb6b32	core: allocate sets of startup and failed units on-demand There's a good chance we never needs these sets, hence allocate them only when needed.	2015-09-11 18:31:49 +02:00
Lennart Poettering	ece174c543	tree-wide: drop {} from one-line if blocks Patch via coccinelle.	2015-09-09 08:20:20 +02:00
Lennart Poettering	a1e58e8ee1	tree-wide: use coccinelle to patch a lot of code to use mfree() This replaces this: free(p); p = NULL; by this: p = mfree(p); Change generated using coccinelle. Semantic patch is added to the sources.	2015-09-09 08:19:27 +02:00
Lennart Poettering	75f86906c5	basic: rework virtualization detection API Introduce a proper enum, and don't pass around string ids anymore. This simplifies things quite a bit, and makes virtualization detection more similar to architecture detection.	2015-09-07 13:42:47 +02:00
Lennart Poettering	b3ac818be8	core: split up manager_get_unit_by_pid() Let's move the actual cgroup part of it into a new separate function manager_get_unit_by_pid_cgroup(), and then make manager_get_unit_by_pid() just a wrapper that also checks the two pid hashmaps. Then, let's make sure the various calls that want to deliver events to the owners of a PID check both hashmaps and the cgroup and deliver the event to each of them. OTOH make sure bus calls like GetUnitByPID() continue to check the PID hashmaps first and the cgroup only as fallback.	2015-09-04 09:07:31 +02:00
Lennart Poettering	fea72cc033	macro: introduce new PID_TO_PTR macros and make use of them This adds a new PID_TO_PTR() macro, plus PTR_TO_PID() and makes use of it wherever we maintain processes in a hash table. Previously we sometimes used LONG_TO_PTR() and other times ULONG_TO_PTR() for that, hence let's make this more explicit and clean up things.	2015-09-04 09:07:30 +02:00
Lennart Poettering	efdb02375b	core: unified cgroup hierarchy support This patch set adds full support the new unified cgroup hierarchy logic of modern kernels. A new kernel command line option "systemd.unified_cgroup_hierarchy=1" is added. If specified the unified hierarchy is mounted to /sys/fs/cgroup instead of a tmpfs. No further hierarchies are mounted. The kernel command line option defaults to off. We can turn it on by default as soon as the kernel's APIs regarding this are stabilized (but even then downstream distros might want to turn this off, as this will break any tools that access cgroupfs directly). It is possibly to choose for each boot individually whether the unified or the legacy hierarchy is used. nspawn will by default provide the legacy hierarchy to containers if the host is using it, and the unified otherwise. However it is possible to run containers with the unified hierarchy on a legacy host and vice versa, by setting the $UNIFIED_CGROUP_HIERARCHY environment variable for nspawn to 1 or 0, respectively. The unified hierarchy provides reliable cgroup empty notifications for the first time, via inotify. To make use of this we maintain one manager-wide inotify fd, and each cgroup to it. This patch also removes cg_delete() which is unused now. On kernel 4.2 only the "memory" controller is compatible with the unified hierarchy, hence that's the only controller systemd exposes when booted in unified heirarchy mode. This introduces a new enum for enumerating supported controllers, plus a related enum for the mask bits mapping to it. The core is changed to make use of this everywhere. This moves PID 1 into a new "init.scope" implicit scope unit in the root slice. This is necessary since on the unified hierarchy cgroups may either contain subgroups or processes but not both. PID 1 hence has to move out of the root cgroup (strictly speaking the root cgroup is the only one where processes and subgroups are still allowed, but in order to support containers nicey, we move PID 1 into the new scope in all cases.) This new unit is also used on legacy hierarchy setups. It's actually pretty useful on all systems, as it can then be used to filter journal messages coming from PID 1, and so on. The root slice ("-.slice") is now implicitly created and started (and does not require a unit file on disk anymore), since that's where "init.scope" is located and the slice needs to be started before the scope can. To check whether we are in unified or legacy hierarchy mode we use statfs() on /sys/fs/cgroup. If the .f_type field reports tmpfs we are in legacy mode, if it reports cgroupfs we are in unified mode. This patch set carefuly makes sure that cgls and cgtop continue to work as desired. When invoking nspawn as a service it will implicitly create two subcgroups in the cgroup it is using, one to move the nspawn process into, the other to move the actual container processes into. This is done because of the requirement that cgroups may either contain processes or other subgroups.	2015-09-01 23:52:27 +02:00
Lennart Poettering	ae2a2c53dd	manager: don't write first-boot flag file all the time Instead, remember that we have already written it.	2015-09-01 17:20:56 +02:00
Lennart Poettering	90990e28c9	manager: remove ask-password fd from sd_event before closing it Otherwise we might attempt to remove a non-existing fd from epoll.	2015-08-31 13:20:44 +02:00

... 3 4 5 6 7 ...

740 Commits