Commit graph

4512 commits

Author SHA1 Message Date
Yu Watanabe 938dbb292a
Merge pull request #10901 from poettering/startswith-list
add new STARTSWITH_SET() macro
2018-11-26 22:40:51 +09:00
Lennart Poettering 9630d4dd68
Merge pull request #10894 from poettering/root-cgroup-fix
A multitude of cgroup fixes
2018-11-26 14:13:01 +01:00
Lennart Poettering da9fc98ded tree-wide: port more code over to PATH_STARTSWITH_SET() 2018-11-26 14:08:46 +01:00
Lennart Poettering 49fe5c0996 tree-wide: port various places over to STARTSWITH_SET() 2018-11-26 14:08:46 +01:00
Lennart Poettering b8b6f32104 cgroup: when we unload a unit, also update all its parent's members mask
This way we can corectly ensure that when a unit that requires some
controller goes away, we propagate the removal of it all the way up, so
that the controller is turned off in all the parents too.
2018-11-23 13:41:37 +01:00
Lennart Poettering 5af8805872 cgroup: drastically simplify caching of cgroups members mask
Previously we tried to be smart: when a new unit appeared and it only
added controllers to the cgroup mask we'd update the cached members mask
in all parents by ORing in the controller flags in their cached values.
Unfortunately this was quite broken, as we missed some conditions when
this cache had to be reset (for example, when a unit got unloaded),
moreover the optimization doesn't work when a controller is removed
anyway (as in that case there's no other way for the parent to iterate
though all children if any other, remaining child unit still needs it).
Hence, let's simplify the logic substantially: instead of updating the
cache on the right events (which we didn't get right), let's simply
invalidate the cache, and generate it lazily when we encounter it later.
This should actually result in better behaviour as we don't have to
calculate the new members mask for a whole subtree whever we have the
suspicion something changed, but can delay it to the point where we
actually need the members mask.

This allows us to simplify things quite a bit, which is good, since
validating this cache for correctness is hard enough.

Fixes: #9512
2018-11-23 13:41:37 +01:00
Lennart Poettering 8a0d538815 cgroup: extend comment on what unit_release_cgroup() is for 2018-11-23 13:41:37 +01:00
Lennart Poettering 1fd3a10c38 cgroup: extend reasons when we realize the enable mask
After creating a cgroup we need to initialize its
"cgroup.subtree_control" file with the controllers its children want to
use. Currently we do so whenever the mkdir() on the cgroup succeeded,
i.e. when we know the cgroup is "fresh". Let's update the condition
slightly that we also do so when internally we assume a cgroup doesn't
exist yet, even if it already does (maybe left-over from a previous
run).

This shouldn't change anything IRL but make things a bit more robust.
2018-11-23 13:41:37 +01:00
Lennart Poettering d5095dcd30 cgroup: tighten call that detects whether we need to realize a unit's cgroup a bit, and comment why 2018-11-23 13:41:37 +01:00
Lennart Poettering 5a62e5e2ac cgroup: document what the various masks variables are used for 2018-11-23 13:41:37 +01:00
Lennart Poettering 27c4ed790a cgroup: simplify check whether it makes sense to realize a cgroup 2018-11-23 13:41:37 +01:00
Lennart Poettering e00068e71f cgroup: in unit_invalidate_cgroup() actually modify invalidation mask
Previously this would manipulate the realization mask for invalidating
the realization. This is a bit ugly though as the realization mask's
primary purpose to is to reflect in which hierarchies a cgroup currently
exists, and it's probably a good idea to keep that in sync with
realities.

We nowadays have the an explicit fields for invalidating cgroup
controller information, the "cgroup_invalidated_mask", let's use this
one instead.

The effect is pretty much the same, as the main consumer of these masks
(unit_has_mask_realize()) checks both anyway.
2018-11-23 13:41:37 +01:00
Lennart Poettering 27adcc9737 cgroup: be more careful with which controllers we can enable/disable on a cgroup
This changes cg_enable_everywhere() to return which controllers are
enabled for the specified cgroup. This information is then used to
correctly track the enablement mask currently in effect for a unit.
Moreover, when we try to turn off a controller, and this works, then
this is indicates that the parent unit might succesfully turn it off
now, too as our unit might have kept it busy.

So far, when realizing cgroups, i.e. when syncing up the kernel
representation of relevant cgroups with our own idea we would strictly
work from the root to the leaves. This is generally a good approach, as
when controllers are enabled this has to happen in root-to-leaves order.
However, when controllers are disabled this has to happen in the
opposite order: in leaves-to-root order (this is because controllers can
only be enabled in a child if it is already enabled in the parent, and
if it shall be disabled in the parent then it has to be disabled in the
child first, otherwise it is considered busy when it is attempted to
remove it in the parent).

To make things complicated when invalidating a unit's cgroup membershup
systemd can actually turn off some controllers previously turned on at
the very same time as it turns on other controllers previously turned
off. In such a case we have to work up leaves-to-root *and*
root-to-leaves right after each other. With this patch this is
implemented: we still generally operate root-to-leaves, but as soon as
we noticed we successfully turned off a controller previously turned on
for a cgroup we'll re-enqueue the cgroup realization for all parents of
a unit, thus implementing leaves-to-root where necessary.
2018-11-23 13:41:37 +01:00
Zbigniew Jędrzejewski-Szmek e5e0a79623 pid1,sd-device: use PATH_STARTSWITH_SET more 2018-11-23 13:37:47 +01:00
Lennart Poettering 26a17ca280 cgroup: add explanatory comment 2018-11-23 12:24:37 +01:00
Lennart Poettering 442ce7759c cgroup: units that aren't loaded properly should not result in cgroup controllers being pulled in
This shouldn't make much difference in real life, but is a bit cleaner.
2018-11-23 12:24:37 +01:00
Lennart Poettering 0adf88b68c cgroup: dump delegation mask too 2018-11-23 12:24:37 +01:00
Lennart Poettering 1649244588 cgroup: make unit_get_needs_bpf_firewall() static too 2018-11-23 12:24:37 +01:00
Lennart Poettering 53aea74a60 cgroup: make some functions static 2018-11-23 12:24:37 +01:00
Lennart Poettering 52fecf20b9 cgroup: fine tune when to apply cgroup attributes to the root cgroup
Let's tweak when precisely to apply cgroup attributes on the root
cgroup.

With this we now follow the following rules:

1. On cgroupsv2 we never apply any regular cgroups to the host root,
   since the attributes generally do not exist there.

2. On cgroupsv1 we do not apply any "weight" or "shares" style
   attributes to the host root cgroup, since they don't make much sense
   on the top level where there's only one group, hence no need to
   compare weights against each other. The other attributes are applied
   to the host root cgroup however.

3. In any case we don't apply attributes to the root of container
   environments (and --user roots), under the assumption that this is
   managed by the manager further up. (Note that on cgroupsv2 this is
   even enforced by the kernel)

4. BPF pseudo-attributes are applied in all cases (since we can have as
   many of them as we want)
2018-11-23 12:24:37 +01:00
Lennart Poettering 589a5f7a38 cgroup: append \n to static strings we write to cgroup attributes
This is a bit cleaner since we when we format numeric limits we append
it. And this way write_string_file() doesn't have to append it.
2018-11-23 12:24:37 +01:00
Lennart Poettering 28cfdc5aeb cgroup: tighten manager_owns_host_root_cgroup() a bit
This tightening is not strictly necessary (as the m->cgroup_root check
further down does the same), but let's make this explicit.
2018-11-23 12:24:37 +01:00
Lennart Poettering 611c4f8afb cgroup: rename {manager_owns|unit_has}_root_cgroup() → .._host_root_cgroup()
Let's emphasize that this function checks for the host root cgroup, i.e.
returns false for the root cgroup when we run in a container where
CLONE_NEWCGROUP is used. There has been some confusion around this
already, for example cgroup_context_apply() uses the function
incorrectly (which we'll fix in a later commit).

Just some refactoring, not change in behaviour.
2018-11-23 12:24:37 +01:00
Lennart Poettering 293d32df39 cgroup: add a common routine for writing to attributes, and logging about it
We can use this at quite a few places, and this allows us to shorten our
code quite a bit.
2018-11-23 12:24:37 +01:00
Lennart Poettering 39b9fefb2e cgroup: add a new macro for determining log level for cgroup attr write failures
For now, let's use it only at one place, but a follow-up commit will
make more use of it.
2018-11-23 12:24:37 +01:00
Lennart Poettering 2c74e12bb3 cgroup: ignore EPERM for a couple of more attribute writes 2018-11-23 12:24:37 +01:00
Lennart Poettering 8c83840772 cgroup: add comment explaining why we ignore EINVAL at two places
These are just copies from further down.
2018-11-23 12:24:37 +01:00
Lennart Poettering 73fe5314bf cgroup: suffix settings with "=" in log messages where appropriate 2018-11-23 12:24:37 +01:00
Lennart Poettering a0c339ed4b cgroup: only install cgroup release agent when we own the root cgroup
If we run in a container we shouldn't patch around this, and most likely
we can't anyway, and there's not much point in complaining about this.
Hence let's strictly say: the agent is private property of the host's
system instance, nothing else.
2018-11-23 12:24:37 +01:00
Lennart Poettering de8a711a58 cgroup: use structured initialization 2018-11-23 12:24:37 +01:00
Lennart Poettering 00e7b3c8e5 unit: minor optimization, use stack over heap, when we can 2018-11-23 00:46:56 +01:00
Lennart Poettering 27da878e7e unit: drop an unused fields from Unit struct 2018-11-23 00:37:00 +01:00
Lennart Poettering 66fa4bdd70 core: add two minor comments (#10890) 2018-11-23 06:25:27 +09:00
Zbigniew Jędrzejewski-Szmek baaa35ad70 coccinelle: make use of SYNTHETIC_ERRNO
Ideally, coccinelle would strip unnecessary braces too. But I do not see any
option in coccinelle for this, so instead, I edited the patch text using
search&replace to remove the braces. Unfortunately this is not fully automatic,
in particular it didn't deal well with if-else-if-else blocks and ifdefs, so
there is an increased likelikehood be some bugs in such spots.

I also removed part of the patch that coccinelle generated for udev, where we
returns -1 for failure. This should be fixed independently.
2018-11-22 10:54:38 +01:00
Michal Koutný aa1f95d264 core: Detect initial timer state from serialized data
We keep a mark whether a single-shot timer was triggered in the caller's
variable initial. When such a timer elapses while we are
serializing/deserializing the inner state, we consider the timer
incorrectly as elapsed and don't trigger it later.

This patch exploits last_trigger timestamp that we already serialize,
hence we can eliminate the argument initial completely.

A reproducer for OnBootSec= timers:
        cat >repro.c <<EOD
        /*
         * Compile:	gcc repro.c -o repro
         * Run:		./repro
         */
        #include <errno.h>
        #include <fcntl.h>
        #include <stdio.h>
        #include <stdlib.h>
        #include <sys/stat.h>
        #include <sys/types.h>
        #include <time.h>
        #include <unistd.h>

        int main(int argc, char *argv[]) {
        	char command[1024];
        	int pause;

        	struct timespec now;

        	while (1) {
        		usleep(rand() % 200000); // prevent periodic repeats
               		clock_gettime(CLOCK_MONOTONIC, &now);
        		printf("%i\n", now.tv_sec);

        		system("rm -f $PWD/mark");
        		snprintf(command, 1024, "systemd-run --user --on-boot=%i --timer-property=AccuracySec=100ms "
        					"touch $PWD/mark", now.tv_sec + 1);
        		system(command);
        		system("systemctl --user list-timers");
        		pause = (1000000000 - now.tv_nsec)/1000 - 70000; // fiddle to hit the middle of reloading
        		usleep(pause > 0 ? pause : 0);
        		system("systemctl --user daemon-reload");
        		sync();
        		sleep(2);
        		if (open("./mark", 0) < 0)
        			if (errno == ENOENT) {
        				printf("mark file does not exist\n");
        				break;
        			}
        	}

        	return 0;
        }
        EOD
2018-11-21 11:28:33 +01:00
Lennart Poettering 818623aca5
Merge pull request #10860 from keszybz/more-cleanup-2
Do more stuff from main macros
2018-11-21 11:07:31 +01:00
Lennart Poettering e3b8d0637d core: run env generators with non-zero umask
For PID 1 we adjust the umask to 0, but generators should not run that
way, given that they might be implemented as shell scripts and such.
Let's hence explicitly adjust the umask for them.

We already do this for unit generators. Let's do this for env
generators, too.
2018-11-20 23:35:04 +01:00
Zbigniew Jędrzejewski-Szmek 294bf0c34a Split out pretty-print.c and move pager.c and main-func.h to shared/
This is high-level functionality, and fits better in shared/ (which is for
our executables), than in basic/ (which is also for libraries).
2018-11-20 18:40:02 +01:00
Lennart Poettering bb25977244 main: don't freeze PID 1 in containers, exit with non-zero instead
After all we have a nice way to propagate total failures, hence let's
use it.
2018-11-20 17:04:07 +01:00
Lennart Poettering bb85a58208 main: use EXIT_EXCEPTION instead of EXIT_FAILURE at two more exceptional places 2018-11-20 17:04:07 +01:00
Lennart Poettering 79a224c460 main: when reloading PID 1 let's reset the default environment
Otherwise we keep collecting stuff from env generators, and we really
shouldn't.

This was working properly on reexec but not on reload, as for reexec we
would always start fresh, but for reload would reuse the Manager object
and hence its default environment set.

Fixes: #10671
2018-11-19 13:01:19 +01:00
Lennart Poettering 2fbbbf9a5f manager: log on two OOM occasions 2018-11-19 12:22:56 +01:00
Lennart Poettering b08020893a
Merge pull request #10809 from keszybz/unit-log-result
Add helper function for logging unit results
2018-11-19 11:07:07 +01:00
Chris Down a88c5b8ac4 cgroup v2: DefaultCPUAccounting=yes if CPU controller isn't required
We now don't enable the CPU controller just for CPU accounting if we are
on 4.15+ and using pure unified hierarchy, as this is provided
externally to the CPU controller. This makes CPUAccounting=yes
essentially free, so enabling it by default when it's cheap seems like a
good idea.
2018-11-18 12:21:41 +00:00
Chris Down f98c25850f cgroup v2: Don't require CPU controller for CPU accounting in 4.15+
systemd only uses functions that are as of Linux 4.15+ provided
externally to the CPU controller (currently usage_usec), so if we have a
new enough kernel, we don't need to set CGROUP_MASK_CPU for
CPUAccounting=true as the CPU controller does not need to necessarily be
enabled in this case.

Part of this patch is modelled on an earlier patch by Ryutaroh Matsumoto
(see PR #9665).
2018-11-18 12:21:41 +00:00
Lennart Poettering 46f2d09f31 conf-parse: drop unused prototype 2018-11-17 08:47:27 +01:00
Zbigniew Jędrzejewski-Szmek aac99f303a core: introduce a helper function to wrap unit_log_{success,failure}
It's inline so that the compiler can easily optimize away the call to get
status string.
2018-11-16 19:47:07 +01:00
Lennart Poettering ae3cc6ec0d
Merge pull request #10770 from poettering/unit-done-log
improvements to structure log events from PID1
2018-11-16 17:54:19 +01:00
Lennart Poettering 6415fecd4c
Merge pull request #10785 from poettering/cgroup-join-removal
remove JoinControllers= setting
2018-11-16 17:53:26 +01:00
Lennart Poettering 3382cf28b6
Merge pull request #10802 from poettering/hide-only-on
man: let's deprecate PermissionsStartOnly=
2018-11-16 17:53:01 +01:00