Commit Graph

151 Commits

Author SHA1 Message Date
Lennart Poettering b82f71c7ff tree-wide: constify a few static string tables 2019-03-25 14:04:34 +01:00
Daniel Black c53d2d54bd service: make killmode=cgroup|mixed, SendSIGKILL=no services singletons
KillMode=mixed and control group are used to indicate that all
process should be killed off. SendSIGKILL is used for services
that require a clean shutdown. These are typically database
service where a SigKilled process would result in a lengthy
recovery and who's shutdown or startup time is quite variable
(so Timeout settings aren't of use).

Here we take these two factors and refuse to start a service if
there are existing processes within a control group. Databases,
while generally having some protection against multiple instances
running, lets not stress the rigor of these. Also ExecStartPre
parts of the service aren't as rigoriously written to protect
against against multiple use.

closes #8630
2019-01-29 15:35:59 +11:00
Chris Down 4e1dfa45e9 cgroup: s/cgroups? ?v?([0-9])/cgroup v\1/gI
Nitpicky, but we've used a lot of random spacings and names in the past,
but we're trying to be completely consistent on "cgroup vN" now.

Generated by `fd -0 | xargs -0 -n1 sed -ri --follow-symlinks 's/cgroups?  ?v?([0-9])/cgroup v\1/gI'`.

I manually ignored places where it's not appropriate to replace (eg.
"cgroup2" fstype and in src/shared/linux).
2019-01-03 11:32:40 +09:00
Chris Down 5f086dc7db cgroup: Imply systemd.unified_cgroup_hierarchy=1 on cgroup_no_v1=all
cgroup_no_v1=all doesn't make a whole lot of sense with legacy hierarchy
(where we use v1 hierarchy for everything), or hybrid hierarchy (where
we still use v1 hierarchy for resource control).

Right now we have to tell people to add both cgroup_no_v1=all and
systemd.unified_cgroup_hierarchy=1 to get the desired behaviour,
however in reality it's hard to imagine any situation where someone
passes cgroup_no_v1=all but *doesn't* want to use the unified cgroup
hierarchy.

Make it so that cgroup_no_v1=all produces intuitive behaviour in systemd
by default, although it can still be disabled by passing
systemd.unified_cgroup_hierarchy=0 explicitly.
2018-12-21 13:29:27 +00:00
Yu Watanabe 938dbb292a
Merge pull request #10901 from poettering/startswith-list
add new STARTSWITH_SET() macro
2018-11-26 22:40:51 +09:00
Lennart Poettering 0cbd293e12 tree-wide: port over more cases to STR_IN_SET() 2018-11-26 14:08:46 +01:00
Lennart Poettering 67558d15ec cgroup: extend cg_mask_supported() comment a bit 2018-11-23 13:41:37 +01:00
Lennart Poettering 27adcc9737 cgroup: be more careful with which controllers we can enable/disable on a cgroup
This changes cg_enable_everywhere() to return which controllers are
enabled for the specified cgroup. This information is then used to
correctly track the enablement mask currently in effect for a unit.
Moreover, when we try to turn off a controller, and this works, then
this is indicates that the parent unit might succesfully turn it off
now, too as our unit might have kept it busy.

So far, when realizing cgroups, i.e. when syncing up the kernel
representation of relevant cgroups with our own idea we would strictly
work from the root to the leaves. This is generally a good approach, as
when controllers are enabled this has to happen in root-to-leaves order.
However, when controllers are disabled this has to happen in the
opposite order: in leaves-to-root order (this is because controllers can
only be enabled in a child if it is already enabled in the parent, and
if it shall be disabled in the parent then it has to be disabled in the
child first, otherwise it is considered busy when it is attempted to
remove it in the parent).

To make things complicated when invalidating a unit's cgroup membershup
systemd can actually turn off some controllers previously turned on at
the very same time as it turns on other controllers previously turned
off. In such a case we have to work up leaves-to-root *and*
root-to-leaves right after each other. With this patch this is
implemented: we still generally operate root-to-leaves, but as soon as
we noticed we successfully turned off a controller previously turned on
for a cgroup we'll re-enqueue the cgroup realization for all parents of
a unit, thus implementing leaves-to-root where necessary.
2018-11-23 13:41:37 +01:00
Lennart Poettering 94f344fb03 cgroup: tweak log message, so that it doesn't claim we always enable controllers when we actually disable them 2018-11-23 12:24:37 +01:00
Lennart Poettering 54b5ba1d1f cgroup: propagate errors when we cannot open cgroup.subtree_control 2018-11-23 12:24:37 +01:00
Zbigniew Jędrzejewski-Szmek baaa35ad70 coccinelle: make use of SYNTHETIC_ERRNO
Ideally, coccinelle would strip unnecessary braces too. But I do not see any
option in coccinelle for this, so instead, I edited the patch text using
search&replace to remove the braces. Unfortunately this is not fully automatic,
in particular it didn't deal well with if-else-if-else blocks and ifdefs, so
there is an increased likelikehood be some bugs in such spots.

I also removed part of the patch that coccinelle generated for udev, where we
returns -1 for failure. This should be fixed independently.
2018-11-22 10:54:38 +01:00
Chris Down f98c25850f cgroup v2: Don't require CPU controller for CPU accounting in 4.15+
systemd only uses functions that are as of Linux 4.15+ provided
externally to the CPU controller (currently usage_usec), so if we have a
new enough kernel, we don't need to set CGROUP_MASK_CPU for
CPUAccounting=true as the CPU controller does not need to necessarily be
enabled in this case.

Part of this patch is modelled on an earlier patch by Ryutaroh Matsumoto
(see PR #9665).
2018-11-18 12:21:41 +00:00
Lennart Poettering 6415fecd4c
Merge pull request #10785 from poettering/cgroup-join-removal
remove JoinControllers= setting
2018-11-16 17:53:26 +01:00
Lennart Poettering f20db19954 cocci: simplify some if checks 2018-11-16 16:05:29 +01:00
Lennart Poettering e353faa0d6 cgroup-util: when attaching/creating cgroups in multiple hierarchies, take jointly mounted controlelrs into account
If we create a cgroup in one controller it might already have been
created in another too, if we have jointly mounted controllers. Take
that into consideration.
2018-11-16 14:54:13 +01:00
Lennart Poettering f99850a0d4 cgroup-util: FLAGS_SET()ify all things 2018-10-26 18:43:34 +02:00
Lennart Poettering 03afd78029 cgroup: when discovering which controllers the kernel supports mask with what we support
Let's use our new CGROUP_MASK_V1 and CGROUP_MASK_V2 definitions for
this.
2018-10-26 18:43:34 +02:00
Lennart Poettering ab275f2386 cgroup-util: before operating on a mounted cgroup controller check if it actually can be mounted
We now have the "BPF" pseudo-controllers. These should never be assumed
to be accessible as /sys/fs/cgroup/<controller> and not through
"cgroup.subtree_control" either, hence always check explicitly before we
go to the file system. We do this through our new CGROUP_MASK_V1 and
CGROUP_MASK_V2 definitions.
2018-10-26 18:43:34 +02:00
Lennart Poettering 604028de60 cgroup-util: disable buffering for cg_enable_everywhere() when writing to cgroup attributes
Let's better be safe than sorry.
2018-10-26 18:43:34 +02:00
Lennart Poettering 38a90d45ad cgroup-util: don't expect cg_mask_from_string()'s return value to be initialized
Also, when we fail, don't clobber the return value.

This brings the call more in-line with our usual coding style, and
removes surprises.

None of the callers seemed to care about this behaviour.
2018-10-26 18:43:34 +02:00
Lennart Poettering 0887fa711c cgroup-util: debug log if /proc/self/ns/cgroup is not available for unexpected reasons 2018-10-26 18:43:34 +02:00
Lennart Poettering 490c5a37cb tree-wide: some automatic coccinelle fixes (#10463)
Nothing fancy, just coccinelle doing its work.
2018-10-20 00:07:46 +09:00
Lennart Poettering d2b39cb606 cgroup-util: FOREACH_LINE() excorcism 2018-10-18 16:23:45 +02:00
Zbigniew Jędrzejewski-Szmek 1bcf3fc6c5 core: return true from cg_is_empty* on ENOENT 2018-10-17 17:49:57 +02:00
Roman Gushchin 084c700780 core: support cgroup v2 device controller
Cgroup v2 provides the eBPF-based device controller, which isn't currently
supported by systemd. This commit aims to provide such support.

There are no user-visible changes, just the device policy and whitelist
start working if cgroup v2 is used.
2018-10-09 09:47:51 -07:00
Roman Gushchin 17f149556a core: refactor bpf firewall support into a pseudo-controller
The idea is to introduce a concept of bpf-based pseudo-controllers
to make adding new bpf-based features easier.
2018-10-09 09:46:08 -07:00
Luke Shumaker f09e86bcaa cgroup-util: cg_kernel_controllers(): Fix comment about including "name="
Remove "arbitrary named hierarchies" from the list of things that
cg_kernel_controllers() might return, and clarify that "name="
pseudo-controllers are not included in the returned list.

/proc/cgroups does not contain "name=" pseudo-controllers, and
cg_kernel_controllers() makes no effort to enumerate them via a different
mechanism.
2018-07-20 12:12:02 -04:00
Lennart Poettering 0c69794138 tree-wide: remove Lennart's copyright lines
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
2018-06-14 10:20:20 +02:00
Lennart Poettering 818bf54632 tree-wide: drop 'This file is part of systemd' blurb
This part of the copyright blurb stems from the GPL use recommendations:

https://www.gnu.org/licenses/gpl-howto.en.html

The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.

hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
2018-06-14 10:20:20 +02:00
Zbigniew Jędrzejewski-Szmek 65be7e0652 pid1: do not reset subtree_control on already-existing units with delegation
Fixes #8364.

Reproducer:
$ sudo systemd-run -t -p Delegate=yes bash
# mkdir /sys/fs/cgroup/system.slice/run-u6958.service/supervisor
# echo $$ > /sys/fs/cgroup/system.slice/run-u6958.service/supervisor/cgroup.procs
# echo +memory > /sys/fs/cgroup/system.slice/run-u6958.service/cgroup.subtree_control
# cat /sys/fs/cgroup/system.slice/run-u6958.service/cgroup.subtree_control
memory
# systemctl daemon-reload
# cat /sys/fs/cgroup/system.slice/run-u6958.service/cgroup.subtree_control
(empty)

With patch, the last command shows 'memory'.
2018-06-11 18:12:30 +02:00
Yu Watanabe 858d36c1ec path-util: introduce path_simplify()
The function is similar to path_kill_slashes() but also removes
initial './', trailing '/.', and '/./' in the path.
When the second argument of path_simplify() is false, then it
behaves as the same as path_kill_slashes(). Hence, this also
replaces path_kill_slashes() with path_simplify().
2018-06-03 23:39:26 +09:00
Antique 96aa6591d1 cgroup-util: fix enabling of controllers (#8816)
If enabling controller for some reason fails we need to clear error
for the FILE stream.  Enabling remaining controllers would otherwise
fail because write_string_stream_ts() checks for ferror(f) and returns
-EIO if there is one.

Broken by commit <77fa610b22>.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
2018-04-26 12:37:35 +02:00
Lennart Poettering 57ea45e11a util-lib: introduce new empty_or_root() helper (#8746)
We check the same condition at various places. Let's add a trivial,
common helper for this, and use it everywhere.

It's not going to make things much faster or much shorter, but I think a
lot more readable
2018-04-18 14:20:49 +02:00
Zbigniew Jędrzejewski-Szmek 11a1589223 tree-wide: drop license boilerplate
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.

I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
2018-04-06 18:58:55 +02:00
Yu Watanabe 1cc6c93a95 tree-wide: use TAKE_PTR() and TAKE_FD() macros 2018-04-05 14:26:26 +09:00
Zbigniew Jędrzejewski-Szmek 989290dbf1 fuzz-unit-file: add __has_feature(memory_sanitizer) when skipping ListenNetlink=
https://clang.llvm.org/docs/MemorySanitizer.html#id5 documents this
check as the way to detect MemorySanitizer at compilation time. We
only need to skip the test if MemorySanitizer is used.

Also, use this condition in cg_slice_to_path(). There, the code that is
conditionalized is not harmful in any way (it's just unnecessary), so remove
the FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION condition.

Fixes #8482.
2018-03-26 15:28:03 +02:00
Lennart Poettering ae2a15bc14 macro: introduce TAKE_PTR() macro
This macro will read a pointer of any type, return it, and set the
pointer to NULL. This is useful as an explicit concept of passing
ownership of a memory area between pointers.

This takes inspiration from Rust:

https://doc.rust-lang.org/std/option/enum.Option.html#method.take

and was suggested by Alan Jenkins (@sourcejedi).

It drops ~160 lines of code from our codebase, which makes me like it.
Also, I think it clarifies passing of ownership, and thus helps
readability a bit (at least for the initiated who know the new macro)
2018-03-22 20:21:42 +01:00
Zbigniew Jędrzejewski-Szmek c028bed19b basic/cgroup-util: fix typo in debug message 2018-03-18 21:05:43 +01:00
Zbigniew Jędrzejewski-Szmek 1c56d50109 fuzz: add test case for oss-fuzz #6897 and a work-around
The orignal reproducer from oss-fuzz depends on the hostname (via %H and %c).
The hostname needs a dash for msan to report this, so a simpler case from
@evverx with the dash hardcoded is also added.

The issue is a false positive from msan, which does not instruct stpncpy
(https://github.com/google/sanitizers/issues/926). Let's add a work-around
until this is fixed.
2018-03-17 09:48:22 +01:00
Zbigniew Jędrzejewski-Szmek eef03d70c1 basic/cgroup-util: remove unused variable 2018-03-06 10:41:41 +01:00
Lennart Poettering 902c8502ad
Merge pull request #8149 from poettering/fake-root-cgroup
Properly synthesize CPU+memory accounting data for the root cgroup
2018-03-01 11:10:24 +01:00
Zbigniew Jędrzejewski-Szmek 9177fa9f2b basic/cgroup-util: simplify cg_get_keyed_attribute(), add test
I didn't like the nested loop where we'd count what we have acquired already,
since we should always know that.
2018-03-01 09:34:33 +01:00
Zbigniew Jędrzejewski-Szmek 00d4b1e684 basic: shorten the code a bit in two places
gcc complains that len might be used unitialized, but afaict, this is not true.
2018-02-26 15:47:12 +01:00
Lennart Poettering b734a4ff14 cgroup-util: rework cg_get_keyed_attribute() a bit
Let's make sure we don't clobber the return parameter on failure, to
follow our coding style. Also, break the loop early if we have all
attributes we need.

This also changes the keys parameter to a simple char**, so that we can
use STRV_MAKE() for passing the list of attributes to read.

This also makes it possible to distuingish the case when the whole
attribute file doesn't exist from one key in it missing. In the former
case we return -ENOENT, in the latter we now return -ENXIO.
2018-02-09 18:35:52 +01:00
Zbigniew Jędrzejewski-Szmek dae8b82eb9 Add mkdir_errno_wrapper() and use instead of mkdir() in various places
We'd pass pointers to mkdir and mkdir_label to call in various places. mkdir
returns the error in errno while mkdir_label returns the error directly.
2017-12-16 13:28:22 +01:00
Lennart Poettering fbd0b64f44
tree-wide: make use of new STRLEN() macro everywhere (#7639)
Let's employ coccinelle to do this for us.

Follow-up for #7625.
2017-12-14 19:02:29 +01:00
Lennart Poettering 35bbbf85e0 basic: turn off stdio locking for a couple of helper calls
These helper calls are potentially called often, and allocate FILE*
objects internally for a very short period of time, let's turn off
locking for them too.
2017-12-14 10:46:19 +01:00
Lennart Poettering 62b9bb2661 cgroup-util: merge cg_set_tasks_access() and cg-set_group_access() into one
We never use these functions seperately, hence don't bother splitting
them into to.

Also, simplify things a bit, and maintain tables for the attribute files
to chown. Let's also update those tables a bit, and include thenew
"cgroup.threads" file in it, that needs to be delegated too, according
to the documentation.
2017-11-25 17:08:21 +01:00
Daniel Lockyer 95333b2bed Replace free and nullify by mfree 2017-11-24 09:37:50 +00:00
Lennart Poettering 5e20b0a452 cgroup: properly determine cgroups zombie processes belong to
When a process becomes a zombie its cgroup might be deleted. Let's add
some minimal code to detect cases like this, so that we can still
attribute this back to the original cgroup.
2017-11-21 11:54:08 +01:00