Commit graph

4237 commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek 7a6d057c28
Merge pull request #9084 from yuwata/fix-8965
core: make StateDirectory= or friends works with DynamicUser= and RootDirectory=/RootImage=
2018-05-29 15:13:34 +02:00
Lennart Poettering b294e5943f core: introduce specifiers for /tmp and /var/tmp
This corresponds nicely with the specifiers we already pass for
/var/lib, /var/cache, /run and so on.

This is particular useful to update the test-path service files to
operate without guessable files, thus allowing multiple parallel
test-path invocations to pass without issues (the idea is to set $TMPDIR
early on in the test to some private directory, and then only use the
new %T or %V specifier to refer to it).
2018-05-29 11:39:15 +02:00
Yu Watanabe bbc1acaba0 core: add --dump-bus-properties option to systemd
If systemd is invoked with this option, this dumps all bus properties.
This may be useful for shell completion for `systemctl --property`.
2018-05-28 18:13:19 +09:00
Yu Watanabe 19e69a9c7a core: include sd-bus-vtable.h in dbus-*.h 2018-05-28 13:36:35 +09:00
Yu Watanabe 37c56f89d2 core: setup mount namespace when RootDirectory= and RuntimeDirectory= or friends are set
The directories specified by RuntimeDirectory= or friends are created
on host. So, it is necessary to bind-mount them on root directory.
2018-05-25 17:33:03 +09:00
Yu Watanabe 5609f6888b core: make StateDirectory= or friends works with DynamicUser= and RootDirectory=/RootImage=
The symbolic links to private directories specified by StateDirectory=
or its friends are created on the host. So, when DynamicUser= and
RootDirectory=/RootImage= are set, then the executed process cannot
access private directory.
This makes the private directories are mounted on the non-private place
when both DynamicUser= and RootDirectory=/RootImage= are set.

Fixes #8965.
2018-05-25 17:25:17 +09:00
Lennart Poettering d58ad743f9 os-util: add helpers for finding /etc/os-release
Place this new helpers in a new source file os-util.[ch], and move the
existing and related call path_is_os_tree() to it as well.
2018-05-24 17:01:57 +02:00
Lennart Poettering 1a5a177eaf fileio: accept FILE* in addition to path in parse_env_file()
Most our other parsing functions do this, let's do this here too,
internally we accept that anyway. Also, the closely related
load_env_file() and load_env_file_pairs() also do this, so let's be
systematic.
2018-05-24 17:01:57 +02:00
Lennart Poettering cdc0f9be92
Merge pull request #8817 from yuwata/cleanup-nsflags
core: allow to specify RestrictNamespaces= multiple times
2018-05-24 16:49:13 +02:00
Lennart Poettering 2ad98f977f
Merge pull request #9040 from yuwata/resolved-networkd-use-dynamic-user
Set DynamicUser= to resolved and networkd
2018-05-23 21:10:39 +02:00
Lennart Poettering 97745ac601
Merge pull request #9039 from yuwata/fix-device-allow
core: support unit specifiers in IODeviceWeight= and friends
2018-05-23 21:07:22 +02:00
Zbigniew Jędrzejewski-Szmek 14d0afb94d
Merge pull request #9065 from poettering/fixup-tab-double-newline
tree-wide: fix some TABs and double newlines
2018-05-22 17:14:48 +02:00
Lennart Poettering 8904ab86b0
Merge pull request #9062 from poettering/parse-conf-macro
add new CONFIG_PARSER_PROTOTYPE() macro
2018-05-22 16:14:49 +02:00
Zbigniew Jędrzejewski-Szmek 52d2566ac7 pid1: fix ShowStatus property
It is not const, because a) systemd can bump it on its own if
errors occur, and b) the user can change it using signals.
Also it's not boolean.

$ busctl get-property org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager ShowStatus
b true
$ sudo kill -SIGRTMIN+21 1
$ busctl get-property org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager ShowStatus
b false

Fixes #4503.
2018-05-22 16:14:20 +02:00
Lennart Poettering 56b00d0028 tree-wide: remove some double newlines in headers, too 2018-05-22 16:13:45 +02:00
Yu Watanabe fdff1da299 core: chown RuntimeDirectory= if DynamicUser= is set
When DynamicUser= is set, then RuntimeDirectory= should be always
chowned, as the service unit may enable RuntimeDirectoryPreserve=,
and the uid or gid may changed from the last run.
This also makes easier to migrate the service to use DynamicUser=.
2018-05-22 22:26:22 +09:00
Lennart Poettering a210692525 tree-wide: port over all code to the new CONFIG_PARSER_PROTOTYPE() macro
This makes most header files easier to look at. Also Emacs gets really
slow when browsing through large sections of overly long prototypes,
which is much improved by this macro.

We should probably not do something similar with too many other cases,
as macros like this might help readability for some, but make it worse
for others. But I think given the complexity of this specific prototype
and how often we use it, it's worth doing.
2018-05-22 13:18:44 +02:00
Zbigniew Jędrzejewski-Szmek 509ad7897c core/job: shortening
Follow-up for a7a7163df7.
2018-05-20 23:25:04 +09:00
Yu Watanabe c9f620bfec core: support unit specifiers in IODeviceWeight= and friends 2018-05-20 23:08:50 +09:00
Yu Watanabe 063c4b1a92 core/load-fragment: update log messages 2018-05-20 23:08:29 +09:00
Zbigniew Jędrzejewski-Szmek 424e80b4b7 rpm: add macros for common configuration dirs
%_environmnentdir /usr/lib/environment.d
%_modulesloaddir /usr/lib/modules-load.d
%_modprobedir /usr/lib/modprobe.d

This makes installing files there more convenient because people don't need to
construct the path from %_prefix/lib/… .

See https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/GBF5WJLTQVSXMHGYGBF3723ZYCWFBR7C/.
2018-05-19 17:02:59 +02:00
Zbigniew Jędrzejewski-Szmek 030caa6501 rpm: simplify redirects to /dev/null 2018-05-19 13:04:57 +02:00
Zbigniew Jędrzejewski-Szmek 28d36da64a rpm: remove confusing --user before --global
Fixes #9027.
2018-05-19 13:04:57 +02:00
David Tardon a7a7163df7 fix race between daemon-reload and other commands
When "systemctl daemon-reload" is run at the same time as "systemctl
start foo", the latter might hang. That's because commands like start
wait for JobRemoved signal to know when the job is finished. But if the
job is finished during reloading, the signal is never sent.

The hang can be easily reproduced by running

    # for ((N=1; N>0; N++)) ; do echo $N ; systemctl daemon-reload ; done
    # for ((N=1; N>0; N++)) ; do echo $N ; systemctl start systemd-coredump.socket ; done

in two different terminals. The start command will hang after 1-2
iterations.

This keeps track of jobs that were started before reload and finished
during it and sends JobRemoved after the reload has finished.
2018-05-19 11:37:00 +02:00
Lennart Poettering 6f8fa29465
Merge pull request #8981 from keszybz/ratelimit-and-dbus
Ratelimit renaming and dbus error message fix
2018-05-18 21:38:30 +02:00
Franck Bui 752bcb770b core: keep the kernel coredump defaults when systemd-coredump is disabled
If systemd-coredump is disabled (at build time), PID1 should keep the
(old) kernel defaults as they are.
2018-05-18 20:37:54 +02:00
Lennart Poettering c385b10a13
Merge pull request #8993 from keszybz/sd-resolve-coverity-and-related-fixes
sd-resolve coverity and related fixes
2018-05-18 20:30:12 +02:00
Lennart Poettering 0612ac38a7
Merge pull request #8985 from yuwata/bus-macro-3
tree-wide: use BUS_DEFINE_PROPERTY_GET* macros
2018-05-18 20:25:52 +02:00
Lennart Poettering c55b280158
Merge pull request #9026 from yuwata/followup-9021
core: refuse StateDirectory=private
2018-05-18 20:02:43 +02:00
Yu Watanabe 5e2d3a5496 core: use free_and_replace() 2018-05-18 17:35:23 +09:00
Yu Watanabe e760d687dc core: fix coding style 2018-05-18 17:34:59 +09:00
Zbigniew Jędrzejewski-Szmek 7fbb5dd5e2
Merge pull request #8940 from poettering/nspawn-attrs
nspawn: make a couple of additional container parameters configurable
2018-05-18 10:33:10 +02:00
Yu Watanabe 8994a11790 core: refuse StateDirectory=private
Follow-up for e886568873 (#9021).
2018-05-18 13:30:21 +09:00
Lennart Poettering e886568873 core: refuse StateDirectory=private, as our internal DynamicUser=1 symlink is called that way
Let's better be safe than sorry.
2018-05-18 10:59:15 +09:00
Lennart Poettering 9f8168eb23 process-util: add new helper call for adjusting the OOM score
And let's make use of it in execute.c
2018-05-17 20:47:21 +02:00
Lennart Poettering e9eb2c02f0 basic: split parsing of the OOM score adjust value into its own function in parse-util.c
And port config_parse_exec_oom_score_adjust() over to use it.

While we are at it, let's also fix config_parse_exec_oom_score_adjust()
to accept an empty string for turning off OOM score adjustments set
earlier.
2018-05-17 20:47:21 +02:00
Lennart Poettering 34a5df58da rlimit-util: introduce setrlimit_closest_all()
This new call applies all configured resource limits in one.
2018-05-17 20:40:04 +02:00
Lennart Poettering 31ce987c2b rlimit-util: add a common destructor call for arrays of struct rlimit 2018-05-17 20:36:52 +02:00
Lennart Poettering 4f424df760 core: move config_parse_limit() to the generic conf-parser.[ch]
That way we can use it in nspawn.

Also, while we are at it, let's rename the call config_parse_rlimit(),
i.e. insert the "r", to clarify what kind of limit this is about.
2018-05-17 20:36:52 +02:00
Lennart Poettering 6550c24c7f rlimit-util: rework rlimit_{from|to}_string() to work without "Limit" prefix
let's make the call more generic, so that we can also easily use it for
parsing "RLIMIT_xyz" style constants.
2018-05-17 20:36:52 +02:00
Yu Watanabe a8f2b6912e core: systemd1.manage-unit-files policy implies systemd1.manage-units
This makes e.g. `systemctl enable --now` ask password only once.

Follow-up for b07abe63d3abf03df559f7cb2c9863943df22274.
2018-05-18 00:02:58 +09:00
Yu Watanabe 51b66c7a8a core: systemd1.manage-unit-files policy implies systemd1.reload-daemon
Closes #5013.
2018-05-15 15:01:05 -07:00
Felipe Sateler 57b7a260c2 core: undo the dependency inversion between unit.h and all unit types 2018-05-15 14:24:34 -04:00
Felipe Sateler 90a8f0b9a9 core: Break circular dependency between unit.h and cgroup.h 2018-05-15 14:23:32 -04:00
Yu Watanabe 14f7edb094 core/dbus-unit: do not pass whole Unit object 2018-05-15 23:11:26 +09:00
Yu Watanabe 874bd264a0 core/dbus-unit: introduce unit_can_{start,stop,isolate}_refuse_manual() functions 2018-05-15 23:11:20 +09:00
Yu Watanabe 92c23c5a70 core: use BUS_DEFINE_PROPERTY_GET* macros 2018-05-15 23:11:16 +09:00
Alan Jenkins 4330dc03a0 service: FileDescriptorStoreMax should also imply NotifyAccess
Commenting out "WatchdogTimeout=3min" in systemd-logind.service causes
NotifyAccess to go from "main" to "none", breaking support for logind
restart.  Let's fix that.
2018-05-15 12:33:56 +02:00
Zbigniew Jędrzejewski-Szmek 6978efcffb core/mount-setup: remove part of check which is always true
f1470e424b removed one check, but missed a similar
one a few lines down.

CID #1390949.
2018-05-14 08:50:00 +02:00
Yu Watanabe af4fa99d6a core: use _cleanup_set_free_ instread of _cleanup_(set_freep) 2018-05-14 14:13:57 +09:00
Zbigniew Jędrzejewski-Szmek 886eaf052d core: remove two unnecessary newlines 2018-05-13 22:08:30 +02:00
Zbigniew Jędrzejewski-Szmek 930c124c3f pid1: do not write invalid utf-8 in error message
We'd write a sequence that was invalid unicode and this caused the d-bus
connection to be terminated:

$ busctl get-property org.freedesktop.systemd1 /org/freedesktop/systemd1/unit/dbus_2esocket org.freedesktop.systemd1.Unit SubState
s "running"
$ busctl get-property org.freedesktop.systemd1 /org/freedesktop/systemd1/unit/dbus_e2socket org.freedesktop.systemd1.Unit SubState
Remote peer disconnected
$ busctl get-property org.freedesktop.systemd1 /org/freedesktop/systemd1/unit/dbus_e2socket org.freedesktop.systemd1.Unit SubState
(hangs)

Fixes #8978.
2018-05-13 22:08:30 +02:00
Zbigniew Jędrzejewski-Szmek 7994ac1d85 Rename ratelimit_test to ratelimit_below
When I see "test", I have to think three times what the return value
means. With "below" this is immediately clear. ratelimit_below(&limit)
sounds almost like English and is imho immediately obvious.

(I also considered ratelimit_ok, but this strongly implies that being under the
limit is somehow better. Most of the times this is true, but then we use the
ratelimit to detect triple-c-a-d, and "ok" doesn't fit so well there.)

C.f. a1bcaa07.
2018-05-13 22:08:30 +02:00
Yu Watanabe 3ff52e8f52 dbus-manager: introduce property_get_{hashmap,set}_size() 2018-05-13 12:21:17 +09:00
Yu Watanabe d1d8547137 dbus-unit: check userdata before obtaining data 2018-05-13 12:21:15 +09:00
Yu Watanabe cb7f88fcf4 dbus-unit: use BUS_DEFINE_PROPERTY_GET* macros 2018-05-13 12:21:13 +09:00
Yu Watanabe 6bfb45bea4 dbus-swap: use BUS_DEFINE_PROPERTY_GET* macros 2018-05-13 12:21:11 +09:00
Yu Watanabe a54f28bc1e dbus-socket: use BUS_DEFINE_PROPERTY_GET* macros 2018-05-13 12:21:09 +09:00
Yu Watanabe f724fd4c25 dbus-mount: use BUS_DEFINE_PROPERTY_GET* macros 2018-05-13 12:21:08 +09:00
Yu Watanabe 23c9a63a98 dbus-manager: use BUS_DEFINE_PROPERTY_GET* macros 2018-05-13 12:21:06 +09:00
Yu Watanabe 019b34cae6 dbus-execute: use BUS_DEFINE_PROPERTY_GET* macros 2018-05-13 12:21:04 +09:00
David Tardon 95f14a3e21 core: use automatic cleanup more 2018-05-12 18:29:41 +02:00
David Tardon c0a1bfacfe systemd-analyze: make dump work for large # of units
If there is a large number of units, the size of the generated dump
string can overstep DBus message size limit. So let's pass that string
via a fd.
2018-05-11 08:11:02 -07:00
Lennart Poettering e4915c2797
Merge pull request #8953 from yuwata/bus-macro
core: simplify dbus properties
2018-05-10 22:51:17 -07:00
Yu Watanabe 945403e6ed path-util: introduce empty_to_root() and use it many places 2018-05-11 01:47:33 +09:00
Yu Watanabe 54138a8de1 core: merge duplicated functions 2018-05-11 01:41:06 +09:00
Yu Watanabe 79a603758d core: send NULL instead of empty string 2018-05-11 01:22:49 +09:00
Yu Watanabe 9d5527f26e core: use offsetof() for Syslog{Level,Facility} dbus properties 2018-05-11 00:39:52 +09:00
Zbigniew Jędrzejewski-Szmek 717fb9bc24
Merge pull request #8950 from dtardon/cleanup
use automatic cleanup more
2018-05-10 17:23:40 +02:00
Yu Watanabe 0515650329 core: use bus_property_get_*() functions instead of NULL 2018-05-10 23:02:57 +09:00
Yu Watanabe cf9d43a8e0 core: drop property_get_syscall_errno() 2018-05-10 22:36:13 +09:00
Yu Watanabe c0159e2036 core: drop property_get_{capability_bounding_set,ambient_capabilities}() 2018-05-10 22:32:12 +09:00
Yu Watanabe 491eecb376 core: use BUS_DEFINE_PROPERTY_GET_ENUM() macro 2018-05-10 22:26:59 +09:00
Yu Watanabe 73b84e922e core: drop 'bus_' prefix from bus_property_get_protect_{home,system}() 2018-05-10 22:26:27 +09:00
Yu Watanabe b3bc33e6c6 core: simplify property_get_cpu_affinity() 2018-05-10 22:25:00 +09:00
David Tardon 921b598716 basic: use automatic cleanup more 2018-05-10 14:04:30 +02:00
Zbigniew Jędrzejewski-Szmek f1470e424b core/mount-setup: remove part of check which is always true
k was set to join_controllers at this point and only incremented, so
it cannot be null at this point.

CID #1390949.
2018-05-10 02:03:23 +02:00
Yu Watanabe 130d3d22e9 tree-wide: use strv_free_and_replace() macro 2018-05-10 00:57:34 +09:00
Zbigniew Jędrzejewski-Szmek 6b1ca2a948
Merge pull request #8898 from poettering/nspawn-mount-block
some nspawn cgroup and mount lock-down fixes
2018-05-08 12:54:58 +02:00
Yu Watanabe 348b44372f meson: generate m4 preprocessor from config.h (#8914) 2018-05-07 11:17:35 +02:00
Yu Watanabe aa9d574de9 load-fragment: allow to specify RestrictNamespaces= multiple times
If multiple RestrictNamespaces= settings are set, then merge the settings.
This also drops supporting "~yes" and "~no".
2018-05-05 11:07:37 +09:00
Yu Watanabe 86c2a9f1c2 nsflsgs: drop namespace_flag_{from,to}_string()
This also drops namespace_flag_to_string_many_with_check(), and
renames namespace_flag_{from,to}_string_many() to
namespace_flags_{from,to}_string().
2018-05-05 11:07:37 +09:00
Yu Watanabe a3f8b0ef45 nsflags: drop namespace_flag_to_string_many_with_check()
We always ignore the unused bits. So, it is not necessary to check
them.
2018-05-05 11:07:37 +09:00
Lennart Poettering 4e2c0a227e namespace: extend list of masked files by ProtectKernelTunables=
This adds a number of entries nspawn already applies to regular service
namespacing too. Most importantly let's mask /proc/kcore and
/proc/kallsyms too.
2018-05-03 17:46:31 +02:00
Lennart Poettering fe80fcc7e8 mount-setup: add a comment that the character/block device nodes are "optional" (#8893)
if we lack privs to create device nodes that's fine, and creating
/run/systemd/inaccessible/chr or /run/systemd/inaccessible/blk won't
work then. Document this in longer comments.

Fixes: #4484
2018-05-03 23:10:35 +09:00
Yu Watanabe 29a3db75fd util: rename signal_from_string_try_harder() to signal_from_string()
Also this makes the new `signal_from_string()` function reject
e.g, `SIG3` or `SIG+5`.
2018-05-03 16:52:49 +09:00
Lennart Poettering c1c80f6c37
Merge pull request #8866 from yuwata/fix-8842
core: disable namespace sandboxing for '+' prefixed lines
2018-05-02 16:15:26 +02:00
Lennart Poettering 9fc0345551
Merge pull request #8815 from poettering/get-unit-by-cgroup
add new GetUnitByControlGroup API
2018-05-02 10:51:48 +02:00
Yu Watanabe b5a33299b0 core: disable namespace sandboxing for '+' prefixed lines
Fixes #8842.
2018-05-01 13:44:06 +09:00
Lennart Poettering d4fd1cf208 core: enforce that scope units can be started only once
Scope units are populated from PIDs specified by the bus client. We do
that when a scope is started. We really shouldn't allow scopes to be
started multiple times, as the PIDs then might be heavily out of date.
Moreover, clients should have the guarantee that any scope they allocate
has a clear runtime cycle which is not repetitive.
2018-04-27 21:52:45 +02:00
Lennart Poettering da6053d0a7 tree-wide: be more careful with the type of array sizes
Previously we were a bit sloppy with the index and size types of arrays,
we'd regularly use unsigned. While I don't think this ever resulted in
real issues I think we should be more careful there and follow a
stricter regime: unless there's a strong reason not to use size_t for
array sizes and indexes, size_t it should be. Any allocations we do
ultimately will use size_t anyway, and converting forth and back between
unsigned and size_t will always be a source of problems.

Note that on 32bit machines "unsigned" and "size_t" are equivalent, and
on 64bit machines our arrays shouldn't grow that large anyway, and if
they do we have a problem, however that kind of overly large allocation
we have protections for usually, but for overflows we do not have that
so much, hence let's add it.

So yeah, it's a story of the current code being already "good enough",
but I think some extra type hygiene is better.

This patch tries to be comprehensive, but it probably isn't and I missed
a few cases. But I guess we can cover that later as we notice it. Among
smaller fixes, this changes:

1. strv_length()' return type becomes size_t

2. the unit file changes array size becomes size_t

3. DNS answer and query array sizes become size_t

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=76745
2018-04-27 14:29:06 +02:00
Lennart Poettering 385f3a0d8d
Merge pull request #7599 from keszybz/slice-templates
Make user@.service independent of logind
2018-04-26 21:39:05 +02:00
Lennart Poettering be737420b7
Merge pull request #8798 from yuwata/follow-up-8675
device: fix serialization and deserialization of DeviceFound
2018-04-26 21:19:16 +02:00
Yu Watanabe d48013f8a1 core: an empty string resets delegate controllers but enables Delegate= (#8826)
This partially reverts ff1b8455c2.
2018-04-26 15:40:45 +02:00
Zbigniew Jędrzejewski-Szmek 4d86c235b8 core: include Found state in device dumps
In particular, this confirms that the Found state needs to remain a bit-field:

$ systemd-analyze dump |grep 'Found: '|sort |uniq -c
    105 	Found: found-udev
      3 	Found: found-udev,found-mount
      1 	Found: found-udev,found-swap
2018-04-26 10:19:27 +02:00
Yu Watanabe 75d0aba49b device: fix serialization and deserialization of DeviceFound
DeviceFound is a bit flag. So, it is necessary to support the case
that multiple bits are set.

Follow-up for 918e6f1c01.
2018-04-25 22:05:00 +09:00
Zbigniew Jędrzejewski-Szmek 22ce84de18 meson: do not link libsystemd_static into libcore (#8813)
(or in terms of the names of the actual files on disk, do not link
libsystemd-shared-238.a into libcore.a).

libsystemd_static is linked into libsystemd_shared, which in turn means that
anything that links to libcore and libsystemd_shared will get libsystemd_static
twice:

$ cc -o systemd 'systemd@exe/src_core_main.c.o' -Wl,--no-undefined -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -pie -DVALGRIND -Wl,--start-group src/core/libcore.a src/shared/libsystemd-shared-238.a src/shared/libsystemd-shared-238.so -pthread -lrt -lseccomp -lselinux -lmount -lblkid -Wl,--end-group -lseccomp -lpam -L/lib64 -laudit -lkmod -lmount -lrt -lcap -lacl -lcryptsetup -lgcrypt -lip4tc -lip6tc -lseccomp -lselinux -lidn -llzma -llz4 -lblkid '-Wl,-rpath,$ORIGIN/src/shared' -Wl,-rpath-link,/home/zbyszek/src/systemd/build/src/shared

This propagation of the dependency seems correct (in the sense that meson is
doing the expected thing based on the given configuration). Linking was done
this way in the original meson conversion. I was probably trying to get
everything to compile and link, I'm not sure why this particular choice was
made. In the meantime, meson has gotten better at propagating dependencies, so
it's possible that this had slightly different effect in the original
conversion, but I did not verify this. Either way, I think we should drop this.

With the patch:
$ cc -o systemd 'systemd@exe/src_core_main.c.o' -Wl,--no-undefined -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -pie -DVALGRIND -Wl,--start-group src/core/libcore.a src/shared/libsystemd-shared-238.so -pthread -lrt -lseccomp -lselinux -lmount -Wl,--end-group -lblkid -lrt -lseccomp -lpam -L/lib64 -laudit -lkmod -lselinux -lmount '-Wl,-rpath,$ORIGIN/src/shared' -Wl,-rpath-link,/home/zbyszek/src/systemd/build/src/shared

This is more correct because we're not linking the same code twice.
With the patch, libystemd_static is used in exactly four places:
- src/shared/libsystemd-shared-238.so
- src/udev/libudev.so.1.6.10
- pam_systemd.so
- test-bus-error
(compared to a bunch more executables before, including systemd,
systemd-analyze, test-hostname, test-ns, etc.)

Size savings are also noticable:

$ size /var/tmp/inst?/usr/lib/systemd/libsystemd-shared-238.so
   text	   data	    bss	    dec	    hex	filename
2397826	 578488	  15920	2992234	 2da86a	/var/tmp/inst1/usr/lib/systemd/libsystemd-shared-238.so
2397826	 578488	  15920	2992234	 2da86a	/var/tmp/inst2/usr/lib/systemd/libsystemd-shared-238.so

$ size /var/tmp/inst?/usr/lib/systemd/systemd
   text	   data	    bss	    dec	    hex	filename
1858790	 261688	   9320	2129798	 207f86	/var/tmp/inst1/usr/lib/systemd/systemd
1556358	 258704	   8072	1823134	 1bd19e	/var/tmp/inst2/usr/lib/systemd/systemd

$ du -s /var/tmp/inst?
52216	/var/tmp/inst1
50844	/var/tmp/inst2

https://github.com/google/oss-fuzz/issues/1330#issuecomment-384054530 might be related.
2018-04-25 13:47:18 +02:00
Lennart Poettering 267dd427da core: add a new GetUnitByControlGroup() bus call
This is useful for foreign container runtimes implementing the OCI
runtime spec, which only wants to deal with cgroup paths. There's
already an API to translate units into cgroup paths, with this we add
the reverse.
2018-04-25 13:43:48 +02:00
Lennart Poettering 04b56d9d9c core: hide snapshot method calls from introspection data
They are obsolete, let's hide them
2018-04-25 13:31:53 +02:00
Lennart Poettering 8e766630f0 tree-wide: drop redundant _cleanup_ macros (#8810)
This drops a good number of type-specific _cleanup_ macros, and patches
all users to just use the generic ones.

In most recent code we abstained from defining type-specific macros, and
this basically removes all those added already, with the exception of
the really low-level ones.

Having explicit macros for this is not too useful, as the expression
without the extra macro is generally just 2ch wider. We should generally
emphesize generic code, unless there are really good reasons for
specific code, hence let's follow this in this case too.

Note that _cleanup_free_ and similar really low-level, libc'ish, Linux
API'ish macros continue to be defined, only the really high-level OO
ones are dropped. From now on this should really be the rule: for really
low-level stuff, such as memory allocation, fd handling and so one, go
ahead and define explicit per-type macros, but for high-level, specific
program code, just use the generic _cleanup_() macro directly, in order
to keep things simple and as readable as possible for the uninitiated.

Note that before this patch some of the APIs (notable libudev ones) were
already used with the high-level macros at some places and with the
generic _cleanup_ macro at others. With this patch we hence unify on the
latter.
2018-04-25 12:31:45 +02:00
Lennart Poettering 81183d9b99
Merge pull request #8802 from keszybz/errno-reform
Errno reform
2018-04-24 20:25:27 +02:00
Zbigniew Jędrzejewski-Szmek a1bcaa075b core/device: avoid bogus errno use and invert ratelimit_test()
I'm not sure if I understand the original code. AFAICS, errno does not
have to be set at all in this callback.

ratelimit_test() returns positive if we are under limit. The code would only
log if the condition happened very often, which I assume is not inteded, and
this check was supposed to prevent too much logging.
2018-04-24 14:10:27 +02:00
Zbigniew Jędrzejewski-Szmek 4355f1c9da Fix three uses of bogus errno value in logs (and returned value in one case) 2018-04-24 14:10:27 +02:00
Zbigniew Jędrzejewski-Szmek b1c05b98bf tree-wide: avoid assignment of r just to use in a comparison
This changes
  r = ...;
  if (r < 0)
to
  if (... < 0)
when r will not be used again.
2018-04-24 14:10:27 +02:00
Zbigniew Jędrzejewski-Szmek a1113e0865 core/manager: make manager_enumerate() static 2018-04-24 11:44:19 +02:00
Zbigniew Jędrzejewski-Szmek 94b01dae47 core/manager: trivial simplification 2018-04-24 11:44:19 +02:00
Zbigniew Jędrzejewski-Szmek 250e9fadbc Add %j/%J unit specifiers
Those are quite similar to %i/%I, but refer to the last dash-separated
component of the name prefix.

The new functionality of dash-dropins could largely supersede the template
functionality, so it would be tempting to overload %i/%I. But that would
not be backwards compatible. So let's add the two new letters instead.
2018-04-24 10:05:04 +02:00
Franck Bui 036d2eefae device: skip deserialization of device units when udevd is not running
Do not try to party initialize a device during deserialization if it's not
known by udev (anymore) and therefore hasn't been seen during device
enumeration.

The device unit in this case has not been initialized properly and setting it
in the "plugged" state can be confusing.

Actually this happens during every boots when PID switches to the new rootfs:
PID is reexecuted and enumerates devices but since udev is not running, the
list of enumerated devices is empty.
2018-04-20 17:49:28 +02:00
Franck Bui 918e6f1c01 device: make sure to always retroactively start device dependencies
PID1 updates the state of device units upon 2 different events:

 - when it processes an event sent by udev and in this case the device deps are
   started if the device enters in the "plugged" state.

 - when it enumerates all devices during its startup or when it is asked to
   reload its configuration data but in this case the device deps (if any) are
   not retroactively started.

When udev processes a new "add" kernel event, it first registers the new device
in its databases then sends an event to systemd.

If for any reason, systemd is asked to reload its configuration between the
previous 2 steps, it might see for the first time the new device while scanning
/sys for all devices. Only during a second step, udev will send the event for
the new device.

In this peculiar case the device deps wont be started (even though the device
is first seen by PID1).

Indeed when reloading its configurations, PID1 will put the device unit in the
"plugged" state but without starting the device deps. Thereafter PID1 will get
the event from udev for the new device but the device unit will be in "plugged"
state already therefore it won't see any need to start the device dependencies.

Rather than assuming that during the reloading of systemd manager configuration
all devices listed in udev DBs have been already processed and should be put in
the "plugged" state (done by device_coldplug()), this patch does that only for
devices which have been processed via an udev event (device_dispatch_io())
previously. In this case we set "d->found" to "DEVICE_FOUND_UDEV" and we make
also sure to no more initialize "d->found" while enumerating devices. Instead
this field is now saved/restored while devices are serialized.
2018-04-20 17:49:28 +02:00
Lennart Poettering 7a9a0c05d4
Merge pull request #8765 from poettering/test-fixes
some short fixes for the tests
2018-04-19 16:18:46 +02:00
Lennart Poettering 5d13a15b1d tree-wide: drop spurious newlines (#8764)
Double newlines (i.e. one empty lines) are great to structure code. But
let's avoid triple newlines (i.e. two empty lines), quadruple newlines,
quintuple newlines, …, that's just spurious whitespace.

It's an easy way to drop 121 lines of code, and keeps the coding style
of our sources a bit tigther.
2018-04-19 12:13:23 +02:00
Lennart Poettering 8f63253149 core: don't export per-unit metadata files in test mode
We shouldn't clobber the host's /run directories with metadata we export
for our units when we run in test mode.
2018-04-19 11:30:18 +02:00
Zbigniew Jędrzejewski-Szmek ecae73d74a core: do not allow Delegate= on unsupported unit types 2018-04-18 20:07:00 +02:00
Zbigniew Jędrzejewski-Szmek ff1b8455c2 core: fix resetting of Delegate= and properly ignore invalid assignment
The default is false not true. If we say "ignoring" we must return 0.
2018-04-18 20:07:00 +02:00
Lennart Poettering 7aab22308e
Merge pull request #8708 from poettering/namespace-repeat
pid1 namespacing fixes
2018-04-18 18:46:44 +02:00
Lennart Poettering 57ea45e11a util-lib: introduce new empty_or_root() helper (#8746)
We check the same condition at various places. Let's add a trivial,
common helper for this, and use it everywhere.

It's not going to make things much faster or much shorter, but I think a
lot more readable
2018-04-18 14:20:49 +02:00
Lennart Poettering 088696fe29 namespace: rework how we resolve symlinks in mount points
Before this patch we'd resolve all symlinks of bind mounts and other
mount points to establish for a service in advance, and only then start
mounting them. This is problematic, if symlink chains jump around
between directories in a namespace tree, so that to resolve a specific
symlink chain we need to establish another mount already. A typical case
where this happens is if /etc/resolv.conf is a symlink to some file in
/run: in that case we'd normally resolve and mount /etc/resolv.conf
early on, but that's broken, as to do this properly we'd need to resolve
/etc/resolv.conf first, then figure out that /run needs to be mounted
before we can proceed, and thus reorder the order in which we apply
mounts dynamically.

With this change, whenever we are about to apply a mount, we'll do a
single step of the symlink normalization process, patch the mount entry
accordingly, and then sort the list of mounts to establish again, taking
the new path into account. This means that we can correctly deal with
the example above: we might start with wanting to mount /etc/resolv.conf
early, but after resolving it to the path in /run/ we'd push it to the
end of the list, ensuring that /run is mounted first.

(Note that this also fixes another bug: we were following symlinks on
the bind mount source relative to the root directory of the service,
rather than of the host. That's wrong though as we explicitly document
tha the source of bind mounts is always on the host.)
2018-04-18 14:17:50 +02:00
Lennart Poettering e871786273 namespace: improve logging when creating mount source nodes 2018-04-18 14:15:48 +02:00
Lennart Poettering f8b64b5723 namespace: split out calls to normalize mount entry list into new function 2018-04-18 14:15:48 +02:00
Lennart Poettering c9ef8573be namespace: don't consider raw image read-only if /home in it is writable 2018-04-18 14:15:48 +02:00
Lennart Poettering 12777909c9
Merge pull request #8417 from brauner/2018-03-09/add_bind_mount_fallback_to_private_devices
core: fall back to bind-mounts for PrivateDevices= execution environments
2018-04-18 11:56:56 +02:00
Lennart Poettering 2cb36f7c1e
Merge pull request #8575 from keszybz/non-absolute-paths
Do not require absolute paths in ExecStart and friends
2018-04-17 15:54:10 +02:00
Zbigniew Jędrzejewski-Szmek 5008da1ec1 systemd: do not require absolute paths in ExecStart
Absolute paths make everything simple and quick, but sometimes this requirement
can be annoying. A good example is calling 'test', which will be located in
/usr/bin/ or /bin depending on the distro. The need the provide the full path
makes it harder a portable unit file in such cases.

This patch uses a fixed search path (DEFAULT_PATH which was already used as the
default value of $PATH), and if a non-absolute file name is found, it is
immediately resolved to a full path using this search path when the unit is
loaded. After that, everything behaves as if an absolute path was specified. In
particular, the executable must exist when the unit is loaded.
2018-04-16 16:09:46 +02:00
Zbigniew Jędrzejewski-Szmek 4109ede778 core/manager: split out function to verify that unit is loaded and not masked
No functional change.
2018-04-16 16:07:27 +02:00
Giuseppe Scrivano ef42f561fc src/core/dbus-cgroup.c: fix typo contoller -> controller (#8717)
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2018-04-14 11:06:11 +02:00
Lennart Poettering b119facd27 core: minor coding style changes 2018-04-13 11:34:48 +02:00
Lennart Poettering 77cc47d8b4 load-dropin: rename variable
We are converting the unit name into its template, hence name the
variable that way, instead of the misleading 'prefix'.
2018-04-13 11:34:48 +02:00
Zbigniew Jędrzejewski-Szmek af984e137e core/namespace: rework the return semantics of clone_device_node yet again
Returning 0 on not-found/wrong-type is confusing. Let's return -ENXIO in that
case instead, and explicitly ignore it in the call site where we want to do that.
I think this is clearer and less likely to be used errenously in case another
call site is added.

C.f. 152c475f95 and 98b1d2b8d9.
2018-04-12 18:15:33 +02:00
Christian Brauner 1649861744 core: fall back to bind-mounts for PrivateDevices= execution environments
In environments where CAP_MKNOD is not available or inside
user namespaces it is still desirable to enable services to use
PrivateDevices= . So fall back to using bind-mounts on EPERM.
2018-04-12 18:15:12 +02:00
Lennart Poettering 4d09e1c8ba
Merge pull request #8676 from keszybz/drop-license-boilerplate
Drop license boilerplate
2018-04-10 14:53:31 +02:00
Zbigniew Jędrzejewski-Szmek e9e8cbc83a core: minor comment update 2018-04-07 20:05:58 +02:00
Zbigniew Jędrzejewski-Szmek 11a1589223 tree-wide: drop license boilerplate
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.

I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
2018-04-06 18:58:55 +02:00
Yu Watanabe 66f5730471
core/device: remove unnecessary check (#8661)
Follow-up for 0dfb0a0abd.
2018-04-06 15:45:13 +09:00
Yu Watanabe 0dfb0a0abd core/device: trivial simplification 2018-04-05 14:26:34 +09:00
Yu Watanabe 1cc6c93a95 tree-wide: use TAKE_PTR() and TAKE_FD() macros 2018-04-05 14:26:26 +09:00
Evgeny Vereshchagin f6c63f6fc9 core: skip the removal of cgroups in the TEST_RUN_MINIMAL mode (#8622)
When `systemd` is run in the TEST_RUN_MINIMAL mode, it doesn't really
set up cgroups, so it shouldn't try to remove anything.

Closes https://github.com/systemd/systemd/issues/8474.
2018-04-03 15:04:22 +02:00
Zbigniew Jędrzejewski-Szmek 56fbd7187a sd-bus: allow description to be set for system/user busses (#8594)
sd_bus_open/sd_bus_open_system/sd_bus_open_user are convenient, but
don't allow the description to be set. After they return, the bus is
is already started, and sd_bus_set_description() fails with -EBUSY.
It would be possible to allow sd_bus_set_description() to update the
description "live", but messages are already emitted from sd_bus_open
functions, so it's better to allow the description to be set in
sd_bus_open/sd_bus_open_system/sd_bus_open_user.

Fixes message like:
Bus n/a: changing state UNSET → OPENING
2018-03-29 16:14:11 +02:00
Yu Watanabe c75436067f tree-wide: remove unused variables (#8612) 2018-03-29 12:50:50 +02:00
Lennart Poettering 771b7ead84 machine-image,mount-setup: minor coding style fixes 2018-03-28 22:04:58 +02:00
Krzysztof Nowicki 6f7729c176 core: dont't remount /sys/fs/cgroup for relabel if not needed (#8595)
The initial fix for relabelling the cgroup filesystem for
SELinux delivered in commit 8739f23e3 was based on the assumption that
the cgroup filesystem is already populated once mount_setup() is
executed, which was true for my system. What I wasn't aware is that this
is the case only when another instance of systemd was running before
this one, which can happen if systemd is used in the initrd (for ex. by
dracut).

In case of a clean systemd start-up the cgroup filesystem is actually
being populated after mount_setup() and does not need relabelling as at
that moment the SELinux policy is already loaded. Since however the root
cgroup filesystem was remounted read-only in the meantime this operation
will now fail.

To fix this check for the filesystem mount flags before relabelling and
only remount ro->rw->ro if necessary and leave the filesystem read-write
otherwise.

Fixes #7901.
2018-03-28 13:36:33 +02:00
Lennart Poettering ce9aa31496
Merge pull request #8600 from keszybz/oss-fuzz-again
Fuzzing- and test-related fixes
2018-03-28 13:01:37 +02:00
Zbigniew Jędrzejewski-Szmek 27fe58b77b core/main: preserve return value under valgrind 2018-03-28 10:38:45 +02:00
Filipe Brandenburger 2ef044ea1e core/socket: use chase_symlinks to find binary inside chroot when looking for SELinux label (#8591)
This is a follow up for this comment from @poettering:
https://github.com/systemd/systemd/pull/8405#discussion_r175719214

This updates PR #8405.

Tested manually using the same commands in
https://lists.freedesktop.org/archives/systemd-devel/2018-March/040478.html.
2018-03-28 09:00:42 +02:00
Dimitri John Ledkov e64c2d0b5f core: use setreuid/setregid trick to create session keyring with right ownership (#8447)
Re-use the hacks used to link user keyring, when creating the session
keyring. This way changing ownership of the keyring is not required, and thus
incovation_id can be correctly created in restricted environments.

Creating invocation_id with root permissions works and linking it into session
keyring works, as at that point session keyring is possessed.

Simple way to validate this is with following commands:

$ journalctl -f &
$ sudo systemd-run --uid 1000 /bin/sh -c 'keyctl describe @s; keyctl list @s; keyctl read `keyctl search @s user invocation_id`'

which now works in LXD containers as well as on the host.

Fixes: https://github.com/systemd/systemd/issues/7655
2018-03-27 12:58:10 +02:00
Lennart Poettering 08c849815c label: rework label_fix() implementations (#8583)
This reworks the SELinux and SMACK label fixing calls in a number of
ways:

1. The two separate boolean arguments of these functions are converted
   into a flags type LabelFixFlags.

2. The operations are now implemented based on O_PATH. This should
   resolve TTOCTTOU races between determining the label for the file
   system object and applying it, as it it allows to pin the object
   while we are operating on it.

3. When changing a label fails we'll query the label previously set, and
   if matches what we want to set anyway we'll suppress the error.

Also, all calls to label_fix() are now (void)ified, when we ignore the
return values.

Fixes: #8566
2018-03-27 07:38:26 +02:00
Zbigniew Jędrzejewski-Szmek ffb3c2bd70
Merge pull request #8554 from poettering/chase-trail-slash
fs-util: add new CHASE_TRAIL_SLASH flag for chase_symlinks()
2018-03-26 18:00:08 +02:00
Michael Olbrich 227b8a762f core: don't include libmount.h in a header file (#8580)
linux/fs.h sys/mount.h, libmount.h and missing.h all include MS_*
definitions.

To avoid problems, only one of linux/fs.h, sys/mount.h and libmount.h
should be included. And missing.h must be included last.

Without this, building systemd may fail with:

In file included from [...]/libmount/libmount.h:31:0,
                 from ../systemd-238/src/core/manager.h:23,
                 from ../systemd-238/src/core/emergency-action.h:37,
                 from ../systemd-238/src/core/unit.h:34,
                 from ../systemd-238/src/core/dbus-timer.h:25,
                 from ../systemd-238/src/core/timer.c:26:
[...]/sys/mount.h:57:2: error: expected identifier before numeric constant
2018-03-26 17:34:53 +02:00
Lennart Poettering 12b6b3b7a4
Merge pull request #8562 from keszybz/docs
Man page and log message fixes
2018-03-26 15:34:39 +02:00
Zbigniew Jędrzejewski-Szmek 5ce6e7f525 core/service: rework the hold-off time over message
"hold-off" is apparently confusing, because we also have HoldoffTimeoutSec=.
Let's use RestartSec= directly in the message.

Fixes #5472.
2018-03-24 14:22:42 +01:00
Lennart Poettering be6bca47ec coccinelle: run no-if-assignments.cocci again 2018-03-23 16:33:38 +01:00
Michal Sekletar 19496554e2 core: delay adding target dependencies until all units are loaded and aliases resolved (#8381)
Currently we add target dependencies while we are loading units. This
can create ordering loops even if configuration doesn't contain any
loop. Take for example following configuration,

$ systemctl get-default
multi-user.target

$ cat /etc/systemd/system/test.service
[Unit]
After=default.target

[Service]
ExecStart=/bin/true

[Install]
WantedBy=multi-user.target

If we encounter such unit file early during manager start-up (e.g. load
queue is dispatched while enumerating devices due to SYSTEMD_WANTS in
udev rules) we would add stub unit default.target and we order it Before
test.service. At the same time we add implicit Before to
multi-user.target. Later we merge two units and we create ordering cycle
in the process.

To fix the issue we will now never add any target dependencies until we
loaded all the unit files and resolved all the aliases.
2018-03-23 15:28:06 +01:00
Lennart Poettering 959071cac2
Merge pull request #8552 from keszybz/test-improvements
Test and diagnostics improvements
2018-03-23 15:26:54 +01:00
Zbigniew Jędrzejewski-Szmek 37c1d5e97d tree-wide: warn when a directory path already exists but has bad mode/owner/type
When we are attempting to create directory somewhere in the bowels of /var/lib
and get an error that it already exists, it can be quite hard to diagnose what
is wrong (especially for a user who is not aware that the directory must have
the specified owner, and permissions not looser than what was requested). Let's
print a warning in most cases. A warning is appropriate, because such state is
usually a sign of borked installation and needs to be resolved by the adminstrator.

$ build/test-fs-util

Path "/tmp/test-readlink_and_make_absolute" already exists and is not a directory, refusing.
   (or)
Directory "/tmp/test-readlink_and_make_absolute" already exists, but has mode 0775 that is too permissive (0755 was requested), refusing.
   (or)
Directory "/tmp/test-readlink_and_make_absolute" already exists, but is owned by 1001:1000 (1000:1000 was requested), refusing.

Assertion 'mkdir_safe(tempdir, 0755, getuid(), getgid(), MKDIR_WARN_MODE) >= 0' failed at ../src/test/test-fs-util.c:320, function test_readlink_and_make_absolute(). Aborting.

No functional change except for the new log lines.
2018-03-23 10:26:38 +01:00
Lennart Poettering c10d6bdb89 macro: introduce new TAKE_FD() macro
This is similar to TAKE_PTR() but operates on file descriptors, and thus
assigns -1 to the fd parameter after returning it.

Removes 60 lines from our codebase. Pretty good too I think.
2018-03-22 20:30:40 +01:00
Lennart Poettering ae2a15bc14 macro: introduce TAKE_PTR() macro
This macro will read a pointer of any type, return it, and set the
pointer to NULL. This is useful as an explicit concept of passing
ownership of a memory area between pointers.

This takes inspiration from Rust:

https://doc.rust-lang.org/std/option/enum.Option.html#method.take

and was suggested by Alan Jenkins (@sourcejedi).

It drops ~160 lines of code from our codebase, which makes me like it.
Also, I think it clarifies passing of ownership, and thus helps
readability a bit (at least for the initiated who know the new macro)
2018-03-22 20:21:42 +01:00
Lennart Poettering 62570f6f03 fs-util: add new CHASE_TRAIL_SLASH flag for chase_symlinks()
This rearranges chase_symlinks() a bit: if no special flags are
specified it will now revert to behaviour before
b12d25a8d6. However, if the new
CHASE_TRAIL_SLASH flag is specified it will follow the behaviour
introduced by that commit.

I wasn't sure which one to make the beaviour that requires specification
of a flag to enable. I opted to make the "append trailing slash"
behaviour the one to enable by a flag, following the thinking that the
function should primarily be used to generate a normalized path, and I
am pretty sure a path without trailing slash is the more "normalized"
one, as the trailing slash is not really a part of it, but merely a
"decorator" that tells various system calls to generate ENOTDIR if the
path doesn't refer to a path.

Or to say this differently: if the slash was part of normalization then
we really should add it in all cases when the final path is a directory,
not just when the user originally specified it.

Fixes: #8544
Replaces: #8545
2018-03-22 19:54:24 +01:00
Zbigniew Jędrzejewski-Szmek d50b5839b0 basic/mkdir: convert bool flag to enum
In preparation for subsequent changes...
2018-03-22 15:57:56 +01:00
juergbi 39362f6f7d main: add NoNewPrivileges config option (#8475)
This makes it possible to disable new privileges for the whole system.
2018-03-21 23:41:19 +01:00
Zbigniew Jędrzejewski-Szmek 37cbc1d579 When mangling names, optionally emit a warning (#8400)
The warning is not emitted for absolute paths like /dev/sda or /home, which are
converted to .device and .mount unit names without any fuss.

Most of the time it's unlikely that users use invalid unit names on purpose,
so let's warn them. Warnings are silenced when --quiet is used.

$ build/systemctl show -p Id hello@foo-bar/baz
Invalid unit name "hello@foo-bar/baz" was escaped as "hello@foo-bar-baz" (maybe you should use systemd-escape?)
Id=hello@foo-bar-baz.service

$ build/systemd-run --user --slice foo-bar/baz --unit foo-bar/foo true
Invalid unit name "foo-bar/foo" was escaped as "foo-bar-foo" (maybe you should use systemd-escape?)
Invalid unit name "foo-bar/baz" was escaped as "foo-bar-baz" (maybe you should use systemd-escape?)
Running as unit: foo-bar-foo.service

Fixes #8302.
2018-03-21 15:26:47 +01:00
Zbigniew Jędrzejewski-Szmek 55c36ec0c1
Merge pull request #8508 from poettering/more-cocci
two new coccinelle rules files and their results
2018-03-21 12:50:49 +01:00
Lennart Poettering 0ba6791f46
Merge pull request #8368 from yuwata/nss-systemd-getpwent
nss-systemd: make dynamic users enumerable by `getent`
2018-03-21 12:36:47 +01:00
Lennart Poettering 31dc1ca3bf move MANAGER_IS_RELOADING() check into manager_recheck_{dbus|journal}() (#8510)
Let's better check this inside of the call than before it, so that we
never issue this while reloading, even should these calls be called due
to other reasons than just the unit notify.

This makes sure the reload state is unset a bit earlier in
manager_reload() so that we can safely call this function from there and
they do the right thing.

Follow-up for e63ebf71ed.
2018-03-21 12:03:45 +01:00
Lennart Poettering ed1738a24a
Merge pull request #8487 from keszybz/oss-fuzz-fixes
Oss fuzz fixes, another batch
2018-03-21 11:50:57 +01:00
Lennart Poettering 2062ada74c selinux: let's fully (and statically) initialize log callback union (#8512)
We can make this const and static, and initialize this ahead of time and
fully, hence let's do that.
2018-03-21 11:48:40 +01:00
Yu Watanabe f9bfa6962d core: add new dbus method GetDynamicUsers
This intruduces a new dbus method GetDynamicUsers for systemd1.Manager,
which enumerates all dynamic users realized in the system.
2018-03-21 13:11:01 +09:00
Zbigniew Jędrzejewski-Szmek e3c3d6761b core/load-fragment: reject overly long paths early
No need to go through the specifier_printf() if the path is already too long in
the unexpanded form (since specifiers increase the length of the string in all
practical cases).

In the oss-fuzz test case, valgrind reports:
  total heap usage: 179,044 allocs, 179,044 frees, 72,687,755,703 bytes allocated
and the original config file is ~500kb. This isn't really a security issue,
since the config file has to be trusted any way, but just a matter of
preventing accidental resource exhaustion.

https://oss-fuzz.com/v2/issue/4651449704251392/6977

While at it, fix order of arguments in the neighbouring log_syntax() call.
2018-03-21 00:46:13 +01:00
Lennart Poettering be6b0c2165 coccinelle: make use of DIV_ROUND_UP() wherever appropriate
Let's use our macros where we can
2018-03-20 20:59:02 +01:00
Lennart Poettering 8c637fe242
Merge pull request #8452 from keszybz/use-libmount-more
Use libmount in systemd-shutdown, add tests
2018-03-20 09:53:34 +01:00
Filipe Brandenburger c2887d565f macros: fix sysusers_create_inline (#8489)
This typo was introduced in commit dd2490ae12 when using
here-documents for the macro values.
2018-03-19 18:05:49 +01:00
Yu Watanabe ee5324aa04 tree-wide: voidify pager_open()
Even if pager_open() fails, in general, we should continue the operations.
All erroneous cases in pager_open() show log message in the function.
So, it is not necessary to check the returned value.
2018-03-19 21:04:02 +09:00
Yu Watanabe bcabcde5d2
Merge pull request #8408 from keszybz/ln-relative
bugs.fd.o bug archelogy
2018-03-19 18:32:30 +09:00
Jan Janssen ac9cea5ba3 shutdown: Don't limit unmount attempts prematurely (#8469)
Once upon a time shutdown.c didn't have the logic to check whether any
unmount attempts succeeded or not. So instead it kept looping for
a fixed amount and hoped all was right. Nowadays, we do know if we
changed anything during a iteration and also stop looping then, but
we still limit ourselves to FINALIZE_ATTEMPTS.

But, theoretically, we could have such a complicated and nested
setup that would survive that limit, leaving stuff around we
might actually be able to unmount. And we could also end up in a
situation where the extra loop with raised unmount error level could
be skipped too.

So let's just drop the retries logic and rely fully on the changed
flag.
2018-03-19 18:27:49 +09:00
Zbigniew Jędrzejewski-Szmek dd2490ae12 macros: use here-docs instead of echo (#8480)
It's common for sysusers files to contain quotes (in particular around the
comment/GECOS field), and using echo "..." is very likely to not work properly
in that case. Let's use <<EOF redirection. It's not bulletproof, but should
work in general.
2018-03-19 17:07:44 +09:00
Evgeny Vereshchagin e4711004d6
Merge pull request #8461 from keszybz/oss-fuzz-fixes
Oss fuzz fixes
2018-03-19 00:06:44 +03:00
Zbigniew Jędrzejewski-Szmek ca8700e922 core/unit: delay creating a stack variable until after length has been checked
path_is_normalized() will reject paths longer than 4095 bytes, so it's better
to not create a stack variable of unbounded size, but instead do the check first
and only then do that allocation.

Also use _cleanup_ to make things a bit shorter.

https://oss-fuzz.com/v2/issue/5424177403133952/7000
2018-03-18 21:07:01 +01:00
Rosen Penev 1e35c5ab27 systemd-link: Remove UDP Fragmentation Offload support. (#8183)
Support was killed in kernel 4.15 as well as ethtool 4.13.

Justification was lack of use by drivers and too much of a maintenance burden.
https://www.spinics.net/lists/netdev/msg443815.html

Also moved config_parse_warn_compat to conf-parser.[ch] to fix compile errors.
2018-03-18 14:28:14 +01:00
Zbigniew Jędrzejewski-Szmek 064c593899 core/service: fix memleak of USBFunctionStrings and USBFunctionDescriptors
oss-fuzz #6892.
2018-03-17 09:01:53 +01:00
Zbigniew Jędrzejewski-Szmek ba0c7754d8 core/manager: move some comments to a better place 2018-03-16 23:15:54 +01:00
Zbigniew Jędrzejewski-Szmek e63ebf71ed core: when reloading, delay any actions on journal and dbus connections
manager_recheck_journal() and manager_recheck_dbus() would be called to early
while we were deserialiazing units, before the systemd-journald.service and
dbus.service have been deserialized. In effect we'd disable logging to the
journald and close the bus connection. The first is not very noticable, it
mostly means that logs emitted during deserialization are lost. The second is
more noticeable, because manager_recheck_dbus() would call bus_done_api() and
bus_done_system() and close dbus connections. Logging and bus connection would
then be restored later after the respective units have been deserialized.

This is easily reproduced by calling:
  $ sudo gdbus call --system --dest org.freedesktop.systemd1 --object-path /org/freedesktop/systemd1 --method "org.freedesktop.systemd1.Manager.Reload"
which works fine before 8559b3b75c, and then starts failing with:
  Error: GDBus.Error:org.freedesktop.DBus.Error.NoReply: Remote peer disconnected

None of this should happen, and we should delay changing state until after
deserialization is complete when reloading. manager_reload() already included
the calls to manager_recheck_journal() and manager_recheck_dbus(), so the
connection state will be updated after deserialization during reloading is done.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1554578.
2018-03-16 23:14:04 +01:00
Zbigniew Jędrzejewski-Szmek 71ae04c400 core/umount: use libmount to enumerate /proc/swaps
example.swaps with "(deleted)" does not cause bogus entries in the list now,
but a memleak in libmount instead. The memleaks is not very important since
this code is run just once.
Reported as https://github.com/karelzak/util-linux/issues/596.

$ build/test-umount
...
/* test_swap_list("/proc/swaps") */
path=/var/tmp/swap o= f=0x0 try-ro=no dev=0:0
path=/dev/dm-2 o= f=0x0 try-ro=no dev=0:0
/* test_swap_list("/home/zbyszek/src/systemd/test/test-umount/example.swaps") */
path=/some/swapfile o= f=0x0 try-ro=no dev=0:0
path=/dev/dm-2 o= f=0x0 try-ro=no dev=0:0
==26912==
==26912== HEAP SUMMARY:
==26912==     in use at exit: 16 bytes in 1 blocks
==26912==   total heap usage: 1,546 allocs, 1,545 frees, 149,008 bytes allocated
==26912==
==26912== 16 bytes in 1 blocks are definitely lost in loss record 1 of 1
==26912==    at 0x4C31C15: realloc (vg_replace_malloc.c:785)
==26912==    by 0x55C5D8C: _IO_vfscanf (in /usr/lib64/libc-2.26.so)
==26912==    by 0x55D8AEC: vsscanf (in /usr/lib64/libc-2.26.so)
==26912==    by 0x55D25C3: sscanf (in /usr/lib64/libc-2.26.so)
==26912==    by 0x53236D0: mnt_table_parse_stream (in /usr/lib64/libmount.so.1.1.0)
==26912==    by 0x53249B6: mnt_table_parse_file (in /usr/lib64/libmount.so.1.1.0)
==26912==    by 0x10D157: swap_list_get (umount.c:194)
==26912==    by 0x10B06E: test_swap_list (test-umount.c:34)
==26912==    by 0x10B24B: main (test-umount.c:56)
==26912==
==26912== LEAK SUMMARY:
==26912==    definitely lost: 16 bytes in 1 blocks
==26912==    indirectly lost: 0 bytes in 0 blocks
==26912==      possibly lost: 0 bytes in 0 blocks
==26912==    still reachable: 0 bytes in 0 blocks
==26912==         suppressed: 0 bytes in 0 blocks
2018-03-16 10:12:50 +01:00
Zbigniew Jędrzejewski-Szmek 1fd8edb53a test-umount: add a simple test for swap_list_get()
The implementation seems buggy:
/* test_swap_list("/home/zbyszek/src/systemd/test/test-umount/example.swaps") */
path=0 o= f=0x0 try-ro=no dev=0:0
path=/some/swapfile2 o= f=0x0 try-ro=no dev=0:0
path=/some/swapfile o= f=0x0 try-ro=no dev=0:0
path=/dev/dm-2 o= f=0x0 try-ro=no dev=0:0
2018-03-16 10:12:50 +01:00
Zbigniew Jędrzejewski-Szmek a6dcd22976 core/umount: use _cleanup_ 2018-03-16 10:12:50 +01:00
Zbigniew Jędrzejewski-Szmek 6fa392bf91 tests: add a simple test for the mountinfo parsing logic 2018-03-16 10:12:50 +01:00
Zbigniew Jędrzejewski-Szmek 95b862b054 shutdown: use libmount to enumerate /proc/self/mountinfo
This is analogous to 8d3ae2bd4c, except that now
src/core/umount.c not src/core/mount.c is converted.

Might help with https://bugzilla.redhat.com/show_bug.cgi?id=1554943, or not.

In the patch, mnt_free_tablep and mnt_free_iterp are declared twice. It'd
be nicer to define them just once in mount-setup.h, but then libmount.h would
have to be included there. libmount.h seems to be buggy, and declares some
defines which break other headers, and working around this is more pain than
the two duplicate lines. So let's live with the duplication for now.

This fixes memleak of MountPoint in mount_points_list_get() on error, not that
it matters any.
2018-03-16 10:09:46 +01:00
Franck Bui 848e863acc basic/macros: rename noreturn into _noreturn_ (#8456)
"noreturn" is reserved and can be used in other header files we include:

  [   16s] In file included from /usr/include/gcrypt.h:30:0,
  [   16s]                  from ../src/journal/journal-file.h:26,
  [   16s]                  from ../src/journal/journal-vacuum.c:31:
  [   16s] /usr/include/gpg-error.h:1544:46: error: expected ‘,’ or ‘;’ before ‘)’ token
  [   16s]  void gpgrt_log_bug (const char *fmt, ...)    GPGRT_ATTR_NR_PRINTF(1,2);

Here we include grcrypt.h (which in turns include gpg-error.h) *after* we
"noreturn" was defined in macro.h.
2018-03-15 14:23:46 +09:00
Evgeny Vereshchagin 3b71cf46be
Merge pull request #8441 from keszybz/oss-fuzz-fixes
Fixes for bugs found by oss-fuzz
2018-03-14 21:25:56 +03:00
Zbigniew Jędrzejewski-Szmek 20d52ab60e shared/conf-parser: fix crash when specifiers cannot be resolved in config_parse_device_allow()
oss-fuzz #6885.
2018-03-14 16:50:08 +01:00
Zbigniew Jędrzejewski-Szmek b93618644b core/umount: fix unitialized fields in MountPoint in dm_list_get()
This one might actually might cause a crash.
2018-03-14 12:38:43 +01:00
Zbigniew Jędrzejewski-Szmek d4f5c00153
Merge pull request #8429 from medhefgo/sd-shutdown
sd-shutdown improvements
2018-03-13 09:47:09 +01:00
Jan Janssen 456b2199f6 shutdown: Reduce log level of unmounts
There is little point in logging about unmounting errors if the
exact mountpoint will be successfully unmounted in a later retry
due unmounts below it having been removed.

Additionally, don't log those errors if we are going to switch back
to a initrd, because that one is also likely to finalize the remaining
mountpoints. If not, it will log errors then.
2018-03-12 18:32:26 +01:00
Jan Janssen e783b4902f umount: Don't bother remounting api and ro filesystems read-only 2018-03-12 18:32:26 +01:00
Jan Janssen 8645ffd12b umount: Try unmounting even if remounting read-only failed
In the case of some api filesystems remounting read-only fails
while unmounting succeeds.
2018-03-12 18:32:26 +01:00
Jan Janssen 3bc341bee9 umount: Provide the same mount flags too when remounting read-only
This most likely amounts to no real benefits and is just here for
completeness sake.
2018-03-12 18:32:26 +01:00
Jan Janssen 1d62d22d94 umount: Decide whether to remount read-only earlier 2018-03-12 18:32:26 +01:00
Jan Janssen 0494cae03d umount: Add more asserts and remove some unused arguments 2018-03-12 18:32:10 +01:00
Jan Janssen 659b15313b umount: Fix memory leak 2018-03-12 13:40:14 +01:00
Zbigniew Jędrzejewski-Szmek e8112e67e4 Make MANAGER_TEST_RUN_MINIMAL just allocate data structures
When running tests like test-unit-name, there is not point in setting
up the cgroup and signals and interacting with the environment. Similarly
when running fuzz testing of the parser.

Add new MANAGER_TEST_RUN_BASIC which takes the role of MANAGER_TEST_RUN_MINIMAL,
and redefine MANAGER_TEST_RUN_MINIMAL to just create the basic data structures.
2018-03-11 16:33:59 +01:00
Zbigniew Jędrzejewski-Szmek dc409696cf Introduce _cleanup_(unit_freep) 2018-03-11 16:33:58 +01:00
Zbigniew Jędrzejewski-Szmek c70cac548a Introduce _cleanup_(manager_freep) 2018-03-11 16:33:57 +01:00
Michal Sekletar aa77e234fc core: ignore errors from cg_create_and_attach() in test mode (#8401)
Reproducer:

$ meson build && cd build
$ ninja
$ sudo useradd test
$ sudo su test
$ ./systemd --system --test
...
Failed to create /user.slice/user-1000.slice/session-6.scope/init.scope control group: Permission denied
Failed to allocate manager object: Permission denied

Above error message is caused by the fact that user test didn't have its
own session and we tried to set up init.scope already running as user
test in the directory owned by different user.

Let's try to setup cgroup hierarchy, but if that fails return error only
when not running in the test mode.

Fixes #8072
2018-03-09 23:30:32 +01:00
Filipe Brandenburger 416be1a03b core/socket: support binary inside chroot when looking for SELinux label (#8405)
Otherwise having a .socket unit start a .service running a binary under
a chroot fails as the unit is unable to determine the SELinux label of
the binary.
2018-03-09 12:20:56 +01:00
Yu Watanabe 906bdbf5e7 core/cgroup: accepts MemorySwapMax=0 (#8366)
Also, this moves two macros from dbus-util.h to dbus-cgroup.c,
as they are only used in dbus-cgroup.c.

Fixes #8363.
2018-03-09 11:34:50 +01:00
Zbigniew Jędrzejewski-Szmek 8750ac0238 pid1: make use of high rt signals on hppa with newer kernels
Back in 4dffec1459 we stopped using SIGRTMIN+26
and higher on hppa because they were not available. Then they became available
in linux 3.18:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1f25df2eff5b25f52c139d3ff31bc883eee9a0ab

Instead of hard-coding the list based on architecture, let's use a runtime
check like signal(7) says.

(A note about implementation: RTSIG_IF_AVAILABLE is defined to take the full
signal and not just an offset from SIGRTMIN so that it's still possible to
grep for SIGRTMIN\+.)

Add a simple "test" to print the signal values.

Fixes https://bugs.freedesktop.org/show_bug.cgi?id=84931.
2018-03-09 10:35:33 +01:00
Lennart Poettering 586fb20fd1
Merge pull request #8372 from keszybz/two-cleanups
Two cleanups
2018-03-08 23:23:43 +01:00
Yu Watanabe 5cbaad2f67 core: do not free heap-allocated strings (#8391)
Fixes #8387.
2018-03-08 14:21:54 +01:00
Yu Watanabe a1d32bac2a
Revert "core: don't setup init.scope in test mode (#8380)" (#8390)
This reverts commit a9e8ecf037,
as it breaks test-path.

Fixes #8389.
2018-03-08 15:29:19 +09:00
Michal Sekletar a9e8ecf037 core: don't setup init.scope in test mode (#8380)
Reproducer:

$ meson build && cd build
$ ninja
$ sudo useradd test
$ sudo su test
$ ./systemd --system --test
...
Failed to create /user.slice/user-1000.slice/session-6.scope/init.scope control group: Permission denied
Failed to allocate manager object: Permission denied

Above error message is caused by the fact that user test didn't have its
own session and we tried to set up init.scope already running as user
test in the directory owned by different user.

Let's skip setting up init.scope altogether since we won't be launching
processes anyway.
2018-03-07 16:41:41 +01:00
Zbigniew Jędrzejewski-Szmek f6a8265b9a core: drop unnecessary __useless_struct_to_allow_trailing_semicolon__
ISO C does not allow empty statements outside of functions, and gcc
will warn the trailing semicolons when compiling with -pedantic:

  warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]

But our code cannot compile with -pedantic anyway, at least because

  warning: ISO C does not support ‘__PRETTY_FUNCTION__’ predefined identifier [-Wpedantic]

Without -pedatnic, clang and even old gcc (3.4) generate no warnings about
those semicolons, so let's just drop __useless_struct_to_allow_trailing_semicolon__.
2018-03-06 10:41:41 +01:00
Yu Watanabe 694d57655c rpm: add missing '-p <lua>' in trigger script (#8367)
Follow-up for 32a00a9c09 (#8090).
2018-03-06 08:02:44 +01:00
Lennart Poettering 6cc7e918ff
Merge pull request #8314 from poettering/rearrange-stdio
refactor how we rearrange fds for stdin/stdout/stderr
2018-03-02 15:42:03 +01:00
Lennart Poettering 650f401123
Merge pull request #8336 from poettering/coccinelle-reallocarray
reallocarray() coccinellization
2018-03-02 15:40:52 +01:00
Zbigniew Jędrzejewski-Szmek 3cca71c456
Merge pull request #8323 from xyproto/ok_color
Make the color of the status OK configurable at build-time
2018-03-02 13:00:07 +01:00
Lennart Poettering 39f305a901 mount-setup: change bpf mount mode to 0700 (#8334)
After discussing with the kernel folks, we agreed to default to 0700 for
this. Better safe than sorry.
2018-03-02 12:55:24 +01:00
Lennart Poettering 62d74c78b5 coccinelle: add reallocarray() coccinelle script
Let's systematically make use of reallocarray() whereever we invoke
realloc() with a product of two values.
2018-03-02 12:39:07 +01:00
Lennart Poettering 2589472712
Merge pull request #8237 from sourcejedi/timer_suspend
core: let OnCalendar= timer units expire during suspend (#8231)
2018-03-02 12:11:06 +01:00
Lennart Poettering 2b33ab0957 tree-wide: port various places over to use new rearrange_stdio() 2018-03-02 11:42:10 +01:00
Alexander F Rødseth 96164a3936 Add build-time option to change the color of the "OK" status text 2018-03-02 09:00:44 +01:00
Zbigniew Jędrzejewski-Szmek 671f0f8de0 Remove /sbin from paths if split-bin is false (#8324)
Follow-up for 157baa87e4.
2018-03-01 21:48:36 +01:00
Lennart Poettering 902c8502ad
Merge pull request #8149 from poettering/fake-root-cgroup
Properly synthesize CPU+memory accounting data for the root cgroup
2018-03-01 11:10:24 +01:00
Lennart Poettering 649a5ffba8
Merge pull request #8171 from poettering/sd-bus-queue-limit
try not to overload pid1's bus message write queue
2018-02-28 18:15:40 +01:00
Alan Jenkins 13f512d324 core: don't freeze OnCalendar= timer units when the clock goes back a lot
E.g. if you have a monthly event and you set the computer clock back one
year, we can allow the next 12 monthly events to happen naturally.  In fact
we already do this when you start a Persistent=yes timer, we just need to
apply the same logic when it's running and we notice the system clock
being set backwards.
2018-02-28 17:00:07 +00:00
Alan Jenkins 9ea9faff78 core: let OnCalendar= timer units expire during suspend (#8231)
On timejumps, including suspend, timer_time_change() calls for a
re-calculation of the next elapse.  Sadly I'm not quite sure what the
intended effect of this was!  Because it was not managing to fire
OnCalendar= timers which fired during the suspend... unless the timer had
already fired once before.

Reported, entirely correctly as far as I can see, on stackexchange:
https://unix.stackexchange.com/questions/351829/systemd-timer-that-expired-while-suspended

 /* If we know the last time this was
  * triggered, schedule the job based relative
- * to that. If we don't just start from
- * now. */
+ * to that. If we don't, just start from
+ * the activation time. */

The same code is called for both the initial calculation and this
re-calculation.  If we're _not_ already active, then this is before the
activation time has been recorded in the unit, so just use the current
time as before.  The new code is mechanically adapted from the same
logic for `OnActiveSec=` (case TIMER_ACTIVE in the code which follows).

Tested with `date --set`.

Motivations:

* Rotate monitoring data from Atop into files which are named per-day.
  Fedora currently implements this with a cron job that runs at midnight,
  but that didn't handle suspend correctly either.

* unbound-anchor.timer on Fedora, is used to update DNSSEC "root trust
  anchor" daily, before the TTL expires.  It uses OnCalendar=daily
  AccuracySec=24h.  Which is a bit suspect because the TTL is 2 days, but I
  think it has the right general idea.

  None of the other timer settings are correct, because they would not
  account for time spent in suspend.  Unless you set WakeSystem
  (this feature is currently undocumented).

* So in general, we can expect to see people using OnCalendar= for the same
  cases as cron.daily and cron.monthly.  Which use anacron to keep track of
  jobs which should be run even if the system was down at the time.

  Timers which are configured to run more frequently than that, are
  unlikely to mind if they get run slightly more often that the writer
  realized, relative to the amount of time the system was really running.

* From the user report above: "I only want to use remind to show a desktop
  notification, it seems excessive to wake up the computer for that. Also,
  I would like to get the reminder first thing in the morning, so the
  OnActiveSec doesn't help with that."
2018-02-28 16:12:22 +00:00
Alan Jenkins 60933bb89b core: timer_enter_waiting(): refactor base local variable
We have two variables `b` and `base`.  `b` is declared within limited
scope; `base` is declared at the top of the function.  However `base`
is actually only used within a scope which is exclusive of `b`.  Clarify
by moving `base` inside the limited scope as well.

(Also `base` doesn't need initializing any more than `b` does.  The
declaration of `base` is now immediately followed by a case analysis of
`v->base`, which serves almost exclusively to determine the value of
`base`).
2018-02-28 15:07:30 +00:00
Zbigniew Jędrzejewski-Szmek bdad9e44e4
Merge pull request #8294 from fsateler/debian-patches
Upstreaming some debian patches
2018-02-28 09:10:16 +01:00
Ansgar Burchardt 7486f305cd Include additional directories in ProtectSystem 2018-02-27 18:56:19 -03:00
Lennart Poettering 13d92c6300 seccomp: rework functions for parsing system call filters
This reworks system call filter parsing, and replaces a couple of "bool"
function arguments by a single flags parameter.

This shouldn't change behaviour, except for one case: when we
recursively call our parsing function on our own syscall list, then
we'll lower the log level to LOG_DEBUG from LOG_WARNING, because at that
point things are just a problem in our own code rather than in the user
configuration we are parsing, and we shouldn't hence generate confusing
warnings about syntax errors.

Fixes: #8261
2018-02-27 19:59:09 +01:00
Lennart Poettering e0a085811d core: don't process dbus unit and job queue when there are already too many messages pending
We maintain a queue of units and jobs that we are supposed to generate
change/new notifications for because they were either just created or
some of their property has changed. Let's throttle processing of this
queue a bit: as soon as > 1K of bus messages are queued for writing
let's skip processing the queue, and then recheck on the next
iteration again.

Moreover, never process more than 100 units in one go, return to the
event loop after that. Both limits together should put effective limits
on both space and time usage of the function, delaying further
operations until a later moment, when the queue is empty or the the
event loop is sufficiently idle again.

This should keep the number of generated messages much lower than
before on busy systems or where some client is hanging.

Note that this also means a bad client can slow down message dispatching
substantially for up to 90s if it likes to, for all clients. But that
should be acceptable as we only allow trusted bus clients, anyway.

Fixes: #8166
2018-02-27 19:54:29 +01:00
Lennart Poettering 9fc677e3c9 core: don't bother enqueuing signal messages into busses that aren't ready yet
This is an optimization: there's no point in enqueuing unit and job
change notificiation signal messages into bus connection that aren't
fully set up yet.

This doesn't fix #8166 but should lower the load of messages enqueued
but not processed yet a bit.
2018-02-27 19:54:29 +01:00
Lennart Poettering 84df74c6f0
Merge pull request #8284 from keszybz/gcc-warning-fixes
Gcc warning fixes
2018-02-26 21:20:13 +01:00
Zbigniew Jędrzejewski-Szmek aa484f3561 tree-wide: use reallocarray instead of our home-grown realloc_multiply (#8279)
There isn't much difference, but in general we prefer to use the standard
functions. glibc provides reallocarray since version 2.26.

I moved explicit_bzero is configure test to the bottom, so that the two stdlib
functions are at the bottom.
2018-02-26 21:20:00 +01:00
Zbigniew Jędrzejewski-Szmek bea28c5adb core/unit: voidify one snprintf statement
One more follow-up for f810b631cd.
2018-02-26 15:49:27 +01:00
Zbigniew Jędrzejewski-Szmek 8012712791 core/path: add one more assert 2018-02-26 15:49:27 +01:00
Zbigniew Jędrzejewski-Szmek f810b631cd Revert "Replace use of snprintf with xsprintf"
This reverts commit a7419dbc59.

_All_ changes in that commit were wrong.

Fixes #8211.
2018-02-23 00:13:52 +01:00
Zbigniew Jędrzejewski-Szmek 94be6463bd
Merge pull request #8205 from poettering/bpf-multi
bpf/cgroup improvements
2018-02-22 14:52:48 +01:00
Lennart Poettering c5c07649c2
Merge pull request #8243 from poettering/statx-syscall-unfuck
statx() syscall macro fix + reboot() handling improvements
2018-02-22 13:15:41 +01:00
Zbigniew Jędrzejewski-Szmek 30c81ce2ce pid1: when creating service directories, don't chown existing files (#8181)
This partially reverts 3536f49e8f and
3536f49e8f.

When the user is dynamic, and we are setting up state, cache, or logs dirs,
behaviour is unchanged, we always do a recursive chown. This is necessary
because the user number might change between invocations.

But when setting up a directory for non-dynamic user, or a runtime directory
for a dynamic user, do any ownership or mode changes only when the directory
is initially created. Nothing says that the files under those directories have
to be all recursively owned by our user. This restores behaviour before
3536f49e8f, so modifications to the state of
the runtime directory persist between ExecStartPre's and ExecStart's, and even
longer in case the directory is persistent.

I think it _would_ be a nice property if setting a user would automatically
propagate to ownership of any Runtime/Logs/Cache directories. But this is
incompatible with another nice property, namely preserving changes to those
directories made by an admin, and with allowing change of ownership of files
in those directories by the service (e.g. to allow other users to access them).
Of the two, I think the second property is more important. Also, it's backwards
compatible.

https://bugzilla.redhat.com/show_bug.cgi?id=1508495

There is no need to chmod a directory we just created, so move that step
up into a branch. After that, 'effective' is only used once, so get rid of
it too.
2018-02-22 11:30:59 +01:00
Lennart Poettering 1f409a0cbb shutdown: let's not use exit() needlessly
Generally we prefer 'return' from main() over exit() so that automatic
cleanups and such work correct. Let's do that in shutdown.c too, becuase
there's not really any reason not to.

With this we are pretty good in consistently using return from main()
rather than exit() all across the codebase. Yay!
2018-02-22 10:46:26 +01:00
Lennart Poettering c01dcddf80 reboot-util: unify reboot with parameter in a single implementation
So far, we had two implementations of reboot-with-parameter doing pretty
much the same. Let's unify that in a generic implementation used by
both.

This is particulary nice as it unifies all /run/systemd/reboot-param
handling in a single .c file.
2018-02-22 10:46:26 +01:00
Lennart Poettering e3631d1c80 basic: split out update_reboot_parameter_and_warn() into its own .c/.h files
This is primarily preparation for a follow-up commit that adds a common
implementation of the other side of the reboot parameter file, i.e. the
code that reads the file and issues reboot() for it.
2018-02-22 10:46:12 +01:00
Lennart Poettering 118cf9523b tree-wide: voidify reboot() invocations
We use (void) in most cases for reboot() already, let's add it to the
others as well.
2018-02-22 10:42:06 +01:00
Lennart Poettering c52a937b46 basic: add a common syscall wrapper around reboot()
This mimics the raw_clone() call we have in place already and
establishes a new syscall wrapper raw_reboot() that wraps the kernel's
reboot() system call in a bit more low-level fashion that glibc's
reboot() wrapper. The main difference is that the extra "arg" argument
is supported.

Ultimately this just replaces the syscall wrapper implementation we
currently have at three places in our codebase by a single one.

With this change this means that all our syscall() invocations are
neatly separated out in static inline system call wrappers in our header
functions.
2018-02-22 10:42:06 +01:00
Lennart Poettering 0b1f3c768c tree-wide: reopen log when we need to log in FORK_CLOSE_ALL_FDS children
In a number of occasions we use FORK_CLOSE_ALL_FDS when forking off a
child, since we don't want to pass fds to the processes spawned (either
because we later want to execve() some other process there, or because
our child might hang around for longer than expected, in which case it
shouldn't keep our fd pinned). This also closes any logging fds, and
thus means logging is turned off in the child. If we want to do proper
logging, explicitly reopen the logs hence in the child at the right
time.

This is particularly crucial in the umount/remount children we fork off
the shutdown binary, as otherwise the children can't log, which is
why #8155 is harder to debug than necessary: the log messages we
generate about failing mount() system calls aren't actually visible on
screen, as they done in the child processes where the log fds are
closed.
2018-02-22 00:35:00 +01:00
Lennart Poettering e18805fbd0 shutdown: explicitly set a log target in shutdown.c
We used to set this, but this was dropped when shutdown got taught to
get the target passed in from the regular PID 1. Let's readd this to
make things more explanatory, and cover all grounds, since after all the
target passed is in theory an optional part of the protocol between the
regular PID 1 and the shutdown PID 1.
2018-02-22 00:33:12 +01:00
Lennart Poettering d405394c5c shutdown: always pass errno to logging functions
We have them, let's propagate them.
2018-02-22 00:32:31 +01:00
Lennart Poettering 00adeed99f umount: beef up logging when umount/remount child processes fail
Let's extend what we log if umount/remount doesn't work correctly as we
expect.

See #8155
2018-02-21 23:57:21 +01:00
Lennart Poettering 5128346127 bpf: reset "extra" IP accounting counters when turning off IP accounting for a unit
We maintain an "extra" set of IP accounting counters that are used when
we systemd is reloaded to carry over the counters from the previous run.
Let's reset these to zero whenever IP accounting is turned off. If we
don't do this then turning off IP accounting and back on later wouldn't
reset the counters, which is quite surprising and different from how our
CPU time counting works.
2018-02-21 16:43:36 +01:00
Lennart Poettering aa2b6f1d2b bpf: rework how we keep track and attach cgroup bpf programs
So, the kernel's management of cgroup/BPF programs is a bit misdesigned:
if you attach a BPF program to a cgroup and close the fd for it it will
stay pinned to the cgroup with no chance of ever removing it again (or
otherwise getting ahold of it again), because the fd is used for
selecting which BPF program to detach. The only way to get rid of the
program again is to destroy the cgroup itself.

This is particularly bad for root the cgroup (and in fact any other
cgroup that we cannot realistically remove during runtime, such as
/system.slice, /init.scope or /system.slice/dbus.service) as getting rid
of the program only works by rebooting the system.

To counter this let's closely keep track to which cgroup a BPF program
is attached and let's implicitly detach the BPF program when we are
about to close the BPF fd.

This hence changes the bpf_program_cgroup_attach() function to track
where we attached the program and changes bpf_program_cgroup_detach() to
use this information. Moreover bpf_program_unref() will now implicitly
call bpf_program_cgroup_detach().

In order to simplify things, bpf_program_cgroup_attach() will now
implicitly invoke bpf_program_load_kernel() when necessary, simplifying
the caller's side.

Finally, this adds proper reference counting to BPF programs. This
is useful for working with two BPF programs in parallel: the BPF program
we are preparing for installation and the BPF program we so far
installed, shortening the window when we detach the old one and reattach
the new one.
2018-02-21 16:43:36 +01:00
Lennart Poettering 13a141f046 namespace: protect bpf file system as part of ProtectKernelTunables=
It also exposes kernel objects, let's better include this in
ProtectKernelTunables=.
2018-02-21 16:43:36 +01:00
Lennart Poettering 6590080851 mount-setup: always use the same source as fstype for the API VFS we mount
So far, for all our API VFS mounts we used the fstype also as mount
source, let's do that for the cgroupsv2 mounts too. The kernel doesn't
really care about the source for API VFS, but it's visible to the user,
hence let's clean this up and follow the rule we otherwise follow.
2018-02-21 16:43:36 +01:00
Lennart Poettering acf7f253de bpf: use BPF_F_ALLOW_MULTI flag if it is available
This new kernel 4.15 flag permits that multiple BPF programs can be
executed for each packet processed: multiple per cgroup plus all
programs defined up the tree on all parent cgroups.

We can use this for two features:

1. Finally provide per-slice IP accounting (which was previously
   unavailable)

2. Permit delegation of BPF programs to services (i.e. leaf nodes).

This patch beefs up PID1's handling of BPF to enable both.

Note two special items to keep in mind:

a. Our inner-node BPF programs (i.e. the ones we attach to slices) do
   not enforce IP access lists, that's done exclsuively in the leaf-node
   BPF programs. That's a good thing, since that way rules in leaf nodes
   can cancel out rules further up (i.e. for example to implement a
   logic of "disallow everything except httpd.service"). Inner node BPF
   programs to accounting however if that's requested. This is
   beneficial for performance reasons: it means in order to provide
   per-slice IP accounting we don't have to add up all child unit's
   data.

b. When this code is run on pre-4.15 kernel (i.e. where
   BPF_F_ALLOW_MULTI is not available) we'll make IP acocunting on slice
   units unavailable (i.e. revert to behaviour from before this commit).
   For leaf nodes we'll fallback to non-ALLOW_MULTI mode however, which
   means that BPF delegation is not available there at all, if IP
   fw/acct is turned on for the unit. This is a change from earlier
   behaviour, where we use the BPF_F_ALLOW_OVERRIDE flag, so that our
   fw/acct would lose its effect as soon as delegation was turned on and
   some client made use of that. I think the new behaviour is the safer
   choice in this case, as silent bypassing of our fw rules is not
   possible anymore. And if people want proper delegation then the way
   out is a more modern kernel or turning off IP firewalling/acct for
   the unit algother.
2018-02-21 16:43:36 +01:00