Commit graph

4417 commits

Author SHA1 Message Date
Lennart Poettering ca193035e9
Merge pull request #10394 from yuwata/fixes-found-by-clang
Fix warnings reported by clang
2018-10-14 20:11:19 +02:00
Yu Watanabe cb16b085c0 core: set _unused_ attribute to 'reloading'
Follow-up for 4df7d537c8.
2018-10-13 23:50:04 +09:00
Lennart Poettering ea5c5f680d core: ensure it's not fatal if we cannot watch /etc/localtime
See: #9602
2018-10-13 15:13:07 +02:00
Lennart Poettering 0cb21d8c60 core: add debug logging if we cant watch /etc/localtime itself 2018-10-13 15:12:58 +02:00
Lennart Poettering 93d4cb09d5 core: fix unfortunate typo in unit_is_unneeded()
Follow-up for a3c1168ac2.
2018-10-13 13:01:08 +02:00
Lennart Poettering cf99f8eacf core: make destructive transaction error a bit more useful 2018-10-13 13:01:08 +02:00
Lennart Poettering 913c898ca0 cgroup: voidify a few things 2018-10-13 12:37:13 +02:00
Lennart Poettering b9839ac9d9 cgroup: make sure whitelist_device() always returns a valid return value
CID 1396094
2018-10-13 12:37:13 +02:00
Davide Cavalca b75f0c69b3 shared: add %g, %G specifiers for group / gid (#10368) 2018-10-13 17:26:48 +09:00
Lennart Poettering a6ee956610
Merge pull request #10356 from dtardon/covscan
assorted coverity/clang fixes
2018-10-12 18:43:04 +02:00
David Tardon f369f47c26 be consistent about sun_path length
Most places use the whole buffer for name, without leaving extra space
for the trailing NUL.
2018-10-12 12:38:49 +02:00
Lennart Poettering 2aab8a1e04
Merge pull request #10201 from yuwata/fix-10196
sd-netlink: add destroy_callback to sd_netlink_call_async() and fix memleaks in networkd
2018-10-12 11:36:08 +02:00
Yu Watanabe 958b8c7bd7 core: fix member access within null pointer
config_parse_tasks_max() is also used for parsing system.conf or
user.conf. In that case, userdata is NULL.

Fixes #10362.
2018-10-11 22:23:39 +02:00
Lennart Poettering 02619c033f
Merge pull request #10353 from keszybz/more-manager-reloading
More manager reloading cleanups
2018-10-11 17:25:03 +02:00
Zbigniew Jędrzejewski-Szmek 05067c3c1f manager: simplify error handling in manager_deserialize()
If a memory error occurred, we would still go through the path which sets the
error on ferror(). It is unlikely that ferror() returns true, but it's seems
cleaner to just propagate the error we already have.

The handling of fgets() returning NULL is also simplified: according to the man
page, it returns NULL only on EOF or error. So if feof() returns true, I don't
think we should call ferror() again.

While at it, let's set errno to 0 and check that it is set before returning it
as an error. The man pages for fgets() and feof() do not say anything about
setting errno.
2018-10-11 14:34:02 +02:00
Zbigniew Jędrzejewski-Szmek 4df7d537c8 manager: also use the reloading "cleanup" function in manager_startup
Here the behaviour is nominally changed, because we will decrease the
counter on error. But the only caller quits the program if error occurs,
so this makes no practical difference.
2018-10-11 14:34:00 +02:00
Zbigniew Jędrzejewski-Szmek d147e2b66b manager: use the _cleanup_ mechanism to do n_reloading counter handling
No functional change.
2018-10-11 14:33:22 +02:00
Zbigniew Jędrzejewski-Szmek 3d7cf72070 manager: replace fake block with a strjoina
The block was created to avoid declaring variables in the middle of the block.
We could now do that, but it's easier to just use strjoina here.
2018-10-11 14:29:34 +02:00
Zbigniew Jędrzejewski-Szmek f436470ae1
Merge pull request #10343 from poettering/manager-state-fix
various fixes for PID1's Manager object
2018-10-10 12:36:16 +02:00
Yu Watanabe 545bab1f0a sd-netlink: add destroy_callback argument to sd_netlink_call_async() 2018-10-10 14:43:05 +09:00
Lennart Poettering 3316429f19
Merge pull request #10062 from rgushchin/device
Support cgroup v2 bpf-based device controller
2018-10-09 23:29:27 +02:00
Lennart Poettering 13711093ef bpf-firewall: always use log_unit_xyz() insteadof log_xyz()
That way it's easier to figure out what the various messages belong to
2018-10-09 21:11:41 +02:00
Lennart Poettering 4cf997befa device: clean up DeviceFound flags set
No need to avoid bit 0. Also the U suffix has no effect, don't use it.
2018-10-09 21:11:22 +02:00
Lennart Poettering 5f616d5feb core: add missing 'continue' statement 2018-10-09 21:11:06 +02:00
Lennart Poettering eb523bfb51 core: include environment generator runtime in generator timestamps
Currently they aren't covered and it probably isn't worth adding another
kind of timestamp just for this, hence simply include it in the regular
generator timestamps.
2018-10-09 19:43:43 +02:00
Lennart Poettering 5ce5e1ad08 core: add a common helper call manager_ready() sharing some common code between manager_reload() and manager_startup()
Just sharing some common code. No functional changes
2018-10-09 19:43:43 +02:00
Lennart Poettering 5197be06e0 core: turn our four vacuum calls into a new helper function
Just share some code. No functional changes.
2018-10-09 19:43:43 +02:00
Lennart Poettering 1fb70e6648 core: rework how we set the objective to MANAGER_OK
Let's do so already when we are about to complete startup/reload, so
that manager_catchup() is run in a context where MANAGER_IS_RUNNING()
returns true, as the intention is.

Fixes: #9518
2018-10-09 19:43:43 +02:00
Lennart Poettering 3ca4d0b3eb core: make use of manager_loop()'s return value
The objective is returned in the return value, let's make use of that,
instead of reaching into the object.
2018-10-09 19:43:43 +02:00
Lennart Poettering 7a35fa24ff core: try to recover from failed reloads
Let's simply continue with everything we loaded, in the hope it's
somewhat useful.
2018-10-09 19:43:43 +02:00
Lennart Poettering 3ad2afb6a2 core: bring manager_startup() and manager_reload() more inline
Both functions do partly the same, let's make sure they do it in the
same order, and that we don't miss some calls.

This makes a number of changes:

1. Moves exec_runtime_vacuum() two calls down in manager_startup(). This
   should not have any effect but makes manager_startup() more like
   manager_reload().

2. Calls manager_recheck_journal(), manager_recheck_dbus(),
   manager_enqueue_sync_bus_names() in manager_startup() too. This is a
   good idea since during reeexec we pass through manager_startup() and
   hence can't assume dbus and journald weren't up yet, hence let's
   check if they are ready to be connected to.

3. Include manager_enumerate_perpetual() in manager_reload(), too. This
   is not strictly necessary, since these units are included in the
   serialization anyway, but it's still a nice thing, in particular as
   theoretically the deserialization could fail.
2018-10-09 19:43:43 +02:00
Lennart Poettering 6eb3af7a6e core: break lines in comments 2018-10-09 19:43:43 +02:00
Lennart Poettering 572986ca14 core: log in all cases in manager_startup()
We missed some cases where we'd fail without any logging at all. Let's
fix that.
2018-10-09 19:43:43 +02:00
Lennart Poettering 6a33af40da manager: rework error handling and logging in manager_reload()
let's clean up error handling and logging in manager_reload() a bit.
Specifically: make sure we log about every error we might encounter at
least and at most once.

When we encounter an error before the "point of no return" then log at
LOG_ERR about it and propagate it. Otherwise, eat it up, but warn about
it and proceed, it's the best we can do.
2018-10-09 19:43:43 +02:00
Lennart Poettering eb10d0bf8a core: add comments about n_reloading to manager_deserialize() 2018-10-09 19:43:43 +02:00
Lennart Poettering 18869883f2 core: handle OOM during deserialization always the same way
OOM failures we consider fatal, while other failures we generally skip
over.
2018-10-09 19:43:43 +02:00
Lennart Poettering b2a8a3dd10 core: clean up deserialization log messages a bit
Always, say that we ignore these kind of issues. We already say that for
many fields, but for a few this was missing.
2018-10-09 19:43:43 +02:00
Lennart Poettering 7eb4f32612 core: make sure manager_run_generators() logs about all errors
Since it's mostly a wrapper around execute_directories() it already logs
in most cases, but a few were missing. Fix that.
2018-10-09 19:43:43 +02:00
Lennart Poettering 4daf832afa core: allow manager_serialize() to fail correctly
If manager_serialize() fails in the middle (which it hopefully doesn't)
make sure to fix up m->n_reloading correctly again so that we don't
leave it > 0 when it really shouldn't be.
2018-10-09 19:43:43 +02:00
Lennart Poettering 638cece45d core: clean up test run flags
Let's make them typesafe, and let's add a nice macro helper for checking
if we are in a test run, which should make testing for this much easier
to read for most cases.
2018-10-09 19:43:43 +02:00
Lennart Poettering c52b19d65f manager: normalize /run disk space checks
Let's avoid using a variable needlessly. More importantly, special case
the error, not the regular case.
2018-10-09 19:43:43 +02:00
Lennart Poettering 86036b26a1 core: tiny tweak for cgroup trimming during manager_free()
Instead of blacklisting when not to trim the cgroup tree, let's instead
whitelist when to do it, as an excercise of being careful when being
destructive.

This should not change behaviour with exception that during switch roots
we now won't attempt to trim the cgroup tree anymore. Which is more
correct behaviour after all we serialize/deserialize during the
transition and should be needlessly destructive.
2018-10-09 19:43:43 +02:00
Lennart Poettering 3ad228ce75 core: use structure initialization for Manager
No changes in behaviour, just a nicer way to fill in the Manager
initially.
2018-10-09 19:43:43 +02:00
Lennart Poettering ed4ac965fa manager: rework test flags set
No reason to avoid bit 0.

Also, fix some tests that pass "true" as flags value, which is just
wrong.
2018-10-09 19:43:43 +02:00
Lennart Poettering af41e5086d core: rename ManagerExitCode → ManagerObjective
"ExitCode" is a bit of a misnomer in two ways: it suggests this was
about the "exit code" concept that exit()/waitid() deal with, but really
isn't. Moreover, it's not event just about exiting either, but more
often about reloading/reexecing or rebooting. Let's hence pick a new
name for this that is a bit more correct.

I initially thought about naming this the "state", but that'd be a
misnomer too, as the value really encodes a "goal" more than a current
state. Also we already have the externally visible ManagerState.

No actual changes in behaviour, just the rename.
2018-10-09 19:43:43 +02:00
Lennart Poettering 899987456c manager: add explanatory comment regarding ManagerState 2018-10-09 19:43:43 +02:00
Lennart Poettering 2cc856ac89 main: minor coding style update 2018-10-09 19:43:43 +02:00
Roman Gushchin 084c700780 core: support cgroup v2 device controller
Cgroup v2 provides the eBPF-based device controller, which isn't currently
supported by systemd. This commit aims to provide such support.

There are no user-visible changes, just the device policy and whitelist
start working if cgroup v2 is used.
2018-10-09 09:47:51 -07:00
Roman Gushchin 91cfdd8d29 core: bump mlock ulimit to 64Mb
Bpf programs are charged against memlock ulimit, and the default value
can be too tight on machines with many cgroups and attached bpf programs.

Let's bump it to 64Mb.
2018-10-09 09:46:36 -07:00
Roman Gushchin 17f149556a core: refactor bpf firewall support into a pseudo-controller
The idea is to introduce a concept of bpf-based pseudo-controllers
to make adding new bpf-based features easier.
2018-10-09 09:46:08 -07:00
Lennart Poettering cb5491ee4d
Merge pull request #10324 from poettering/audit-serialize-bool
properly serialize in_audit boolean
2018-10-09 11:59:05 +02:00
Franck Bui c6885f5f36 core: introduce systemd.early_core_pattern= kernel cmdline option
Until a core dump handler is installed by systemd-sysctl, the generation of
core dump for services is turned OFF which can make the debugging of the early
boot process harder especially since there's no easy way to restore the core
dump generation.

This patch introduces a new kernel command line option which specifies an
absolute path where the kernel should write the core dump file when an early
process crashes.

This will take effect until systemd-coredump (or any other handlers) takes
over.
2018-10-09 10:26:23 +02:00
Lennart Poettering 0e699122b7 core: properly serialize "in_audit" per-unit boolean
Fixes: #9962
2018-10-09 10:09:39 +02:00
Lennart Poettering 256f65d045 core: rearrange conditions in unit_notify() a bit
This shouldn't change control flow, with one exception: we won't send
notifications for boot progress to plymouth anymore during reload, which
is something we really shouldn't.
2018-10-09 10:09:39 +02:00
Zbigniew Jędrzejewski-Szmek 29abe1664e
Merge pull request #10159 from poettering/killall-spree-kernel-thread
killall.c fixes regarding kernel thread detection
2018-10-08 20:12:18 +02:00
Yu Watanabe c250bf671b core/dbus-execute: fix parsing CPUScheduling* and Nice for transient services
Fixes #10290.
2018-10-05 21:41:05 +02:00
Lennart Poettering 334415b16e
Merge pull request #10094 from keszybz/wants-loading
Fix bogus fragment paths in units in .wants/.requires
2018-10-05 17:36:31 +02:00
Zbigniew Jędrzejewski-Szmek 7c3733d5de pid1: remove unnecessary error reassignment
LGTM was complaining:
> Comparison is always true because r >= 0.
2018-10-02 15:36:24 +02:00
Anita Zhang c87700a133 Make Watchdog Signal Configurable
Allows configuring the watchdog signal (with a default of SIGABRT).
This allows an alternative to SIGABRT when coredumps are not desirable.

Appropriate references to SIGABRT or aborting were renamed to reflect
more liberal watchdog signals.

Closes #8658
2018-09-26 16:14:29 +02:00
Lennart Poettering ee8d493cbd
Merge pull request #10158 from keszybz/seccomp-log-tightening
Seccomp log tightening
2018-09-26 15:56:32 +02:00
Lennart Poettering d3a94b3e80 killall: (void)ify more things 2018-09-25 12:50:35 +02:00
Lennart Poettering 20ca2d10bd killall: filter out bogus PIDs
we might as well filter these too since negative PIDs have special
semantics in kill(), and we should never trigger that...
2018-09-25 12:50:35 +02:00
Lennart Poettering e45154c770 killall: use is_kernel_thread() during killing spree process filtering too
Apparently the new "bpfilter" subsystem otherwise confuses us.

See: https://lists.freedesktop.org/archives/systemd-devel/2018-September/041392.html
2018-09-25 12:50:35 +02:00
Lennart Poettering 7c428bb5d5
Merge pull request #10059 from yuwata/env-exec-directory
core: introduce $RUNTIME_DIRECTORY= or friends
2018-09-25 12:34:30 +02:00
Zbigniew Jędrzejewski-Szmek 3318fd9c24
Merge pull request #10073 from xnox/execve
Execute generators with manager's environment exported
2018-09-25 10:07:23 +02:00
Yu Watanabe 6c9c51e5e2 fs-util: make symlink_idempotent() optionally create relative link 2018-09-24 18:52:53 +03:00
Zbigniew Jędrzejewski-Szmek b54f36c604 seccomp: reduce logging about failure to add syscall to seccomp
Our logs are full of:
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldstat() / -10037, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call get_thread_area() / -10076, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call set_thread_area() / -10079, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldfstat() / -10034, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldolduname() / -10036, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldlstat() / -10035, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call waitpid() / -10073, ignoring: Numerical argument out of domain
...
This is pointless and makes debug logs hard to read. Let's keep the logs
in test code, but disable it in nspawn and pid1. This is done through a function
parameter because those functions operate recursively and it's not possible to
make the caller to log meaningfully.


There should be no functional change, except the skipped debug logs.
2018-09-24 17:21:09 +02:00
Dimitri John Ledkov a3156a8ee4 core: execute generators with manager's environmnet 2018-09-24 13:40:50 +01:00
Dimitri John Ledkov ea368f0bd2 core: execute environment_generators with manager's environment 2018-09-24 13:40:10 +01:00
Dimitri John Ledkov 78ec1bb436 exec-util: in execute_directories, support initial exec environment 2018-09-24 13:40:10 +01:00
Yu Watanabe 93bab28895 tree-wide: use typesafe_qsort() 2018-09-19 08:02:52 +09:00
Alexander Filippov 047de7e1b1 core: fix the check if CONFIG_CGROUP_BPF is on
Since the commit torvalds/linux@fdb5c4531c
the syscall BPF_PROG_ATTACH return EBADF when CONFIG_CGROUP_BPF is
turned off and as result the bpf_firewall_supported() returns the
incorrect value.

This commmit replaces the syscall BPF_PROG_ATTACH with BPF_PROG_DETACH
which is still work as expected.

Resolves openbmc/linux#159
See also systemd/systemd#7054

Signed-off-by: Alexander Filippov <a.filippov@yadro.com>
2018-09-18 16:19:51 +02:00
Yu Watanabe aca835ed2e core/execute: do not use the negative errno when setup_namespace() returns -ENOANO
Without this, log shows meaningless error message 'No anode', e.g.,
===
Failed to unshare the mount namespace: Operation not permitted
foo.service: Failed to set up mount namespacing: No anode
foo.service: Failed at step NAMESPACE spawning /usr/bin/test: No anode
===

Follow-up for 1beab8b0d0.
2018-09-18 14:31:09 +09:00
Yu Watanabe 2e4a4faea8 core/namespace: add more log messages 2018-09-18 14:31:09 +09:00
Zbigniew Jędrzejewski-Szmek 23e8c79665 pid1: drop now-unused path parameter to resolve_template() 2018-09-15 20:03:32 +02:00
Zbigniew Jędrzejewski-Szmek 5a72417084 pid1: drop unused path parameter to add_two_dependencies_by_name() 2018-09-15 20:02:00 +02:00
Zbigniew Jędrzejewski-Szmek 35d8c19ace pid1: drop now-unused path parameter to add_dependency_by_name() 2018-09-15 19:57:52 +02:00
Zbigniew Jędrzejewski-Szmek 0c062fd3eb systemd: do not pass .wants fragment path to manager_load_unit
When loading units, sometimes we'd first encounter a unit from .wants or
.requires directory. A typical case would be when multi-user.target.wants/
contains a symlink to some unit. We would prepare to load this unit using
/etc/systemd/system/multi-user.target.wants/foo.service as the fragment
path. This is always wrong. Instead, let's use NULL as the path and let
manager_load_unit() figure out the path on its own.

Fixes #9921.

    path=0x5625ed9b01a0 "/usr/lib/systemd/system/local-fs.target.wants/systemd-remount-fs.service", e=0x0,
    _ret=0x7ffe64645000) at ../src/core/manager.c:1887
    name=0x5625ed9b01ce "systemd-remount-fs.service",
    path=0x5625ed9b01a0 "/usr/lib/systemd/system/local-fs.target.wants/systemd-remount-fs.service", e=0x0,
    _ret=0x7ffe64645000) at ../src/core/manager.c:1961
    name=0x5625ed9b01ce "systemd-remount-fs.service",
    path=0x5625ed9b01a0 "/usr/lib/systemd/system/local-fs.target.wants/systemd-remount-fs.service",
    add_reference=true, mask=UNIT_DEPENDENCY_FILE) at ../src/core/unit.c:2946
    dir_suffix=0x5625ebb179ed ".wants") at ../src/core/load-dropin.c:95
    path=0x0, e=0x0, _ret=0x7ffe646452c0) at ../src/core/manager.c:1965
    name=0x5625ebb186f8 "local-fs.target", path=0x0, add_reference=true,
    mask=UNIT_DEPENDENCY_MOUNTINFO_IMPLICIT) at ../src/core/unit.c:2946
    where=0x5625ed9b3cc0 "/tmp", options=0x5625ed947110 "rw,nosuid,nodev,seclabel",
    fstype=0x5625ed95be90 "tmpfs", flags=0x7ffe64645395) at ../src/core/mount.c:1439
    where=0x5625ed9b3cc0 "/tmp", options=0x5625ed947110 "rw,nosuid,nodev,seclabel",
    fstype=0x5625ed95be90 "tmpfs", set_flags=false) at ../src/core/mount.c:1567
    at ../src/core/mount.c:1635
    ret_retval=0x7ffe64645660, ret_shutdown_verb=0x7ffe646456c0, ret_fds=0x7ffe646456d8,
    ret_switch_root_dir=0x7ffe646456b0, ret_switch_root_init=0x7ffe646456b8,
    ret_error_message=0x7ffe646456c8) at ../src/core/main.c:1669
2018-09-15 19:43:58 +02:00
Yu Watanabe fb2042dd55 core: add new environment variable $RUNTIME_DIRECTORY= or friends
The variable is generated from RuntimeDirectory= or friends.
If multiple directories are set, then they are concatenated with
the separator ':'.
2018-09-13 17:02:58 +09:00
Yu Watanabe 7c1cb6f198 core: add one more assert() 2018-09-13 17:02:58 +09:00
Yu Watanabe 76a9460d44 core: fix assert() about number of built environment variables
Follow-up for 4b58153dd2 and
fd63e712b2.
2018-09-13 17:02:58 +09:00
Yu Watanabe 209c6f03bc core/umount: use structured initializers 2018-09-10 16:48:37 +09:00
Yu Watanabe 8437c0593f tree-wide: replace device_enumerator_scan_devices()+FOREACH_DEVICE_AND_SUBSYSTEM() by FOREACH_DEVICE() 2018-09-10 16:48:37 +09:00
Yu Watanabe 0de4876496 core/socket: fix memleak in the error paths in usbffs_dispatch_eps() 2018-09-03 14:25:08 +09:00
Yu Watanabe 0c09cb0e78
Merge pull request #9977 from sourcejedi/no-remount-superblock3
Namespace fixes
2018-09-01 23:18:01 +09:00
Alan Jenkins fcac12d150 namespace: remove redundant .has_prefix=false
The MountEntry's added for EMPTY_DIR work very similarly to the TMPFS ones.
In both cases, .has_prefix is false.  In fact, .has_prefix is false in
*all* the MountEntry's we add except for the access mounts (READONLY etc).

But EMPTY_DIR stuck out by explicitly setting .has_prefix = false.
Let's remove that.
2018-09-01 17:23:01 +09:00
Alan Jenkins 4a756839e6 namespace: we always use a root_directory now
We changed to always setup the new namespace in a separate directory
(commit 0722b35)
2018-09-01 17:23:01 +09:00
Alan Jenkins ad8e66dcc4 namespace: fix mode for TemporaryFileSystem=
... when no mount options are passed.

Change the code, to avoid the following failure in the newly added tests:

exec-temporaryfilesystem-rw.service: Executing: /usr/bin/sh -x -c
'[ "$(stat -c %a /var)" == 755 ]'
++ stat -c %a /var
+ '[' 1777 == 755 ']'
Received SIGCHLD from PID 30364 (sh).
Child 30364 (sh) died (code=exited, status=1/FAILURE)

(And I spotted an opportunity to use TAKE_PTR() at the end).
2018-09-01 17:22:14 +09:00
Alan Jenkins 69338c3dfb namespace: don't try to remount superblocks
We can't remount the underlying superblocks, if we are inside a user
namespace and running Linux <= 4.17.  We can only change the per-mount
flags (MS_REMOUNT | MS_BIND).

This type of mount() call can only change the per-mount flags, so we
don't have to worry about passing the right string options now.

Fixes #9914 ("Since 1beab8b was merged, systemd has been failing to start
systemd-resolved inside unprivileged containers" ... "Failed to re-mount
'/run/systemd/unit-root/dev' read-only: Operation not permitted").

> It's basically my fault :-). I pointed out we could remount read-only
> without MS_BIND when reviewing the PR that added TemporaryFilesystem=,
> and poettering suggested to change PrivateDevices= at the same time.
> I think it's safe to change back, and I don't expect anyone will notice
> a difference in behaviour.
>
> It just surprised me to realize that
> `TemporaryFilesystem=/tmp:size=10M,ro,nosuid` would not apply `ro` to the
> superblock (underlying filesystem), like mount -osize=10M,ro,nosuid does.
> Maybe a comment could note the kernel version (v4.18), that lets you
> remount without MS_BIND inside a user namespace.

This makes the code longer and I guess this function is still ugly, sorry.
One obstacle to cleaning it up is the interaction between
`PrivateDevices=yes` and `ReadOnlyPaths=/dev`.  I've added a test for the
existing behaviour, which I think is now the correct behaviour.
2018-08-30 11:17:16 +01:00
Yu Watanabe 3343021755 dynamic-user: drop unnecessary initialization 2018-08-29 23:04:26 +09:00
Yu Watanabe 288ca7af8c dynamic-user: fix potential segfault 2018-08-28 05:09:00 +09:00
Yu Watanabe 8301aa0bf1 tree-wide: use DEFINE_TRIVIAL_REF_UNREF_FUNC() macro or friends where applicable 2018-08-27 14:01:46 +09:00
Yu Watanabe cf4b2f9906 tree-wide: use unsigned for refcount 2018-08-27 13:48:04 +09:00
Yu Watanabe 4366e598ae core: replace udev_device by sd_device 2018-08-23 04:57:39 +09:00
Yu Watanabe 6bcf00eda3 core/umount: replace udev_device by sd_device 2018-08-23 04:57:39 +09:00
Tejun Heo 6ae4283cb1 core: add IODeviceLatencyTargetSec
This adds support for the following proposed latency based IO control
mechanism.

  https://lkml.org/lkml/2018/6/5/428
2018-08-22 16:46:18 +02:00
Yu Watanabe 52e4d62550
Merge pull request #9852 from poettering/namespace-errno
namespace: be more careful when handling namespacing failures
2018-08-22 11:16:29 +09:00
Lennart Poettering 1beab8b0d0 namespace: be more careful when handling namespacing failures gracefully
This makes two changes to the namespacing code:

1. We'll only gracefully skip service namespacing on access failure if
   exclusively sandboxing options where selected, and not mount-related
   options that result in a very different view of the world. For example,
   ignoring RootDirectory=, RootImage= or Bind= is really probablematic,
   but ReadOnlyPaths= is just a weaker sandbox.

2. The namespacing code will now return a clearly recognizable error
   code when it cannot enforce its namespacing, so that we cannot
   confuse EPERM errors from mount() with those from unshare(). Only the
   errors from the first unshare() are now taken as hint to gracefully
   disable namespacing.

Fixes: #9844 #9835
2018-08-21 20:00:33 +02:00
aszlig 66c91c3a23 umount: Don't use options from fstab on remount
The fstab entry may contain comment/application-specific options, like
for example x-systemd.automount or x-initrd.mount.

With the recent switch to libmount, the mount options during remount are
now gathered via mnt_fs_get_options(), which returns the merged fstab
options with the effective options in mountinfo.

Unfortunately if one of these application-specific options are set in
fstab, the remount will fail with -EINVAL.

In systemd 238:

  Remounting '/test-x-initrd-mount' read-only in with options
  'errors=continue,user_xattr,acl'.

In systemd 239:

  Remounting '/test-x-initrd-mount' read-only in with options
  'errors=continue,user_xattr,acl,x-initrd.mount'.
  Failed to remount '/test-x-initrd-mount' read-only: Invalid argument

So instead of using mnt_fs_get_options(), we're now using both
mnt_fs_get_fs_options() and mnt_fs_get_vfs_options() and merging the
results together so we don't get any non-relevant options from fstab.

Signed-off-by: aszlig <aszlig@nix.build>
2018-08-21 19:49:51 +02:00
Zbigniew Jędrzejewski-Szmek 0566668016
Merge pull request #9712 from filbranden/socket1
socket-util: Introduce send_one_fd_iov() and receive_one_fd_iov()
2018-08-21 19:45:44 +02:00
Zbigniew Jędrzejewski-Szmek 7692fed98b
Merge pull request #9783 from poettering/get-user-creds-flags
beef up get_user_creds() a bit and other improvements
2018-08-21 10:09:33 +02:00
Zbigniew Jędrzejewski-Szmek 00c4361878
Merge pull request #9853 from poettering/uneeded-queue
rework StopWhenUnneeded=1 logic
2018-08-21 10:06:30 +02:00
Lennart Poettering fafff8f1ff user-util: rework get_user_creds()
Let's fold get_user_creds_clean() into get_user_creds(), and introduce a
flags argument for it to select "clean" behaviour. This flags parameter
also learns to other new flags:

- USER_CREDS_SYNTHESIZE_FALLBACK: in this mode the user records for
  root/nobody are only synthesized as fallback. Normally, the synthesized
  records take precedence over what is in the user database.  With this
  flag set this is reversed, and the user database takes precedence, and
  the synthesized records are only used if they are missing there. This
  flag should be set in cases where doing NSS is deemed safe, and where
  there's interest in knowing the correct shell, for example if the
  admin changed root's shell to zsh or suchlike.

- USER_CREDS_ALLOW_MISSING: if set, and a UID/GID is specified by
  numeric value, and there's no user/group record for it accept it
  anyway. This allows us to fix #9767

This then also ports all users to set the most appropriate flags.

Fixes: #9767

[zj: remove one isempty() call]
2018-08-20 15:58:21 +02:00
Lennart Poettering b2a60844c4 namespace: when creating device nodes, also create /dev/char/* symlinks
On the host these symlinks are created by udev, and we consider them API
and make use of them ourselves at various places. Hence when running a
private /dev, also create these symlinks so that lookups by major/minor
work in such an environment, too.
2018-08-20 15:58:11 +02:00
Zbigniew Jędrzejewski-Szmek a9e241dcb3
Merge pull request #9801 from yuwata/analyze-cleanups
analyze: several improvements
2018-08-20 13:12:53 +02:00
Lennart Poettering 3cd24c1aa9 core: when setting up PAM, try to get tty of STDIN_FILENO if not set explicitly
When stdin/stdout/stderr is initialized from an fd, let's read the tty
name of it if we can, and pass that to PAM.

This makes sure that "machinectl shell" sessions have proper TTY fields
initialized that "loginctl" then shows.
2018-08-20 12:28:17 +02:00
Lennart Poettering 37ec0fdd34 tree-wide: add clickable man page link to all --help texts
This is a bit like the info link in most of GNU's --help texts, but we
don't do info but man pages, and we make them properly clickable on
terminal supporting that, because awesome.

I think it's generally advisable to link up our (brief) --help texts and
our (more comprehensive) man pages a bit, so this should be an easy and
straight-forward way to do it.
2018-08-20 11:33:04 +02:00
Zbigniew Jędrzejewski-Szmek fda09318e3 core: rename function to better reflect semantics 2018-08-20 10:43:31 +02:00
Lennart Poettering a3c1168ac2 core: rework StopWhenUnneeded= logic
Previously, we'd act immediately on StopWhenUnneeded= when a unit state
changes. With this rework we'll maintain a queue instead: whenever
there's the chance that StopWhenUneeded= might have an effect we enqueue
the unit, and process it later when we have nothing better to do.

This should make the implementation a bit more reliable, as the unit notify event
cannot immediately enqueue tons of side-effect jobs that might
contradict each other, but we do so only in a strictly ordered fashion,
from the main event loop.

This slightly changes the check when to consider a unit "unneeded".
Previously, we'd assume that a unit in "deactivating" state could also
be cleaned up. With this new logic we'll only consider units unneeded
that are fully up and have no job queued. This means that whenever
there's something pending for a unit we won't clean it up.
2018-08-10 16:19:01 +02:00
Zbigniew Jędrzejewski-Szmek b257b19e6b
Merge pull request #9848 from yuwata/fix-9835-9844
core: namespace fixes
2018-08-10 15:36:34 +02:00
Yu Watanabe 4c3a2b84d8 core/execute: fix dump format for Limit*=
Fixes #9846.
2018-08-10 11:59:16 +02:00
Yu Watanabe 763a260ae7 core/namespace: add more log messages
Suggested by #9835.
2018-08-10 14:30:35 +09:00
Yu Watanabe 7e8d494b33 core: use memcpy_safe()
Fixes #9738.
2018-08-08 17:11:43 +09:00
Lennart Poettering 91f4424012
Merge pull request #9817 from yuwata/shorten-error-logging
tree-wide: Shorten error logging and several code cleanups
2018-08-07 10:44:44 +02:00
Lennart Poettering 6f663594bc
Merge pull request #9744 from yuwata/fix-9737
Make RootImage= work with PrivateDevices=
2018-08-07 09:55:07 +02:00
Yu Watanabe fc95c359f6 tree-wide: use returned value from log_*_errno() 2018-08-07 15:48:37 +09:00
Filipe Brandenburger a0edd02e43 tree-wide: Convert compare_func's to use CMP() macro wherever possible.
Looked for definitions of functions using the *_compare_func() suffix.

Tested:
- Unit tests passed (ninja -C build/ test)
- Installed this build and booted with it.
2018-08-06 19:26:35 -07:00
Yu Watanabe 4ae25393f3 tree-wide: shorten error logging a bit
Continuation of 4027f96aa0.
2018-08-07 10:14:33 +09:00
Yu Watanabe 7bc740f480 core: add comments about timestamps stored in manager 2018-08-06 22:21:05 +09:00
Yu Watanabe fe65e88ba6 namespace: implicitly adds DeviceAllow= when RootImage= is set
RootImage= may require the following settings
```
DeviceAllow=/dev/loop-control rw
DeviceAllow=block-loop rwm
DeviceAllow=block-blkext rwm
```
This adds the following settings implicitly when RootImage= is
specified.

Fixes #9737.
2018-08-06 14:02:31 +09:00
Yu Watanabe fd870bac25 core: introduce cgroup_add_device_allow() 2018-08-06 13:42:14 +09:00
Yu Watanabe 839f187753 core/namespace: drop mount points outside of root even if RootDirectory= is not set 2018-08-06 12:51:33 +09:00
Yu Watanabe 9b68367b3a core/namespace: drop conditions depends on root is empty or not
After 0722b35934, the variable `root`
is always set.
2018-08-06 12:51:33 +09:00
Filipe Brandenburger d34673ecb8 socket-util: Introduce send_one_fd_iov() and receive_one_fd_iov()
These take a struct iovec to send data together with the passed FD.

The receive function returns the FD through an output argument. In case data is
received, but no FD is passed, the receive function will set the output
argument to -1 explicitly.

Update code in dynamic-user to use the new helpers.
2018-08-02 09:25:04 -07:00
Zbigniew Jędrzejewski-Szmek 5b316330be
Merge pull request #9624 from poettering/service-state-flush
flush out ExecStatus structures when a new service cycle begins
2018-08-02 09:50:39 +02:00
Zbigniew Jędrzejewski-Szmek 7426028b7a
Merge pull request #9720 from yuwata/fix-9702
Fix DynamicUser=yes with static User= whose UID and GID are different
2018-07-26 11:42:00 +02:00
Zbigniew Jędrzejewski-Szmek 54fe2ce1b9
Merge pull request #9504 from poettering/nss-deadlock
some nss deadlock love
2018-07-26 10:16:25 +02:00
Zbigniew Jędrzejewski-Szmek cf6e28f3cb
Merge pull request #9484 from poettering/permille-everywhere
Permille everywhere
2018-07-26 10:13:56 +02:00
Yu Watanabe 25a1df7c65 core: fix gid when DynamicUser=yes with static User=
When DynamicUser=yes and static User= are set, and the user has
different uid and gid, then as the storage socket for the dynamic
user does not contains gid, we need to obtain gid.

Follow-up for 9ec655cbbd.

Fixes #9702.
2018-07-26 15:38:18 +09:00
Lennart Poettering 5686391b00 core: introduce new Type=exec service type
Users are often surprised that "systemd-run" command lines like
"systemd-run -p User=idontexist /bin/true" will return successfully,
even though the logs show that the process couldn't be invoked, as the
user "idontexist" doesn't exist. This is because Type=simple will only
wait until fork() succeeded before returning start-up success.

This patch adds a new service type Type=exec, which is very similar to
Type=simple, but waits until the child process completed the execve()
before returning success. It uses a pipe that has O_CLOEXEC set for this
logic, so that the kernel automatically sends POLLHUP on it when the
execve() succeeded but leaves the pipe open if not. This means PID 1
waits exactly until the execve() succeeded in the child, and not longer
and not shorter, which is the desired functionality.

Making use of this new functionality, the command line
"systemd-run -p User=idontexist -p Type=exec /bin/true" will now fail,
as expected.
2018-07-25 22:48:11 +02:00
Lennart Poettering ce0d60a7c4 execute: use our usual syntax for defining bit masks 2018-07-25 22:48:11 +02:00
Lennart Poettering 25b583d7ff core: swap order of "n_storage_fds" and "n_socket_fds" parameters
When process fd lists to pass to activated programs we always place the
socket activation fds first, and the storage fds last. Irritatingly in
almost all calls the "n_storage_fds" parameter (i.e. the number of
storage fds to pass) came first so far, and the "n_socket_fds" parameter
second. Let's clean this up, and specify the number of fds in the order
the fds themselves are passed.

(Also, let's fix one more case where "unsigned" was used to size an
array, while we should use "size_t" instead.)
2018-07-25 22:48:11 +02:00
Lennart Poettering f806dfd345 tree-wide: increase granularity of percent specifications all over the place to permille
We so far had various placed we'd parse percentages with
parse_percent(). Let's make them use parse_permille() instead, which is
downward compatible (as it also parses percent values), and increases
the granularity a bit. Given that on the wire we usually normalize
relative specifications to something like UINT32_MAX anyway changing
from base-100 to base-1000 calculations can be done easily without
breaking compat.

This commit doesn't document this change in the man pages. While
allowing more precise specifcations permille is not as commonly
understood as perent I guess, hence let's keep this out of the docs for
now.
2018-07-25 16:14:45 +02:00
Zbigniew Jędrzejewski-Szmek 2b5107e162 core/main: use return log_*_errno more 2018-07-25 14:48:07 +02:00
Lennart Poettering ac7ec2883d main: use log_error_errno() at one more place 2018-07-25 12:31:50 +02:00
Zbigniew Jędrzejewski-Szmek 8d455017ee Drop more copyright headers 2018-07-24 14:40:38 +02:00
Lennart Poettering ae0db6f132
Merge pull request #9687 from yuwata/rfe-9662
analyze: several systemd-analyze plot improvements
2018-07-24 09:43:57 +02:00
Lennart Poettering 0dbff467f7
Merge pull request #9685 from yuwata/fix-9663
core: serialize and deserialize current ShowStatus
2018-07-23 21:17:07 +02:00
Yu Watanabe 665a9774e0 core: expose initrd related timestamps on bus 2018-07-24 03:47:07 +09:00
Yu Watanabe d4ee7bd849 core: serialize/deserialize several timestamps on initrd in different names 2018-07-24 03:45:51 +09:00
Yu Watanabe bee38b5cf8 core: serialize and deserialize current ShowStatus
Fixes #9663.
2018-07-23 23:42:48 +09:00
Yu Watanabe 7a293242e0 core: normalize ShowStatus 2018-07-23 21:55:26 +09:00
Jon Ringle fbb48d4c66 Make final kill signal configurable
Usecase is to allow changing the final kill from SIGKILL to SIGQUIT which
should create a core dump useful for debugging why the service didn't stop
with the SIGTERM
2018-07-23 13:44:54 +02:00
Lennart Poettering 6a1d4d9fa6 core: properly reset all ExecStatus structures when entering a new unit cycle
Whenever a unit is started fresh we should flush out any runtime data
from the previous cycle. We are pretty good at that already, but what so
far we missed was the ExecStart=/ExecStop=/… command exit status data.
Let's fix that, and properly flush out that stuff too.

Consider this service:

    [Service]
    ExecStart=/bin/sleep infinity
    ExecStop=/bin/false

When this service is started, then stopped and then started again
"systemctl status" would show the ExecStop= results of the previous run
along with the ExecStart= results of the current one, which is very
confusing. With this patch this is corrected: the data is kept right
until the moment the new service cycle starts, and then flushed out.
Hence "systemctl status" in that case will only show the ExecStart=
data, but no ExecStop= data, like it should be.

This should fix part of the confusion of #9588
2018-07-23 13:36:47 +02:00
Lennart Poettering 42cb05d5ff execute: document what the different structures are for in comments 2018-07-23 13:36:47 +02:00
Lennart Poettering ee39ca20c6 core: drop "argv" field from ExecParameter structure
We always initialize it from the same field in ExecCommand anyway, hence
there's no point in passing it separately to exec_spawn(), after all we
already pass the ExecCommand structure itself anyway.

No change in behaviour.
2018-07-23 13:36:47 +02:00
Lennart Poettering 2ed26ed065 execute: use structure initialization when filling in exec status 2018-07-23 13:36:47 +02:00
Yu Watanabe f330408d62 tree-wide: drop empty lines in comments 2018-07-23 08:44:24 +02:00
Lennart Poettering d521916d0f pid1: tell PAM/NSS modules why we are calling them 2018-07-20 16:57:35 +02:00
Lennart Poettering f606cd16d3
Merge pull request #9500 from zsol/append
Add support for opening files for appending
2018-07-20 15:45:08 +02:00