Commit Graph

4252 Commits

Author SHA1 Message Date
Anita Zhang c87700a133 Make Watchdog Signal Configurable
Allows configuring the watchdog signal (with a default of SIGABRT).
This allows an alternative to SIGABRT when coredumps are not desirable.

Appropriate references to SIGABRT or aborting were renamed to reflect
more liberal watchdog signals.

Closes #8658
2018-09-26 16:14:29 +02:00
Lennart Poettering ee8d493cbd
Merge pull request #10158 from keszybz/seccomp-log-tightening
Seccomp log tightening
2018-09-26 15:56:32 +02:00
Lennart Poettering 7c428bb5d5
Merge pull request #10059 from yuwata/env-exec-directory
core: introduce $RUNTIME_DIRECTORY= or friends
2018-09-25 12:34:30 +02:00
Zbigniew Jędrzejewski-Szmek 3318fd9c24
Merge pull request #10073 from xnox/execve
Execute generators with manager's environment exported
2018-09-25 10:07:23 +02:00
Yu Watanabe 6c9c51e5e2 fs-util: make symlink_idempotent() optionally create relative link 2018-09-24 18:52:53 +03:00
Zbigniew Jędrzejewski-Szmek b54f36c604 seccomp: reduce logging about failure to add syscall to seccomp
Our logs are full of:
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldstat() / -10037, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call get_thread_area() / -10076, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call set_thread_area() / -10079, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldfstat() / -10034, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldolduname() / -10036, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldlstat() / -10035, ignoring: Numerical argument out of domain
Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call waitpid() / -10073, ignoring: Numerical argument out of domain
...
This is pointless and makes debug logs hard to read. Let's keep the logs
in test code, but disable it in nspawn and pid1. This is done through a function
parameter because those functions operate recursively and it's not possible to
make the caller to log meaningfully.


There should be no functional change, except the skipped debug logs.
2018-09-24 17:21:09 +02:00
Dimitri John Ledkov a3156a8ee4 core: execute generators with manager's environmnet 2018-09-24 13:40:50 +01:00
Dimitri John Ledkov ea368f0bd2 core: execute environment_generators with manager's environment 2018-09-24 13:40:10 +01:00
Dimitri John Ledkov 78ec1bb436 exec-util: in execute_directories, support initial exec environment 2018-09-24 13:40:10 +01:00
Yu Watanabe 93bab28895 tree-wide: use typesafe_qsort() 2018-09-19 08:02:52 +09:00
Alexander Filippov 047de7e1b1 core: fix the check if CONFIG_CGROUP_BPF is on
Since the commit torvalds/linux@fdb5c4531c
the syscall BPF_PROG_ATTACH return EBADF when CONFIG_CGROUP_BPF is
turned off and as result the bpf_firewall_supported() returns the
incorrect value.

This commmit replaces the syscall BPF_PROG_ATTACH with BPF_PROG_DETACH
which is still work as expected.

Resolves openbmc/linux#159
See also systemd/systemd#7054

Signed-off-by: Alexander Filippov <a.filippov@yadro.com>
2018-09-18 16:19:51 +02:00
Yu Watanabe aca835ed2e core/execute: do not use the negative errno when setup_namespace() returns -ENOANO
Without this, log shows meaningless error message 'No anode', e.g.,
===
Failed to unshare the mount namespace: Operation not permitted
foo.service: Failed to set up mount namespacing: No anode
foo.service: Failed at step NAMESPACE spawning /usr/bin/test: No anode
===

Follow-up for 1beab8b0d0.
2018-09-18 14:31:09 +09:00
Yu Watanabe 2e4a4faea8 core/namespace: add more log messages 2018-09-18 14:31:09 +09:00
Yu Watanabe fb2042dd55 core: add new environment variable $RUNTIME_DIRECTORY= or friends
The variable is generated from RuntimeDirectory= or friends.
If multiple directories are set, then they are concatenated with
the separator ':'.
2018-09-13 17:02:58 +09:00
Yu Watanabe 7c1cb6f198 core: add one more assert() 2018-09-13 17:02:58 +09:00
Yu Watanabe 76a9460d44 core: fix assert() about number of built environment variables
Follow-up for 4b58153dd2 and
fd63e712b2.
2018-09-13 17:02:58 +09:00
Yu Watanabe 209c6f03bc core/umount: use structured initializers 2018-09-10 16:48:37 +09:00
Yu Watanabe 8437c0593f tree-wide: replace device_enumerator_scan_devices()+FOREACH_DEVICE_AND_SUBSYSTEM() by FOREACH_DEVICE() 2018-09-10 16:48:37 +09:00
Yu Watanabe 0de4876496 core/socket: fix memleak in the error paths in usbffs_dispatch_eps() 2018-09-03 14:25:08 +09:00
Yu Watanabe 0c09cb0e78
Merge pull request #9977 from sourcejedi/no-remount-superblock3
Namespace fixes
2018-09-01 23:18:01 +09:00
Alan Jenkins fcac12d150 namespace: remove redundant .has_prefix=false
The MountEntry's added for EMPTY_DIR work very similarly to the TMPFS ones.
In both cases, .has_prefix is false.  In fact, .has_prefix is false in
*all* the MountEntry's we add except for the access mounts (READONLY etc).

But EMPTY_DIR stuck out by explicitly setting .has_prefix = false.
Let's remove that.
2018-09-01 17:23:01 +09:00
Alan Jenkins 4a756839e6 namespace: we always use a root_directory now
We changed to always setup the new namespace in a separate directory
(commit 0722b35)
2018-09-01 17:23:01 +09:00
Alan Jenkins ad8e66dcc4 namespace: fix mode for TemporaryFileSystem=
... when no mount options are passed.

Change the code, to avoid the following failure in the newly added tests:

exec-temporaryfilesystem-rw.service: Executing: /usr/bin/sh -x -c
'[ "$(stat -c %a /var)" == 755 ]'
++ stat -c %a /var
+ '[' 1777 == 755 ']'
Received SIGCHLD from PID 30364 (sh).
Child 30364 (sh) died (code=exited, status=1/FAILURE)

(And I spotted an opportunity to use TAKE_PTR() at the end).
2018-09-01 17:22:14 +09:00
Alan Jenkins 69338c3dfb namespace: don't try to remount superblocks
We can't remount the underlying superblocks, if we are inside a user
namespace and running Linux <= 4.17.  We can only change the per-mount
flags (MS_REMOUNT | MS_BIND).

This type of mount() call can only change the per-mount flags, so we
don't have to worry about passing the right string options now.

Fixes #9914 ("Since 1beab8b was merged, systemd has been failing to start
systemd-resolved inside unprivileged containers" ... "Failed to re-mount
'/run/systemd/unit-root/dev' read-only: Operation not permitted").

> It's basically my fault :-). I pointed out we could remount read-only
> without MS_BIND when reviewing the PR that added TemporaryFilesystem=,
> and poettering suggested to change PrivateDevices= at the same time.
> I think it's safe to change back, and I don't expect anyone will notice
> a difference in behaviour.
>
> It just surprised me to realize that
> `TemporaryFilesystem=/tmp:size=10M,ro,nosuid` would not apply `ro` to the
> superblock (underlying filesystem), like mount -osize=10M,ro,nosuid does.
> Maybe a comment could note the kernel version (v4.18), that lets you
> remount without MS_BIND inside a user namespace.

This makes the code longer and I guess this function is still ugly, sorry.
One obstacle to cleaning it up is the interaction between
`PrivateDevices=yes` and `ReadOnlyPaths=/dev`.  I've added a test for the
existing behaviour, which I think is now the correct behaviour.
2018-08-30 11:17:16 +01:00
Yu Watanabe 3343021755 dynamic-user: drop unnecessary initialization 2018-08-29 23:04:26 +09:00
Yu Watanabe 288ca7af8c dynamic-user: fix potential segfault 2018-08-28 05:09:00 +09:00
Yu Watanabe 8301aa0bf1 tree-wide: use DEFINE_TRIVIAL_REF_UNREF_FUNC() macro or friends where applicable 2018-08-27 14:01:46 +09:00
Yu Watanabe cf4b2f9906 tree-wide: use unsigned for refcount 2018-08-27 13:48:04 +09:00
Yu Watanabe 4366e598ae core: replace udev_device by sd_device 2018-08-23 04:57:39 +09:00
Yu Watanabe 6bcf00eda3 core/umount: replace udev_device by sd_device 2018-08-23 04:57:39 +09:00
Tejun Heo 6ae4283cb1 core: add IODeviceLatencyTargetSec
This adds support for the following proposed latency based IO control
mechanism.

  https://lkml.org/lkml/2018/6/5/428
2018-08-22 16:46:18 +02:00
Yu Watanabe 52e4d62550
Merge pull request #9852 from poettering/namespace-errno
namespace: be more careful when handling namespacing failures
2018-08-22 11:16:29 +09:00
Lennart Poettering 1beab8b0d0 namespace: be more careful when handling namespacing failures gracefully
This makes two changes to the namespacing code:

1. We'll only gracefully skip service namespacing on access failure if
   exclusively sandboxing options where selected, and not mount-related
   options that result in a very different view of the world. For example,
   ignoring RootDirectory=, RootImage= or Bind= is really probablematic,
   but ReadOnlyPaths= is just a weaker sandbox.

2. The namespacing code will now return a clearly recognizable error
   code when it cannot enforce its namespacing, so that we cannot
   confuse EPERM errors from mount() with those from unshare(). Only the
   errors from the first unshare() are now taken as hint to gracefully
   disable namespacing.

Fixes: #9844 #9835
2018-08-21 20:00:33 +02:00
aszlig 66c91c3a23 umount: Don't use options from fstab on remount
The fstab entry may contain comment/application-specific options, like
for example x-systemd.automount or x-initrd.mount.

With the recent switch to libmount, the mount options during remount are
now gathered via mnt_fs_get_options(), which returns the merged fstab
options with the effective options in mountinfo.

Unfortunately if one of these application-specific options are set in
fstab, the remount will fail with -EINVAL.

In systemd 238:

  Remounting '/test-x-initrd-mount' read-only in with options
  'errors=continue,user_xattr,acl'.

In systemd 239:

  Remounting '/test-x-initrd-mount' read-only in with options
  'errors=continue,user_xattr,acl,x-initrd.mount'.
  Failed to remount '/test-x-initrd-mount' read-only: Invalid argument

So instead of using mnt_fs_get_options(), we're now using both
mnt_fs_get_fs_options() and mnt_fs_get_vfs_options() and merging the
results together so we don't get any non-relevant options from fstab.

Signed-off-by: aszlig <aszlig@nix.build>
2018-08-21 19:49:51 +02:00
Zbigniew Jędrzejewski-Szmek 0566668016
Merge pull request #9712 from filbranden/socket1
socket-util: Introduce send_one_fd_iov() and receive_one_fd_iov()
2018-08-21 19:45:44 +02:00
Zbigniew Jędrzejewski-Szmek 7692fed98b
Merge pull request #9783 from poettering/get-user-creds-flags
beef up get_user_creds() a bit and other improvements
2018-08-21 10:09:33 +02:00
Zbigniew Jędrzejewski-Szmek 00c4361878
Merge pull request #9853 from poettering/uneeded-queue
rework StopWhenUnneeded=1 logic
2018-08-21 10:06:30 +02:00
Lennart Poettering fafff8f1ff user-util: rework get_user_creds()
Let's fold get_user_creds_clean() into get_user_creds(), and introduce a
flags argument for it to select "clean" behaviour. This flags parameter
also learns to other new flags:

- USER_CREDS_SYNTHESIZE_FALLBACK: in this mode the user records for
  root/nobody are only synthesized as fallback. Normally, the synthesized
  records take precedence over what is in the user database.  With this
  flag set this is reversed, and the user database takes precedence, and
  the synthesized records are only used if they are missing there. This
  flag should be set in cases where doing NSS is deemed safe, and where
  there's interest in knowing the correct shell, for example if the
  admin changed root's shell to zsh or suchlike.

- USER_CREDS_ALLOW_MISSING: if set, and a UID/GID is specified by
  numeric value, and there's no user/group record for it accept it
  anyway. This allows us to fix #9767

This then also ports all users to set the most appropriate flags.

Fixes: #9767

[zj: remove one isempty() call]
2018-08-20 15:58:21 +02:00
Lennart Poettering b2a60844c4 namespace: when creating device nodes, also create /dev/char/* symlinks
On the host these symlinks are created by udev, and we consider them API
and make use of them ourselves at various places. Hence when running a
private /dev, also create these symlinks so that lookups by major/minor
work in such an environment, too.
2018-08-20 15:58:11 +02:00
Zbigniew Jędrzejewski-Szmek a9e241dcb3
Merge pull request #9801 from yuwata/analyze-cleanups
analyze: several improvements
2018-08-20 13:12:53 +02:00
Lennart Poettering 3cd24c1aa9 core: when setting up PAM, try to get tty of STDIN_FILENO if not set explicitly
When stdin/stdout/stderr is initialized from an fd, let's read the tty
name of it if we can, and pass that to PAM.

This makes sure that "machinectl shell" sessions have proper TTY fields
initialized that "loginctl" then shows.
2018-08-20 12:28:17 +02:00
Lennart Poettering 37ec0fdd34 tree-wide: add clickable man page link to all --help texts
This is a bit like the info link in most of GNU's --help texts, but we
don't do info but man pages, and we make them properly clickable on
terminal supporting that, because awesome.

I think it's generally advisable to link up our (brief) --help texts and
our (more comprehensive) man pages a bit, so this should be an easy and
straight-forward way to do it.
2018-08-20 11:33:04 +02:00
Zbigniew Jędrzejewski-Szmek fda09318e3 core: rename function to better reflect semantics 2018-08-20 10:43:31 +02:00
Lennart Poettering a3c1168ac2 core: rework StopWhenUnneeded= logic
Previously, we'd act immediately on StopWhenUnneeded= when a unit state
changes. With this rework we'll maintain a queue instead: whenever
there's the chance that StopWhenUneeded= might have an effect we enqueue
the unit, and process it later when we have nothing better to do.

This should make the implementation a bit more reliable, as the unit notify event
cannot immediately enqueue tons of side-effect jobs that might
contradict each other, but we do so only in a strictly ordered fashion,
from the main event loop.

This slightly changes the check when to consider a unit "unneeded".
Previously, we'd assume that a unit in "deactivating" state could also
be cleaned up. With this new logic we'll only consider units unneeded
that are fully up and have no job queued. This means that whenever
there's something pending for a unit we won't clean it up.
2018-08-10 16:19:01 +02:00
Zbigniew Jędrzejewski-Szmek b257b19e6b
Merge pull request #9848 from yuwata/fix-9835-9844
core: namespace fixes
2018-08-10 15:36:34 +02:00
Yu Watanabe 4c3a2b84d8 core/execute: fix dump format for Limit*=
Fixes #9846.
2018-08-10 11:59:16 +02:00
Yu Watanabe 763a260ae7 core/namespace: add more log messages
Suggested by #9835.
2018-08-10 14:30:35 +09:00
Yu Watanabe 7e8d494b33 core: use memcpy_safe()
Fixes #9738.
2018-08-08 17:11:43 +09:00
Lennart Poettering 91f4424012
Merge pull request #9817 from yuwata/shorten-error-logging
tree-wide: Shorten error logging and several code cleanups
2018-08-07 10:44:44 +02:00
Lennart Poettering 6f663594bc
Merge pull request #9744 from yuwata/fix-9737
Make RootImage= work with PrivateDevices=
2018-08-07 09:55:07 +02:00