Commit Graph

26967 Commits

Author SHA1 Message Date
Lennart Poettering ba128bb809 execute: filter low-level I/O syscalls if PrivateDevices= is set
If device access is restricted via PrivateDevices=, let's also block the
various low-level I/O syscalls at the same time, so that we know that the
minimal set of devices in our virtualized /dev are really everything the unit
can access.
2016-09-25 10:52:57 +02:00
Lennart Poettering 1ecdba149b NEWS: update news about systemd-udevd.service 2016-09-25 10:52:57 +02:00
Lennart Poettering 0c28d51ac8 units: further lock down our long-running services
Let's make this an excercise in dogfooding: let's turn on more security
features for all our long-running services.

Specifically:

- Turn on RestrictRealtime=yes for all of them

- Turn on ProtectKernelTunables=yes and ProtectControlGroups=yes for most of
  them

- Turn on RestrictAddressFamilies= for all of them, but different sets of
  address families for each

Also, always order settings in the unit files, that the various sandboxing
features are close together.

Add a couple of missing, older settings for a numbre of unit files.

Note that this change turns off AF_INET/AF_INET6 from udevd, thus effectively
turning of networking from udev rule commands. Since this might break stuff
(that is already broken I'd argue) this is documented in NEWS.
2016-09-25 10:52:57 +02:00
Lennart Poettering f6eb19a474 units: permit importd to mount stuff
Fixes #3996
2016-09-25 10:52:57 +02:00
Lennart Poettering 6757c06a1a man: shorten the exit status table a bit
Let's merge a couple of columns, to make the table a bit shorter. This
effectively just drops whitespace, not contents, but makes the currently
humungous table much much more compact.
2016-09-25 10:52:57 +02:00
Lennart Poettering 81c8aceed4 man: the exit code/signal is stored in $EXIT_CODE, not $EXIT_STATUS 2016-09-25 10:52:57 +02:00
Lennart Poettering effbd6d2ea man: rework documentation for ReadOnlyPaths= and related settings
This reworks the documentation for ReadOnlyPaths=, ReadWritePaths=,
InaccessiblePaths=. It no longer claims that we'd follow symlinks relative to
the host file system. (Which wasn't true actually, as we didn't follow symlinks
at all in the most recent releases, and we know do follow them, but relative to
RootDirectory=).

This also replaces all references to the fact that all fs namespacing options
can be undone with enough privileges and disable propagation by a single one in
the documentation of ReadOnlyPaths= and friends, and then directs the read to
this in all other places.

Moreover a hint is added to the documentation of SystemCallFilter=, suggesting
usage of ~@mount in case any of the fs namespacing related options are used.
2016-09-25 10:42:18 +02:00
Lennart Poettering b2656f1b1c man: in user-facing documentaiton don't reference C function names
Let's drop the reference to the cap_from_name() function in the documentation
for the capabilities setting, as it is hardly helpful. Our readers are not
necessarily C hackers knowing the semantics of cap_from_name(). Moreover, the
strings we accept are just the plain capability names as listed in
capabilities(7) hence there's really no point in confusing the user with
anything else.
2016-09-25 10:42:18 +02:00
Lennart Poettering 8f1ad200f0 namespace: don't make the root directory of a namespace a mount if it already is one
Let's not stack mounts needlessly.
2016-09-25 10:42:18 +02:00
Lennart Poettering d944dc9553 namespace: chase symlinks for mounts to set up in userspace
This adds logic to chase symlinks for all mount points that shall be created in
a namespace environment in userspace, instead of leaving this to the kernel.
This has the advantage that we can correctly handle absolute symlinks that
shall be taken relative to a specific root directory. Moreover, we can properly
handle mounts created on symlinked files or directories as we can merge their
mounts as necessary.

(This also drops the "done" flag in the namespace logic, which was never
actually working, but was supposed to permit a partial rollback of the
namespace logic, which however is only mildly useful as it wasn't clear in
which case it would or would not be able to roll back.)

Fixes: #3867
2016-09-25 10:42:18 +02:00
Lennart Poettering 1e4e94c881 namespace: invoke unshare() only after checking all parameters
Let's create the new namespace only after we validated and processed all
parameters, right before we start with actually mounting things.

This way, the window where we can roll back is larger (not that it matters
IRL...)
2016-09-25 10:42:18 +02:00
Lennart Poettering 096424d123 execute: drop group priviliges only after setting up namespace
If PrivateDevices=yes is set, the namespace code creates device nodes in /dev
that should be owned by the host's root, hence let's make sure we set up the
namespace before dropping group privileges.
2016-09-25 10:42:18 +02:00
Lennart Poettering 920a7899de nspawn: let's mount /proc/sysrq-trigger read-only by default
LXC does this, and we should probably too. Better safe than sorry.
2016-09-25 10:42:18 +02:00
Lennart Poettering 63bb64a056 core: imply ProtectHome=read-only and ProtectSystem=strict if DynamicUser=1
Let's make sure that services that use DynamicUser=1 cannot leave files in the
file system should the system accidentally have a world-writable directory
somewhere.

This effectively ensures that directories need to be whitelisted rather than
blacklisted for access when DynamicUser=1 is set.
2016-09-25 10:42:18 +02:00
Lennart Poettering 3f815163ff core: introduce ProtectSystem=strict
Let's tighten our sandbox a bit more: with this change ProtectSystem= gains a
new setting "strict". If set, the entire directory tree of the system is
mounted read-only, but the API file systems /proc, /dev, /sys are excluded
(they may be managed with PrivateDevices= and ProtectKernelTunables=). Also,
/home and /root are excluded as those are left for ProtectHome= to manage.

In this mode, all "real" file systems (i.e. non-API file systems) are mounted
read-only, and specific directories may only be excluded via
ReadWriteDirectories=, thus implementing an effective whitelist instead of
blacklist of writable directories.

While we are at, also add /efi to the list of paths always affected by
ProtectSystem=. This is a follow-up for
b52a109ad3 which added /efi as alternative for
/boot. Our namespacing logic should respect that too.
2016-09-25 10:42:18 +02:00
Lennart Poettering 160cfdbed3 namespace: add some debug logging when enforcing InaccessiblePaths= 2016-09-25 10:42:18 +02:00
Lennart Poettering 6b7c9f8bce namespace: rework how ReadWritePaths= is applied
Previously, if ReadWritePaths= was nested inside a ReadOnlyPaths=
specification, then we'd first recursively apply the ReadOnlyPaths= paths, and
make everything below read-only, only in order to then flip the read-only bit
again for the subdirs listed in ReadWritePaths= below it.

This is not only ugly (as for the dirs in question we first turn on the RO bit,
only to turn it off again immediately after), but also problematic in
containers, where a container manager might have marked a set of dirs read-only
and this code will undo this is ReadWritePaths= is set for any.

With this patch behaviour in this regard is altered: ReadOnlyPaths= will not be
applied to the children listed in ReadWritePaths= in the first place, so that
we do not need to turn off the RO bit for those after all.

This means that ReadWritePaths=/ReadOnlyPaths= may only be used to turn on the
RO bit, but never to turn it off again. Or to say this differently: if some
dirs are marked read-only via some external tool, then ReadWritePaths= will not
undo it.

This is not only the safer option, but also more in-line with what the man page
currently claims:

        "Entries (files or directories) listed in ReadWritePaths= are
        accessible from within the namespace with the same access rights as
        from outside."

To implement this change bind_remount_recursive() gained a new "blacklist"
string list parameter, which when passed may contain subdirs that shall be
excluded from the read-only mounting.

A number of functions are updated to add more debug logging to make this more
digestable.
2016-09-25 10:40:51 +02:00
Lennart Poettering 7648a565d1 namespace: when enforcing fs namespace restrictions suppress redundant mounts
If /foo is marked to be read-only, and /foo/bar too, then the latter may be
suppressed as it has no effect.
2016-09-25 10:19:15 +02:00
Lennart Poettering 6ee1a919cf namespace: simplify mount_path_compare() a bit 2016-09-25 10:19:10 +02:00
Lennart Poettering 3fbe8dbe41 execute: if RuntimeDirectory= is set, it should be writable
Implicitly make all dirs set with RuntimeDirectory= writable, as the concept
otherwise makes no sense.
2016-09-25 10:19:05 +02:00
Lennart Poettering be39ccf3a0 execute: move suppression of HOME=/ and SHELL=/bin/nologin into user-util.c
This adds a new call get_user_creds_clean(), which is just like
get_user_creds() but returns NULL in the home/shell parameters if they contain
no useful information. This code previously lived in execute.c, but by
generalizing this we can reuse it in run.c.
2016-09-25 10:18:57 +02:00
Lennart Poettering 07689d5d2c execute: split out creation of runtime dirs into its own functions 2016-09-25 10:18:54 +02:00
Lennart Poettering fe3c2583be namespace: make sure InaccessibleDirectories= masks all mounts further down
If a dir is marked to be inaccessible then everything below it should be masked
by it.
2016-09-25 10:18:51 +02:00
Lennart Poettering 59eeb84ba6 core: add two new service settings ProtectKernelTunables= and ProtectControlGroups=
If enabled, these will block write access to /sys, /proc/sys and
/proc/sys/fs/cgroup.
2016-09-25 10:18:48 +02:00
Lennart Poettering 72246c2a65 core: enforce seccomp for secondary archs too, for all rules
Let's make sure that all our rules apply to all archs the local kernel
supports.
2016-09-25 10:18:44 +02:00
Zbigniew Jędrzejewski-Szmek 6c1e2427df Merge pull request #4194 from bboozzoo/bboozzoo/nss-rootlib 2016-09-24 11:40:15 -04:00
Zbigniew Jędrzejewski-Szmek d11e656ace Merge pull request #4182 from jkoelker/routetable 2016-09-24 11:05:06 -04:00
Martin Pitt f258e94843 networkd: do not drop config for pending interfaces (#4187)
While an interface is still being processed by udev, it is in state "pending",
instead of "unmanaged". We must not flush device configuration then.

Further fixes commit 3104883ddc after commit c436d55397.

Fixes #4186
2016-09-24 10:07:45 -04:00
Maciek Borzecki 082210c7a8 build-sys: get rid of move-to-rootlibdir
Replace move-to-rootlibdir calls in post-install hooks with explicitly
used ${rootlibdir} where needed.

Signed-off-by: Maciek Borzecki <maciek.borzecki@gmail.com>
2016-09-24 15:15:01 +02:00
Zbigniew Jędrzejewski-Szmek eb93312810 kernel-install: allow plugins to terminate the procedure (#4174)
Replaces #4103.
2016-09-24 09:03:54 -04:00
Zbigniew Jędrzejewski-Szmek 2541b135bf Merge pull request #4207 from fbuihuu/fix-journal-hmac-calculation
Fix journal hmac calculation.
2016-09-24 08:57:49 -04:00
HATAYAMA Daisuke 886cf982d3 sysctl: configure kernel parameters in the order they occur in each sysctl configuration files (#4205)
Currently, systemd-sysctl command configures kernel parameters in each sysctl
configuration files in random order due to characteristics of iterator of
Hashmap.

However, kernel parameters need to be configured in the order they occur in
each sysctl configuration files.

- For example, consider fs.suid_coredump and kernel.core_pattern. If
  fs.suid_coredump=2 is configured before kernel.core_pattern= whose default
  value is "core", then kernel outputs the following message:

      Unsafe core_pattern used with suid_dumpable=2. Pipe handler or fully qualified core dump path required.

  Note that the security issue mentioned in this message has already been fixed
  on recent kernels, so this is just a warning message on such kernels. But
  it's still confusing to users that this message is output on some boot and
  not output on another boot.

- I don't know but there could be other kernel parameters that are significant
  in the order they are configured.

- The legacy sysctl command configures kernel parameters in the order they
  occur in each sysctl configuration files. Although I didn't find any official
  specification explaining this behavior of sysctl command, I don't think there
  is any meaningful reason to change this behavior, in particular, to the
  random one.

This commit does the change by simply using OrderedHashmap instead of Hashmap.
2016-09-24 08:56:07 -04:00
Luca Bruno 48a8d337a6 nspawn: decouple --boot from CLONE_NEWIPC (#4180)
This commit is a minor tweak after the split of `--share-system`, decoupling the `--boot`
option from IPC namespacing.

Historically there has been a single `--share-system` option for sharing IPC/PID/UTS with the
host, which was incompatible with boot/pid1 mode. After the split, it is now possible to express
the requirements with better granularity.

For reference, this is a followup to #4023 which contains references to previous discussions.
I realized too late that CLONE_NEWIPC is not strictly needed for boot mode.
2016-09-24 08:30:42 -04:00
Franck Bui 33685a5a3a journal: fix HMAC calculation when appending a data object
Since commit 5996c7c295 (v190 !), the
calculation of the HMAC is broken because the hash for a data object
including a field is done in the wrong order: the field object is
hashed before the data object is.

However during verification, the hash is done in the opposite order as
objects are scanned sequentially.
2016-09-23 14:59:51 +02:00
Franck Bui 43cd879483 journal: warn when we fail to append a tag to a journal
We shouldn't silently fail when appending the tag to a journal file
since FSS protection will simply be disabled in this case.
2016-09-23 14:59:00 +02:00
AsciiWolf a4d373452d l10n: update Czech translation (#4203) 2016-09-23 07:11:26 +02:00
Wilhelm Schuster fbdec7923f machine: Disable more output when quiet flag is set (#4196) 2016-09-22 15:49:22 -04:00
Daniel Maixner 86c2fc21f9 l10n: add Czech Translation (#4195) 2016-09-21 14:42:35 +02:00
Maciek Borzecki af0a10bfa1 nss: install nss modules to ${rootlibdir}
NSS modules (libnss_*.so.*) need to be installed into
${rootlibdir} (typically /lib) in order to be used. Previously, the
modules were installed into ${libdir}, thus usually ending up in
/usr/lib, even on systems where split usr is enabled, or ${libdir} is
passed explicitly.

Signed-off-by: Maciek Borzecki <maciek.borzecki@gmail.com>
2016-09-21 09:00:11 +02:00
Michael Pope 21dc02277d nspawn: fix comment typo in setup_timezone example (#4183) 2016-09-20 07:30:48 +02:00
Jason Kölker 2ba31d29a5 networkd: Allow specifying RouteTable for RAs 2016-09-19 03:27:46 +00:00
Jason Kölker f594276b86 networkd: Allow specifying RouteTable for DHCP 2016-09-19 03:27:42 +00:00
Felix Zhang dd8352659c journal: fix typo in comment (#4176) 2016-09-18 11:14:50 +02:00
Martin Pitt 7ce9cc1545 Revert "kernel-install: Add KERNEL_INSTALL_NOOP (#4103)"
Further discussion showed that this better gets addressed at the packaging
level.

This reverts commit 34210af7c6.
2016-09-17 16:39:00 +02:00
Martin Pitt 6ac288a990 Merge pull request #4123 from keszybz/network-file-dropins
Network file dropins
2016-09-17 10:00:19 +02:00
Michael Pope 0b493a0263 nspawn: clarify log warning for /etc/localtime not being a symbolic link (#4163) 2016-09-17 09:59:28 +02:00
Zbigniew Jędrzejewski-Szmek 881e6b5edf networkd: change message about missing Kind
If Kind is not specied, the message about "Invalid Kind" was misleading.
If Kind was specified in an invalid way, we get a message in the parsing
phase anyway. Reword the message to cover both cases better.
2016-09-16 10:32:03 -04:00
Zbigniew Jędrzejewski-Szmek bac150e9d1 man: mention that netdev,network files support dropins
Also update the description of drop-ins in systemd.unit(5) to say that .d
directories, not .conf files, are in /etc/system/system, /run/systemd/system,
etc.
2016-09-16 10:32:03 -04:00
Zbigniew Jędrzejewski-Szmek 2cc34d5b91 networkd: support drop-in dirs for .network files 2016-09-16 10:32:03 -04:00
Zbigniew Jędrzejewski-Szmek 23bb31aa0a shared/conf-parser: add config_parse_many which takes strv with dirs
This way we don't have to create a nulstr just to unpack it in a moment.
2016-09-16 10:32:03 -04:00