Commit Graph

260 Commits

Author SHA1 Message Date
Lennart Poettering 2a1288ff89 util: introduce CMSG_FOREACH() macro and make use of it everywhere
It's only marginally shorter then the usual for() loop, but certainly
more readable.
2015-06-10 19:29:47 +02:00
Jason Pleau d38e01dc96 core/namespace: Protect /usr instead of /home with ProtectSystem=yes
A small typo in ee818b8 caused /home to be put in read-only instead of
/usr when ProtectSystem was enabled (ie: not set to "no").
2015-05-31 20:29:36 +02:00
Lennart Poettering 03cfe0d514 nspawn: finish user namespace support 2015-05-21 16:32:01 +02:00
Lennart Poettering 6458ec20b5 core,nspawn: unify code that moves the root dir 2015-05-20 14:38:12 +02:00
Alban Crequy ee818b89f4 core: Private*/Protect* options with RootDirectory
When a service is chrooted with the option RootDirectory=/opt/..., then
the options PrivateDevices, PrivateTmp, ProtectHome, ProtectSystem must
mount the directories under $RootDirectory/{dev,tmp,home,usr,boot}.

The test-ns tool can test setup_namespace() with and without chroot:
 $ sudo TEST_NS_PROJECTS=/home/lennart/projects ./test-ns
 $ sudo TEST_NS_CHROOT=/home/alban/debian-tree TEST_NS_PROJECTS=/home/alban/debian-tree/home/alban/Documents ./test-ns
2015-05-18 18:47:45 +02:00
Lennart Poettering 5a8af538ae nspawn: rework custom mount point order, and add support for overlayfs
Previously all bind mount mounts were applied in the order specified,
followed by all tmpfs mounts in the order specified. This is
problematic, if bind mounts shall be placed within tmpfs mounts.

This patch hence reworks the custom mount point logic, and alwas applies
them in strict prefix-first order. This means the order of mounts
specified on the command line becomes irrelevant, the right operation
will always be executed.

While we are at it this commit also adds native support for overlayfs
mounts, as supported by recent kernels.
2015-05-13 14:07:26 +02:00
Iago López Galeiras 4543768d13 nspawn: change filesystem type from "bind" to NULL in mount() syscalls
Try to keep syscalls as minimal as possible.
2015-03-31 15:36:53 +02:00
Michal Schmidt a0827e2b12 core/namespace: fix path sorting
The comparison function we use for qsorting paths is overly indifferent.
Consider these 3 paths for sorting:
 /foo
 /bar
 /foo/foo
qsort() may compare:
 "/foo" with "/bar" => 0, indifference
 "/bar" with "/foo/foo" => 0, indifference
and assume transitively that "/foo" and "/foo/foo" are also indifferent.

But this is wrong, we want "/foo" sorted before "/foo/foo".
The comparison function must be transitive.

Use path_compare(), which behaves properly.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1184016
2015-03-16 22:17:15 +01:00
Zbigniew Jędrzejewski-Szmek 42b1b9907d core: explicitly ignore failure during cleanup
CID #1237550.
2015-03-13 23:42:17 -04:00
Zbigniew Jędrzejewski-Szmek 3164e3cbc5 core: either ignore or handle mount failures
/dev/pts/ptmx is as important as /dev/pts, so error out if that
fails. Others seem less important, since the namespace is usable
without them, so ignore failures.

CID #123755, #123754.
2015-03-13 23:42:17 -04:00
Zbigniew Jędrzejewski-Szmek dc75168823 Use space after a silencing (void)
We were using a space more often than not, and this way is
codified in CODING_STYLE.
2015-03-13 23:42:17 -04:00
Thomas Hindoe Paaboel Andersen 2eec67acbb remove unused includes
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
2015-02-23 23:53:42 +01:00
Lennart Poettering 63c372cb9d util: rework strappenda(), and rename it strjoina()
After all it is now much more like strjoin() than strappend(). At the
same time, add support for NULL sentinels, even if they are normally not
necessary.
2015-02-03 02:05:59 +01:00
Topi Miettinen e65476622d Type of mount(2) flags is unsigned long 2015-01-01 14:39:17 -05:00
Lennart Poettering d7b8eec7dc tmpfiles: add new line type 'v' for creating btrfs subvolumes 2014-12-28 02:08:40 +01:00
Michal Schmidt 4a62c710b6 treewide: another round of simplifications
Using the same scripts as in f647962d64 "treewide: yet more log_*_errno
+ return simplifications".
2014-11-28 19:57:32 +01:00
Michal Schmidt 56f64d9576 treewide: use log_*_errno whenever %m is in the format string
If the format string contains %m, clearly errno must have a meaningful
value, so we might as well use log_*_errno to have ERRNO= logged.

Using:
find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\((".*%m.*")/log_\1_errno(errno, \2/'

Plus some whitespace, linewrap, and indent adjustments.
2014-11-28 19:49:27 +01:00
Susant Sahani b77acbcf7d namespace: unchecked return value from library
fix:

CID 1237553 (#1 of 6): Unchecked return value from library
(CHECKED_RETURN

CID 1237553 (#3 of 6): Unchecked return value from library
(CHECKED_RETURN)

CID 1237553 (#4 of 6): Unchecked return value from library
(CHECKED_RETURN)

CID 1237553 (#5 of 6): Unchecked return value from library
(CHECKED_RETURN

CID 1237553 (#6 of 6): Unchecked return value from library
(CHECKED_RETURN)
2014-11-17 12:06:40 +01:00
Daniel Mack 63cc4c3138 sd-bus: sync with kdbus upstream (ABI break)
kdbus has seen a larger update than expected lately, most notably with
kdbusfs, a file system to expose the kdbus control files:

 * Each time a file system of this type is mounted, a new kdbus
   domain is created.

 * The layout inside each mount point is the same as before, except
   that domains are not hierarchically nested anymore.

 * Domains are therefore also unnamed now.

 * Unmounting a kdbusfs will automatically also detroy the
   associated domain.

 * Hence, the action of creating a kdbus domain is now as
   privileged as mounting a filesystem.

 * This way, we can get around creating dev nodes for everything,
   which is last but not least something that is not limited by
   20-bit minor numbers.

The kdbus specific bits in nspawn have all been dropped now, as nspawn
can rely on the container OS to set up its own kdbus domain, simply by
mounting a new instance.

A new set of mounts has been added to mount things *after* the kernel
modules have been loaded. For now, only kdbus is in this set, which is
invoked with mount_setup_late().
2014-11-13 20:41:52 +01:00
Lennart Poettering ecabcf8b6e selinux: clean up selinux label function naming 2014-10-23 21:36:56 +02:00
WaLyong Cho cc56fafeeb mac: rename apis with mac_{selinux/smack}_ prefix 2014-10-23 17:13:15 +02:00
Lennart Poettering a004cb4cb2 namespace: add missing 'const' to parameters 2014-10-17 13:49:08 +02:00
Zbigniew Jędrzejewski-Szmek d267c5aa3d core/namespace: remove invalid check
dir cannot be NULL here, because it was allocated with alloca.

CID #1237768.
2014-10-03 20:42:09 -04:00
Zbigniew Jędrzejewski-Szmek 1775f1ebc4 core/namespace: remove invalid check
root cannot be NULL here, because it was allocated with alloca.

CID #1237769.
2014-10-03 20:42:09 -04:00
Thomas Hindoe Paaboel Andersen 120d578e5f namespace: avoid posible use of uninitialized variable 2014-09-08 22:09:41 +02:00
Daniel Mack a610cc4f18 namespace: add support for custom kdbus endpoint
If a path to a previously created custom kdbus endpoint is passed in,
bind-mount a new devtmpfs that contains a 'bus' node, which in turn in
bind-mounted with the custom endpoint. This tmpfs then mounted over the
kdbus subtree that refers to the current bus.

This way, we can fake the bus node in order to lock down services with
a kdbus custom endpoint policy.
2014-09-08 14:12:56 +02:00
Ansgar Burchardt e2d7c1a075 drop_duplicates: copy full BindMount struct
At least

  t->ignore = f->ignore;

is missing here. Just copy the full struct to be sure.
2014-07-27 15:15:11 -04:00
Lennart Poettering 664064d60c namespace: make sure /tmp, /var/tmp and /dev are writable in namespaces we set up 2014-07-03 16:28:26 +02:00
Lennart Poettering 002b226843 namespace: fix uninitialized memory access 2014-07-03 16:28:26 +02:00
Lennart Poettering dd078a1ef8 namespace: properly label device nodes we create
https://bugzilla.redhat.com/show_bug.cgi?id=1081429
2014-06-18 00:09:46 +02:00
Lennart Poettering 051be1f71c namespace: cover /boot with ProtectSystem= again
Now that we properly exclude autofs mounts from ProtectSystem= we can
include it in the effect of ProtectSystem= again.
2014-06-06 14:48:51 +02:00
Lennart Poettering d6797c920e namespace: beef up read-only bind mount logic
Instead of blindly creating another bind mount for read-only mounts,
check if there's already one we can use, and if so, use it. Also,
recursively mark all submounts read-only too. Also, ignore autofs mounts
when remounting read-only unless they are already triggered.
2014-06-06 14:37:40 +02:00
Lennart Poettering c8835999c3 namespace: also include /root in ProtectHome=
/root can't really be autofs, and is also a home, directory, so cover it
with ProtectHome=.
2014-06-05 21:55:06 +02:00
Lennart Poettering 6d313367d9 namespace: when setting up an inaccessible mount point, unmounting everything below
This has the benefit of not triggering any autofs mount points
unnecessarily.
2014-06-05 21:35:35 +02:00
Lennart Poettering 5331194c12 core: don't include /boot in effect of ProtectSystem=
This would otherwise unconditionally trigger any /boot autofs mount,
which we probably should avoid.

ProtectSystem= will now only cover /usr and (optionally) /etc, both of
which cannot be autofs anyway.

ProtectHome will continue to cover /run/user and /home. The former
cannot be autofs either. /home could be, however is frequently enough
used (unlikey /boot) so that it isn't too problematic to simply trigger
it unconditionally via ProtectHome=.
2014-06-05 10:03:26 +02:00
Lennart Poettering 1b8689f949 core: rename ReadOnlySystem= to ProtectSystem= and add a third value for also mounting /etc read-only
Also, rename ProtectedHome= to ProtectHome=, to simplify things a bit.

With this in place we now have two neat options ProtectSystem= and
ProtectHome= for protecting the OS itself (and optionally its
configuration), and for protecting the user's data.
2014-06-04 18:12:55 +02:00
Lennart Poettering e06b6479a5 core: provide /dev/ptmx as symlink in PrivateDevices= execution environments 2014-06-04 17:21:18 +02:00
Lennart Poettering 82d252404a core: make sure PrivateDevices= makes /dev/log available
Now that we moved the actual syslog socket to
/run/systemd/journal/dev-log we can actually make /dev/log a symlink to
it, when PrivateDevices= is used, thus making syslog available to
services using PrivateDevices=.
2014-06-04 16:59:13 +02:00
Lennart Poettering 417116f234 core: add new ReadOnlySystem= and ProtectedHome= settings for service units
ReadOnlySystem= uses fs namespaces to mount /usr and /boot read-only for
a service.

ProtectedHome= uses fs namespaces to mount /home and /run/user
inaccessible or read-only for a service.

This patch also enables these settings for all our long-running services.

Together they should be good building block for a minimal service
sandbox, removing the ability for services to modify the operating
system or access the user's private data.
2014-06-03 23:57:51 +02:00
Lennart Poettering c2c13f2df4 unit: turn off mount propagation for udevd
Keep mounts done by udev rules private to udevd. Also, document how
MountFlags= may be used for this.
2014-03-20 04:16:39 +01:00
Lennart Poettering 2b85f4e19c core: Beef up PrivateDevices=
Also mount /dev/kdbus, /dev/mqueue and /dev/hugepages into the /dev for
namespaced services.
2014-03-19 16:25:11 +01:00
Lennart Poettering 94828d2ddc conf-parser: config_parse_path_strv() is not generic, so let's move it into load-fragment.c
The parse code actually checked for specific lvalue names, which is
really wrong for supposedly generic parsers...
2014-03-03 21:40:55 +01:00
Lennart Poettering 7f112f50fe exec: introduce PrivateDevices= switch to provide services with a private /dev
Similar to PrivateNetwork=, PrivateTmp= introduce PrivateDevices= that
sets up a private /dev with only the API pseudo-devices like /dev/null,
/dev/zero, /dev/random, but not any physical devices in them.
2014-01-20 21:28:37 +01:00
Lennart Poettering 6b46ea73e3 namespace: include boot id in private tmp directories
This way it is easy to only exclude directories from the current boot
from automatic clean up in /var/tmp.

Also, pick a longer name for the directories so that are globs in
tmp.conf can be simpler yet equally accurate.
2013-12-13 04:06:43 +01:00
Lennart Poettering 76cd584b8d namespace: comment typo fix 2013-11-27 20:31:51 +01:00
Lennart Poettering 613b411c94 service: add the ability for units to join other unit's PrivateNetwork= and PrivateTmp= namespaces 2013-11-27 20:28:48 +01:00
Zbigniew Jędrzejewski-Szmek d8c9d3a468 systemd: use unit name in PrivateTmp directories
Unit name is used whole in the directory name, so that the unit name
can be easily extracted from it, e.g. "/tmp/systemd-abcd.service-DEDBIF1".

https://bugzilla.redhat.com/show_bug.cgi?id=957439
2013-10-22 22:54:09 -04:00
Zbigniew Jędrzejewski-Szmek 7ff7394d9e Never call qsort on potentially NULL arrays
This extends 62678ded 'efi: never call qsort on potentially
NULL arrays' to all other places where qsort is used and it
is not obvious that the count is non-zero.
2013-10-13 17:56:54 -04:00
Maciej Wereski ea92ae33e0 "-" prefix for InaccessibleDirectories and ReadOnlyDirectories 2013-08-23 12:48:14 -04:00
Zbigniew Jędrzejewski-Szmek d5a3f0eac7 core: remove unnecessary goto in setup_namespace 2013-03-20 19:16:01 -04:00
Zbigniew Jędrzejewski-Szmek d34cd37490 Make PrivateTmp dirs also inaccessible from the outside
Currently, PrivateTmp=yes means that the service cannot see the /tmp
shared by rest of the system and is isolated from other services using
PrivateTmp, but users can access and modify /tmp as seen by the
service.

Move the private /tmp and /var/tmp directories into a 0077-mode
directory. This way unpriviledged users on the system cannot see (or
modify) /tmp as seen by the service.
2013-03-20 14:08:41 -04:00
Michal Sekletar c17ec25e4d core: reuse the same /tmp, /var/tmp and inaccessible dir
All Execs within the service, will get mounted the same
/tmp and /var/tmp directories, if service is configured with
PrivateTmp=yes. Temporary directories are cleaned up by service
itself in addition to systemd-tmpfiles. Directory which is mounted
as inaccessible is created at runtime in /run/systemd.
2013-03-15 22:56:40 -04:00
Lennart Poettering 1e41be2015 nspawn,namespaces: make sure we recursively bind mount things in
We want to make sure that everything from the host is also visible in
the sandbox.
2012-08-13 16:25:03 +02:00
Lennart Poettering ac0930c892 namespace: rework namespace support
- don't use pivot_root() anymore, just reuse root hierarchy
- first create all mounts, then mark them read-only so that we get the
  right behaviour when people want writable mounts inside of
  read-only mounts
- don't pass invalid combinations of MS_ constants to the kernel
2012-08-13 15:27:04 +02:00
Lennart Poettering 64825d3c58 fix a couple of issues found with llvm-analyze 2012-08-08 23:54:21 +02:00
Lennart Poettering c1d70f7ca5 namespace: make PrivateTmp= apply to both /tmp and /var/tmp 2012-05-14 22:41:30 +02:00
Kay Sievers 9eb977db5b util: split-out path-util.[ch] 2012-05-08 02:33:10 +02:00
Kay Sievers 4d46fec56d remove MS_* which can not be combined with current kernel code
MS_BIND|MS_MOVE can not be combined:
  do_mount()
    else if (flags & MS_BIND)
      do_loopback(&path, dev_name, flags & MS_REC);
    [...]
    else if (flags & MS_MOVE)
      do_move_mount(&path, dev_name);

MS_REMOUNT|MS_UNBINDABLE can not be combined:
  do_mount()
    if (flags & MS_REMOUNT)
      do_remount(&path, flags & ~MS_REMOUNT, mnt_flags, data_page);
    [...]
    else if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE))
      do_change_type(&path, flags);
2012-04-18 13:37:45 +02:00
Lennart Poettering 5430f7f2bc relicense to LGPLv2.1 (with exceptions)
We finally got the OK from all contributors with non-trivial commits to
relicense systemd from GPL2+ to LGPL2.1+.

Some udev bits continue to be GPL2+ for now, but we are looking into
relicensing them too, to allow free copy/paste of all code within
systemd.

The bits that used to be MIT continue to be MIT.

The big benefit of the relicensing is that closed source code may now
link against libsystemd-login.so and friends.
2012-04-12 00:24:39 +02:00
Kay Sievers b30e2f4c18 move libsystemd_core.la sources into core/ 2012-04-11 16:03:51 +02:00
Renamed from src/namespace.c (Browse further)