Commit graph

603 commits

Author SHA1 Message Date
Jan Engelhardt a8eaaee72a doc: correct orthography, word forms and missing/extraneous words 2015-11-06 13:45:21 +01:00
Jan Engelhardt b938cb902c doc: correct punctuation and improve typography in documentation 2015-11-06 13:00:02 +01:00
Michal Schmidt 35607a8d1c nspawn: save errno before reopening log after exec failure 2015-11-05 13:44:12 +01:00
Michal Schmidt 070edd97f3 nspawn: no fake errno
The S_ISREG test does not set errno, so don't use it in the error
message.
2015-11-05 13:44:11 +01:00
Michal Schmidt 4314d33f51 nspawn: simplify error returns
Use the "return log_error_errno(...)" idiom to have fewer curly braces.

The last hunk also fixes the return value of setup_journal(), but the
fix has no practical effect.
2015-11-05 13:44:10 +01:00
Michal Schmidt 709f6e46a3 treewide: use the negative error codes returned by our functions
Our functions return negative error codes.
Do not rely on errno being set after calling our own functions.
2015-11-05 13:44:06 +01:00
Lennart Poettering 97044145b4 core,nspawn: minor coding style fixes 2015-10-31 19:09:20 +01:00
Susant Sahani 6cbe4ed1e1 nspwan: port to extract_first_word 2015-10-28 22:59:01 +05:30
Lennart Poettering b5efdb8af4 util-lib: split out allocation calls into alloc-util.[ch] 2015-10-27 13:45:53 +01:00
Lennart Poettering 15a5e95075 util-lib: split out printf() helpers to stdio-util.h 2015-10-27 13:25:57 +01:00
Lennart Poettering 430f0182b7 src/basic: rename audit.[ch] → audit-util.[ch] and capability.[ch] → capability-util.[ch]
The files are named too generically, so that they might conflict with
the upstream project headers. Hence, let's add a "-util" suffix, to
clarify that this are just our utility headers and not any official
upstream headers.
2015-10-27 13:25:57 +01:00
Lennart Poettering affb60b1ef util-lib: split out umask-related code to umask-util.h 2015-10-27 13:25:56 +01:00
Lennart Poettering 8fcde01280 util-lib: split stat()/statfs()/stavfs() related calls into stat-util.[ch] 2015-10-27 13:25:56 +01:00
Lennart Poettering f4f15635ec util-lib: move a number of fs operations into fs-util.[ch] 2015-10-27 13:25:56 +01:00
Lennart Poettering 4349cd7c1d util-lib: move mount related utility calls to mount-util.[ch] 2015-10-27 13:25:55 +01:00
Lennart Poettering 6bedfcbb29 util-lib: split string parsing related calls from util.[ch] into parse-util.[ch] 2015-10-27 13:25:55 +01:00
Lennart Poettering 2583fbea8e socket-util: move remaining socket-related calls from util.[ch] to socket-util.[ch] 2015-10-26 01:24:39 +01:00
Lennart Poettering b1d4f8e154 util-lib: split out user/group/uid/gid calls into user-util.[ch] 2015-10-26 01:24:38 +01:00
Lennart Poettering 3ffd4af220 util-lib: split out fd-related operations into fd-util.[ch]
There are more than enough to deserve their own .c file, hence move them
over.
2015-10-25 13:19:18 +01:00
Lennart Poettering 07630cea1f util-lib: split our string related calls from util.[ch] into its own file string-util.[ch]
There are more than enough calls doing string manipulations to deserve
its own files, hence do something about it.

This patch also sorts the #include blocks of all files that needed to be
updated, according to the sorting suggestions from CODING_STYLE. Since
pretty much every file needs our string manipulation functions this
effectively means that most files have sorted #include blocks now.

Also touches a few unrelated include files.
2015-10-24 23:05:02 +02:00
Lennart Poettering 0f03c2a4c0 path-util: unify how we process paths specified on the command line
Let's introduce a common function that makes relative paths absolute and
warns about any errors while doing so.
2015-10-24 23:03:49 +02:00
Lennart Poettering 0f47436510 util-lib: get_current_dir_name() can return errors other than ENOMEM
get_current_dir_name() can return a variety of errors, not just ENOMEM,
hence don't blindly turn its errors to ENOMEM, but return correct errors
in path_make_absolute_cwd().

This trickles down into a couple of other functions, some of which
receive unrelated minor fixes too with this commit.
2015-10-24 23:03:49 +02:00
Lennart Poettering 16fb773ee3 nspawn: don't try to resolve passed binary before entering namespace
Othewise we might follow the symlinks on the host, instead of the
container.

Fixes #1400
2015-10-22 01:59:25 +02:00
Lennart Poettering 0e2656744f nspawn: rework how we determine private networking settings
Make sure we acquire CAP_NET_ADMIN if we require virtual networking.

Make sure we imply virtual ethernet correctly when bridge is request.

Fixes: #1511
Fixes: #1554
Fixes: #1590
2015-10-22 01:59:25 +02:00
Lennart Poettering 5bcd08db28 btrfs: beef-up btrfs support with a limited understanding of quota
With this change we understand more than just leaf quota groups for
btrfs file systems. Specifically:

- When we create a subvolume we can now optionally add the new subvolume
  to all qgroups its parent subvolume was member of too. Alternatively
  it is also possible to insert an intermediary quota group between the
  parent's qgroups and the subvolume's leaf qgroup, which is useful for
  a concept of "subtree" qgroups, that contain a subvolume and all its
  children.

- The remove logic for subvolumes has been updated to optionally remove
  any leaf qgroups or "subtree" qgroups, following the logic above.

- The snapshot logic for subvolumes has been updated to replicate the
  original qgroup setup of the source, if it follows the "subtree"
  design described above. It will not cover qgroup setups that introduce
  arbitrary qgroups, especially those orthogonal to the subvolume
  hierarchy.

This also tries to be more graceful when setting up /var/lib/machines as
btrfs. For example, if mkfs.btrfs is missing we don't even try to set it
up as loopback device.

Fixes #1559
Fixes #1129
2015-10-22 01:59:25 +02:00
Iago López Galeiras d167824896 nspawn: skip /sys-as-tmpfs if we don't use private-network
Since v3.11/7dc5dbc ("sysfs: Restrict mounting sysfs"), the kernel
doesn't allow mounting sysfs if you don't have CAP_SYS_ADMIN rights over
the network namespace.

So the mounting /sys as a tmpfs code introduced in
d8fc6a000f doesn't work with user
namespaces if we don't use private-net. The reason is that we mount
sysfs inside the container and we're in the network namespace of the host
but we don't have CAP_SYS_ADMIN over that namespace.

To fix that, we mount /sys as a sysfs (instead of tmpfs) if we don't use
private network and ignore the /sys-as-a-tmpfs code if we find that /sys
is already mounted as sysfs.

Fixes #1555
2015-10-20 10:19:23 +02:00
Lennart Poettering ae3dde8012 machinectl: fix race when opening new shells with "machinectl shell"
Previously, we'd allocate the TTY, spawn a service on it, but
immediately start processing the TTY and forwarding it to whatever the
commnd was started on. This is however problematic, as the TTY might get
actually opened only much later by the service. We'll hence first get
EIOs on the master as the other side is still closed, and hence
considered it hung up and terminated the session.

With this change we add a flag to the pty forwarding logic:
PTY_FORWARD_IGNORE_INITIAL_VHANGUP. If set, we'll ignore all hangups
(i.e. EIOs) on the master PTY until the first byte is successfully read.
From that point on we consider a hangup/EIO a regular connection termination. This
way, we handle the race: when we get EIO initially we'll ignore it,
until the connection is properly set up, at which time we start
honouring it.
2015-10-07 20:10:48 +02:00
Lennart Poettering d8fc6a000f nspawn: mount /sys as tmpfs, and then mount only select subdirs of the real sysfs below it
This way we can hide things like /sys/firmware or /sys/hypervisor from
the container, while keeping the device tree around.

While this is a security benefit in itself it also allows us to fix
issue #1277.

Previously we'd mount /sys before creating the user namespace, in order
to be able to mount /sys/fs/cgroup/* beneath it (which resides in it),
which we can only mount outside of the user namespace. To ensure that
the user namespace owns the network namespace we'd set up the network
namespace at the same time as the user namespace. Thus, we'd still see
the /sys/class/net/ from the originating network namespace, even though
we are in our own network namespace now. With this patch, /sys is
mounted before transitioning into the user namespace as tmpfs, so that
we can also mount /sys/fs/cgroup/* into it this early. The directories
such as /sys/class/ are then later added in from the real sysfs from
inside the network and user namespace so that they actually show whatis
available in it.

Fixes #1277
2015-09-30 15:19:33 +02:00
Lennart Poettering 403af78c80 nspawn: fix user namespace support
We didn#t actually pass ownership of /run to the UID in the container
since some releases, let's fix that.
2015-09-30 12:48:17 +02:00
Lennart Poettering db3b1dedb2 nspawn: order includes 2015-09-30 12:24:06 +02:00
Lennart Poettering 3f6fd1ba65 util: introduce common version() implementation and use it everywhere
This also allows us to drop build.h from a ton of files, hence do so.
Since we touched the #includes of those files, let's order them properly
according to CODING_STYLE.
2015-09-29 21:08:37 +02:00
Lennart Poettering 189d5bac5c util: unify implementation of NOP signal handler
This is highly complex code after all, we really should make sure to
only keep one implementation of this extremely difficult function
around.
2015-09-29 21:08:37 +02:00
Lennart Poettering 2feceb5eb9 tree-wide: take benefit of the fact that fdset_free() returns NULL 2015-09-29 21:08:37 +02:00
Lennart Poettering 3ee897d6c2 tree-wide: port more code to use send_one_fd() and receive_one_fd()
Also, make it slightly more powerful, by accepting a flags argument, and
make it safe for handling if more than one cmsg attribute happens to be
attached.
2015-09-29 21:08:37 +02:00
Krzesimir Nowak c0ffce2bd1 nspawn, machined: fix comments and error messages
A bunch of "Client -> Child" fixes and one barrier-enumerator fix.

(David: rebased on master)
2015-09-22 14:17:03 +02:00
Krzesimir Nowak 327e26d689 nspawn: close unneeded sockets in outer child
(David: Note, this is just a cleanup and doesn't fix any bugs)
2015-09-22 14:11:44 +02:00
David Herrmann d960371482 util: introduce {send,receive}_one_fd()
Introduce two new helpers that send/receive a single fd via a unix
transport. Also make nspawn use them instead of hard-coding it.

Based on a patch by Krzesimir Nowak.
2015-09-22 14:09:54 +02:00
Lennart Poettering 59f448cf15 tree-wide: never use the off_t unless glibc makes us use it
off_t is a really weird type as it is usually 64bit these days (at least
in sane programs), but could theoretically be 32bit. We don't support
off_t as 32bit builds though, but still constantly deal with safely
converting from off_t to other types and back for no point.

Hence, never use the type anymore. Always use uint64_t instead. This has
various benefits, including that we can expose these values directly as
D-Bus properties, and also that the values parse the same in all cases.
2015-09-10 18:16:18 +02:00
Lennart Poettering 82116c4329 nspawn: also close uid shift socket in the parent
We should really close all parent sides of our child/parent socket
pairs.
2015-09-08 01:22:46 +02:00
Lennart Poettering 76d448820e nspawn: short reads do not set errno, hence don't try to print it 2015-09-08 01:22:26 +02:00
Lennart Poettering 4610de5022 inspawn: switch from SOCK_DGRAM to SOCK_SEQPACKET for internal socketpairs
SOCK_DGRAM and SOCK_SEQPACKET have very similar semantics when used with
socketpair(). However, SOCK_SEQPACKET has the advantage of knowing a
hangup concept, since it is inherently connection-oriented.

Since we use socket pairs to communicate between the nspawn main process
and the nspawn child process, where the child might die abnormally it's
interesting to us to learn about this via hangups if the child side of
the pair is closed. Hence, let's switch to SOCK_SEQPACKET for these
internal communication sockets.

Fixes #956.
2015-09-08 01:17:47 +02:00
Lennart Poettering 07fa00f9d9 nspawn: properly propagate errors when we fail to set soemthing up 2015-09-08 01:17:15 +02:00
Lennart Poettering 8fe0087ede nspawn: sort and clean up included header list
Let's remove unnecessary inclusions, and order the list alphabetically
as suggested in CODING_STYLE now.
2015-09-07 18:56:54 +02:00
Lennart Poettering 2b5c04d59c nspawn: remove nspawn.h, it's empty now 2015-09-07 18:47:34 +02:00
Lennart Poettering ee64508006 nspawn: split out --uid= logic into nspawn-setuid.[ch] 2015-09-07 18:44:31 +02:00
Lennart Poettering b7103bc5f4 nspawn: split out machined registration code to nspawn-register.[ch] 2015-09-07 18:44:31 +02:00
Lennart Poettering 34829a324b nspawn: split out cgroup related calls into nspawn-cgroup.[ch] 2015-09-07 18:44:30 +02:00
Lennart Poettering 9a2a5625bf nspawn: split out network related code to nspawn-network.[ch] 2015-09-07 18:44:30 +02:00
Lennart Poettering 7a8f63251d nspawn: split all port exposure code into nspawn-expose-port.[ch] 2015-09-07 18:44:30 +02:00
Lennart Poettering e83bebeff7 nspawn: split out mount related functions into a new nspawn-mount.c file 2015-09-07 18:44:30 +02:00
Lennart Poettering f757855e81 nspawn: add new .nspawn files for container settings
.nspawn fiels are simple settings files that may accompany container
images and directories and contain settings otherwise passed on the
nspawn command line. This provides an efficient way to attach execution
data directly to containers.
2015-09-06 01:49:06 +02:00
Lennart Poettering 98e4d8d763 nspawn: enable all controllers we can for the "payload" subcgroup we create
In the unified hierarchy delegating controller access is safe, hence
make sure to enable all controllers for the "payload" subcgroup if we
create it, so that the container will have all controllers enabled the
nspawn service itself has.
2015-09-04 09:07:31 +02:00
Lennart Poettering efdb02375b core: unified cgroup hierarchy support
This patch set adds full support the new unified cgroup hierarchy logic
of modern kernels.

A new kernel command line option "systemd.unified_cgroup_hierarchy=1" is
added. If specified the unified hierarchy is mounted to /sys/fs/cgroup
instead of a tmpfs. No further hierarchies are mounted. The kernel
command line option defaults to off. We can turn it on by default as
soon as the kernel's APIs regarding this are stabilized (but even then
downstream distros might want to turn this off, as this will break any
tools that access cgroupfs directly).

It is possibly to choose for each boot individually whether the unified
or the legacy hierarchy is used. nspawn will by default provide the
legacy hierarchy to containers if the host is using it, and the unified
otherwise. However it is possible to run containers with the unified
hierarchy on a legacy host and vice versa, by setting the
$UNIFIED_CGROUP_HIERARCHY environment variable for nspawn to 1 or 0,
respectively.

The unified hierarchy provides reliable cgroup empty notifications for
the first time, via inotify. To make use of this we maintain one
manager-wide inotify fd, and each cgroup to it.

This patch also removes cg_delete() which is unused now.

On kernel 4.2 only the "memory" controller is compatible with the
unified hierarchy, hence that's the only controller systemd exposes when
booted in unified heirarchy mode.

This introduces a new enum for enumerating supported controllers, plus a
related enum for the mask bits mapping to it. The core is changed to
make use of this everywhere.

This moves PID 1 into a new "init.scope" implicit scope unit in the root
slice. This is necessary since on the unified hierarchy cgroups may
either contain subgroups or processes but not both. PID 1 hence has to
move out of the root cgroup (strictly speaking the root cgroup is the
only one where processes and subgroups are still allowed, but in order
to support containers nicey, we move PID 1 into the new scope in all
cases.) This new unit is also used on legacy hierarchy setups. It's
actually pretty useful on all systems, as it can then be used to filter
journal messages coming from PID 1, and so on.

The root slice ("-.slice") is now implicitly created and started (and
does not require a unit file on disk anymore), since
that's where "init.scope" is located and the slice needs to be started
before the scope can.

To check whether we are in unified or legacy hierarchy mode we use
statfs() on /sys/fs/cgroup. If the .f_type field reports tmpfs we are in
legacy mode, if it reports cgroupfs we are in unified mode.

This patch set carefuly makes sure that cgls and cgtop continue to work
as desired.

When invoking nspawn as a service it will implicitly create two
subcgroups in the cgroup it is using, one to move the nspawn process
into, the other to move the actual container processes into. This is
done because of the requirement that cgroups may either contain
processes or other subgroups.
2015-09-01 23:52:27 +02:00
Lennart Poettering a19222e1d3 nspawn: don't try to extract quotes from option string, glibc doesn't do that either
Follow-up regarding #649.
2015-08-29 19:43:48 +02:00
Eugene Yakubovich 5e5bfa6e1c nspawn: add (no)rbind option to --bind and --bind-ro
--bind and --bind-ro perform the bind mount
non-recursively. It is sometimes (often?) desirable
to do a recursive mount. This patch adds an optional
set of bind mount options in the form of:
	--bind=src-path:dst-path:options
options are comma separated and currently only
"rbind" and "norbind" are allowed.
Default value is "rbind".
2015-08-28 18:06:05 -07:00
Lennart Poettering c1521918b4 nspawn: make sure --template= and --machine= my be combined
Fixes #1018.

Based on a patch from Seth Jennings.
2015-08-25 20:28:31 +02:00
Thomas Hindoe Paaboel Andersen 62f176068c remove unused variables 2015-08-21 22:19:10 +02:00
Richard Maw 62f9f39a45 nspawn: Allow : characters in overlay paths
: characters can be entered with the \: escape sequence.
2015-08-07 15:50:43 +00:00
Richard Maw 872d0dbdc3 nspawn: escape paths in overlay mount options
Overlayfs uses , as an option separator and : as a list separator. These
characters are both valid in file paths, so overlayfs allows file paths
which contain these characters to backslash escape these values.
2015-08-07 15:50:43 +00:00
Richard Maw e4a5d9edee nspawn: Allow : characters in nspawn --bind paths
: characters in bind paths can be entered as the \: escape sequence.
2015-08-07 15:50:43 +00:00
Richard Maw 6330ee1083 nspawn: Allow : characters in --tmpfs path
This now accepts : characters with the \: escape sequence.

Other escape sequences are also interpreted, but having a \ in your file
path is less likely than :, so this shouldn't break anyone's existing
tools.
2015-08-07 15:50:42 +00:00
Zbigniew Jędrzejewski-Szmek 73974f6768 Merge branch 'hostnamectl-dot-v2'
Manual merge of https://github.com/systemd/systemd/pull/751.
2015-08-05 21:02:41 -04:00
Zbigniew Jędrzejewski-Szmek ae691c1d93 hostname-util: get rid of unused parameter of hostname_cleanup()
All users are now setting lowercase=false.
2015-08-05 20:49:21 -04:00
David Herrmann 97b11eedff tree-wide: introduce mfree()
Pretty trivial helper which wraps free() but returns NULL, so we can
simplify this:
        free(foobar);
        foobar = NULL;
to this:
        foobar = mfree(foobar);
2015-07-31 19:56:38 +02:00
Daniel Mack 2fc09a9cdd tree-wide: use free_and_strdup()
Use free_and_strdup() where appropriate and replace equivalent,
open-coded versions.
2015-07-30 13:09:01 +02:00
Mike Gilbert 3dce891505 nspawn: Don't pass uid mount option for devpts
Mounting devpts with a uid breaks pty allocation with recent glibc
versions, which expect that the kernel will set the correct owner for
user-allocated ptys.

The kernel seems to be smart enough to use the correct uid for root when
we switch to a user namespace.

This resolves #337.
2015-07-22 22:34:57 -04:00
Lennart Poettering 1434eb3838 Merge pull request #500 from zonque/fileio
fileio: consolidate write_string_file*()
2015-07-08 17:13:53 -03:00
Zbigniew Jędrzejewski-Szmek af86c44038 Remove repeated 'the's 2015-07-07 07:40:53 -04:00
Daniel Mack ad118bda15 tree-wide: fix write_string_file() user that should not create files
The latest consolidation cleanup of write_string_file() revealed some users
of that helper which should have used write_string_file_no_create() in the
past but didn't. Basically, all existing users that write to files in /sys
and /proc should not expect to write to a file which is not yet existant.
2015-07-06 19:27:20 -04:00
Daniel Mack 4c1fc3e404 fileio: consolidate write_string_file*()
Merge write_string_file(), write_string_file_no_create() and
write_string_file_atomic() into write_string_file() and provide a flags mask
that allows combinations of atomic writing, newline appending and automatic
file creation. Change all users accordingly.
2015-07-06 19:19:25 -04:00
Lennart Poettering eff8efe671 Merge pull request #492 from richardmaw-codethink/nspawn-automatic-uid-shift-fix-v2
nspawn: Communicate determined UID shift to parent version 2
2015-07-06 20:53:56 +02:00
Richard Maw 825d5287d7 nspawn: Communicate determined UID shift to parent
There is logic to determine the UID shift from the file-system, rather
than having it be explicitly passed in.

However, this needs to happen in the child process that sets up the
mounts, as what's important is the UID of the mounted root, rather than
the mount-point.

Setting up the UID map needs to happen in the parent becuase the inner
child needs to have been started, and the outer child is no longer able
to access the uid_map file, since it lost access to it when setting up
the mounts for the inner child.

So we need to communicate the uid shift back out, along with the PID of
the inner child process.

Failing to communicate this means that the invalid UID shift, which is
the value used to specify "this needs to be determined from the file
system" is left invalid, so setting up the user namespace's UID shift
fails.
2015-07-06 13:23:19 +01:00
Lennart Poettering dbb60d6944 nspawn: fix indenting 2015-07-06 12:35:51 +02:00
David Herrmann 6acc94b621 Merge pull request #485 from poettering/sd-bus-flush-close-unref
sd-bus: introduce new sd_bus_flush_close_unref() call
2015-07-04 12:41:01 +02:00
Lennart Poettering 03976f7b4a sd-bus: introduce new sd_bus_flush_close_unref() call
sd_bus_flush_close_unref() is a call that simply combines sd_bus_flush()
(which writes all unwritten messages out) + sd_bus_close() (which
terminates the connection, releasing all unread messages) +
sd_bus_unref() (which frees the connection).

The combination of this call is used pretty frequently in systemd tools
right before exiting, and should also be relevant for most external
clients, and is hence useful to cover in a call of its own.

Previously the combination of the three calls was already done in the
_cleanup_bus_close_unref_ macro, but this was only available internally.

Also see #327
2015-07-03 19:49:03 +02:00
Lennart Poettering 391567f479 Revert "nspawn: determine_uid_shift before forking" 2015-07-03 12:30:53 +02:00
Tom Gundersen b7a049dba5 Merge pull request #429 from richardmaw-codethink/nspawn-userns-uid-shift-autodetection-fix
nspawn: determine_uid_shift before forking
2015-06-30 18:24:14 +02:00
Richard Maw 7fe2bb84c4 nspawn: determine_uid_shift before forking
It is needed in one branch of the fork, but calculated in another
branch.

Failing to do this means using --private-users without specifying a uid
shift always fails because it tries to shift the uid to UID_INVALID.
2015-06-30 14:05:58 +00:00
Richard Maw 3c59d4f21f nspawn: Don't remount with fewer options
When we do a MS_BIND mount, it inherits the flags of its parent mount.
When we do a remount, it sets the flags to exactly what is specified.
If we are in a user namespace then these mount points have their flags
locked, so you can't reduce the protection.

As a consequence, the default setup of mount_all doesn't work with user
namespaces. However if we ensure we add the mount flags of the parent
mount when remounting, then we aren't removing mount options, so we
aren't trying to unlock an option that we aren't allowed to.
2015-06-30 14:05:03 +00:00
Lennart Poettering 68a313c592 nspawn: suppress warning when /etc/resolv.conf is a valid symlink
In such a case let's suppress the warning (downgrade to LOG_DEBUG),
under the assumption that the user has no config file to update in its
place, but a symlink that points to something like resolved's
automatically managed resolve.conf file.

While we are at it, also stop complaining if we cannot write /etc/resolv.conf
due to a read-only disk, given that there's little we could do about it.
2015-06-18 19:45:18 +02:00
Lennart Poettering 503546da7c nspawn: when exiting, flush all remaining bytes from the pty to stdout
This is a simpler fix for #210, it simply uses copy_bytes() for the
copying.
2015-06-17 20:54:45 +02:00
Djalal Harouni b774fb7f00 nspawn: check if kernel supports userns as early as possible
If the kernel do not support user namespace then one of the children
created by nspawn parent will fail at clone(CLONE_NEWUSER) with the
generic error EINVAL and without logging the error. At the same time
the parent may also try to setup the user namespace and will fail with
another error.

To improve this, check if the kernel supports user namespace as early
as possible.
2015-06-16 17:30:45 +01:00
Lennart Poettering 86b85cf440 Merge pull request #214 from poettering/signal-rework-2
everywhere: port everything to sigprocmask_many() and friends
2015-06-15 20:35:18 +02:00
Lennart Poettering 72c0a2c255 everywhere: port everything to sigprocmask_many() and friends
This ports a lot of manual code over to sigprocmask_many() and friends.

Also, we now consistly check for sigprocmask() failures with
assert_se(), since the call cannot realistically fail unless there's a
programming error.

Also encloses a few sd_event_add_signal() calls with (void) when we
ignore the return values for it knowingly.
2015-06-15 20:13:23 +02:00
Lennart Poettering 770b5ce4fc tmpfiles: automatically remove old machine snapshots at boot
Remove old temporary snapshots, but only at boot. Ideally we'd have
"self-destroying" btrfs snapshots that go away if the last last
reference to it does. To mimic a scheme like this at least remove the
old snapshots on fresh boots, where we know they cannot be referenced
anymore. Note that we actually remove all temporary files in
/var/lib/machines/ at boot, which should be safe since the directory has
defined semantics. In the root directory (where systemd-nspawn
--ephemeral places snapshots) we are more strict, to avoid removing
unrelated temporary files.

This also splits out nspawn/container related tmpfiles bits into a new
tmpfiles snippet to systemd-nspawn.conf
2015-06-15 19:28:55 +02:00
Lennart Poettering 14bcf25c8b util: when creating temporary file names, allow including extra id string in it
This adds a "char *extra" parameter to tempfn_xxxxxx(), tempfn_random(),
tempfn_ranomd_child(). If non-NULL this string is included in the middle
of the newly created file name. This is useful for being able to
distuingish the kind of temporary file when we see one.

This also adds tests for the three call.

For now, we don't make use of this at all, but port all users over.
2015-06-15 19:28:55 +02:00
Daniel Mack 12c2884c55 firewall: rename fw-util.[ch] → firewall-util.[ch]
The names fw-util.[ch] are too ambiguous, better rename the files to
firewall-util.[ch]. Also rename the test accordingly.
2015-06-15 14:08:02 +02:00
Lennart Poettering 5feece76fb Merge pull request #205 from endocode/iaguis/seccomp-v2
nspawn: make seccomp loading errors non-fatal
2015-06-15 11:45:48 +02:00
Iago López Galeiras 9b1cbdc6e1 nspawn: make seccomp loading errors non-fatal
seccomp_load returns -EINVAL when seccomp support is not enabled in the
kernel [1]. This should be a debug log, not an error that interrupts nspawn.
If the seccomp filter can't be set and audit is enabled, the user will
get an error message anyway.

[1]: http://man7.org/linux/man-pages/man2/prctl.2.html
2015-06-15 10:55:31 +02:00
Tom Gundersen 1c4baffc18 sd-netlink: rename from sd-rtnl 2015-06-13 19:52:54 +02:00
Tom Gundersen 31710be527 sd-rtnl: make joining broadcast groups implicit 2015-06-11 17:47:40 +02:00
Lennart Poettering ce30c8dcb4 tree-wide: whenever we fork off a foreign child process reset signal mask/handlers
Also, when the child is potentially long-running make sure to set a
death signal.

Also, ignore the result of the reset operations explicitly by casting
them to (void).
2015-06-10 01:28:58 +02:00
Lennart Poettering 24882e06c1 util: split out signal-util.[ch] from util.[ch]
No functional changes.
2015-05-29 20:14:11 +02:00
Martin Pitt e26d6ce517 path-util: Change path_is_mount_point() symlink arg from bool to flags
This makes path_is_mount_point() consistent with fd_is_mount_point() wrt.
flags.
2015-05-29 17:42:44 +02:00
Tom Gundersen cc9fce6554 nspawn: fix memleak
This was a typo, swapping prefix_root() in place of prefix_roota().

Fixes CID 1299640.
2015-05-25 23:01:50 +02:00
Tom Gundersen 2371271c2a nspawn: avoid memleak
Simplify the code a bit, at the cost of potentially duplicating some
memory unneccessarily.

Fixes CID 1299641.
2015-05-25 22:58:26 +02:00
Tom Gundersen 4b53a9d21b nspawn: drop some debugging code
These have no effect.

Fixes CID 1299643.
2015-05-25 22:49:14 +02:00
Tom Gundersen f001a83522 nspawn: make coverity happy
Rather than checking the return of asprintf() we are checking if buf gets allocated,
make it clear that it is ok to ignore the return value.

Fixes CID 1299644.
2015-05-25 22:27:29 +02:00
Umut Tezduyar Lindskog 637aa8a36c nspawn: be verbose about interface names
Allowed interface name is relatively small. Lets not make
users go in to the source code to figure out what happened.

--machine=debian-tree conflicts with
--machine=debian-tree2

ex: Failed to add new veth \
         interfaces (host0, vb-debian-tree): File exists
2015-05-24 22:39:09 +02:00
Lennart Poettering 5ba7a26847 nspawn: prohibit access to the kernel log buffer by default
Unless CAP_SYSLOG is explicitly passed block all access to kmg
2015-05-21 20:49:24 +02:00
Lennart Poettering 050f727728 util: introduce PERSONALITY_INVALID as macro for 0xffffffffLU 2015-05-21 19:48:49 +02:00
Lennart Poettering 03cfe0d514 nspawn: finish user namespace support 2015-05-21 16:32:01 +02:00
Lennart Poettering 6458ec20b5 core,nspawn: unify code that moves the root dir 2015-05-20 14:38:12 +02:00
Alban Crequy 6b7d2e9ea4 nspawn: close extra fds before execing init
When systemd-nspawn gets exec*()ed, it inherits the followings file
descriptors:
- 0, 1, 2: stdin, stdout, stderr
- SD_LISTEN_FDS_START, ... SD_LISTEN_FDS_START+LISTEN_FDS: file
  descriptors passed by the system manager (useful for socket
  activation). They are passed to the child process (process leader).
- extra lock fd: rkt passes a locked directory as an extra fd, so the
  directory remains locked as long as the container is alive.

systemd-nspawn used to close all open fds except 0, 1, 2 and the
SD_LISTEN_FDS_START..SD_LISTEN_FDS_START+LISTEN_FDS. This patch delays
the close just before the exec so the nspawn process (parent) keeps the
extra fds open.

This patch supersedes the previous attempt ("cloexec extraneous fds"):
http://lists.freedesktop.org/archives/systemd-devel/2015-May/031608.html
2015-05-18 22:24:15 +02:00
Lennart Poettering 958b66ea16 util: split all hostname related calls into hostname-util.c 2015-05-18 17:10:07 +02:00
Stefan Junker ce5b3ad450 nspawn: allow access to device nodes listed in --bind= and --bind-ro= switches
https://bugs.freedesktop.org/show_bug.cgi?id=90385
2015-05-14 22:51:05 +02:00
Iago López Galeiras 875e1014dd nspawn: skip symlink to a combined cgroup hierarchy if it already exists
If a symlink to a combined cgroup hierarchy already exists and points to
the right path, skip it. This avoids an error when the cgroups are set
manually before calling nspawn.
2015-05-13 16:03:07 +02:00
Iago López Galeiras 54b4755f15 nspawn: only mount the cgroup root if it's not already mounted
This allows the user to set the cgroups manually before calling nspawn.
2015-05-13 15:56:59 +02:00
Lennart Poettering 5a8af538ae nspawn: rework custom mount point order, and add support for overlayfs
Previously all bind mount mounts were applied in the order specified,
followed by all tmpfs mounts in the order specified. This is
problematic, if bind mounts shall be placed within tmpfs mounts.

This patch hence reworks the custom mount point logic, and alwas applies
them in strict prefix-first order. This means the order of mounts
specified on the command line becomes irrelevant, the right operation
will always be executed.

While we are at it this commit also adds native support for overlayfs
mounts, as supported by recent kernels.
2015-05-13 14:07:26 +02:00
Lennart Poettering 27023c0ef5 nspawn: pass on kill signal setting to contaner scope
Let's just pass on what the user set for us.
2015-05-11 22:10:36 +02:00
Lennart Poettering 1a2399e57d nspawn: when run as a service, don't ask machined for terminatin of ourselves 2015-04-28 21:34:23 +02:00
Lennart Poettering 773ce3d89c nspawn: make sure we install the device policy if nspawn is run as unit as on the command line 2015-04-28 21:34:23 +02:00
Lennart Poettering aee327b816 nspawn: don't inherit read-only flag from disk image if --ephemeral is used
When --ephemeral is used there's no need to keep the image read-only, so
let's not do that then.
2015-04-22 16:56:51 +02:00
Lennart Poettering 10a8700606 tree-wide: get rid of more strerror() calls 2015-04-21 18:05:44 +02:00
Ronny Chevalier 288a74cce5 shared: add terminal-util.[ch] 2015-04-11 00:34:02 +02:00
Ronny Chevalier 3df3e884ae shared: add random-util.[ch] 2015-04-11 00:11:13 +02:00
Ronny Chevalier 0b452006de shared: add process-util.[ch] 2015-04-10 23:54:49 +02:00
Ronny Chevalier 6482f6269c shared: add formats-util.h 2015-04-10 23:54:48 +02:00
Lennart Poettering da00518b3f path-util: fix more path_is_mount e792e890f fallout 2015-04-07 16:03:45 +02:00
Lennart Poettering f70a17f8d4 btrfs: add support for recursive btrfs snapshotting 2015-04-06 15:26:59 +02:00
Lennart Poettering e9bc1871b9 btrfs: make btrfs_subvol_snapshot() parameters a flags field 2015-04-06 14:54:58 +02:00
Lennart Poettering d9e2daaf3d btrfs: support recursively removing btrfs snapshots 2015-04-06 11:28:16 +02:00
Lennart Poettering c687863750 util: rework rm_rf() logic
- Move to its own file rm-rf.c

- Change parameters into a single flags parameter

- Remove "honour sticky" logic, it's unused these days
2015-04-06 10:57:53 +02:00
Alban Crequy 81f5049b7c nspawn: fallback on bind mount when mknod fails
Some systems abusively restrict mknod, even when the device node already
exists in /dev. This is unfortunate because it prevents systemd-nspawn
from creating the basic devices in /dev in the container.

This patch implements a workaround: when mknod fails, fallback on bind
mounts.

Additionally, /dev/console was created with a mknod with the same
major/minor as /dev/null before bind mounting a pts on it. This patch
removes the mknod and creates an empty regular file instead.

In order to test this patch, I used the following configuration, which I
think should replicate the system with the abusive restriction on mknod:

  # grep devices /proc/self/cgroup
  4:devices:/user.slice/restrict
  # cat /sys/fs/cgroup/devices/user.slice/restrict/devices.list
  c 1:9 r
  c 5:2 rw
  c 136:* rw
  # systemd-nspawn --register=false -D .

v2:
 - remove "bind", it is not needed since there is already MS_BIND
v3:
 - fix error management when calling touch()
 - fix lowercase in error message
2015-03-31 17:21:03 +02:00
Lennart Poettering 4f923a1984 nspawn: drop sd_booted() check
We have no such check in any of the other tools, hence don't have one in
nspawn either.

(This should make things nicer for Rocket, among other things)

Note: removing this check does not mean that we support running nspawn
on non-systemd. We explicitly don't. It just means that we remove the
check for running it like that. You are still on your own if you do...
2015-03-31 15:36:53 +02:00
Iago López Galeiras 4543768d13 nspawn: change filesystem type from "bind" to NULL in mount() syscalls
Try to keep syscalls as minimal as possible.
2015-03-31 15:36:53 +02:00
Zbigniew Jędrzejewski-Szmek 48861960ac nspawn: tell coverity that we ignore return value
CID #1271353.
2015-03-13 23:42:16 -04:00
David Herrmann 15411c0cb1 tree-wide: there is no ENOTSUP on linux
Replace ENOTSUP by EOPNOTSUPP as this is what linux actually uses.
2015-03-13 14:10:39 +01:00
Zbigniew Jędrzejewski-Szmek 8a16a7b4e7 nspawn: fix use-after-free and leak in error paths
CID #1257765.
2015-03-07 14:19:20 -05:00
Jay Faulkner 9a71b1122c nspawn: Map all seccomp filters to capabilities
This change makes it so all seccomp filters are mapped
to the appropriate capability and are only added if that
capability was not requested when running the container.

This unbreaks the remaining use cases broken by the
addition of seccomp filters without respecting requested
capabilities.

Co-Authored-By: Clif Houck <me@clifhouck.com>

[zj: - adapt to our coding style, make struct anonymous]
2015-03-04 23:18:09 -05:00
Lennart Poettering c6c8f6e218 nspawn: make kill signal to use for PID 1 configurable 2015-02-25 22:06:54 +01:00
Thomas Hindoe Paaboel Andersen 2eec67acbb remove unused includes
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
2015-02-23 23:53:42 +01:00
Jan Synacek 4aab5d0cbd nspawn: fix whitespace and typo in partition table blurb 2015-02-23 15:26:58 +01:00
Lennart Poettering 6278cf6048 nspawn: chown basic device nodes to userns root 2015-02-19 12:03:39 +01:00
Lennart Poettering d15d65a01f nspawn: fix build on non-selinux systems 2015-02-19 12:03:12 +01:00
Lennart Poettering 6dac160c0a nspawn: add basic user namespacing support
(This is incomplete, /proc and /sys are still owned by root from outside
the container, not inside)
2015-02-19 11:31:08 +01:00
Lennart Poettering 9c857b9d16 nspawn: when connected to pipes for stdin/stdout, pass them as-is to PID 1
Previously we always invoked the container PID 1 on /dev/console of the
container. With this change we do so only if nspawn was invoked
interactively (i.e. its stdin/stdout was connected to a TTY). In all other
cases we directly pass through the fds unmodified.

This has the benefit that nspawn can be added into shell pipelines.

https://bugs.freedesktop.org/show_bug.cgi?id=87732
2015-02-18 23:36:20 +01:00
Lennart Poettering f36933fef6 nspawn: add support for --property= to set scope properties
This is similar to systemd-run's --property= setting.
2015-02-18 19:42:24 +01:00
Jay Faulkner d0a0ccf3fe nspawn: Allow module loading if CAP_SYS_MODULE is requested
nspawn containers currently block module loading in all cases, with
no option to disable it. This allows an admin, specifically setting
capability=CAP_SYS_MODULE or capability=all to load modules.
2015-02-04 13:34:46 +01:00
Lennart Poettering 63c372cb9d util: rework strappenda(), and rename it strjoina()
After all it is now much more like strjoin() than strappend(). At the
same time, add support for NULL sentinels, even if they are normally not
necessary.
2015-02-03 02:05:59 +01:00
Thomas Hindoe Paaboel Andersen fed6df828d remove unused variables 2015-02-02 22:58:06 +01:00
Lennart Poettering c0534580ac nspawn: when mounting the cgroup hierarchies, use the exact same mount options for the superblock as the host
Otherwise we'll generate kernel runtime warnings about non-matching
mount options.
2015-01-23 01:43:16 +01:00
Lennart Poettering bbb99c30d0 nspawn: mount /tmp in the container, don't leave this to the container's init
We really want /tmp to be properly mounted, especially in containers
that lack CAP_SYS_ADMIN or that are not fully booted up and only get a
shell, hence let's do so in nspawn already.
2015-01-23 01:27:06 +01:00
Alban Crequy 05e7da5afa nspawn: allow bind-mounting char and block files 2015-01-23 01:22:55 +01:00
Lennart Poettering c09ef2e4e8 nspawn: work around kernel bug with partition table probing on loopback devices
When we set up a loopback device with partition probing, the udev
"change" event about the configured device is first passed on to
userspace, only the the in-kernel partition prober is started. Since
partition probing fails with EBUSY when somebody has the device open,
the probing frequently fails since udev starts probing/opening the
device as soon as it gets the notification about it, and it might do so
earlier than the kernel probing.

This patch adds a (hopefully temporary) work-around for this, that
compares the number of probed partitions of the kernel with those of
blkid and synchronously asks for reprobing until the numebrs are in
sync.

This really deserves a proper kernel fix.
2015-01-20 20:40:45 +01:00
Tom Gundersen 4bbfe7ad22 nspawn: add ipvlan support 2015-01-20 00:46:13 +01:00
Lennart Poettering f6c51a8136 nspawn: support dissecting GPT images that contain only a single generic linux partition
This should allow running Ubuntu UEFI GPT Images with nspawn,
unmodified.
2015-01-19 20:24:10 +01:00
Lennart Poettering 2fbe4296c5 inspawn: wait until udev has probed a loopback device before making us of it 2015-01-19 20:24:10 +01:00
Jonathan Boulle 835214146b nspawn: fix log typos 2015-01-15 08:19:30 +01:00
Lennart Poettering aceac2f0b6 import: rename "gpt" disk image type to "raw"
After all, nspawn can now dissect MBR partition levels, too, hence
".gpt" appears a misnomer. Moreover, the the .raw suffix for these files
is already pretty popular (the Fedora disk images use it for example),
hence sounds like an OK scheme to adopt.
2015-01-15 01:47:21 +01:00
Lennart Poettering 5e4074aa31 spawn: downgrade loopback detach errors to debug
Sometimes udev or some other background daemon might keep the loopback
devices busy while we already want to detach them. Downgrade the warning
about it.

Given that we use autodetach downgrading these messages should be with
little risk.
2015-01-15 00:51:56 +01:00
Lennart Poettering ada4799ac5 nspawn: add support for limited dissecting of MBR disk images with nspawn
With this change nspawn's -i switch now can now make sense of MBR disk
images too - however only if there's only a single, bootable partition
of type 0x83 on the image. For all other cases we cannot really make
sense from the partition table alone.

The big benefit of this change is that upstream Fedora Cloud Images can
now be booted unmodified with systemd-nspawn:

 # wget http://download.fedoraproject.org/pub/fedora/linux/releases/21/Cloud/Images/x86_64/Fedora-Cloud-Base-20141203-21.x86_64.raw.xz
 # unxz Fedora-Cloud-Base-20141203-21.x86_64.raw.xz
 # systemd-nspawn -i Fedora-Cloud-Base-20141203-21.x86_64.raw -b

Next stop: teach the import logic to automatically download these
images, uncompress and verify them.
2015-01-15 00:47:10 +01:00
Lennart Poettering 733d15ac7a nspawn: pass the container's init PID out via sd_notify()
This is useful for nspawn managers that want to learn when nspawn is
finished with initialiuzation, as well what the PID of the init system
in the container is.
2015-01-14 23:29:01 +01:00
Lennart Poettering 657bdca9e4 nspawn: fix an incorrect assert comparison 2015-01-14 23:18:33 +01:00
Lennart Poettering 30535c1692 nspawn: add file system locks for controlling access to container images
This adds three kinds of file system locks for container images:

a) a file system lock next to the actual image, in a .lck file in the
   same directory the image is located. This lock has the benefit of
   usually being located on the same NFS share as the image itself, and
   thus allows locking container images across NFS shares.

b) a file system lock in /run, named after st_dev and st_ino of the
   root of the image. This lock has the advantage that it is unique even
   if the same image is bind mounted to two different places at the same
   time, as the ino/dev stays constant for them.

c) a file system lock that is only taken when a new disk image is about
   to be created, that ensures that checking whether the name is already
   used across the search path, and actually placing the image is not
   interrupted by other code taking the name.

a + b are read-write locks. When a container is booted in read-only mode
a read lock is taken, otherwise a write lock.

Lock b is always taken after a, to avoid ABBA problems.

Lock c is mostly relevant when renaming or cloning images.
2015-01-14 23:18:33 +01:00
Lennart Poettering 8937422f3b nspawn: remove the right propagation directory 2015-01-14 23:18:33 +01:00
Lennart Poettering ab5e3a1bcc nspawn: --help typo fix 2015-01-13 20:59:07 +01:00
Lennart Poettering 0dfaa00607 nspawn: add "-n" shortcut for "--network-veth"
Now that networkd's IP masquerading support means that running
containers with "--network-veth" will provide network access out of the
box for the container, let's add a shortcut "-n" for it, to make it
easily accessible.
2015-01-13 20:17:06 +01:00
Lennart Poettering 6d0b55c272 nspawn: add new option "--port=" for exposing container ports on the local host
This exposes an IP port on the container as local port using DNAT.
2015-01-13 13:55:15 +01:00
Lennart Poettering f2068bcce0 machined: when cloning a raw disk image, also set the NOCOW flag 2015-01-08 23:13:45 +01:00
Tom Gundersen 080e78329a nspawn: fix error message when mknod fails 2015-01-08 17:09:45 +01:00
Lennart Poettering 0ec5543c4c machinectl: make sure that "machinectl login" exits immediately when the machine it is connected to dies 2015-01-07 03:08:00 +01:00
Lennart Poettering b12afc8c5c nspawn: mount most of the cgroup tree read-only in nspawn containers except for the container's own subtree in the name=systemd hierarchy
More specifically mount all other hierarchies in their entirety and the
name=systemd above the container's subtree read-only.
2015-01-05 01:40:51 +01:00
Lennart Poettering 814a3fdfdc nspawn: report back to systemd only very late whether we are OK
That way, systemd can actually figure out if everything is OK with
nspawn.
2014-12-29 17:54:33 +01:00
Lennart Poettering 1b9cebf638 nspawn: use the same image discovery logic in nspawn as in machined 2014-12-28 02:08:40 +01:00
Filipe Brandenburger f01ae8260d nspawn: remove spurious include of <sys/capability.h>
It does not use any functions from libcap directly. The CAP_* constants in use
through this file come from "missing.h" which will import <linux/capability.h>
and complement it with CAP_* constants not defined by the current kernel
headers.

Add an explicit import of our "capability.h" since it does use the function
capability_bounding_set_drop from that header file. Previously, that header was
implicitly imported through through "cap-list.h".

Tested that "systemd-nspawn" builds cleanly and works after this change.
2014-12-25 10:55:42 -05:00
Lennart Poettering 611b312b7d nspawn,pty: port over to new ptsname_malloc() helper 2014-12-23 03:26:24 +01:00
Lennart Poettering c7b7d4493a machinectl,nspawn: don't print extra final newline if pty terminal output was newline-terinated anyway 2014-12-23 03:26:24 +01:00
Lennart Poettering 9b15b7846d run: add a new "-t" mode for invoking a binary on an allocated TTY 2014-12-23 03:26:24 +01:00
Lennart Poettering 785890acf6 machinectl: implement "bind" command to create additional bind mounts from host to container during runtime 2014-12-18 01:36:28 +01:00
Ken Werner 60e1651a31 nspawn: fix invocation of the raw clone() system call on s390 and cris
Since the order of the first and second arguments of the raw clone() system
call is reversed on s390 and cris it needs to be invoked differently.
2014-12-17 00:20:56 -05:00
Lennart Poettering b9ba4dabba nspawn: when booting in ephemeral mode, append random token to machine name
Also, when booting up an ephemeral container of / use the system
hostname as default machine name.

This way specifiyng -M is unnecessary when booting up an ephemeral
container, while allowing any number of ephemeral containers to run from
the same tree.
2014-12-12 17:30:25 +01:00
Lennart Poettering c4e34a612c nspawn: allow spawning ephemeral nspawn containers based on the root file system of the OS
This works now:

        # systemd-nspawn -xb -D / -M foobar

Which boots up an ephemeral container, based on the host's root file
system. Or in other words: you can now run the very same host OS you
booted your system with also in a container, on top of it, without
having it interfere. Great for testing whether the init system you are
hacking on still boots without reboot the system!
2014-12-12 17:30:25 +01:00
Lennart Poettering df9a75e480 nspawn: don't link journals in ephemeral mode 2014-12-12 17:30:25 +01:00
Lennart Poettering 53e438e301 nspawn: properly unset arg_link_journal_try, when --link-journal= is specified 2014-12-12 17:30:25 +01:00
Lennart Poettering ec16945ebf nspawn: beef up nspawn with some btrfs magic
This adds --template= to duplicate an OS tree as btrfs snpashot and run
it

This also adds --ephemeral or -x to create a snapshot of an OS tree and
boot that, removing it after exit.
2014-12-12 13:35:32 +01:00
Lennart Poettering 0c3c42847d nspawn: properly validate machine names 2014-12-12 13:35:32 +01:00
Lennart Poettering 2822da4fb7 util: introduce our own gperf based capability list
This way, we can ensure we have a more complete, up-to-date list of
capabilities around, always.
2014-12-10 03:21:07 +01:00
Lennart Poettering a90e23051b nspawn: create the macvlan MAC addresses in an arch independent stable way 2014-12-10 00:26:16 +01:00
Lennart Poettering e867ceb6b9 nspawn: make sure macvlan MAC addresses are stable
https://bugs.freedesktop.org/show_bug.cgi?id=85527
2014-12-09 01:20:09 +01:00
Lennart Poettering 04a9193940 nspawn: correct EEXIST check when creating directory to mount /tmp in
https://bugs.freedesktop.org/show_bug.cgi?id=86309
2014-12-03 17:53:33 +01:00
Zbigniew Jędrzejewski-Szmek 01dc33ce28 nspawn: fix unused variable warning 2014-11-29 11:11:10 -05:00
Zbigniew Jędrzejewski-Szmek 820d3acfe9 delta: diff returns 1 when files differ, ignore this
https://bugs.debian/org/771397
2014-11-29 11:10:51 -05:00
Michal Schmidt 4a62c710b6 treewide: another round of simplifications
Using the same scripts as in f647962d64 "treewide: yet more log_*_errno
+ return simplifications".
2014-11-28 19:57:32 +01:00
Michal Schmidt 56f64d9576 treewide: use log_*_errno whenever %m is in the format string
If the format string contains %m, clearly errno must have a meaningful
value, so we might as well use log_*_errno to have ERRNO= logged.

Using:
find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\((".*%m.*")/log_\1_errno(errno, \2/'

Plus some whitespace, linewrap, and indent adjustments.
2014-11-28 19:49:27 +01:00
Michal Schmidt f647962d64 treewide: yet more log_*_errno + return simplifications
Using:
find . -name '*.[ch]' | while read f; do perl -i.mmm -e \
 'local $/;
  local $_=<>;
  s/(if\s*\([^\n]+\))\s*{\n(\s*)(log_[a-z_]*_errno\(\s*([->a-zA-Z_]+)\s*,[^;]+);\s*return\s+\g4;\s+}/\1\n\2return \3;/msg;
  print;'
 $f
done

And a couple of manual whitespace fixups.
2014-11-28 18:56:16 +01:00
Michal Schmidt da927ba997 treewide: no need to negate errno for log_*_errno()
It corrrectly handles both positive and negative errno values.
2014-11-28 13:29:21 +01:00
Michal Schmidt 0a1beeb642 treewide: auto-convert the simple cases to log_*_errno()
As a followup to 086891e5c1 "log: add an "error" parameter to all
low-level logging calls and intrdouce log_error_errno() as log calls
that take error numbers", use sed to convert the simple cases to use
the new macros:

find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\("(.*)%s"(.*), strerror\(-([a-zA-Z_]+)\)\);/log_\1_errno(-\4, "\2%m"\3);/'

Multi-line log_*() invocations are not covered.
And we also should add log_unit_*_errno().
2014-11-28 12:04:41 +01:00
Richard Schütz 6c2d07020f nspawn: ignore EEXIST when mounting tmpfs
commit 79d80fc146 introduced a regression that
prevents mounting a tmpfs if the mount point already exits in the container's
root file system. This commit fixes the problem by ignoring EEXIST.
2014-11-22 20:05:19 -05:00
Martin Pitt 574edc9006 nspawn: Add try-{host,guest} journal link modes
--link-journal={host,guest} fail if the host does not have persistent
journalling enabled and /var/log/journal/ does not exist. Even worse, as there
is no stdout/err any more, there is no error message to point that out.

Introduce two new modes "try-host" and "try-guest" which don't fail in this
case, and instead just silently skip the guest journal setup.

Change -j to mean "try-guest" instead of "guest", and fix the wrong --help
output for it (it said "host" before).

Change systemd-nspawn@.service.in to use "try-guest" so that this unit works
with both persistent and non-persistent journals on the host without failing.

https://bugs.debian.org/770275
2014-11-21 14:27:26 +01:00
Daniel Mack 63cc4c3138 sd-bus: sync with kdbus upstream (ABI break)
kdbus has seen a larger update than expected lately, most notably with
kdbusfs, a file system to expose the kdbus control files:

 * Each time a file system of this type is mounted, a new kdbus
   domain is created.

 * The layout inside each mount point is the same as before, except
   that domains are not hierarchically nested anymore.

 * Domains are therefore also unnamed now.

 * Unmounting a kdbusfs will automatically also detroy the
   associated domain.

 * Hence, the action of creating a kdbus domain is now as
   privileged as mounting a filesystem.

 * This way, we can get around creating dev nodes for everything,
   which is last but not least something that is not limited by
   20-bit minor numbers.

The kdbus specific bits in nspawn have all been dropped now, as nspawn
can rely on the container OS to set up its own kdbus domain, simply by
mounting a new instance.

A new set of mounts has been added to mount things *after* the kernel
modules have been loaded. For now, only kdbus is in this set, which is
invoked with mount_setup_late().
2014-11-13 20:41:52 +01:00
David Herrmann dfb05a1cf5 barrier: explicitly ignore return values of barrier_place()
The barrier implementation tracks remote states internally. There is no
need to check the return value of any barrier_*() function if the caller
is not interested in the result. The barrier helpers only return the state
of the remote side, which is usually not interesting as later calls to
barrier_sync() will catch this, anyway.

Shut up coverity by explicitly ignoring return values of barrier_place()
if we're not interested in it.
2014-11-04 09:49:43 +01:00
Lennart Poettering 023fb90b83 ptyforward: rework PTY forwarder logic used by nspawn to utilize the normal event loop
We really should not run manual event loops anymore, but standardize on
sd_event, so that we can run sd_bus connections from it eventually.
2014-10-31 16:55:04 +01:00
Lennart Poettering 919699ec30 units: don't order journal flushing afte remote-fs.target
Instead, only depend on the actual file systems we need.

This should solve dep loops on setups where remote-fs.target is moved
into late boot.
2014-10-31 16:23:39 +01:00
Lennart Poettering fddbb89c46 nspawn: don't make up -1 as error code 2014-10-31 16:23:39 +01:00
Dave Reisner 1ab19cb167 nspawn: ignore EEXIST when creating mount point
A combination of commits f3c80515c and 79d80fc14 cause nspawn to
silently fail with a commandline such as:

  # systemd-nspawn -D /build/extra-x86_64 --bind=/usr

strace shows the culprit:

  [pid 27868] writev(2, [{"Failed to create mount point /build/extra-x86_64/usr: File exists", 82}, {"\n", 1}], 2) = 83
2014-10-29 13:42:51 -04:00
Michal Sekletar 605f81a896 util: introduce sethostname_idempotent
Function queries system hostname and applies changes only when necessary. Also,
migrate all client of sethostname to sethostname_idempotent while at it.
2014-10-27 10:37:46 +01:00
Daniel Mack 317cde8b80 nspawn: fix DeviceAllow list
Commit 864e17068 ("nspawn: actually allow access to /dev/net/tun in the
container") added "/dev/net/tun" to the list of allowed devices but forgot
to tweak the array length, which caused "/dev/kdbus/*" to be missed.
2014-10-17 16:07:12 +02:00
Lennart Poettering 864e17068c nspawn: actually allow access to /dev/net/tun in the container
It's not sufficient to just copy the device node over, we need to update
the policy for it too.
2014-10-10 11:11:25 +02:00
Tom Gundersen 85614d663e nspawn: copy /dev/net/tun from host
This enables tuntap support in the container (assumning the necessary capabilities are in place).
2014-10-08 15:52:07 +02:00
Tom Gundersen e8c8ddccfc nspawn: log when tearing down of loop device fails 2014-09-29 20:52:10 +02:00
Tom Gundersen 79d80fc146 nspawn: check some more return values
Most of these failures would anyway get caught later on, but now the error messages are a bit more
specific.
2014-09-25 19:10:11 +02:00
Tom Gundersen c00524c9cc nspawn: don't try to create veth link with too long ifname
Reported by: James Lott <james@lottspot.com>
2014-09-19 23:02:00 +02:00
Tom Gundersen 3125b3ef5d nspawn: fix --network-interface
Use SETLINK when modifying an existing link.
2014-08-28 12:16:07 +02:00
Lennart Poettering 1b6d7fa742 util: make use of newly added reset_signal_mask() call wherever appropriate 2014-08-26 21:12:54 +02:00
Lennart Poettering af4ec4309e notify: send STOPPING=1 from our daemons 2014-08-21 17:24:21 +02:00
Lennart Poettering 4f758c2398 nspawn: make sure that when --network-veth is used both the host and the container side get fixed MAC addresses 2014-08-04 19:15:07 +02:00
Lennart Poettering 249968612f bus: always explicitly close bus from main programs
Since b5eca3a205 we don't attempt to GC
busses anymore when unsent messages remain that keep their reference,
when they otherwise are not referenced anymore. This means that if we
explicitly want connections to go away, we need to close them.

With this change we will no do so explicitly wherver we connect to the
bus from a main program (and thus know when the bus connection should go
away), or when we create a private bus connection, that really should go
away after our use.

This fixes connection leaks in the NSS and PAM modules.
2014-08-04 16:25:24 +02:00
Zbigniew Jędrzejewski-Szmek 601185b43d Unify parse_argv style
getopt is usually good at printing out a nice error message when
commandline options are invalid. It distinguishes between an unknown
option and a known option with a missing arg. It is better to let it
do its job and not use opterr=0 unless we actually want to suppress
messages. So remove opterr=0 in the few places where it wasn't really
useful.

When an error in options is encountered, we should not print a lengthy
help() and overwhelm the user, when we know precisely what is wrong
with the commandline. In addition, since help() prints to stdout, it
should not be used except when requested with -h or --help.

Also, simplify things here and there.
2014-08-03 21:46:07 -04:00
Zbigniew Jędrzejewski-Szmek 4212a3375e nspawn: fix truncation of machine names in interface names
Based on patch by Michael Marineau <michael.marineau@coreos.com>:

When deriving the network interface name from machine name strncpy was
not properly null terminating the string and the maximum string size as
returned by strlen() is actually IFNAMSIZ-1, not IFNAMSIZ.
2014-08-03 01:29:51 -04:00
Zbigniew Jędrzejewski-Szmek a2a5291b3f Reject invalid quoted strings
String which ended in an unfinished quote were accepted, potentially
with bad memory accesses.

Reject anything which ends in a unfished quote, or contains
non-whitespace characters right after the closing quote.

_FOREACH_WORD now returns the invalid character in *state. But this return
value is not checked anywhere yet.

Also, make 'word' and 'state' variables const pointers, and rename 'w'
to 'word' in various places. Things are easier to read if the same name
is used consistently.

mbiebl_> am I correct that something like this doesn't work
mbiebl_> ExecStart=/usr/bin/encfs --extpass='/bin/systemd-ask-passwd "Unlock EncFS"'
mbiebl_> systemd seems to strip of the quotes
mbiebl_> systemctl status shows
mbiebl_> ExecStart=/usr/bin/encfs --extpass='/bin/systemd-ask-password Unlock EncFS  $RootDir $MountPoint
mbiebl_> which is pretty weird
2014-07-31 04:00:31 -04:00
Zbigniew Jędrzejewski-Szmek 7566e26721 barrier: initalize file descriptors with -1
Explicitly initalize descriptors using explicit assignment like
bus_error. This makes barriers follow the same conventions as
everything else and makes things a bit simpler too.

Rename barier_init to barier_create so it is obvious that it is
not about initialization.

Remove some parens, etc.
2014-07-18 20:12:44 -04:00
David Herrmann 3496b9eeaf nspawn: fix barrier-destroy call
I dropped the cleanup-helper before pushing so use _cleanup_() directly.
2014-07-17 11:48:39 +02:00
David Herrmann a2da110b78 nspawn: use Barrier API instead of eventfd-util
The Barrier-API simplifies cross-fork() synchronization a lot. Replace the
hard-coded eventfd-util implementation and drop it.

Compared to the old API, Barriers also handle exit() of the remote side as
abortion. This way, segfaults will not cause the parent to deadlock.

EINTR handling is currently ignored for any barrier-waits. This can easily
be added, but it isn't needed so far so I dropped it. EINTR handling in
general is ugly, anyway. You need to deal with pselect/ppoll/... variants
and make sure not to unblock signals at the wrong times. So genrally,
there's little use in adding it.
2014-07-17 11:34:25 +02:00
Lennart Poettering 5aa4bb6b5b nspawn: register external network interface with machined 2014-07-10 22:48:30 +02:00
Lennart Poettering 4d9f07b492 nspawn: add new --volatile switch for booting containers in volatile (ephemeral) mode
Two modes are supported: --volatile=yes mounts only /usr into the
container, and a tmpfs as root directory. --volatile=state mounts the
full OS tree in, but overmounts /var with a tmpfs.

--volatile=yes hence boots with an unpopulated /etc and /var, starting
with pristine configuration and state.

--volatile=state hence boots with an unpopulated /var, only starting
with pristine state.
2014-07-04 03:24:42 +02:00
Lennart Poettering ce38dbc84b nspawn: when running in a service unit, use systemd for restarts
THis way we can remove cgroup priviliges after setup, but get them back
for the next restart, as we need it.
2014-07-03 12:51:07 +02:00
Lennart Poettering 28650077f3 nspawn: block open_by_handle_at() and others via seccomp
Let's protect ourselves against the recently reported docker security
issue. Our man page makes clear that we do not make any security
promises anyway, but well, this one is easy to mitigate, so let's do it.
While we are at it block a couple of more syscalls that are no good in
containers, too.
2014-06-30 16:22:12 +02:00
Lennart Poettering 840295fc1e nspawn: let's avoid using goto to wildly for non-cleanup purposes 2014-06-30 15:20:59 +02:00
Lennart Poettering ce9f1527b6 nspawn: simplify exit condition check 2014-06-30 15:19:00 +02:00
Luke Shumaker 8baaf7a3d8 nspawn: log a warning on failure from wait_for_terminate()
This is at the suggestion of Djalal Harouni on the mailing list, and
reflects the behavior of shared/util.c:wait_for_terminate_and_warn().
2014-06-30 15:13:53 +02:00
Luke Shumaker 6d416b9cc8 nspawn: Fix regression with exit status
Commit 113cea8 introduced a bug that caused the exit code of systemd-nspawn
to not reflect the exit code of the program executed in the container.
2014-06-30 15:13:47 +02:00
Kay Sievers 971ff8c78b switch-root: create essential base directories at system bootup
This allows us to bootup a rootfs with a /usr directory only.
2014-06-24 18:12:31 +02:00
Kay Sievers 3577de7ac3 nspawn: create essential base directories at system bootup
This allows us to bootup a rootfs with a /usr directory only.
2014-06-24 15:41:03 +02:00
Thomas Hindoe Paaboel Andersen c8b32e11ee consistently order cleanup attribute before type 2014-06-22 00:45:15 +02:00
Lennart Poettering 5ae4d543cb os-release: define /usr/lib/os-release as fallback for /etc/os-release
The file should have been in /usr/lib/ in the first place, since it
describes the OS container in /usr (and not the configuration in /etc),
hence, let's support os-release files in /usr/lib as fallback if no
version in /etc exists, following the usual override logic.

A prior commit already enabled tmpfiles to create /etc/os-release as a
symlink to /usr/lib/os-release should it be missing, thus providing nice
compatibility with applications only checking in /etc.

While it's probably a good idea if all apps check both locations via a
fallback logic, it is only necessary in the early boot process, as long
as the /etc/os-release symlink has not been restored, in case we boot
with an empty /etc.
2014-06-13 20:11:59 +02:00
Lennart Poettering 06c17c39a8 nspawn: add new --tmpfs= option to mount a tmpfs on specific directories, such as /var 2014-06-11 00:44:30 +02:00
Lennart Poettering 849958d1ba tmpfiles: add new "C" line for copying files or directories 2014-06-10 23:02:40 +02:00
Zbigniew Jędrzejewski-Szmek 45f1386c9a nspawn: split long message into two lines
For names like /var/lib/container/something, the message
becomes quite long. Better to split it.

Also reword the message not to suggest that ^]^]^] only works
in the beginning.
2014-06-07 16:30:51 -04:00
Lennart Poettering d6797c920e namespace: beef up read-only bind mount logic
Instead of blindly creating another bind mount for read-only mounts,
check if there's already one we can use, and if so, use it. Also,
recursively mark all submounts read-only too. Also, ignore autofs mounts
when remounting read-only unless they are already triggered.
2014-06-06 14:37:40 +02:00
Djalal Harouni e866af3acc nspawn: make nspawn robust to container failure
nspawn and the container child use eventfd to wait and notify each other
that they are ready so the container setup can be completed.

However in its current form the wait/notify event ignore errors that
may especially affect the child (container).

On errors the child will jump to the "child_fail" label and terminate
with _exit(EXIT_FAILURE) without notifying the parent. Since the eventfd
is created without the "EFD_NONBLOCK" flag, this leaves the parent
blocking on the eventfd_read() call. The container can also be killed
at any moment before execv() and the parent will not receive
notifications.

We can fix this by using cheap mechanisms, the new high level eventfd
API and handle SIGCHLD signals:

* Keep the cheap eventfd and EFD_NONBLOCK flag.

* Introduce eventfd states for parent and child to sync.
Child notifies parent with EVENTFD_CHILD_SUCCEEDED on success or
EVENTFD_CHILD_FAILED on failure and before _exit(). This prevents the
parent from waiting on an event that will never come.

* If the child is killed before execv() or before notifying the parent,
we install a NOP handler for SIGCHLD which will interrupt blocking calls
with EINTR. This gives a chance to the parent to call wait() and
terminate in main().

* If there are no errors, parent will block SIGCHLD, restore default
handler and notify child which will do execv(), then parent will pass
control to process_pty() to do its magic.

This was exposed in part by:
https://bugs.freedesktop.org/show_bug.cgi?id=76193

Reported-by: Tobias Hunger tobias.hunger@gmail.com
2014-05-25 11:23:35 +08:00
Djalal Harouni 113cea802d nspawn: move container wait logic into wait_for_container()
Move the container wait logic into its own wait_for_container() function
and add two status codes: CONTAINER_TERMINATED or CONTAINER_REBOOTED.
The status will be stored in its argument, this way we handle:
a) Return negative on failures.
b) Return zero on success and set the status to either
   CONTAINER_REBOOTED or CONTAINER_TERMINATED.

These status codes are used to terminate nspawn or loop again in case of
CONTAINER_REBOOTED.
2014-05-25 11:23:30 +08:00
Cristian Rodríguez 590b6b9188 Use %m instead of strerror(errno) where appropiate 2014-05-25 11:18:28 +08:00
Lennart Poettering cdb2b9d05a nspawn: restore journal directory is empty check
This undoes part of commit e6a4a517be.

Instead of removing the error message about non-empty journal bind mount
directories, simply downgrade the message to a warning and proceed.
2014-05-22 15:21:01 +09:00
Djalal Harouni e6a4a517be nspawn: allow to bind mount journal on top of a non empty container journal dentry
Currently if nspawn was called with --link-journal=host or
--link-journal=auto and the right /var/log/journal/machine-id/ exists
then the bind mount the subdirectory into the container might fail due
to the ~/mycontainer/var/log/journal/machine-id/ of the container not
being empty.

There is no reason to check if the container journal subdir is empty
since there will be a bind mount on top of it. The user asked for a bind
mount so give it.

Note: a next call with --link-journal=guest may fail due to the
/var/log/journal/machine-id/ on the host not being empty.

https://bugs.freedesktop.org/show_bug.cgi?id=76193

Reported-by: Tobias Hunger <tobias.hunger@gmail.com>
2014-05-22 09:55:23 +09:00
Nis Martensen f1721625e7 fix spelling of privilege 2014-05-19 00:40:44 +09:00
Lennart Poettering 9f24adc288 nspawn: properly format container_uuid in UUID format
http://lists.freedesktop.org/archives/systemd-devel/2014-April/018971.html
2014-05-16 19:37:19 +02:00
Philip Lorenz 70f539ca14 nspawn: Fix erroneous OOM when building group list
change_uid_gid() never initialises sz which may cause greedy_realloc to
skip the initial buffer allocation.
2014-04-10 09:50:39 -04:00
Tom Gundersen d8e538ecd9 sd-rtnl: rework rtnl type system
Use a static table with all the typing information, rather than repeated
switch statements. This should make it a lot simpler to add new types.

We need to keep all the type info to be able to create containers
without exposing their implementation details to the users of the library.

As a freebee we verify the types of appended/read attributes.

The API is extended to nicely deal with unions of container types.
2014-03-28 19:11:59 +01:00
Lennart Poettering 3d94f76c99 util: replace close_pipe() with new safe_close_pair()
safe_close_pair() is more like safe_close(), except that it handles
pairs of fds, and doesn't make and misleading allusion, as it works
similarly well for socketpairs() as for pipe()s...
2014-03-24 03:22:44 +01:00
Lennart Poettering 03e334a1c7 util: replace close_nointr_nofail() by a more useful safe_close()
safe_close() automatically becomes a NOP when a negative fd is passed,
and returns -1 unconditionally. This makes it easy to write lines like
this:

        fd = safe_close(fd);

Which will close an fd if it is open, and reset the fd variable
correctly.

By making use of this new scheme we can drop a > 200 lines of code that
was required to test for non-negative fds or to reset the closed fd
variable afterwards.
2014-03-18 19:31:34 +01:00
Tom Gundersen 039dd4afd6 nspawn: UP the host side of the veth pair after adding it to a bridge 2014-03-16 13:55:41 +01:00
Dave Reisner 7947952ede nspawn: remove unused variable 2014-03-13 21:56:07 -04:00
Brandon Philips f418f31d50 nspawn: allow -EEXIST on mkdir_safe /home/${uid}
With systemd 211 nspawn attempts to create the home directory for the
given uid. However, if the home directory already exists then it will
fail. Don't error out on -EEXIST.
2014-03-14 02:25:56 +01:00
Tom Gundersen 01dde0611b nspawn: make host0's MAC address persistent
We still need to make sure that no two MAC addresses are the same, so we use
a logic similar to what is used in udev to generate MAC addresses, and base
it on a hash of the host's machine ID and thecontainer's name.
2014-03-13 17:47:33 +01:00
Lennart Poettering 727fd4fda5 nspawn: honour GPT partition flags when mounting file systems following the discoverable partitions spec 2014-03-13 01:33:33 +01:00
Mantas Mikulėnas 4de8292689 nspawn: fix argv[0] for getent 2014-03-11 17:45:20 +01:00
Lennart Poettering a07f961e98 nspawn: allow using kdbus from nspawn containers 2014-03-11 17:43:41 +01:00
Lennart Poettering 8c4e25b73c nspawn: fix getent fallback 2014-03-11 03:08:54 +01:00
Lennart Poettering 0cb9fbcd44 nspawn: when resoliving UIDs/GIDs for "-u", do so in forked off /usr/bin/getent instead of in-process
When the container runs a different native architecture than the host we
shouldn't attempt to load the container's NSS modules with the host's
libc. Instead, resolve UID/GID by invoking /usr/bin/getent in the
container. The tool should be fairly universally available and allows us
to do resolving of the UID/GID with the container's libc in a parsable
format.

https://bugs.freedesktop.org/show_bug.cgi?id=75733
2014-03-11 02:41:13 +01:00