Commit graph

386 commits

Author SHA1 Message Date
Lennart Poettering b1d4f8e154 util-lib: split out user/group/uid/gid calls into user-util.[ch] 2015-10-26 01:24:38 +01:00
Lennart Poettering 3ffd4af220 util-lib: split out fd-related operations into fd-util.[ch]
There are more than enough to deserve their own .c file, hence move them
over.
2015-10-25 13:19:18 +01:00
Lennart Poettering 07630cea1f util-lib: split our string related calls from util.[ch] into its own file string-util.[ch]
There are more than enough calls doing string manipulations to deserve
its own files, hence do something about it.

This patch also sorts the #include blocks of all files that needed to be
updated, according to the sorting suggestions from CODING_STYLE. Since
pretty much every file needs our string manipulation functions this
effectively means that most files have sorted #include blocks now.

Also touches a few unrelated include files.
2015-10-24 23:05:02 +02:00
Lennart Poettering 0f03c2a4c0 path-util: unify how we process paths specified on the command line
Let's introduce a common function that makes relative paths absolute and
warns about any errors while doing so.
2015-10-24 23:03:49 +02:00
Lennart Poettering 0f47436510 util-lib: get_current_dir_name() can return errors other than ENOMEM
get_current_dir_name() can return a variety of errors, not just ENOMEM,
hence don't blindly turn its errors to ENOMEM, but return correct errors
in path_make_absolute_cwd().

This trickles down into a couple of other functions, some of which
receive unrelated minor fixes too with this commit.
2015-10-24 23:03:49 +02:00
Lennart Poettering 16fb773ee3 nspawn: don't try to resolve passed binary before entering namespace
Othewise we might follow the symlinks on the host, instead of the
container.

Fixes #1400
2015-10-22 01:59:25 +02:00
Lennart Poettering 0e2656744f nspawn: rework how we determine private networking settings
Make sure we acquire CAP_NET_ADMIN if we require virtual networking.

Make sure we imply virtual ethernet correctly when bridge is request.

Fixes: #1511
Fixes: #1554
Fixes: #1590
2015-10-22 01:59:25 +02:00
Lennart Poettering 5bcd08db28 btrfs: beef-up btrfs support with a limited understanding of quota
With this change we understand more than just leaf quota groups for
btrfs file systems. Specifically:

- When we create a subvolume we can now optionally add the new subvolume
  to all qgroups its parent subvolume was member of too. Alternatively
  it is also possible to insert an intermediary quota group between the
  parent's qgroups and the subvolume's leaf qgroup, which is useful for
  a concept of "subtree" qgroups, that contain a subvolume and all its
  children.

- The remove logic for subvolumes has been updated to optionally remove
  any leaf qgroups or "subtree" qgroups, following the logic above.

- The snapshot logic for subvolumes has been updated to replicate the
  original qgroup setup of the source, if it follows the "subtree"
  design described above. It will not cover qgroup setups that introduce
  arbitrary qgroups, especially those orthogonal to the subvolume
  hierarchy.

This also tries to be more graceful when setting up /var/lib/machines as
btrfs. For example, if mkfs.btrfs is missing we don't even try to set it
up as loopback device.

Fixes #1559
Fixes #1129
2015-10-22 01:59:25 +02:00
Iago López Galeiras d167824896 nspawn: skip /sys-as-tmpfs if we don't use private-network
Since v3.11/7dc5dbc ("sysfs: Restrict mounting sysfs"), the kernel
doesn't allow mounting sysfs if you don't have CAP_SYS_ADMIN rights over
the network namespace.

So the mounting /sys as a tmpfs code introduced in
d8fc6a000f doesn't work with user
namespaces if we don't use private-net. The reason is that we mount
sysfs inside the container and we're in the network namespace of the host
but we don't have CAP_SYS_ADMIN over that namespace.

To fix that, we mount /sys as a sysfs (instead of tmpfs) if we don't use
private network and ignore the /sys-as-a-tmpfs code if we find that /sys
is already mounted as sysfs.

Fixes #1555
2015-10-20 10:19:23 +02:00
Lennart Poettering ae3dde8012 machinectl: fix race when opening new shells with "machinectl shell"
Previously, we'd allocate the TTY, spawn a service on it, but
immediately start processing the TTY and forwarding it to whatever the
commnd was started on. This is however problematic, as the TTY might get
actually opened only much later by the service. We'll hence first get
EIOs on the master as the other side is still closed, and hence
considered it hung up and terminated the session.

With this change we add a flag to the pty forwarding logic:
PTY_FORWARD_IGNORE_INITIAL_VHANGUP. If set, we'll ignore all hangups
(i.e. EIOs) on the master PTY until the first byte is successfully read.
From that point on we consider a hangup/EIO a regular connection termination. This
way, we handle the race: when we get EIO initially we'll ignore it,
until the connection is properly set up, at which time we start
honouring it.
2015-10-07 20:10:48 +02:00
Lennart Poettering d8fc6a000f nspawn: mount /sys as tmpfs, and then mount only select subdirs of the real sysfs below it
This way we can hide things like /sys/firmware or /sys/hypervisor from
the container, while keeping the device tree around.

While this is a security benefit in itself it also allows us to fix
issue #1277.

Previously we'd mount /sys before creating the user namespace, in order
to be able to mount /sys/fs/cgroup/* beneath it (which resides in it),
which we can only mount outside of the user namespace. To ensure that
the user namespace owns the network namespace we'd set up the network
namespace at the same time as the user namespace. Thus, we'd still see
the /sys/class/net/ from the originating network namespace, even though
we are in our own network namespace now. With this patch, /sys is
mounted before transitioning into the user namespace as tmpfs, so that
we can also mount /sys/fs/cgroup/* into it this early. The directories
such as /sys/class/ are then later added in from the real sysfs from
inside the network and user namespace so that they actually show whatis
available in it.

Fixes #1277
2015-09-30 15:19:33 +02:00
Lennart Poettering 403af78c80 nspawn: fix user namespace support
We didn#t actually pass ownership of /run to the UID in the container
since some releases, let's fix that.
2015-09-30 12:48:17 +02:00
Lennart Poettering db3b1dedb2 nspawn: order includes 2015-09-30 12:24:06 +02:00
Lennart Poettering 3f6fd1ba65 util: introduce common version() implementation and use it everywhere
This also allows us to drop build.h from a ton of files, hence do so.
Since we touched the #includes of those files, let's order them properly
according to CODING_STYLE.
2015-09-29 21:08:37 +02:00
Lennart Poettering 189d5bac5c util: unify implementation of NOP signal handler
This is highly complex code after all, we really should make sure to
only keep one implementation of this extremely difficult function
around.
2015-09-29 21:08:37 +02:00
Lennart Poettering 2feceb5eb9 tree-wide: take benefit of the fact that fdset_free() returns NULL 2015-09-29 21:08:37 +02:00
Lennart Poettering 3ee897d6c2 tree-wide: port more code to use send_one_fd() and receive_one_fd()
Also, make it slightly more powerful, by accepting a flags argument, and
make it safe for handling if more than one cmsg attribute happens to be
attached.
2015-09-29 21:08:37 +02:00
Krzesimir Nowak c0ffce2bd1 nspawn, machined: fix comments and error messages
A bunch of "Client -> Child" fixes and one barrier-enumerator fix.

(David: rebased on master)
2015-09-22 14:17:03 +02:00
Krzesimir Nowak 327e26d689 nspawn: close unneeded sockets in outer child
(David: Note, this is just a cleanup and doesn't fix any bugs)
2015-09-22 14:11:44 +02:00
David Herrmann d960371482 util: introduce {send,receive}_one_fd()
Introduce two new helpers that send/receive a single fd via a unix
transport. Also make nspawn use them instead of hard-coding it.

Based on a patch by Krzesimir Nowak.
2015-09-22 14:09:54 +02:00
Lennart Poettering 59f448cf15 tree-wide: never use the off_t unless glibc makes us use it
off_t is a really weird type as it is usually 64bit these days (at least
in sane programs), but could theoretically be 32bit. We don't support
off_t as 32bit builds though, but still constantly deal with safely
converting from off_t to other types and back for no point.

Hence, never use the type anymore. Always use uint64_t instead. This has
various benefits, including that we can expose these values directly as
D-Bus properties, and also that the values parse the same in all cases.
2015-09-10 18:16:18 +02:00
Lennart Poettering 82116c4329 nspawn: also close uid shift socket in the parent
We should really close all parent sides of our child/parent socket
pairs.
2015-09-08 01:22:46 +02:00
Lennart Poettering 76d448820e nspawn: short reads do not set errno, hence don't try to print it 2015-09-08 01:22:26 +02:00
Lennart Poettering 4610de5022 inspawn: switch from SOCK_DGRAM to SOCK_SEQPACKET for internal socketpairs
SOCK_DGRAM and SOCK_SEQPACKET have very similar semantics when used with
socketpair(). However, SOCK_SEQPACKET has the advantage of knowing a
hangup concept, since it is inherently connection-oriented.

Since we use socket pairs to communicate between the nspawn main process
and the nspawn child process, where the child might die abnormally it's
interesting to us to learn about this via hangups if the child side of
the pair is closed. Hence, let's switch to SOCK_SEQPACKET for these
internal communication sockets.

Fixes #956.
2015-09-08 01:17:47 +02:00
Lennart Poettering 07fa00f9d9 nspawn: properly propagate errors when we fail to set soemthing up 2015-09-08 01:17:15 +02:00
Lennart Poettering 8fe0087ede nspawn: sort and clean up included header list
Let's remove unnecessary inclusions, and order the list alphabetically
as suggested in CODING_STYLE now.
2015-09-07 18:56:54 +02:00
Lennart Poettering 2b5c04d59c nspawn: remove nspawn.h, it's empty now 2015-09-07 18:47:34 +02:00
Lennart Poettering ee64508006 nspawn: split out --uid= logic into nspawn-setuid.[ch] 2015-09-07 18:44:31 +02:00
Lennart Poettering b7103bc5f4 nspawn: split out machined registration code to nspawn-register.[ch] 2015-09-07 18:44:31 +02:00
Lennart Poettering 34829a324b nspawn: split out cgroup related calls into nspawn-cgroup.[ch] 2015-09-07 18:44:30 +02:00
Lennart Poettering 9a2a5625bf nspawn: split out network related code to nspawn-network.[ch] 2015-09-07 18:44:30 +02:00
Lennart Poettering 7a8f63251d nspawn: split all port exposure code into nspawn-expose-port.[ch] 2015-09-07 18:44:30 +02:00
Lennart Poettering e83bebeff7 nspawn: split out mount related functions into a new nspawn-mount.c file 2015-09-07 18:44:30 +02:00
Lennart Poettering f757855e81 nspawn: add new .nspawn files for container settings
.nspawn fiels are simple settings files that may accompany container
images and directories and contain settings otherwise passed on the
nspawn command line. This provides an efficient way to attach execution
data directly to containers.
2015-09-06 01:49:06 +02:00
Lennart Poettering 98e4d8d763 nspawn: enable all controllers we can for the "payload" subcgroup we create
In the unified hierarchy delegating controller access is safe, hence
make sure to enable all controllers for the "payload" subcgroup if we
create it, so that the container will have all controllers enabled the
nspawn service itself has.
2015-09-04 09:07:31 +02:00
Lennart Poettering efdb02375b core: unified cgroup hierarchy support
This patch set adds full support the new unified cgroup hierarchy logic
of modern kernels.

A new kernel command line option "systemd.unified_cgroup_hierarchy=1" is
added. If specified the unified hierarchy is mounted to /sys/fs/cgroup
instead of a tmpfs. No further hierarchies are mounted. The kernel
command line option defaults to off. We can turn it on by default as
soon as the kernel's APIs regarding this are stabilized (but even then
downstream distros might want to turn this off, as this will break any
tools that access cgroupfs directly).

It is possibly to choose for each boot individually whether the unified
or the legacy hierarchy is used. nspawn will by default provide the
legacy hierarchy to containers if the host is using it, and the unified
otherwise. However it is possible to run containers with the unified
hierarchy on a legacy host and vice versa, by setting the
$UNIFIED_CGROUP_HIERARCHY environment variable for nspawn to 1 or 0,
respectively.

The unified hierarchy provides reliable cgroup empty notifications for
the first time, via inotify. To make use of this we maintain one
manager-wide inotify fd, and each cgroup to it.

This patch also removes cg_delete() which is unused now.

On kernel 4.2 only the "memory" controller is compatible with the
unified hierarchy, hence that's the only controller systemd exposes when
booted in unified heirarchy mode.

This introduces a new enum for enumerating supported controllers, plus a
related enum for the mask bits mapping to it. The core is changed to
make use of this everywhere.

This moves PID 1 into a new "init.scope" implicit scope unit in the root
slice. This is necessary since on the unified hierarchy cgroups may
either contain subgroups or processes but not both. PID 1 hence has to
move out of the root cgroup (strictly speaking the root cgroup is the
only one where processes and subgroups are still allowed, but in order
to support containers nicey, we move PID 1 into the new scope in all
cases.) This new unit is also used on legacy hierarchy setups. It's
actually pretty useful on all systems, as it can then be used to filter
journal messages coming from PID 1, and so on.

The root slice ("-.slice") is now implicitly created and started (and
does not require a unit file on disk anymore), since
that's where "init.scope" is located and the slice needs to be started
before the scope can.

To check whether we are in unified or legacy hierarchy mode we use
statfs() on /sys/fs/cgroup. If the .f_type field reports tmpfs we are in
legacy mode, if it reports cgroupfs we are in unified mode.

This patch set carefuly makes sure that cgls and cgtop continue to work
as desired.

When invoking nspawn as a service it will implicitly create two
subcgroups in the cgroup it is using, one to move the nspawn process
into, the other to move the actual container processes into. This is
done because of the requirement that cgroups may either contain
processes or other subgroups.
2015-09-01 23:52:27 +02:00
Lennart Poettering a19222e1d3 nspawn: don't try to extract quotes from option string, glibc doesn't do that either
Follow-up regarding #649.
2015-08-29 19:43:48 +02:00
Eugene Yakubovich 5e5bfa6e1c nspawn: add (no)rbind option to --bind and --bind-ro
--bind and --bind-ro perform the bind mount
non-recursively. It is sometimes (often?) desirable
to do a recursive mount. This patch adds an optional
set of bind mount options in the form of:
	--bind=src-path:dst-path:options
options are comma separated and currently only
"rbind" and "norbind" are allowed.
Default value is "rbind".
2015-08-28 18:06:05 -07:00
Lennart Poettering c1521918b4 nspawn: make sure --template= and --machine= my be combined
Fixes #1018.

Based on a patch from Seth Jennings.
2015-08-25 20:28:31 +02:00
Thomas Hindoe Paaboel Andersen 62f176068c remove unused variables 2015-08-21 22:19:10 +02:00
Richard Maw 62f9f39a45 nspawn: Allow : characters in overlay paths
: characters can be entered with the \: escape sequence.
2015-08-07 15:50:43 +00:00
Richard Maw 872d0dbdc3 nspawn: escape paths in overlay mount options
Overlayfs uses , as an option separator and : as a list separator. These
characters are both valid in file paths, so overlayfs allows file paths
which contain these characters to backslash escape these values.
2015-08-07 15:50:43 +00:00
Richard Maw e4a5d9edee nspawn: Allow : characters in nspawn --bind paths
: characters in bind paths can be entered as the \: escape sequence.
2015-08-07 15:50:43 +00:00
Richard Maw 6330ee1083 nspawn: Allow : characters in --tmpfs path
This now accepts : characters with the \: escape sequence.

Other escape sequences are also interpreted, but having a \ in your file
path is less likely than :, so this shouldn't break anyone's existing
tools.
2015-08-07 15:50:42 +00:00
Zbigniew Jędrzejewski-Szmek 73974f6768 Merge branch 'hostnamectl-dot-v2'
Manual merge of https://github.com/systemd/systemd/pull/751.
2015-08-05 21:02:41 -04:00
Zbigniew Jędrzejewski-Szmek ae691c1d93 hostname-util: get rid of unused parameter of hostname_cleanup()
All users are now setting lowercase=false.
2015-08-05 20:49:21 -04:00
David Herrmann 97b11eedff tree-wide: introduce mfree()
Pretty trivial helper which wraps free() but returns NULL, so we can
simplify this:
        free(foobar);
        foobar = NULL;
to this:
        foobar = mfree(foobar);
2015-07-31 19:56:38 +02:00
Daniel Mack 2fc09a9cdd tree-wide: use free_and_strdup()
Use free_and_strdup() where appropriate and replace equivalent,
open-coded versions.
2015-07-30 13:09:01 +02:00
Mike Gilbert 3dce891505 nspawn: Don't pass uid mount option for devpts
Mounting devpts with a uid breaks pty allocation with recent glibc
versions, which expect that the kernel will set the correct owner for
user-allocated ptys.

The kernel seems to be smart enough to use the correct uid for root when
we switch to a user namespace.

This resolves #337.
2015-07-22 22:34:57 -04:00
Lennart Poettering 1434eb3838 Merge pull request #500 from zonque/fileio
fileio: consolidate write_string_file*()
2015-07-08 17:13:53 -03:00