We want to emphasize bus connections as per-thread communication
primitives, hence introduce a concept of a per-thread default bus, and
make use of it everywhere.
Among other things this also adds a few things necessary for the change:
- Considerably more powerful error returning APIs in libsystemd-bus
- Adapter for connecting an sd_bus to an sd_event
- As I reworked the PolicyKit logic to the new library I also made it
asynchronous, so that PolicyKit requests of one user cannot block out
another user anymore.
- We always use the macro names for common bus error. That way it is
harder to mistype them since the compiler will notice
We were already creating the file if it was missing, and this way
containers can reconfigure the file without running into problems.
This also makes resolv.conf handling more alike to handling of
/etc/localtime, which is also not a bind mount.
Previously, if a file's bind mount destination didn't exist, nspawn
would blindly create a directory, and the subsequent bind mount would
fail. Examine the filetype of the source and ensure that, if the
destination does not exist, that it is created appropriately.
Also go one step further and ensure that the filetypes of the source
and destination match.
Commit 2e996f4d4b added an include
of linux/netlink.h
This kernel header is not self contained in the linux 2.6 kernel
which breaks compilation with an unknown type sa_family_t
A workaround is to include linux/netlink.h after sys/socket.h
Embedded folks don't need the machine registration stuff, hence it's
nice to make this optional. Also, I'd expect that machinectl will grow
additional commands quickly, for example to join existing containers and
suchlike, hence it's better keeping that separate from loginctl.
- This changes all logind cgroup objects to use slice objects rather
than fixed croup locations.
- logind can now collect minimal information about running
VMs/containers. As fixed cgroup locations can no longer be used we
need an entity that keeps track of machine cgroups in whatever slice
they might be located. Since logind already keeps track of users,
sessions and seats this is a trivial addition.
- nspawn will now register with logind and pass various bits of metadata
along. A new option "--slice=" has been added to place the container
in a specific slice.
- loginctl gained commands to list, introspect and terminate machines.
- user.slice and machine.slice will now be pulled in by logind.service,
since only logind.service requires this slice.
Session objects will now get the .session suffix, user objects the .user
suffix, nspawn containers the .nspawn suffix.
This also changes the user cgroups to be named after the numeric UID
rather than the username, since this allows us the parse these paths
standalone without requiring access to the cgroup file system.
This also changes the mapping of instanced units to cgroups. Instead of
mapping foo@bar.service to the cgroup path /user/foo@.service/bar we
will now map it to /user/foo@.service/foo@bar.service, in order to
ensure that all our objects are properly suffixed in the tree.
As discussed with Dan Berrange it's a good idea to suffix all objects in
the cgroup tree with ".something", so that when the system is
partitioned using a resource management tool we can drop objects of
different types into the same partition directory without generate
namespace conflicts.
We'l add this to the Pax Control Group document as soon as write access
to the fdo wiki is restored.
All attributes are stored as text, since root_directory is already
text, and it seems easier to have all of them in text format.
Attributes are written in the trusted. namespace, because the kernel
currently does not allow user. attributes on cgroups. This is a PITA,
and CAP_SYS_ADMIN is required to *read* the attributes. Alas.
A second pipe is opened for the child to signal the parent that the
cgroup hierarchy has been set up.
nspawn will overmount resolv.conf if it exists. Since e.g.
default install with yum doesn't create /etc/resolv.conf,
a container created with yum will not have network. This
seems undesirable, and since we overmount the file anyway,
let's create it too.
Also, mounting a read-write /etc/resolv.conf in the container
is treated as a failure, since it makes it possible to
modify hosts /etc/resolv.conf from inside the container.
Containers will now carry a label (normally derived from the root
directory name, but configurable by the user), and the container's root
cgroup is /machine/<label>. This label is called "machine name", and can
cover both containers and VMs (as soon as libvirt also makes use of
/machine/).
libsystemd-login can be used to query the machine name from a process.
This patch also includes numerous clean-ups for the cgroup code.
Before, we would initialize many fields twice: first
by filling the structure with zeros, and then a second
time with the real values. We can let the compiler do
the job for us, avoiding one copy.
A downside of this patch is that text gets slightly
bigger. This is because all zero() calls are effectively
inlined:
$ size build/.libs/systemd
text data bss dec hex filename
before 897737 107300 2560 1007597 f5fed build/.libs/systemd
after 897873 107300 2560 1007733 f6075 build/.libs/systemd
… actually less than 1‰.
A few asserts that the parameter is not null had to be removed. I
don't think this changes much, because first, it is quite unlikely
for the assert to fail, and second, an immediate SEGV is almost as
good as an assert.
This reverts commit cb96a2c69a.
It is not a mistake to pass args when -b is specified. They will simply
be passed on to the container's init.
The manpage needs fixing, that's true.
systemd-nspawn will now print the PID of the child.
An example showing how to enter the container is added
to the man page.
Support for nsenter without an explicit command was
added in https://github.com/karelzak/util-linux/commit/5758069
(post v2.22.2). So this example requires both a new kernel
and the latest util-linux.