From 44d565ed3615684c9e5e062a905c685699220b49 Mon Sep 17 00:00:00 2001 From: Lennart Poettering Date: Tue, 17 Apr 2018 18:43:42 +0200 Subject: [PATCH] doc: add introductory docs for portable services --- doc/PORTABLE_SERVICES.md | 251 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 251 insertions(+) create mode 100644 doc/PORTABLE_SERVICES.md diff --git a/doc/PORTABLE_SERVICES.md b/doc/PORTABLE_SERVICES.md new file mode 100644 index 0000000000..b989a4e549 --- /dev/null +++ b/doc/PORTABLE_SERVICES.md @@ -0,0 +1,251 @@ +# Portable Services Introduction + +This systemd version includes a preview of the "portable service" +concept. "Portable Services" are supposed to be an incremental improvement over +traditional system services, making two specific facets of container management +available to system services more readily. Specifically: + +1. The bundling of applications, i.e. packing up multiple services, their + binaries and all their dependencies in a single image, and running them + directly from it. + +2. Stricter default security policies, i.e. sand-boxing of applications. + +The primary tool for interfacing with "portable services" is the new +"portablectl" program. It's currently shipped in /usr/lib/systemd/portablectl +(i.e. not in the `$PATH`), since it's not yet considered part of the officially +supported systemd interfaces — it's a preview still after all. + +Portable services don't bring anything inherently new to the table. All they do +is put together known concepts in a slightly nicer way to cover a specific set +of use-cases in a nicer way. + +# So, what *is* a "Portable Service"? + +A portable service is ultimately just an OS tree, either inside of a directory +tree, or inside a raw disk image containing a Linux file system. This tree is +called the "image". It can be "attached" or "detached" from the system. When +"attached" specific systemd units from the image are made available on the host +system, then behaving pretty much exactly like locally installed system +services. When "detached" these units are removed again from the host, leaving +no artifacts around (except maybe messages they might have logged). + +The OS tree/image can be created with any tool of your choice. For example, you +can use `dnf --installroot=` if you like, or `debootstrap`, the image format is +entirely generic, and doesn't have to carry any specific metadata beyond what +distribution images carry anyway. Or to say this differently: the image format +doesn't define any new metadata as unit files and OS tree directories or disk +images are already sufficient, and pretty universally available these days. One +particularly nice tool for creating suitable images is +[mkosi](https://github.com/systemd/mkosi), but many other existing tools will +do too. + +If you so will, "Portable Services" are a nicer way to manage chroot() +environments, with better security, tooling and behavior. + +# Where's the difference to a "Container"? + +"Container" is a very vague term, after all it is used for +systemd-nspawn/LXC-type OS containers, for Docker/rkt-like micro service +containers, and even certain 'lightweight' VM runtimes. + +The "portable service" concept ultimately will not provide a fully isolated +environment to the payload, like containers mostly intend to. Instead they are +from the beginning more alike regular system services, can be controlled with +the same tools, are exposed the same way in all infrastructure and so on. Their +main difference is that the use a different root directory than the rest of the +system. Hence, the intention is not to run code in a different, isolated world +from the host — like most containers would do it —, but to run it in the same +world, but with stricter access controls on what the service can see and do. + +As one point of differentiation: as programs run as "portable services" are +pretty much regular system services, they won't run as PID 1 (like Docker would +do it), but as normal process. A corollary of that is that they aren't supposed +to manage anything in their own environment (such as the network) as the +execution environment is mostly shared with the rest of the system. + +The primary focus use-case of "portable services" is to extend the host system +with encapsulated extensions, but provide almost full integration with the rest +of the system, though possibly restricted by effective security knobs. This +focus includes system extensions otherwise sometimes called "super-privileged +containers". + +Note that portable services are only available for system services, not for +user services. i.e. the functionality cannot be used for the stuff +bubblewrap/flatpak is focusing on. + +# Mode of Operation + +If you have portable service image, maybe in a raw disk image called +`foobar_0.7.23.raw`, then attaching the services to the host is as easy as: + +``` +# /usr/lib/systemd/portablectl attach foobar_0.7.23.raw +``` + +This command does the following: + +1. It dissects the image, checks and validates the `/etc/os-release` data of + the image, and looks for all included unit files. + +2. It copies out all unit files with a suffix of `.service`, `.socket`, + `.target`, `.timer` and `.path`. whose name begins with the image's name + (with the .raw removed), truncated at the first underscore (if there is + one). This prefix name generated from the image name must be followed by a + ".", "-" or "@" character in the unit name. Or in other words, given the + image name of `foobar_0.7.23.raw` all unit files matching + `foobar-*.{service|socket|target|timer|path}`, + `foobar@.{service|socket|target|timer|path}` as well as + `foobar.*.{service|socket|target|timer|path}` and + `foobar.{service|socket|target|timer|path}` are copied out. These unit files + are placed in `/etc/systemd/system/` like regular unit files. Within the + images the unit files are looked for at the usual locations, i.e. in + `/usr/lib/systemd/system/` and `/etc/systemd/system/` and so on, relative to + the image's root. + +3. For each such unit file a drop-in file is created. Let's say + `foobar-waldo.service` was one of the unit files copied to + `/etc/systemd/system/`, then a drop-in file + `/etc/systemd/system/foobar-waldo.service.d/20-portable.conf` is created, + containing a few lines of additional configuration: + + ``` + [Service] + RootImage=/path/to/foobar.raw + Environment=PORTABLE=foobar + LogExtraFields=PORTABLE=foobar + ``` + +4. For each such unit a "profile" drop-in is linked in. This "profile" drop-in + generally contains security options that lock down the service. By default + the `default` profile is used, which provides a medium level of + security. There's also `trusted` which runs the service at the highest + privileges, i.e. host's root and everything. The `strict' profile comes with + the toughest security restrictions. Finally, `nonetwork` is like `default` + but without network access. Users may define their own profiles too (or + modify the existing ones) + +And that's already it. + +Note that the images need to stay around (and the same location) as long as the +portable service is attached. If an image is moved, the `RootImage=` line +written to the unit drop-in would point to an non-existing place, and break the +logic. + +The `portablectl detach` command executes the reverse operation: it looks for +the drop-ins and the unit files associated with the image, and removes them +again. + +Note that `portable attach` won't enable or start any of the units it copies +out. This still has to take place in a second, separate step. (That said We +might add options to do this automatically later on.). + +# Requirements on Images + +Note that portable services don't introduce any new image format, but most OS +images should just work the way they are. Specifically, the following +requirements are made for an image that can be attached/detached with +`portablectl`. + +1. It must contain a binary (and its dependencies) that shall be invoked, + including all its dependencies. If binary code, the code needs to be + compiled for an architecture compatible with the host. + +2. The image must either be a plain sub-directory (or btrfs subvolume) + containing the binaries and its dependencies in a classic Linux OS tree, or + must be a raw disk image either containing only one, naked file system, or + an image with a partition table understood by the Linux kernel with only a + single partition defined, or alternatively, a GPT partition table with a set + of properly marked partitions following the [Discoverable Partitions + Specification](https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/). + +3. The image must at least contain one matching unit file, with the right name + prefix and suffix (see above). The unit file is searched in the usual paths, + i.e. primarily /etc/systemd/system/ and /usr/lib/systemd/system/ within the + image. (The implementation will check a couple of other paths too, but it's + recommended to use these two paths.) + +4. The image must contain an os-release file, either in /etc/os-release or + /usr/lib/os-release. The file should follow the standard format. + +Note that generally images created by tools such as `debootstrap`, `dnf +--installroot=` or `mkosi` qualify for all of the above in one way or +another. If you wonder what the most minimal image would be that complies with +the requirements above, it could consist of this: + +``` +/usr/bin/minimald # a statically compiled binary +/usr/lib/systemd/minimal-test.service # the unit file for the service, with ExecStart=/usr/bin/minimald +/usr/lib/os-release # an os-release file explaining what this is +``` + +And that's it. + +Note that qualifying images do not have to contain an init system of their +own. If they do, it's fine, it will be ignored by the portable service logic, +but they generally don't have to, and it might make sense to avoid any, to keep +images minimal. + +Note that as no new image format or metadata is defined, it's very +straight-forward to define images than can be made use of it a number of +different ways. For example, by using `mkosi -b` you can trivially build a +single, unified image that: + +1. Can be attached as portable service, to run any container services natively + on the host. + +2. Can be run as OS container, using `systemd-nspawn`, by booting the image + with `systemd-nspawn -i -b`. + +3. Can be booted directly as VM image, using a generic VM executor such as + `virtualbox`/`qemu`/`kvm` + +4. Can be booted directly on bare-metal systems. + +Of course, to facilitate 2, 3 and 4 you need to include an init system in the +image. To facility 3 and 4 you also need to include a boot loader in the +image. As mentioned `mkosi -b` takes care of all of that for you, but any other +image generator should work too. + +# Execution Environment + +Note that the code in portable service images is run exactly like regular +services. Hence there's no new execution environment to consider. Oh, unlike +Docker would do it, as these are regular system services they aren't run as PID +1 either, but with regular PID values. + +# Access to host resources + +If services shipped with this mechanism shall be able to access host resources +(such as files or AF_UNIX sockets for IPC), use the normal `BindPaths=` and +`BindReadOnlyPaths=` settings in unit files to mount them in. In fact the +`default` profile mentioned above makes use of this to ensure +`/etc/resolv.conf`, the D-Bus system bus socket or write access to the logging +subsystem are available to the service. + +# Instantiation + +Sometimes it makes sense to instantiate the same set of services multiple +times. The portable service concept does not introduce a new logic for this. It +is recommended to use the regular unit templating of systemd for this, i.e. to +include template units such as `foobar@.service`, so that instantiation is as +simple as: + +``` +# /usr/lib/systemd/portablectl attach foobar_0.7.23.raw +# systemctl enable --now foobar@instancea.service +# systemctl enable --now foobar@instanceb.service +… +``` + +The benefit of this approach is that templating works exactly the same for +units shipped with the OS itself as for attached portable services. + +# Immutable images with local data + +It's a good idea to keep portable service images read-only during normal +operation. In fact all but the `trusted` profile will default to this kind of +behaviour, by setting the `ProtectSystem=strict` option. In this case writable +service data may be placed on the host file system. Use `StateDirectory=` in +the unit files to enable such behaviour and add a local data directory to the +services copied onto the host.