Systemd

Author	SHA1	Message	Date
Torsten Hilbrich	88fc9c9bad	systemd-nspawn: Allow setting ambient capability set The old code was only able to pass the value 0 for the inheritable and ambient capability set when a non-root user was specified. However, sometimes it is useful to run a program in its own container with a user specification and some capabilities set. This is needed when the capabilities cannot be provided by file capabilities (because the file system is mounted with MS_NOSUID for additional security). This commit introduces the option --ambient-capability and the config file option AmbientCapability=. Both are used in a similar way to the existing Capability= setting. It changes the inheritable and ambient set (which is 0 by default). The code also checks that the settings for the bounding set (as defined by Capability= and DropCapability=) and the setting for the ambient set (as defined by AmbientCapability=) are compatible. Otherwise, the operation would fail in any way. Due to the current use of -1 to indicate no support for ambient capability set the special value "all" cannot be supported. Also, the setting of ambient capability is restricted to running a single program in the container payload.	2020-12-07 19:56:59 +01:00
Yu Watanabe	db9ecf0501	license: LGPL-2.1+ -> LGPL-2.1-or-later	2020-11-09 13:23:58 +09:00
Lennart Poettering	3652872add	nspawn: add --set-credential= and --load-credential= Let's allow passing in creds to containers, so that PID 1 inside the container can pick them up.	2020-08-25 19:45:47 +02:00
Lennart Poettering	6b000af4f2	tree-wide: avoid some loaded terms https://tools.ietf.org/html/draft-knodel-terminology-02 https://lwn.net/Articles/823224/ This gets rid of most but not occasions of these loaded terms: 1. scsi_id and friends are something that is supposed to be removed from our tree (see #7594) 2. The test suite defines an API used by the ubuntu CI. We can remove this too later, but this needs to be done in sync with the ubuntu CI. 3. In some cases the terms are part of APIs we call or where we expose concepts the kernel names the way it names them. (In particular all remaining uses of the word "slave" in our codebase are like this, it's used by the POSIX PTY layer, by the network subsystem, the mount API and the block device subsystem). Getting rid of the term in these contexts would mean doing some major fixes of the kernel ABI first. Regarding the replacements: when whitelist/blacklist is used as noun we replace with with allow list/deny list, and when used as verb with allow-list/deny-list.	2020-06-25 09:00:19 +02:00
Tobias Hunger	129635333d	repart: Add UUID option to config files Add a option to provide a UUID for the partition that will get created and document that.	2020-05-25 15:48:59 +02:00
Lennart Poettering	86775e3524	nspawn: beef up --resolve-conf= modes Let's add flavours for copying stub/uplink resolv.conf versions. Let's add a more brutal "replace" mode, where we'll replace any existing destination file. Let's also change what "auto" means: instead of copying the static file, let's use the stub file, so that DNS search info is copied over. Fixes: #15340	2020-04-22 19:38:04 +02:00
Zbigniew Jędrzejewski-Szmek	0985c7c4e2	Rework cpu affinity parsing The CPU_SET_S api is pretty bad. In particular, it has a parameter for the size of the array, but operations which take two (CPU_EQUAL_S) or even three arrays (CPU_{AND,OR,XOR}_S) still take just one size. This means that all arrays must be of the same size, or buffer overruns will occur. This is exactly what our code would do, if it received an array of unexpected size over the network. ("Unexpected" here means anything different from what cpu_set_malloc() detects as the "right" size.) Let's rework this, and store the size in bytes of the allocated storage area. The code will now parse any number up to 8191, independently of what the current kernel supports. This matches the kernel maximum setting for any architecture, to make things more portable. Fixes #12605.	2019-05-29 10:20:42 +02:00
Zbigniew Jędrzejewski-Szmek	ca78ad1de9	headers: remove unneeded includes from util.h This means we need to include many more headers in various files that simply included util.h before, but it seems cleaner to do it this way.	2019-03-27 11:53:12 +01:00
Zbigniew Jędrzejewski-Szmek	b2645747b7	nspawn-oci: fix double free Also rename function to make it clear that it also frees the array object itself.	2019-03-22 17:39:12 +01:00
Lennart Poettering	de40a3037a	nspawn: add support for executing OCI runtime bundles with nspawn This is a pretty large patch, and adds support for OCI runtime bundles to nspawn. A new switch --oci-bundle= is added that takes a path to an OCI bundle. The JSON file included therein is read similar to a .nspawn settings files, however with a different feature set. Implementation-wise this mostly extends the pre-existing Settings object to carry additional properties for OCI. However, OCI supports some concepts .nspawn files did not support yet, which this patch also adds: 1. Support for "masking" files and directories. This functionatly is now also available via the new --inaccesible= cmdline command, and Inaccessible= in .nspawn files. 2. Support for mounting arbitrary file systems. (not exposed through nspawn cmdline nor .nspawn files, because probably not a good idea) 3. Ability to configure the console settings for a container. This functionality is now also available on the nspawn cmdline in the new --console= switch (not added to .nspawn for now, as it is something specific to the invocation really, not a property of the container) 4. Console width/height configuration. Not exposed through .nspawn/cmdline, but this may be controlled through $COLUMNS and $LINES like in most other UNIX tools. 5. UID/GID configuration by raw numbers. (not exposed in .nspawn and on the cmdline, since containers likely have different user tables, and the existing --user= switch appears to be the better option) 6. OCI hook commands (no exposed in .nspawn/cmdline, as very specific to OCI) 7. Creation of additional devices nodes in /dev. Most likely not a good idea, hence not exposed in .nspawn/cmdline. There's already --bind= to achieve the same, which is the better alternative. 8. Explicit syscall filters. This is not a good idea, due to the skewed arch support, hence not exposed through .nspawn/cmdline. 9. Configuration of some sysctls on a whitelist. Questionnable, not supported in .nspawn/cmdline for now. 10. Configuration of all 5 types of capabilities. Not a useful concept, since the kernel will reduce the caps on execve() anyway. Not exposed through .nspawn/cmdline as this is not very useful hence. Note that this only implements the OCI runtime logic itself. It does not provide a runc-compatible command line tool. This is left for a later PR. Only with that in place tools such as "buildah" can use the OCI support in nspawn as drop-in replacement. Currently still missing is OCI hook support, but it's already parsed and everything, and should be easy to add. Other than that it's OCI is implemented pretty comprehensively. There's a list of incompatibilities in the nspawn-oci.c file. In a later PR I'd like to convert this into proper markdown and add it to the documentation directory.	2019-03-15 15:41:28 +01:00
Yu Watanabe	e93672eeac	tree-wide: drop missing.h from headers and use relevant missing_*.h	2018-12-06 13:31:16 +01:00
Yu Watanabe	36dd5ffd5d	util: drop missing.h from util.h	2018-12-04 10:00:34 +01:00
Jiuyang liu	a2f577fca0	add ephemeral to nspawn-settings.	2018-10-24 10:22:20 +02:00
Lennart Poettering	0c69794138	tree-wide: remove Lennart's copyright lines These lines are generally out-of-date, incomplete and unnecessary. With SPDX and git repository much more accurate and fine grained information about licensing and authorship is available, hence let's drop the per-file copyright notice. Of course, removing copyright lines of others is problematic, hence this commit only removes my own lines and leaves all others untouched. It might be nicer if sooner or later those could go away too, making git the only and accurate source of authorship information.	2018-06-14 10:20:20 +02:00
Lennart Poettering	818bf54632	tree-wide: drop 'This file is part of systemd' blurb This part of the copyright blurb stems from the GPL use recommendations: https://www.gnu.org/licenses/gpl-howto.en.html The concept appears to originate in times where version control was per file, instead of per tree, and was a way to glue the files together. Ultimately, we nowadays don't live in that world anymore, and this information is entirely useless anyway, as people are very welcome to copy these files into any projects they like, and they shouldn't have to change bits that are part of our copyright header for that. hence, let's just get rid of this old cruft, and shorten our codebase a bit.	2018-06-14 10:20:20 +02:00
Lennart Poettering	f728ab1724	nspawn: let's rename _FORCE_ENUM_WIDTH → _SETTING_FORCE_ENUM_WIDTH Just some preparation in case we need a similar hack in another enum one day.	2018-05-22 16:21:26 +02:00
Lennart Poettering	1688841f46	nspawn: similar to the previous patches, also make /etc/localtime handling more configurable Fixes: #9009	2018-05-22 16:21:26 +02:00
Lennart Poettering	4e1d6aa983	nspawn: make --link-journal= configurable through .nspawn files, too	2018-05-22 16:20:08 +02:00
Lennart Poettering	09d423e921	nspawn: add greater control over how /etc/resolv.conf is handled Fixes: #8014 #1781	2018-05-22 16:19:26 +02:00
Lennart Poettering	8904ab86b0	Merge pull request #9062 from poettering/parse-conf-macro add new CONFIG_PARSER_PROTOTYPE() macro	2018-05-22 16:14:49 +02:00
Lennart Poettering	a210692525	tree-wide: port over all code to the new CONFIG_PARSER_PROTOTYPE() macro This makes most header files easier to look at. Also Emacs gets really slow when browsing through large sections of overly long prototypes, which is much improved by this macro. We should probably not do something similar with too many other cases, as macros like this might help readability for some, but make it worse for others. But I think given the complexity of this specific prototype and how often we use it, it's worth doing.	2018-05-22 13:18:44 +02:00
Zbigniew Jędrzejewski-Szmek	b49c6ca089	systemd-nspawn: make SettingsMask 64 bit wide The use of UINT64_C() in the SettingsMask enum definition is misleading: it does not mean that individual fields have this width. E.g., with enum { FOO = UINT64_C(1) } sizeof(FOO) gives 4. It only means that the shift is done properly. So 1 << 35 is undefined, but UINT64_C(1) << 35 is the expected 64 bit constant. Thus, the use UINT64_C() is useful, because we know that the shifts are done properly, no matter what the value of _RLIMIT_MAX is, but when those fields are used in expressions, we don't know what size they will be (probably 4). Let's add a define which "hides" the enum definition behind a define which gives the same value but is actually 64 bit. I think this is a nicer solution than requiring all users to cast SETTING_RLIMIT_FIRST before use. Fixes #9035.	2018-05-22 10:51:49 +02:00
Lennart Poettering	d107bb7d63	nspawn: add a new --cpu-affinity= switch Similar as the other options added before, this is primarily useful to provide comprehensive OCI runtime compatbility, but might be useful otherwise, too.	2018-05-17 20:48:54 +02:00
Lennart Poettering	81f345dfed	nspawn: add a new --oom-score-adjust= command line switch This is primarily useful in order to provide comprehensive OCI runtime compatibility with nspawn, but might have uses outside of it.	2018-05-17 20:48:12 +02:00
Lennart Poettering	66edd96310	nspawn: add a new --no-new-privileges= cmdline option to nspawn This simply controls the PR_SET_NO_NEW_PRIVS flag for the container. This too is primarily relevant to provide OCI runtime compaitiblity, but might have other uses too, in particular as it nicely complements the existing --capability= and --drop-capability= flags.	2018-05-17 20:47:20 +02:00
Lennart Poettering	3a9530e5f1	nspawn: make the hostname of the container explicitly configurable with a new --hostname= switch Previously, the container's hostname was exclusively initialized from the machine name configured with --machine=, i.e. the internal name and the external name used for and by the container was synchronized. This adds a new option --hostname= that optionally allows the internal name to deviate from the external name. This new option is mainly useful to ultimately implement the OCI runtime spec directly in nspawn, but it might be useful on its own for some other usecases too.	2018-05-17 20:46:45 +02:00
Lennart Poettering	bf428efb07	nspawn: add new --rlimit= switch, and always set resource limits explicitly for our container payloads This ensures we set the various resource limits of our container explicitly on each invocation so that we inherit less from our callers into the payload. By default resource limits are now set to the same values Linux generally passes to the host PID 1, thus minimizing needless differences between host and container environments. The limits are now also configurable using a new --rlimit= switch. This is preparation for teaching nspawn native OCI runtime support as OCI permits setting resource limits for container payloads, and it hence probably makes sense if we do too.	2018-05-17 20:45:54 +02:00
Lennart Poettering	88614c8a28	nspawn: size_t more stuff A follow-up for #8840	2018-05-03 17:19:46 +02:00
Zbigniew Jędrzejewski-Szmek	11a1589223	tree-wide: drop license boilerplate Files which are installed as-is (any .service and other unit files, .conf files, .policy files, etc), are left as is. My assumption is that SPDX identifiers are not yet that well known, so it's better to retain the extended header to avoid any doubt. I also kept any copyright lines. We can probably remove them, but it'd nice to obtain explicit acks from all involved authors before doing that.	2018-04-06 18:58:55 +02:00
Lennart Poettering	dccca82b1a	log: minimize includes in log.h log.h really should only include the bare minimum of other headers, as it is really pulled into pretty much everything else and already in itself one of the most basic pieces of code we have. Let's hence drop inclusion of: 1. sd-id128.h because it's entirely unneeded in current log.h 2. errno.h, dito. 3. sys/signalfd.h which we can replace by a simple struct forward declaration 4. process-util.h which was needed for getpid_cached() which we now hide in a funciton log_emergency_level() instead, which nicely abstracts the details away. 5. sys/socket.h which was needed for struct iovec, but a simple struct forward declaration suffices for that too. Ultimately this actually makes our source tree larger (since users of the functionality above must now include it themselves, log.h won't do that for them), but I think it helps to untangle our web of includes a tiny bit. (Background: I'd like to isolate the generic bits of src/basic/ enough so that we can do a git submodule import into casync for it)	2018-01-11 14:44:31 +01:00
Zbigniew Jędrzejewski-Szmek	53e1b68390	Add SPDX license identifiers to source files under the LGPL This follows what the kernel is doing, c.f. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.	2017-11-19 19:08:15 +01:00
Lennart Poettering	960e4569e1	nspawn: implement configurable syscall whitelisting/blacklisting Now that we have ported nspawn's seccomp code to the generic code in seccomp-util, let's extend it to support whitelisting and blacklisting of specific additional syscalls. This uses similar syntax as PID1's support for system call filtering, but in contrast to that always implements a blacklist (and not a whitelist), as we prepopulate the filter with a blacklist, and the unit's system call filter logic does not come with anything prepopulated. (Later on we might actually want to invert the logic here, and whitelist rather than blacklist things, but at this point let's not do that. In case we switch this over later, the syscall add/remove logic of this commit should be compatible conceptually.) Fixes: #5163 Replaces: #5944	2017-09-12 14:06:21 +02:00
Philip Withnall	b53ede699c	nspawn: Add support for sysroot pivoting (#5258 ) Add a new --pivot-root argument to systemd-nspawn, which specifies a directory to pivot to / inside the container; while the original / is pivoted to another specified directory (if provided). This adds support for booting container images which may contain several bootable sysroots, as is common with OSTree disk images. When these disk images are booted on real hardware, ostree-prepare-root is run in conjunction with sysroot.mount in the initramfs to achieve the same results.	2017-02-08 16:54:31 +01:00
Mike Gilbert	c9f7b4d356	build-sys: add check for gperf lookup function signature (#5055 ) gperf-3.1 generates lookup functions that take a size_t length parameter instead of unsigned int. Test for this at configure time. Fixes: https://github.com/systemd/systemd/issues/5039	2017-01-10 08:39:05 +01:00
Lennart Poettering	7b4318b6a5	nspawn: add ability to configure overlay mounts to .nspawn files Fixes: #4634	2016-12-01 12:41:17 +01:00
Alessandro Puccetti	9c1e04d0fa	nspawn: introduce --notify-ready=[no\|yes] (#3474 ) This the patch implements a notificaiton mechanism from the init process in the container to systemd-nspawn. The switch --notify-ready=yes configures systemd-nspawn to wait the "READY=1" message from the init process in the container to send its own to systemd. --notify-ready=no is equivalent to the previous behavior before this patch, systemd-nspawn notifies systemd with a "READY=1" message when the container is created. This notificaiton mechanism uses socket file with path relative to the contanier "/run/systemd/nspawn/notify". The default values it --notify-ready=no. It is also possible to configure this mechanism from the .nspawn files using NotifyReady. This parameter takes the same options of the command line switch. Before this patch, systemd-nspawn notifies "ready" after the inner child was created, regardless the status of the service running inside it. Now, with --notify-ready=yes, systemd-nspawn notifies when the service is ready. This is really useful when there are dependencies between different contaniers. Fixes https://github.com/systemd/systemd/issues/1369 Based on the work from https://github.com/systemd/systemd/pull/3022 Testing: Boot a OS inside a container with systemd-nspawn. Note: modify the commands accordingly with your filesystem. 1. Create a filesystem where you can boot an OS. 2. sudo systemd-nspawn -D ${HOME}/distros/fedora-23/ sh 2.1. Create the unit file /etc/systemd/system/sleep.service inside the container (You can use the example below) 2.2. systemdctl enable sleep 2.3 exit 3. sudo systemd-run --service-type=notify --unit=notify-test ${HOME}/systemd/systemd-nspawn --notify-ready=yes -D ${HOME}/distros/fedora-23/ -b 4. In a different shell run "systemctl status notify-test" When using --notify-ready=yes the service status is "activating" for 20 seconds before being set to "active (running)". Instead, using --notify-ready=no the service status is marked "active (running)" quickly, without waiting for the 20 seconds. This patch was also test with --private-users=yes, you can test it just adding it at the end of the command at point 3. ------ sleep.service ------ [Unit] Description=sleep After=network.target [Service] Type=oneshot ExecStart=/bin/sleep 20 [Install] WantedBy=multi-user.target ------------ end ------------	2016-06-10 13:09:06 +02:00
Lennart Poettering	22b28dfdc7	nspawn: add new --network-zone= switch for automatically managed bridge devices This adds a new concept of network "zones", which are little more than bridge devices that are automatically managed by nspawn: when the first container referencing a bridge is started, the bridge device is created, when the last container referencing it is removed the bridge device is removed again. Besides this logic --network-zone= is pretty much identical to --network-bridge=. The usecase for this is to make it easy to run multiple related containers (think MySQL in one and Apache in another) in a common, named virtual Ethernet broadcast zone, that only exists as long as one of them is running, and fully automatically managed otherwise.	2016-05-09 15:45:31 +02:00
Lennart Poettering	0de7accea9	nspawn: allow configuration of user namespaces in .nspawn files In order to implement this we change the bool arg_userns into an enum UserNamespaceMode, which can take one of NO, PICK or FIXED, and replace the arg_uid_range_pick bool with it.	2016-04-25 12:16:02 +02:00
Daniel Mack	b26fa1a2fb	tree-wide: remove Emacs lines from all files This should be handled fine now by .dir-locals.el, so need to carry that stuff in every file.	2016-02-10 13:41:57 +01:00
Lennart Poettering	7732f92bad	nspawn: optionally run a stub init process as PID 1 This adds a new switch --as-pid2, which allows running commands as PID 2, while a stub init process is run as PID 1. This is useful in order to run arbitrary commands in a container, as PID1's semantics are different from all other processes regarding reaping of unknown children or signal handling.	2016-02-03 23:58:24 +01:00
Lennart Poettering	5f932eb9af	nspawn: add new --chdir= switch Fixes: #2192	2016-02-03 23:58:24 +01:00
Thomas Hindoe Paaboel Andersen	71d35b6b55	tree-wide: sort includes in *.h This is a continuation of the previous include sort patch, which only sorted for .c files.	2015-11-18 23:09:02 +01:00
Lennart Poettering	f6d6bad146	nspawn: add new --network-veth-extra= switch for defining additional veth links The new switch operates like --network-veth, but may be specified multiple times (to define multiple link pairs) and allows flexible definition of the interface names. This is an independent reimplementation of #1678, but defines different semantics, keeping the behaviour completely independent of --network-veth. It also comes will full hook-up for .nspawn files, and the matching documentation.	2015-11-12 22:04:49 +01:00
Lennart Poettering	0e2656744f	nspawn: rework how we determine private networking settings Make sure we acquire CAP_NET_ADMIN if we require virtual networking. Make sure we imply virtual ethernet correctly when bridge is request. Fixes: #1511 Fixes: #1554 Fixes: #1590	2015-10-22 01:59:25 +02:00
Lennart Poettering	2b5c04d59c	nspawn: remove nspawn.h, it's empty now	2015-09-07 18:47:34 +02:00
Lennart Poettering	7a8f63251d	nspawn: split all port exposure code into nspawn-expose-port.[ch]	2015-09-07 18:44:30 +02:00
Lennart Poettering	e83bebeff7	nspawn: split out mount related functions into a new nspawn-mount.c file	2015-09-07 18:44:30 +02:00
Lennart Poettering	f757855e81	nspawn: add new .nspawn files for container settings .nspawn fiels are simple settings files that may accompany container images and directories and contain settings otherwise passed on the nspawn command line. This provides an efficient way to attach execution data directly to containers.	2015-09-06 01:49:06 +02:00

48 commits