Commit graph

27718 commits

Author SHA1 Message Date
Sergiusz Urbaniak 4f086aab52
nspawn: R/W support for /sys, and /proc/sys
This commit adds the possibility to leave /sys, and /proc/sys read-write.
It introduces a new (undocumented) env var SYSTEMD_NSPAWN_API_VFS_WRITABLE
to enable this feature.

If set to "yes", /sys, and /proc/sys will be read-write.
If set to "no", /sys, and /proc/sys will be read-only.
If set to "network" /proc/sys/net will be read-write. This is useful in
use-cases, where systemd-nspawn is used in an external network
namespace.

This adds the possibility to start privileged containers which need more
control over settings in the /proc, and /sys filesystem.

This is also a follow-up on the discussion from
https://github.com/systemd/systemd/pull/4018#r76971862 where an
introduction of a simple env var to enable R/W support for those
directories was already discussed.
2016-11-18 09:50:40 +01:00
Zbigniew Jędrzejewski-Szmek 5c7119f43e test-ipcrm: skip test if nfsnobody is missing 2016-11-17 20:57:22 -05:00
Zbigniew Jędrzejewski-Szmek 041b5ae170 basic/process-util: we need to take the shorter of two strings
==30496== Conditional jump or move depends on uninitialised value(s)
==30496==    at 0x489F654: memcmp (vg_replace_strmem.c:1091)
==30496==    by 0x49BF203: getenv_for_pid (process-util.c:678)
==30496==    by 0x4993ACB: detect_container (virt.c:442)
==30496==    by 0x182DFF: test_get_process_comm (test-process-util.c:98)
==30496==    by 0x185847: main (test-process-util.c:368)
==30496==
2016-11-17 20:57:22 -05:00
Zbigniew Jędrzejewski-Szmek 347ebd0297 test-process-util: bind mount fails under selinux, skip test 2016-11-17 20:57:22 -05:00
Zbigniew Jędrzejewski-Szmek 9a4550e258 Merge pull request #4671 from poettering/namespace-bind
rework service namespace handling a bit
2016-11-17 19:40:57 -05:00
Zbigniew Jędrzejewski-Szmek a1e45b8ba3 basic/env-uil: fix assertion failure in strv_env_replace (#4688)
free_and_replace sets the setcond argument to NULL (it's designed
to be used with _clenaup_ macros), and we don't want that here.

Fixes #4684.
2016-11-17 22:25:01 +01:00
Franck Bui 539622bd8c core: in confirm spawn, suggest 'f' when user selects 'n' choice 2016-11-17 18:23:32 +01:00
Franck Bui c891efaf8a core: confirm_spawn: always accept units with same_pgrp set for now
For some reasons units remaining in the same process group as PID 1
(same_pgrp=true) fail to acquire the console even if it's not taken by anyone.

So always accept for units with same_pgrp set for now.
2016-11-17 18:16:51 +01:00
Franck Bui 63d77c9254 core: include the unit name when notifying that a confirmation question timed out 2016-11-17 18:16:51 +01:00
Franck Bui b0eb29449e core: add 'c' in confirmation_spawn to resume the boot process 2016-11-17 18:16:50 +01:00
Franck Bui 56fde33af1 core: add 'j' in confirmation_spawn to list the jobs that are in progress 2016-11-17 18:16:50 +01:00
Franck Bui dd6f9ac0d0 core: add 'D' in confirmat spawn to show a full dump of the unit to spawn 2016-11-17 18:16:50 +01:00
Franck Bui eedf223a30 core: add 'i' in confirm spawn to give a short summary of the unit to spawn 2016-11-17 18:16:50 +01:00
Franck Bui d172b175f6 core: rework the confirmation spawn prompt
Previously it was "[Yes, Fail, Skip]" which is pretty misleading because it
suggests that the whole word needs to be entered instead of a single char.

Also this won't fit well when we'll extend the number of choices.

This patch addresses this by changing the choice hint with "[y, f, s – h for help]"
so it's now clear that a single letter has to be entered.

It also introduces a new choice 'h' which describes all possible choices since
a single letter can be not descriptive enough for new users.

It also allow to stick with the same hint string regardless of how
many choices we will support.
2016-11-17 18:16:50 +01:00
Franck Bui 2bcd3c26fe core: limit the length of the confirmation question
When "confirmation_spawn=1", the confirmation question can look like:

  Execute /usr/bin/kmod static-nodes --format=tmpfiles --output=/run/tmpfiles.d/kmod.conf? [Yes, No, Skip]

which is pretty verbose and might not fit in the console width size (which is
usually 80 chars) and thus question will be splitted into 2 consecutive lines.

However since the question is now refreshed every 2 secs, the reprinted
question will overwrite the second line of the previous one...

To prevent this, this patch makes sure that the command line won't be longer
than 60 chars by ellipsizing it if the command is longer:

  Execute /usr/bin/kmod static-nodes --format=tmpfiles --output=/ru…nf? [Yes, No, View, Skip]

A following patch will introduce a new choice that will allow the user to get
details on the command to be executed so it will still be possible to see the
full command line.
2016-11-17 18:16:50 +01:00
Franck Bui 3c670f8998 core: reprint the question every 2 sec in ask_char()
ask_char() now reprints the question every 2sec automatically.

It prefixes its output with '\r' to to bring the cursor to the
beginning of the terminal line, and then print the message, redoing it
every 2sec.

As long as nothing interferes with out output this logic will have no
visible effect as we constantly overprint the visible text with the
exact same text.

However, if something is dumped in the middle, then our question won't
get lost, as we'll ask soon again.

This is useful if the question is asked to a terminal that is also
used to dump some other status messages/logs. For example when
confirmation messages are enabled during the boot
(systemd.confirm_spawn=1), the question can easily be lost if the
kernel logs are also enabled and both use the same console.

Idea suggested by Lennart Poettering.
2016-11-17 18:16:49 +01:00
Franck Bui 2bcc330942 core: in confirm_spawn, the meaning of 'n' and 's' choices are confusing
Before this patch we had:

 - "no" which gives "failing execution" but the command is actually assumed as
   succeed.

 - "skip" which gives "skipping", but the command is assumed to have failed,
   which ends up with "Failed to start ..." on the console.

Now we have:

 - "fail" which gives "failing execution" and the command is indeed assumed as
   failed.

 - "skip" which gives "skipping execution" and the command is assumed as
   succeed.
2016-11-17 18:16:49 +01:00
Franck Bui 3b20f877ad core: rework ask_for_confirmation()
Now the reponses are handled by ask_for_confirmation() as well as the report of
any errors occuring during the process of retrieving the confirmation response.

One benefit of this is that there's no need to open/close the console one more
time when reporting error/status messages.

The caller now just needs to care about the return values whose meanings are:

 - don't execute and pretend that the command failed
 - don't execute and pretend that the command succeeed
 - positive answer, execute the command

Also some slight code reorganization and introduce write_confirm_error() and
write_confirm_error_fd(). write_confim_message becomes unneeded.
2016-11-17 18:16:49 +01:00
Franck Bui 7d5ceb6416 core: allow to redirect confirmation messages to a different console
It's rather hard to parse the confirmation messages (enabled with
systemd.confirm_spawn=true) amongst the status messages and the kernel
ones (if enabled).

This patch gives the possibility to the user to redirect the confirmation
message to a different virtual console, either by giving its name or its path,
so those messages are separated from the other ones and easier to read.
2016-11-17 18:16:16 +01:00
Franck Bui 42bf1ae17b core: prevent the cylon when confirmation_spawn=yes (#2194)
When booting with systemd.confirm_spawn=true, the eye of cylon
animation kicks in pretty quickly so user doesn't have any chance to
answer the questions which services to start before the confirmation
message is screwed by the cylon.

This basically breaks the confirm_spawn functionality completely.

This patch prevents the cylon animation to kick in when
confirmation_spawn=yes.

Fixes: #2194
2016-11-17 18:11:21 +01:00
Lennart Poettering 0c426957d8 update TODO 2016-11-17 18:10:30 +01:00
Lennart Poettering aa70f38b5c namespace: clarify that /proc/apm is obsolete, but leave it blocked 2016-11-17 18:10:30 +01:00
Lennart Poettering c6232fb0e9 namespace: reindent namespace tables
Let's align all our BindMount tables, let's use the same column widths in all
of them, and let's make them not any wider than necessary.

This only changes whitespace, not contents of any of the tables.
2016-11-17 18:09:16 +01:00
Lennart Poettering 5327c910d2 namespace: simplify, optimize and extend handling of mounts for namespace
This changes a couple of things in the namespace handling:

It merges the BindMount and TargetMount structures. They are mostly the same,
hence let's just use the same structue, and rely on C's implicit zero
initialization of partially initialized structures for the unneeded fields.

This reworks memory management of each entry a bit. It now contains one "const"
and one "malloc" path. We use the former whenever we can, but use the latter
when we have to, which is the case when we have to chase symlinks or prefix a
root directory. This means in the common case we don't actually need to
allocate any dynamic memory. To make this easy to use we add an accessor
function bind_mount_path() which retrieves the right path string from a
BindMount structure.

While we are at it, also permit "+" as prefix for dirs configured with
ReadOnlyPaths= and friends: if specified the root directory of the unit is
implicited prefixed.

This also drops set_bind_mount() and uses C99 structure initialization instead,
which I think is more readable and clarifies what is being done.

This drops append_protect_kernel_tunables() and
append_protect_kernel_modules() as append_static_mounts() is now simple enough
to be called directly.

Prefixing with the root dir is now done in an explicit step in
prefix_where_needed(). It will prepend the root directory on each entry that
doesn't have it prefixed yet. The latter is determined depending on an extra
bit in the BindMount structure.
2016-11-17 18:08:32 +01:00
Franck Bui f80da6f3e9 core: monitor the inotify file descriptor not the console one in acquire_terminal()
When waiting for the terminal to be release in acquire_terminal(), we
were monitoring the terminal fd instead of the inotify descriptor.

Therefore any write accesses would wake up the waiting process instead
of being wake up when the tty is closed only.
2016-11-17 09:22:45 +01:00
Martin Pitt 375fd1559b Merge pull request #4681 from keszybz/shortening
Shortening
2016-11-17 08:20:01 +01:00
Zbigniew Jędrzejewski-Szmek 4a58145f0f Merge pull request #4678 from poettering/gc-device
Automatically GC device jobs when there's no need to keep them in the job queue anymore.
Implement systemctl list-jobs --before/--after.
Allow systemd-run -p After/Before/Wants/Requires= ...
2016-11-16 21:16:13 -05:00
Zbigniew Jędrzejewski-Szmek 76d8ca2229 core/dbus-job, systemctl: shorten some code 2016-11-16 21:01:11 -05:00
Zbigniew Jędrzejewski-Szmek bc357ce5d7 systemctl: shorter list-jobs --before/--after output a bit
(before)$ systemctl list-jobs  --before --after
 JOB UNIT                                          TYPE  STATE
8769 foobar.device                                 start running
        A job waits for this job: 8669 (run-rb6da596d0cfa4e36b7c594cd973e795a.service/start)
8669 run-rb6da596d0cfa4e36b7c594cd973e795a.service start waiting
        This job waits for a job: 8769 (foobar.device/start)

2 jobs listed.

(after)$ systemctl list-jobs  --before --after
 JOB UNIT                                          TYPE  STATE
8769 foobar.device                                 start running
        waiting for job 8669 (run-rb6da596d0cfa4e36b7c594cd973e795a.service/start)
8669 run-rb6da596d0cfa4e36b7c594cd973e795a.service start waiting
        blocking job 8769 (foobar.device/start)

2 jobs listed.
2016-11-16 20:51:47 -05:00
Francesco Brozzu c4027307a2 hwdb: fix airplane mode trigger when switching from laptop to desktop on HP Pavilion x360 13 (#4680) 2016-11-16 22:57:20 +01:00
Lennart Poettering 7d992a6ede update TODO 2016-11-16 17:01:46 +01:00
Lennart Poettering 82948f6c8e systemctl: show waiting jobs when "systemctl list-jobs --after/--before" is called
Let's expose the new bus functions we added in the previous commit in
systemctl.
2016-11-16 17:01:46 +01:00
Lennart Poettering 15ea79f85c core: add bus calls for determining jobs waiting for other jobs
This should make it easier to debug job deadlocks.
2016-11-16 17:01:46 +01:00
Lennart Poettering 289188ca7d system-run: add support for configuring unit dependencies with --property=
Support on the server side has already been in place for quite some time, let's
also add support on the client side for this.
2016-11-16 15:03:26 +01:00
Lennart Poettering d4015567ad systemctl: add env var to force connection to system manager via the bus
Sometimes it is useful for debugging purposes to force systemctl to connect to
PID 1 via the bus instead of direct connection, even if the direct connection
is possible.
2016-11-16 15:03:26 +01:00
Lennart Poettering c5a97ed132 core: GC redundant device jobs from the run queue
In contrast to all other unit types device units when queued just track
external state, they cannot effect state changes on their own. Hence unless a
client or other job waits for them there's no reason to keep them in the job
queue. This adds a concept of GC'ing jobs of this type as soon as no client or
other job waits for them anymore.

To ensure this works correctly we need to track which clients actually
reference a job (i.e. which ones enqueued it). Unfortunately that's pretty
nasty to do for direct connections, as sd_bus_track doesn't work for
them. For now, work around this, by simply remembering in a boolean that a job
was requested by a direct connection, and reset it when we notice the direct
connection is gone. This means the GC logic works fine, except that jobs are
not immediately removed when direct connections disconnect.

In the longer term, a rework of the bus logic should fix this properly. For now
this should be good enough, as GC works for fine all cases except this one, and
thus is a clear improvement over the previous behaviour.

Fixes: #1921
2016-11-16 15:03:26 +01:00
Lennart Poettering 1a465207ab core: rename "clients" field of Job structure to "bus_track"
Let's make semantics of this field more similar to the same functionality in
the Unit object, in particular as we add new functionality to it later on.
2016-11-16 15:03:26 +01:00
Lennart Poettering a2d72e265a core: drop n_in_gc_queue field of Manager structure
We count the units in the GC queue with this, but actually never make use of
it, hence drop it.
2016-11-16 15:03:26 +01:00
Lennart Poettering 0a23a62729 core: a few small coding style/modernization updates for job.c 2016-11-16 15:03:26 +01:00
Lennart Poettering 984794baf4 shared: split out code for adding multiple names to sd_bus_track object
Let's introduce a new call bus_track_add_name_many() that adds a string list to
a tracking object.
2016-11-16 15:03:26 +01:00
Djalal Harouni afc402b76a Merge pull request #4658 from endocode/djalal/sandbox-various-fixes-v1
core: improve the logic that implies no new privileges and documentation fixes
2016-11-15 20:45:27 +01:00
Evgeny Vereshchagin 22f1f8f24c tests: add UNIFIED_CGROUP_HIERARCHY=[default|hybrid] (#4675)
This will simplify testing a bit.
Mainly for https://github.com/systemd/systemd/pull/4670
2016-11-15 17:38:04 +01:00
Djalal Harouni 73186d534b bus-util: print RestrictNamespaces= as a string
Allow all callers that want to print RestrictNamespaces= returned from D-Bus
as a string instead of a u64 value.
2016-11-15 15:52:42 +01:00
Djalal Harouni 97e60383c0 test: add tests for RestrictNamespaces= 2016-11-15 15:50:19 +01:00
Djalal Harouni d6299d613f core:gperf: pass the exec_context struct directly to parse restrict namespaces
The RestrictNamespaces= takes yes, no or a list of namespaces types,
therefor config_parse_restrict_namespaces() is a bit complex and it
operates on the ExecContext, fix this by passing the offset of
ExecContext directly otherwise restricting namespaces won't work.
2016-11-15 15:04:43 +01:00
Djalal Harouni 8526555680 doc: move ProtectKernelModules= documentation near ProtectKernelTunalbes= 2016-11-15 15:04:41 +01:00
Djalal Harouni 6a8c2d5915 core: property is RestrictNamespaces with s 2016-11-15 15:04:38 +01:00
Djalal Harouni a7db8614f3 doc: note when no new privileges is implied 2016-11-15 15:04:35 +01:00
Djalal Harouni c92e8afebd core: improve the logic that implies no new privileges
The no_new_privileged_set variable is not used any more since commit
9b232d3241 that fixed another thing. So remove it. Also no
need to check if we are under user manager, remove that part too.
2016-11-15 15:04:31 +01:00
David Herrmann 46b6025a88 Merge pull request #4665 from teg/networkd-split-1
networkd: split sources into subdirectories
2016-11-14 12:08:38 +01:00