Commit graph

239 commits

Author SHA1 Message Date
Daniel Mack 906c06f64a cgroup, unit, fragment parser: make use of new firewall functions 2017-09-22 15:24:55 +02:00
Lennart Poettering 18f573aaf9 core: make sure to dump cgroup context when unit_dump() is called for all unit types
For some reason we didn't dump the cgroup context for a number of unit
types, including service units. Not sure how this wasn't noticed
before... Add this in.
2017-09-22 15:24:54 +02:00
Lennart Poettering 1703fa41a7 core: rename EXEC_APPLY_PERMISSIONS → EXEC_APPLY_SANDBOXING
"Permissions" was a bit of a misnomer, as it suggests that UNIX file
permission bits are adjusted, which aren't really changed here. Instead,
this is about UNIX credentials such as users or groups, as well as
namespacing, hence let's use a more generic term here, without any
misleading reference to UNIX file permissions: "sandboxing", which shall
refer to all kinds of sandboxing technologies, including UID/GID
dropping, selinux relabelling, namespacing, seccomp, and so on.
2017-08-10 15:02:50 +02:00
Lennart Poettering f0d477979e core: introduce unit_set_exec_params()
The new unit_set_exec_params() call is to units what
manager_set_exec_params() is to the manager object: it initializes the
various fields from the relevant generic properties set.
2017-08-10 15:02:50 +02:00
Lennart Poettering 19bbdd985e core: manager_set_exec_params() cannot fail, hence make it void
Let's simplify things a bit.
2017-08-10 15:02:50 +02:00
Lennart Poettering 584b8688d1 execute: also fold the cgroup delegate bit into ExecFlags 2017-08-10 15:02:50 +02:00
Abdó Roig-Maranges 1df96fcb31 core: Do not fail perpetual mount units without fragment (#6459)
mount_load does not require fragment files to be present in order to
load mount units which are perpetual, or come from /proc/self/mountinfo.

mount_verify should do the same, otherwise a synthesized '-.mount' would
be marked as failed with "No such file or directory", as it is perpetual
but not marked to come from /proc/self/mountinfo at this point.

This happens for the user instance, and I suspect it was the cause of #5375
for the system instance, without gpt-generator.
2017-07-31 12:32:09 +02:00
Yu Watanabe 3536f49e8f core: add {State,Cache,Log,Configuration}Directory= (#6384)
This introduces {State,Cache,Log,Configuration}Directory= those are
similar to RuntimeDirectory=. They create the directories under
/var/lib, /var/cache/, /var/log, or /etc, respectively, with the mode
specified in {State,Cache,Log,Configuration}DirectoryMode=.

This also fixes #6391.
2017-07-18 14:34:52 +02:00
NeilBrown 83897d5470 core/mount: pass "-c" flag to /bin/umount (#6093)
"-c", which is short for "--no-canonicalize", tells /bin/umount
that the path name is canonical (no .. or symlinks etc).

systemd always uses a canonical name, so this flag is appropriate
for systemd to use.
Knowing that the path is canonical allows umount to avoid
some calls to lstat() on the path.

From v2.30 "-c" goes further and causes umount to avoid all
attempts to 'lstat()' (or similar) the path.  This is important
when automatically unmounting a filesystem, as lstat() can
hang indefinitely in some cases such as when an NFS server
is not accessible.

"-c" has been supported since util-linux 2.17 which is before the
earliest version supported by systemd.
So "-c" is safe to use now, and once util-linux v2.30 is in use,
it will allow mounts from non-responsive NFS servers to be
unmounted.
2017-06-07 15:28:23 +03:00
Zbigniew Jędrzejewski-Szmek ce954c0319 core/mount: remove repeated word 2017-02-02 11:18:34 -05:00
Yu Watanabe cfcd431890 core: add missing unit_add_to_load_queue() to mount_setup_new_unit()
unit_add_to_load_queue was present in the code before 03b8cfede9,
and was inadvertently dropped.

Fixes #5105
2017-01-23 14:06:43 +09:00
Yu Watanabe a51ee72d2e core: minor error handling fix in mount_setup_new_unit()
The function mount_setup_new_unit() should return -ENOMEM
if at least one of `strdup` calls are failed.
2017-01-23 13:59:21 +09:00
Franck Bui 03b8cfede9 core: make sure to init mount params before calling mount_is_extrinsic() (#5087)
When a new entry appears in /proc/self/mountinfo, mount_setup_unit()
allocated a new mount unit for it and starts initializing it.

mount_setup_unit() is also used to update a mount unit when a change happens in
/proc/self/mountinfo, for example a mountpoint can be remounted with additional
mount options.

This patch introduces 2 separate functions to deal with those 2 cases instead
of mount_setup_unit() dealing with both of them. The common code is small and
doing the split makes the code easier to read and less error prone if extended
later.

It also makes sure to initialize in both functions the mount parameters of the
mount unit before calling mount_is_extrinsic() since this function relies on
them.

Fixes: #4902
2017-01-16 15:19:13 -05:00
Franck Bui ebc8968bc0 core: make mount units from /proc/self/mountinfo possibly bind to a device (#4515)
Since commit 9d06297, mount units from mountinfo are not bound to their devices
anymore (they use the "Requires" dependency instead).

This has the following drawback: if a media is mounted and the eject button is
pressed then the media is unconditionally ejected leaving some inconsistent
states.

Since udev is the component that is reacting (no matter if the device is used
or not) to the eject button, users expect that udev at least try to unmount the
media properly.

This patch introduces a new property "SYSTEMD_MOUNT_DEVICE_BOUND". When set on
a block device, all units that requires this device will see their "Requires"
dependency upgraded to a "BindTo" one. This is currently only used by cdrom
devices.

This patch also gives the possibility to the user to restore the previous
behavior that is bind a mount unit to a device. This is achieved by passing the
"x-systemd.device-bound" option to mount(8). Please note that currently this is
not working because libmount treats the x-* options has comments therefore
they're not available in utab for later application retrievals.
2016-12-16 17:13:58 +01:00
Lennart Poettering ad2706db7c core: rework logic to determine when we decide to add automatic deps for mounts
This adds a concept of "extrinsic" mounts. If mounts are extrinsic we consider
them managed by something else and do not add automatic ordering against
umount.target, local-fs.target, remote-fs.target.

Extrinsic mounts are considered:

- All mounts if we are running in --user mode

- API mounts such as everything below /proc, /sys, /dev, which exist from
  earliest boot to latest shutdown.

- All mounts marked as initrd mounts, if we run on the host

- The initrd's private directory /run/initrams that should survive until last
  reboot.

This primarily merges a couple of different exclusion lists into a single
concept.
2016-12-14 10:13:52 +01:00
Lennart Poettering c9d5c9c0e1 core: make unit_free() accept NULL pointers
We generally try to make our destructors robust regarding NULL pointers, much
in the same way as glibc's free(). Do this also for unit_free().

Follow-up for #4748.
2016-12-01 00:25:51 +01:00
Franck Bui 7d5ceb6416 core: allow to redirect confirmation messages to a different console
It's rather hard to parse the confirmation messages (enabled with
systemd.confirm_spawn=true) amongst the status messages and the kernel
ones (if enabled).

This patch gives the possibility to the user to redirect the confirmation
message to a different virtual console, either by giving its name or its path,
so those messages are separated from the other ones and easier to read.
2016-11-17 18:16:16 +01:00
Zbigniew Jędrzejewski-Szmek f97b34a629 Rename formats-util.h to format-util.h
We don't have plural in the name of any other -util files and this
inconsistency trips me up every time I try to type this file name
from memory. "formats-util" is even hard to pronounce.
2016-11-07 10:15:08 -05:00
Lennart Poettering 1201cae704 core: change mount_synthesize_root() return to int
Let's propagate the error here, instead of eating it up early.

In a later change we should probably also change mount_enumerate() to propagate
errors up, but that would mean we'd have to change the unit vtable, and thus
change all unit types, hence is quite an invasive change.
2016-11-02 11:39:49 -06:00
Lennart Poettering a581e45ae8 unit: unify some code with new unit_new_for_name() call 2016-11-02 11:29:59 -06:00
Lennart Poettering 11222d0fe0 core: make the root mount perpetual too
Now that have a proper concept of "perpetual" units, let's make the root mount
one too, since it also cannot go away.
2016-11-02 11:29:59 -06:00
Zbigniew Jędrzejewski-Szmek 3b319885c4 tree-wide: introduce free_and_replace helper
It's a common pattern, so add a helper for it. A macro is necessary
because a function that takes a pointer to a pointer would be type specific,
similarly to cleanup functions. Seems better to use a macro.
2016-10-16 23:35:39 -04:00
Zbigniew Jędrzejewski-Szmek b744e8937c Merge pull request #4067 from poettering/invocation-id
Add an "invocation ID" concept to the service manager
2016-10-11 13:40:50 -04:00
Lennart Poettering 052364d41f core: simplify if branches a bit
We do the same thing in two branches, let's merge them. Let's also add an
explanatory comment, while we are at it.
2016-10-10 22:57:02 +02:00
Lennart Poettering f2aed3070d core: make use of IN_SET() in various places in mount.c 2016-10-10 22:57:02 +02:00
Lennart Poettering 1f0958f640 core: when determining whether a process exit status is clean, consider whether it is a command or a daemon
SIGTERM should be considered a clean exit code for daemons (i.e. long-running
processes, as a daemon without SIGTERM handler may be shut down without issues
via SIGTERM still) while it should not be considered a clean exit code for
commands (i.e. short-running processes).

Let's add two different clean checking modes for this, and use the right one at
the appropriate places.

Fixes: #4275
2016-10-10 22:57:01 +02:00
Lennart Poettering 4b58153dd2 core: add "invocation ID" concept to service manager
This adds a new invocation ID concept to the service manager. The invocation ID
identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is
generated each time a unit moves from and inactive to an activating or active
state.

The primary usecase for this concept is to connect the runtime data PID 1
maintains about a service with the offline data the journal stores about it.
Previously we'd use the unit name plus start/stop times, which however is
highly racy since the journal will generally process log data after the service
already ended.

The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel,
except that it applies to an individual unit instead of the whole system.

The invocation ID is passed to the activated processes as environment variable.
It is additionally stored as extended attribute on the cgroup of the unit. The
latter is used by journald to automatically retrieve it for each log logged
message and attach it to the log entry. The environment variable is very easily
accessible, even for unprivileged services. OTOH the extended attribute is only
accessible to privileged processes (this is because cgroupfs only supports the
"trusted." xattr namespace, not "user."). The environment variable may be
altered by services, the extended attribute may not be, hence is the better
choice for the journal.

Note that reading the invocation ID off the extended attribute from journald is
racy, similar to the way reading the unit name for a logging process is.

This patch adds APIs to read the invocation ID to sd-id128:
sd_id128_get_invocation() may be used in a similar fashion to
sd_id128_get_boot().

PID1's own logging is updated to always include the invocation ID when it logs
information about a unit.

A new bus call GetUnitByInvocationID() is added that allows retrieving a bus
path to a unit by its invocation ID. The bus path is built using the invocation
ID, thus providing a path for referring to a unit that is valid only for the
current runtime cycleof it.

Outlook for the future: should the kernel eventually allow passing of cgroup
information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we
can alter the invocation ID to be generated as hash from that rather than
entirely randomly. This way we can derive the invocation race-freely from the
messages.
2016-10-07 20:14:38 +02:00
Barron Rulon 49915de245 mount: add SloppyOptions= to mount_dump() 2016-08-27 10:47:46 -04:00
Barron Rulon 4f8d40a9dc mount: add new ForceUnmount= setting for mount units, mapping to umount(8)'s "-f" switch 2016-08-27 10:46:52 -04:00
brulon e520950a03 mount: add new LazyUnmount= setting for mount units, mapping to umount(8)'s "-l" switch (#3827) 2016-08-26 17:57:22 +02:00
Lennart Poettering 00d9ef8560 core: add RemoveIPC= setting
This adds the boolean RemoveIPC= setting to service, socket, mount and swap
units (i.e.  all unit types that may invoke processes). if turned on, and the
unit's user/group is not root, all IPC objects of the user/group are removed
when the service is shut down. The life-cycle of the IPC objects is hence bound
to the unit life-cycle.

This is particularly relevant for units with dynamic users, as it is essential
that no objects owned by the dynamic users survive the service exiting. In
fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set.

In order to communicate the UID/GID of an executed process back to PID 1 this
adds a new "user lookup" socket pair, that is inherited into the forked
processes, and closed before the exec(). This is needed since we cannot do NSS
from PID 1 due to deadlock risks, However need to know the used UID/GID in
order to clean up IPC owned by it if the unit shuts down.
2016-08-19 00:37:25 +02:00
Lennart Poettering a0fef983ab core: remember first unit failure, not last unit failure
Previously, the result value of a unit was overriden with each failure that
took place, so that the result always reported the last failure that took
place.

With this commit this is changed, so that the first failure taking place is
stored instead. This should normally not matter much as multiple failures are
sufficiently uncommon. However, it improves one behaviour: if we send SIGABRT
to a service due to a watchdog timeout, then this currently would be reported
as "coredump" failure, rather than the "watchodg" failure it really is. Hence,
in order to report information about the type of the failure, and not about
the effect of it, let's change this from all unit type to store the first, not
the last failure.

This addresses the issue pointed out here:

https://github.com/systemd/systemd/pull/3818#discussion_r73433520
2016-08-04 23:08:05 +02:00
Lennart Poettering c39f1ce24d core: turn various execution flags into a proper flags parameter
The ExecParameters structure contains a number of bit-flags, that were so far
exposed as bool:1, change this to a proper, single binary bit flag field. This
makes things a bit more expressive, and is helpful as we add more flags, since
these booleans are passed around in various callers, for example
service_spawn(), whose signature can be made much shorter now.

Not all bit booleans from ExecParameters are moved into the flags field for
now, but this can be added later.
2016-08-04 16:27:07 +02:00
Lennart Poettering eb18df724b Merge pull request #2471 from michaelolbrich/transient-mounts
allow transient mounts and automounts
2016-08-04 16:16:04 +02:00
Lennart Poettering 29206d4619 core: add a concept of "dynamic" user ids, that are allocated as long as a service is running
This adds a new boolean setting DynamicUser= to service files. If set, a new
user will be allocated dynamically when the unit is started, and released when
it is stopped. The user ID is allocated from the range 61184..65519. The user
will not be added to /etc/passwd (but an NSS module to be added later should
make it show up in getent passwd).

For now, care should be taken that the service writes no files to disk, since
this might result in files owned by UIDs that might get assigned dynamically to
a different service later on. Later patches will tighten sandboxing in order to
ensure that this cannot happen, except for a few selected directories.

A simple way to test this is:

        systemd-run -p DynamicUser=1 /bin/sleep 99999
2016-07-22 15:53:45 +02:00
Lennart Poettering cf6f7f66a4 core: add minor comment
Let's explain #3444 briefly in the sources, too.
2016-06-06 22:03:31 +02:00
michaelolbrich 53203e5f8f mount: make sure got into MOUNT_DEAD state after a successful umount (#3444)
Without this code the following can happen:
1. Open a file to keep a mount busy
2. Try to stop the corresponding mount unit with systemctl
   -> umount fails and the failure is remembered in mount->result
3. Close the file and umount the filesystem manually
   -> mount_dispatch_io() calls "mount_enter_dead(mount, MOUNT_SUCCESS)"
   -> Old error in mount->result is reused and the mount unit enters a
      failed state

Clear the old error result when 'mountinfo' reports a successful umount to
fix this.
2016-06-06 21:59:51 +02:00
Michael Olbrich b294b79fb0 mount: use get_mount_parameters_fragment() consistently
There are multiple different checks, that all mean the same thing:
Is it a explicitly configured mount unit where actions need to be taken to
mount it, or is is just mirroring 'mountinfo':
'from_fragment' to set if fragment_path is not NULL, and
get_mount_parameters_fragment() just wraps that and returns fragment_path.

Use get_mount_parameters_fragment() everywhere to be consistent.
This is just a cleanup without functional change.
2016-06-06 07:33:54 +02:00
Zbigniew Jędrzejewski-Szmek 94ad3616c8 core/mount: add helper function for mount states 2016-05-07 16:19:53 -04:00
Lennart Poettering 1ed7ebcfca Merge pull request #3170 from poettering/v230-preparation-fixes
make virtualization detection quieter, rework unit start limit logic, detect unit file drop-in changes correctly, fix autofs state propagation
2016-05-04 10:46:13 +02:00
Lennart Poettering fae03ed32a automount: rework propagation between automount and mount units
Port the progagation logic to the generic Unit->trigger_notify() callback logic
in the unit vtable, that is called for a unit not only when the triggered unit
of it changes state but also when a job for that unit finishes. This, firstly
allows us to make the code a bit cleaner and more generic, but more
importantly, allows us to notice correctly when a mount job fails, and
propagate that back to autofs client processes.

Fixes: #2181
2016-05-02 16:51:45 +02:00
Lennart Poettering 072993504e core: move enforcement of the start limit into per-unit-type code again
Let's move the enforcement of the per-unit start limit from unit.c into the
type-specific files again. For unit types that know a concept of "result" codes
this allows us to hook up the start limit condition to it with an explicit
result code. Also, this makes sure that the state checks in clal like
service_start() may be done before the start limit is checked, as the start
limit really should be checked last, right before everything has been verified
to be in order.

The generic start limit logic is left in unit.c, but the invocation of it is
moved into the per-type files, in the various xyz_start() functions, so that
they may place the check at the right location.

Note that this change drops the enforcement entirely from device, slice, target
and scope units, since these unit types generally may not fail activation, or
may only be activated a single time. This is also documented now.

Note that restores the "start-limit-hit" result code that existed before
6bf0f408e4 already in the service code. However,
it's not introduced for all units that have a result code concept.

Fixes #3166.
2016-05-02 13:08:00 +02:00
Zbigniew Jędrzejewski-Szmek ce99c68a33 Move no_instances information to shared/
This way it can be used in install.c in subsequent commit.
2016-05-01 19:58:59 -04:00
Zbigniew Jędrzejewski-Szmek 8a993b61d1 Move no_alias information to shared/
This way it can be used in install.c in subsequent commit.
2016-05-01 19:40:51 -04:00
Lennart Poettering 365007369b Merge pull request #3069 from Werkov/fix-dependencies-for-bind-mounts
Always create dependencies for bind mounts
2016-04-29 12:50:29 +02:00
Michal Koutný d3bd0986bb Always create dependencies for loop device mounts
In case a file is on a networked filesystem, we may tag the fstab record with
_netdev option, however, corrrect dependencies will be created for this mount.
2016-04-25 13:25:00 +02:00
Michal Koutný 26919ac110 Always create dependencies for bind mounts
Dependencies were not created for _netdev mountpoints, the reasoning for this
is in the commit fc676b00, i.e. to avoid adding dependencies for network
mountpoints where What= appears like a path. Thus proposing this semantically
more correct condition when dependencies are added for _actual_ bind mounts
irrespectively of network flag.

Consequently it allows to add _netdev option to bind mounts, which includes
them in remote-fs.target, which simplifies configuration.
2016-04-25 13:12:02 +02:00
Lennart Poettering 291d565a04 core,systemctl: add bus API to retrieve processes of a unit
This adds a new GetProcesses() bus call to the Unit object which returns an
array consisting of all PIDs, their process names, as well as their full cgroup
paths. This is then used by "systemctl status" to show the per-unit process
tree.

This has the benefit that the client-side no longer needs to access the
cgroupfs directly to show the process tree of a unit. Instead, it now uses this
new API, which means it also works if -H or -M are used correctly, as the
information from the specific host is used, and not the one from the local
system.

Fixes: #2945
2016-04-22 16:06:20 +02:00
Zbigniew Jędrzejewski-Szmek 3ae5990c6e tree-wide: introduce PATH_IN_SET macro 2016-04-16 22:57:05 -04:00
Lennart Poettering 463d0d1569 core: remove ManagerRunningAs enum
Previously, we had two enums ManagerRunningAs and UnitFileScope, that were
mostly identical and converted from one to the other all the time. The latter
had one more value UNIT_FILE_GLOBAL however.

Let's simplify things, and remove ManagerRunningAs and replace it by
UnitFileScope everywhere, thus making the translation unnecessary. Introduce
two new macros MANAGER_IS_SYSTEM() and MANAGER_IS_USER() to simplify checking
if we are running in one or the user context.
2016-04-12 13:43:30 +02:00