Commit Graph

294 Commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek aac99f303a core: introduce a helper function to wrap unit_log_{success,failure}
It's inline so that the compiler can easily optimize away the call to get
status string.
2018-11-16 19:47:07 +01:00
Lennart Poettering 523ee2d414 core: log a recognizable message when a unit succeeds, too
We already are doing it on failure, let's do it on success, too.

Fixes: #10265
2018-11-16 15:22:48 +01:00
Lennart Poettering 91bbd9b796 core: make log messages about unit processes exiting recognizable 2018-11-16 15:22:48 +01:00
Lennart Poettering 7c047d7443 core: make log messages about units entering a 'failed' state recognizable
Let's make this recognizable, and carry result information in a
structure fashion.
2018-11-16 15:22:48 +01:00
Yu Watanabe b9c04eafb8 core: introduce exec_params_clear()
Follow-up for 1ad6e8b302.

Fixes #10677.
2018-11-08 09:36:37 +01:00
Lennart Poettering 1ad6e8b302 core: split environment block mantained by PID 1's Manager object in two
This splits the "environment" field of Manager into two:
transient_environment and client_environment. The former is generated
from configuration file, kernel cmdline, environment generators. The
latter is the one the user can control with "systemctl set-environment"
and similar.

Both sets are merged transparently whenever needed. Separating the two
sets has the benefit that we can safely flush out the former while
keeping the latter during daemon reload cycles, so that env var settings
from env generators or configuration files do not accumulate, but
dynamic API changes are kept around.

Note that this change is not entirely transparent to users: if the user
first uses "set-environment" to override a transient variable, and then
uses "unset-environment" to unset it again things will revert to the
original transient variable now, while previously the variable was fully
removed. This change in behaviour should not matter too much though I
figure.

Fixes: #9972
2018-10-31 18:00:53 +01:00
Lennart Poettering d68c645bd3 core: rework serialization
Let's be more careful with what we serialize: let's ensure we never
serialize strings that are longer than LONG_LINE_MAX, so that we know we
can read them back with read_line(…, LONG_LINE_MAX, …) safely.

In order to implement this all serialization functions are move to
serialize.[ch], and internally will do line size checks. We'd rather
skip a serialization line (with a loud warning) than write an overly
long line out. Of course, this is just a second level protection, after
all the data we serialize shouldn't be this long in the first place.

While we are at it also clean up logging: while serializing make sure to
always log about errors immediately. Also, (void)ify all calls we don't
expect errors in (or catch errors as part of the general
fflush_and_check() at the end.
2018-10-26 10:52:41 +02:00
Yu Watanabe 5e1ee764e1 core: include error cause in log message 2018-10-20 01:40:42 +09:00
Zbigniew Jędrzejewski-Szmek 5a72417084 pid1: drop unused path parameter to add_two_dependencies_by_name() 2018-09-15 20:02:00 +02:00
Zbigniew Jędrzejewski-Szmek 35d8c19ace pid1: drop now-unused path parameter to add_dependency_by_name() 2018-09-15 19:57:52 +02:00
Yu Watanabe fc95c359f6 tree-wide: use returned value from log_*_errno() 2018-08-07 15:48:37 +09:00
Zbigniew Jędrzejewski-Szmek 5b316330be
Merge pull request #9624 from poettering/service-state-flush
flush out ExecStatus structures when a new service cycle begins
2018-08-02 09:50:39 +02:00
Lennart Poettering 5686391b00 core: introduce new Type=exec service type
Users are often surprised that "systemd-run" command lines like
"systemd-run -p User=idontexist /bin/true" will return successfully,
even though the logs show that the process couldn't be invoked, as the
user "idontexist" doesn't exist. This is because Type=simple will only
wait until fork() succeeded before returning start-up success.

This patch adds a new service type Type=exec, which is very similar to
Type=simple, but waits until the child process completed the execve()
before returning success. It uses a pipe that has O_CLOEXEC set for this
logic, so that the kernel automatically sends POLLHUP on it when the
execve() succeeded but leaves the pipe open if not. This means PID 1
waits exactly until the execve() succeeded in the child, and not longer
and not shorter, which is the desired functionality.

Making use of this new functionality, the command line
"systemd-run -p User=idontexist -p Type=exec /bin/true" will now fail,
as expected.
2018-07-25 22:48:11 +02:00
Lennart Poettering 6a1d4d9fa6 core: properly reset all ExecStatus structures when entering a new unit cycle
Whenever a unit is started fresh we should flush out any runtime data
from the previous cycle. We are pretty good at that already, but what so
far we missed was the ExecStart=/ExecStop=/… command exit status data.
Let's fix that, and properly flush out that stuff too.

Consider this service:

    [Service]
    ExecStart=/bin/sleep infinity
    ExecStop=/bin/false

When this service is started, then stopped and then started again
"systemctl status" would show the ExecStop= results of the previous run
along with the ExecStart= results of the current one, which is very
confusing. With this patch this is corrected: the data is kept right
until the moment the new service cycle starts, and then flushed out.
Hence "systemctl status" in that case will only show the ExecStart=
data, but no ExecStop= data, like it should be.

This should fix part of the confusion of #9588
2018-07-23 13:36:47 +02:00
Lennart Poettering 0c69794138 tree-wide: remove Lennart's copyright lines
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
2018-06-14 10:20:20 +02:00
Lennart Poettering 818bf54632 tree-wide: drop 'This file is part of systemd' blurb
This part of the copyright blurb stems from the GPL use recommendations:

https://www.gnu.org/licenses/gpl-howto.en.html

The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.

hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
2018-06-14 10:20:20 +02:00
Lennart Poettering 6f40aa4547 core: add a couple of more error cases that should result in "bad-setting"
This changes a number of EINVAL cases to ENOEXEC, so that we enter
"bad-setting" state if they fail.
2018-06-11 12:53:12 +02:00
Lennart Poettering 04eb582acc core: enumerate perpetual units in a separate per-unit-type method
Previously the enumerate() callback defined for each unit type would do
two things:

1. It would create perpetual units (i.e. -.slice, system.slice, -.mount and
   init.scope)

2. It would enumerate units from /proc/self/mountinfo, /proc/swaps and
   the udev database

With this change these two parts are split into two seperate methods:
enumerate() now only does #2, while enumerate_perpetual() is responsible
for #1. Why make this change? Well, perpetual units should have a
slightly different effect that those found through enumeration: as
perpetual units should be up unconditionally, perpetually and thus never
change state, they should also not pull in deps by their state changing,
not even when the state is first set to active. Thus, their state is
generally initialized through the per-device coldplug() method in
similar  fashion to the deserialized state from a previous run would be
put into place. OTOH units found through regular enumeration should
result in state changes (and thus pull in deps due to state changes),
hence their state should be put in effect in the catchup() method
instead. Hence, given this difference, let's also separate the
functions, so that the rule is:

1. What is created in enumerate_perpetual() should be started in
   coldplug()

2. What is created in enumerate() should be started in catchup().
2018-06-07 15:29:17 +02:00
Lennart Poettering 485ae697ba core: rework device_found_node() prototype
let's drop the "now" argument, it's exactly what MANAGER_IS_RUNNING()
returns, hence let's use that instead to simplify things.

Moreover, let's change the add/found argument pair to become found/mask,
which allows us to change multiple flags at the same time into opposing
directions, which will be useful later on.

Also, let's change the return type to void. It's a notifier call where
callers will ignore the return value anyway as it is nothing actionable.

Should not change behaviour.
2018-06-07 13:36:19 +02:00
Zbigniew Jędrzejewski-Szmek 79e221d078
Merge pull request #9158 from poettering/notify-auto-reload
trigger OnFailure= only if Restart= is not in effect
2018-06-05 13:51:07 +02:00
Yu Watanabe 858d36c1ec path-util: introduce path_simplify()
The function is similar to path_kill_slashes() but also removes
initial './', trailing '/.', and '/./' in the path.
When the second argument of path_simplify() is false, then it
behaves as the same as path_kill_slashes(). Hence, this also
replaces path_kill_slashes() with path_simplify().
2018-06-03 23:39:26 +09:00
Lennart Poettering 2ad2e41a72 core: don't trigger OnFailure= deps when a unit is going to restart
This adds a flags parameter to unit_notify() which can be used to pass
additional notification information to the function. We the make the old
reload_failure boolean parameter one of these flags, and then add a new
flag that let's unit_notify() if we are configured to restart the
service.

Note that this adjusts behaviour of systemd to match what the docs say.

Fixes: #8398
2018-06-01 19:08:30 +02:00
Felipe Sateler 57b7a260c2 core: undo the dependency inversion between unit.h and all unit types 2018-05-15 14:24:34 -04:00
Zbigniew Jędrzejewski-Szmek 11a1589223 tree-wide: drop license boilerplate
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.

I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
2018-04-06 18:58:55 +02:00
Michael Olbrich 227b8a762f core: don't include libmount.h in a header file (#8580)
linux/fs.h sys/mount.h, libmount.h and missing.h all include MS_*
definitions.

To avoid problems, only one of linux/fs.h, sys/mount.h and libmount.h
should be included. And missing.h must be included last.

Without this, building systemd may fail with:

In file included from [...]/libmount/libmount.h:31:0,
                 from ../systemd-238/src/core/manager.h:23,
                 from ../systemd-238/src/core/emergency-action.h:37,
                 from ../systemd-238/src/core/unit.h:34,
                 from ../systemd-238/src/core/dbus-timer.h:25,
                 from ../systemd-238/src/core/timer.c:26:
[...]/sys/mount.h:57:2: error: expected identifier before numeric constant
2018-03-26 17:34:53 +02:00
Zbigniew Jędrzejewski-Szmek 95b862b054 shutdown: use libmount to enumerate /proc/self/mountinfo
This is analogous to 8d3ae2bd4c, except that now
src/core/umount.c not src/core/mount.c is converted.

Might help with https://bugzilla.redhat.com/show_bug.cgi?id=1554943, or not.

In the patch, mnt_free_tablep and mnt_free_iterp are declared twice. It'd
be nicer to define them just once in mount-setup.h, but then libmount.h would
have to be included there. libmount.h seems to be buggy, and declares some
defines which break other headers, and working around this is more pain than
the two duplicate lines. So let's live with the duplication for now.

This fixes memleak of MountPoint in mount_points_list_get() on error, not that
it matters any.
2018-03-16 10:09:46 +01:00
Lennart Poettering a94ab7acfd
Merge pull request #8175 from keszybz/gc-cleanup
Garbage collection cleanup
2018-02-15 17:47:37 +01:00
Zbigniew Jędrzejewski-Szmek 648461c07d Merge pull request #8125 from poettering/cgroups-migrate
Trivial merge conflict resolved locally.
2018-02-15 16:15:45 +01:00
Zbigniew Jędrzejewski-Szmek f2f725e5cc pid1: rename unit_check_gc to unit_may_gc
"check" is unclear: what is true, what is false? Let's rename to "can_gc" and
revert the return value ("positive" values are easier to grok).

v2:
- rename from unit_can_gc to unit_may_gc
2018-02-15 13:04:12 +01:00
Lennart Poettering 004c7f169e core: fold manager_set_exec_params() into unit_set_exec_params()
Let's simplify things a bit: we so far called both functions every
single time, let's just merge one into the other, so that we have fewer
functions to call.
2018-02-12 11:34:00 +01:00
Lennart Poettering 548f69375e tree-wide: use path_hash_ops instead of string_hash_ops whenever we key by a path
Let's make use of our new hash_ops!
2018-02-12 11:07:55 +01:00
Yu Watanabe f2e18ef1a3 core: remove unnecessary initialization 2018-02-09 16:36:37 +09:00
Yu Watanabe e8a565cb66 core: make ExecRuntime be manager managed object
Before this, each ExecRuntime object is owned by a unit. However,
it may be shared with other units which enable JoinsNamespaceOf=.
Thus, by the serialization/deserialization process, its sharing
information, more specifically, reference counter is lost, and
causes issue #7790.

This makes ExecRuntime objects be managed by manager, and changes
the serialization/deserialization process.

Fixes #7790.
2018-02-06 16:00:34 +09:00
Yu Watanabe 9189979213 core/mount: dump TimeoutSec= 2018-01-30 17:09:59 +09:00
Lennart Poettering 832316370b mount,swap: write event loop priority as "SD_EVENT_PRIORITY_NORMAL-x"
We do that in all other cases, let's do it here too. Since
SD_EVENT_PRIORITY_NORMAL evaluates to zero there's zero effective
difference, but it makes things easier to grok and grep for if we always
express relative priorities within PID 1 only.
2018-01-23 18:13:01 +01:00
Alan Jenkins b6ba0c164d mount: don't consider activated until /sbin/mount returns
So far, we considered mount units activated as soon as the mount
appeared.  This avoided seeing a difference between mounts started by
systemd, and e.g. by running `mount` from a terminal.
(`umount` was not handled this way).

However in some cases, options passed to `mount` require additional
system calls after the mount is successfully created.  E.g. the
`private` mount option, or the `ro` option on bind mounts.
It seems best to wait for mount to finish doing that.  E.g. in
the `private` case, the current behaviour could theoretically cause
non-deterministic results, as child mounts inherit the
private/shared propagation setting from their parent.

This also avoids a special case in mount_reload().
2018-01-23 11:09:18 +00:00
Alan Jenkins 5701836121 mount: clarify that umount retries do not (anymore) allow multiple timeouts
It _looks_ as if, back when we used to retry unsuccessful calls to umount,
this would have inflated the effective timeout.  Multiplying it by
RETRY_UMOUNT_MAX.  Which is set to 32.

I'm surprised if it's true: I would have expected it to be noticed during
the work on NFS timeouts.  But I can't see what would have stopped it.

Clarify that I do not expect this to happen anymore.  I think each
individual umount call is allowed up to the full timeout, but if umount
ever exited with a signal status, we would stop retrying.

To be extra clear, make sure that we do not retry in the event that umount
perversely returned EXIT_SUCCESS after receiving SIGTERM.
2018-01-23 11:09:18 +00:00
Alan Jenkins 006aabbd05 mount: mountinfo event is supposed to always arrive before SIGCHLD
"Due to the io event priority logic we can be sure the new mountinfo is
loaded before we process the SIGCHLD for the mount command."

I think this is a reasonable expectation.  But if it works, then the
other comment must be false:

"Note that mount(8) returning and the kernel sending us a mount table
change event might happen out-of-order."

Therefore we can clean up the code for the latter.

If this is working as advertised, then we can make sure that mount units
fail if the mount we thought we were creating did not actually appear,
due to races or trickery (or because /sbin/mount did something unexpected
despite returning EXIT_SUCCESS).

Include a specific warning message for this failure.

If we give up when the mount point is still mounted after 32 successful
calls to /sbin/umount, that seems a fairly similar case.  So make that
message a LOG_WARN as well (not LOG_DEBUG). Also, this was recently changed to only
retry while umount is returning EXIT_SUCCESS; in that case in particular
there would be no other messages in the log to suggest what had happened.
2018-01-23 11:09:06 +00:00
Alan Jenkins 25cd49647c mount: forbid mount on path with symlinks
It was forbidden to create mount units for a symlink.  But the reason is
that the mount unit needs to know the real path that will appear in
/proc/self/mountinfo.  The kernel dereferences *all* the symlinks in the
path at mount time (I checked this with `mount -c` running under `strace`).

This will have no effect on most systems.  As recommended by docs, most
systems use /etc/fstab, as opposed to native mount unit files.
fstab-generator dereferences symlinks for backwards compatibility.

A relatively minor issue regarding Time Of Check / Time Of Use also exists
here.  I can't see how to get rid of it entirely.  If we pass an absolute
path to mount, the racing process can replace it with a symlink.  If we
chdir() to the mount point and pass ".", the racing process can move the
directory.  The latter might potentially be nicer, except that it breaks
WorkingDirectory=.

I'm not saying the race is relevant to security - I just want to consider
how bad the effect is.  Currently, it can make the mount unit active (and
hence the job return success), despite there never being a matching entry
in /proc/self/mountinfo.  This wart will be removed in the next commit;
i.e. it will make the mount unit fail instead.
2018-01-20 22:06:34 +00:00
Alan Jenkins 3cc9685649 core: prevent spurious retries of umount
Testing the previous commit with `systemctl stop tmp.mount` logged the
reason for failure as expected, but unexpectedly the message was repeated
32 times.

The retry is a special case for umount; it is only supposed to cover the
case where the umount command was _successful_, but there was still some
remaining mount(s) underneath.  Fix it by making sure to test the first
condition :).

Re-tested with and without a preceding `mount --bind /mnt /tmp`,
and using `findmnt` to check the end result.
2018-01-13 17:22:46 +00:00
Alan Jenkins 5804e1b6ff core: fix output (logging) for mount units (#7603)
Documentation - systemd.exec - strongly implies mount units get logging.

It is safe for mounts to depend on systemd-journald.socket.  There is no
cyclic dependency generated.  This is because the root, -.mount, was
already deliberately set to EXEC_OUTPUT_NULL.  See comment in
mount_load_root_mount().  And /run is excluded from being a mount unit.

Nor does systemd-journald depend on /var.  It starts earlier, initially
logging to /run.

Tested before/after using `systemctl stop tmp.mount`.
2018-01-13 13:03:13 +00:00
rkolchmeyer 65d36b4950 core: Fix edge case when processing /proc/self/mountinfo (#7811)
Currently, if there are two /proc/self/mountinfo entries with the same
mount point path, the mount setup flags computed for the second of
these two entries will overwrite the mount setup flags computed for
the first of these two entries. This is the root cause of issue #7798.
This patch changes mount_setup_existing_unit to prevent the
just_mounted mount setup flag from being overwritten if it is set to
true. This will allow all mount units created from /proc/self/mountinfo
entries to be initialized properly.

Fixes: #7798
2018-01-05 19:28:23 +01:00
Lennart Poettering a4634b214c core: warn about left-over processes in cgroup on unit start
Now that we don't kill control processes anymore, let's at least warn
about any processes left-over in the unit cgroup at the moment of
starting the unit.
2017-11-25 17:08:21 +01:00
Lennart Poettering 3c7416b6ca core: unify common code for preparing for forking off unit processes
This introduces a new function unit_prepare_exec() that encapsulates a
number of calls we do in preparation for spawning off some processes in
all our unit types that do so.

This allows us to neatly unify a bit of code between unit types and
shorten our code.
2017-11-21 11:54:08 +01:00
Zbigniew Jędrzejewski-Szmek 53e1b68390 Add SPDX license identifiers to source files under the LGPL
This follows what the kernel is doing, c.f.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.
2017-11-19 19:08:15 +01:00
Lennart Poettering d3070fbdf6 core: implement /run/systemd/units/-based path for passing unit info from PID 1 to journald
And let's make use of it to implement two new unit settings with it:

1. LogLevelMax= is a new per-unit setting that may be used to configure
   log priority filtering: set it to LogLevelMax=notice and only
   messages of level "notice" and lower (i.e. more important) will be
   processed, all others are dropped.

2. LogExtraFields= is a new per-unit setting for configuring per-unit
   journal fields, that are implicitly included in every log record
   generated by the unit's processes. It takes field/value pairs in the
   form of FOO=BAR.

Also, related to this, one exisiting unit setting is ported to this new
facility:

3. The invocation ID is now pulled from /run/systemd/units/ instead of
   cgroupfs xattrs. This substantially relaxes requirements of systemd
   on the kernel version and the privileges it runs with (specifically,
   cgroupfs xattrs are not available in containers, since they are
   stored in kernel memory, and hence are unsafe to permit to lesser
   privileged code).

/run/systemd/units/ is a new directory, which contains a number of files
and symlinks encoding the above information. PID 1 creates and manages
these files, and journald reads them from there.

Note that this is supposed to be a direct path between PID 1 and the
journal only, due to the special runtime environment the journal runs
in. Normally, today we shouldn't introduce new interfaces that (mis-)use
a file system as IPC framework, and instead just an IPC system, but this
is very hard to do between the journal and PID 1, as long as the IPC
system is a subject PID 1 manages, and itself a client to the journal.

This patch cleans up a couple of types used in journal code:
specifically we switch to size_t for a couple of memory-sizing values,
as size_t is the right choice for everything that is memory.

Fixes: #4089
Fixes: #3041
Fixes: #4441
2017-11-16 12:40:17 +01:00
Yu Watanabe 74b1731c75 core/mount: fstype may be NULL 2017-11-12 14:27:25 +01:00
Lennart Poettering 3e3852b3c6 core: make "tmpfs" dependencies on swapfs a "default" dep, not an "implicit"
There should be a way to turn this logic of, and DefaultDependencies=
appears to be the right option for that, hence let's downgrade this
dependency type from "implicit" to "default, and thus honour
DefaultDependencies=.

This also drops mount_get_fstype() as we only have a single user needing
this now.

A follow-up for #7076.
2017-11-10 19:52:41 +01:00
Lennart Poettering eef85c4a3f core: track why unit dependencies came to be
This replaces the dependencies Set* objects by Hashmap* objects, where
the key is the depending Unit, and the value is a bitmask encoding why
the specific dependency was created.

The bitmask contains a number of different, defined bits, that indicate
why dependencies exist, for example whether they are created due to
explicitly configured deps in files, by udev rules or implicitly.

Note that memory usage is not increased by this change, even though we
store more information, as we manage to encode the bit mask inside the
value pointer each Hashmap entry contains.

Why this all? When we know how a dependency came to be, we can update
dependencies correctly when a configuration source changes but others
are left unaltered. Specifically:

1. We can fix UDEV_WANTS dependency generation: so far we kept adding
   dependencies configured that way, but if a device lost such a
   dependency we couldn't them again as there was no scheme for removing
   of dependencies in place.

2. We can implement "pin-pointed" reload of unit files. If we know what
   dependencies were created as result of configuration in a unit file,
   then we know what to flush out when we want to reload it.

3. It's useful for debugging: "systemd-analyze dump" now shows
   this information, helping substantially with understanding how
   systemd's dependency tree came to be the way it came to be.
2017-11-10 19:45:29 +01:00
Alan Jenkins 79aafbd122 core: distinguish "Killing"/"Terminating"/"Stopping" for mount unit timeout
Update the timeout warnings for remount and unmount.  For consistency with
mount, for accuracy, and for consistency with their equivalents in
service.c.
2017-11-01 15:28:50 +00:00