Commit graph

701 commits

Author SHA1 Message Date
Frantisek Sumsal 1d6cc5d0e5 tree-wide: coccinelle fixes 2020-10-04 12:32:21 +02:00
Frantisek Sumsal 973bc32ab6 core: consolidate alloc & put operations into one statement 2020-09-14 16:13:44 +02:00
Zbigniew Jędrzejewski-Szmek 90e74a66e6 tree-wide: define iterator inside of the macro 2020-09-08 12:14:05 +02:00
Zbigniew Jędrzejewski-Szmek 12375b95dd core/unit: reduce scope of variables 2020-09-08 12:07:05 +02:00
Christian Göttsche a3f5fd964b selinux: create unit invocation links with default SELinux context 2020-09-01 15:48:53 +02:00
Zbigniew Jędrzejewski-Szmek c2911d48ff Rework how we cache mtime to figure out if units changed
Instead of assuming that more-recently modified directories have higher mtime,
just look for any mtime changes, up or down. Since we don't want to remember
individual mtimes, hash them to obtain a single value.

This should help us behave properly in the case when the time jumps backwards
during boot: various files might have mtimes that in the future, but we won't
care. This fixes the following scenario:

We have /etc/systemd/system with T1. T1 is initially far in the past.
We have /run/systemd/generator with time T2.
The time is adjusted backwards, so T2 will be always in the future for a while.
Now the user writes new files to /etc/systemd/system, and T1 is updated to T1'.
Nevertheless, T1 < T1' << T2.
We would consider our cache to be up-to-date, falsely.
2020-08-31 20:53:38 +02:00
Zbigniew Jędrzejewski-Szmek c149d2b491 pid1: use the cache mtime not clock to "mark" load attempts
We really only care if the cache has been reloaded between the time when we
last attempted to load this unit and now. So instead of recording the actual
time we try to load the unit, just store the timestamp of the cache. This has
the advantage that we'll notice if the cache mtime jumps forward or backward.

Also rename fragment_loadtime to fragment_not_found_time. It only gets set when
we failed to load the unit and the old name was suggesting it is always set.

In https://bugzilla.redhat.com/show_bug.cgi?id=1871327
(and most likely https://bugzilla.redhat.com/show_bug.cgi?id=1867930
and most likely https://bugzilla.redhat.com/show_bug.cgi?id=1872068) we try
to load a non-existent unit over and over from transaction_add_job_and_dependencies().
My understanding is that the clock was in the future during inital boot,
so cache_mtime is always in the future (since we don't touch the fs after initial boot),
so no matter how many times we try to load the unit and set
fragment_loadtime / fragment_not_found_time, it is always higher than cache_mtime,
so manager_unit_cache_should_retry_load() always returns true.
2020-08-31 20:53:38 +02:00
Zbigniew Jędrzejewski-Szmek 81be23886d core: rename manager_unit_file_maybe_loadable_from_cache()
The name is misleading, since we aren't really loading the unit from cache — if
this function returns true, we'll try to load the unit from disk, updating the
cache in the process.
2020-08-31 20:53:38 +02:00
Lennart Poettering bb0c0d6f29 core: add credentials logic
Fixes: #15778 #16060
2020-08-25 19:45:35 +02:00
Lennart Poettering 45374f6503
Merge pull request #15662 from Werkov/fix-cgroup-disable
Fix unsetting cgroup restrictions
2020-08-25 17:36:07 +02:00
Michal Koutný 4c591f3996 cgroup: Introduce family queueing instead of siblings
The unit_add_siblings_to_cgroup_realize_queue does more than mere
siblings queueing, hence define a family of a unit as (immediate)
children of the unit and immediate children of all ancestors.

Working with this abstraction simplifies the queuing calls and it
shouldn't change the functionality.
2020-08-19 11:41:53 +02:00
Michal Koutný f23ba94db3 cgroup: Implicit unit_invalidate_cgroup_members_masks
Merge members mask invalidation into
unit_add_siblings_to_cgroup_realize_queue, this way unit_realize_cgroup
needn't be called with members mask invalidation.

We have to retain the members mask invalidation in unit_load -- although
active units would have cgroups (re)realized (unit_load queues for
realization), the realization would happen with potentially stale mask.
2020-08-19 11:41:53 +02:00
Michal Koutný fb46fca7e0 cgroup: Eager realization in unit_free
unit_free(u) realizes direct parent and invalidates members mask of all
ancestors. This isn't sufficient in v1 controller hierarchies since
siblings of the freed unit may have existed only because of the removed
unit.

We cannot be lazy about the siblings because if parent(u) is also
removed, it'd migrate and rmdir cgroups for siblings(u). However,
realized masks of siblings(u) won't reflect this change.

This was a non-issue earlier, because we weren't removing cgroup
directories properly (effectively matching the stale realized mask),
removal failed because of tasks left by missing migration (see previous
commit).

Therefore, ensure realization of all units necessary to clean up after
the free'd unit.

Fixes: #14149
2020-08-19 11:41:53 +02:00
Luca Boccassi b3d133148e core: new feature MountImages
Follows the same pattern and features as RootImage, but allows an
arbitrary mount point under / to be specified by the user, and
multiple values - like BindPaths.

Original implementation by @topimiettinen at:
https://github.com/systemd/systemd/pull/14451
Reworked to use dissect's logic instead of bare libmount() calls
and other review comments.
Thanks Topi for the initial work to come up with and implement
this useful feature.
2020-08-05 21:34:55 +01:00
Zbigniew Jędrzejewski-Szmek 02b0109af5
Merge pull request #15955 from anitazha/nullorempty
core: check null_or_empty_path for masked units instead of /dev/null
2020-07-08 22:18:17 +02:00
Zbigniew Jędrzejewski-Szmek cd990847b9 tree-wide: more repeated words 2020-07-07 12:08:22 +02:00
Anita Zhang 640f3b143d core: check null_or_empty for masked units instead of /dev/null
There's some inconsistency in the what is considered a masked unit:
some places (i.e. load-fragment.c) use `null_or_empty()` while others
check if the file path is symlinked to "/dev/null". Since the latter
doesn't account for things like non-absolute symlinks to "/dev/null",
this commit switches the check for "/dev/null" to use `null_or_empty_path()`
2020-07-03 02:33:50 -07:00
Luca Boccassi 7233e91af0 core: store timestamps of unit load attempts
When the system is under heavy load, it can happen that the unit cache
is refreshed for an unrelated reason (in the test I simulate this by
attempting to start a non-existing unit). The new unit is found and
accounted for in the cache, but it's ignored since we are loading
something else.
When we actually look for it, by attempting to start it, the cache is
up to date so no refresh happens, and starting fails although we have
it loaded in the cache.

When the unit state is set to UNIT_NOT_FOUND, mark the timestamp in
u->fragment_loadtime. Then when attempting to load again we can check
both if the cache itself needs a refresh, OR if it was refreshed AFTER
the last failed attempt that resulted in the state being
UNIT_NOT_FOUND.

Update the test so that this issue reproduces more often.
2020-06-30 16:50:00 +02:00
Luca Boccassi 0cffae953a core: add device mapper to allow-list with DevicePolicy=closed and RootImage
To set up a verity/cryptsetup RootImage the forked child needs to
ioctl /dev/mapper/control and create a new mapper.
If PrivateDevices=yes and/or DevicePolicy=closed are used, this is
blocked by the cgroup setting, so add an exception like it's done
for loop devices (and also add a dependency on the kernel modules
implementing them).
2020-06-26 18:39:45 +02:00
Zbigniew Jędrzejewski-Szmek de7fef4b6e tree-wide: use set_ensure_put()
Patch contains a coccinelle script, but it only works in some cases. Many
parts were converted by hand.

Note: I did not fix errors in return value handing. This will be done separate
to keep the patch comprehensible. No functional change is intended in this
patch.
2020-06-22 16:32:37 +02:00
Lennart Poettering 24bd74ae03
Merge pull request #15940 from keszybz/names-set-optimization
Try to optimize away Unit.names set
2020-06-10 18:52:08 +02:00
Zbigniew Jędrzejewski-Szmek e2ea005681 core: do not touch instance from unit_choose_id()
unit_choose_id() is about marking one of the aliases of the unit as the main
name. With the preparatory work in previous patches, all aliases of the unit
must have the same instance, so the operation to update the instance is a noop.
2020-06-10 09:45:58 +02:00
Zbigniew Jędrzejewski-Szmek ada4b34ec7 core: rework error messages in unit_add_name()
They were added recently in acd1987a18. We can
make them more informative by using unit_type_to_string() and not repeating
unit names as much. Also, %m should not be used together with SYNTHETIC_ERRNO().
2020-06-10 09:42:20 +02:00
Zbigniew Jędrzejewski-Szmek d383acad25 core: when adding names to unit, require matching instance strings
We would check that the instance is present in both units (or missing in both).
But when it is defined, it should be the same in both. The comment in the code
was explicitly saying that differing instance strings are allowed, but this
mostly seems to be a left-over from old times. The man page is pretty clear:

> the instance (if any) is always uniquely defined for a given unit and all its
> aliases.
2020-06-10 09:42:20 +02:00
Zbigniew Jędrzejewski-Szmek 4562c35527 core: store unit aliases in a separate set
We allocated the names set for each unit, but in the majority of cases, we'd
put only one name in the set:

$ systemctl show --value -p Names '*'|grep .|grep -v ' '|wc -l
564
$ systemctl show --value -p Names '*'|grep .|grep ' '|wc -l
16

So let's add a separate .id field, and only store aliases in the set, and only
create the set if there's at least one alias. This requires a bit of gymnastics
in the code, but I think this optimization is worth the trouble, because we
save one object for many loaded units.

In particular set_complete_move() wasn't very useful because the target
unit would always have at least one name defined, i.e. the optimization to
move the whole set over would never fire.
2020-06-10 09:36:58 +02:00
Zbigniew Jędrzejewski-Szmek c9e0695675 core: set source_mtime after load dropins
Dropins may specify SourcePath= too, but we would do the stat only
after loading the main fragment, before loading of the drop-ins.

Fixes #13634.
2020-06-02 22:53:55 +02:00
Zbigniew Jędrzejewski-Szmek f6173cb955 core: define UnitDependency iterators in loops
Reduced scope of variables is always nice.
2020-05-28 18:53:35 +02:00
Zbigniew Jędrzejewski-Szmek db868d45f9 core: make unit_set_invocation_id static
No functional change.
2020-05-28 18:47:01 +02:00
Lennart Poettering c8aa4b5b86 core: voidify one function return 2020-05-26 23:52:22 +02:00
Lennart Poettering 4c42543429 core: also log about left-over processes during unit stop
Only log at LOG_INFO level, i.e. make this informational. During start
let's leave it at LOG_WARNING though.

Of course, it's ugly leaving processes around like that either in start
or in stop, but at start its more dangerous than on stop, so be tougher
there.
2020-05-26 23:52:13 +02:00
Lennart Poettering a0b191b705 condition: add ConditionEnvironment=
Prompted by the discussions in #15180.

This is a bit more complex than I hoped, since for PID 1 we need to pass
in the synethetic environment block in we generate on demand.
2020-05-15 16:05:33 +02:00
Lennart Poettering e1e214c56b
Merge pull request #15265 from fbuihuu/mount-fixes
Mount fixes
2020-05-15 11:13:45 +02:00
Lennart Poettering f3dc6af20f core: automatically update StandardOuput=syslog to =journal (and similar for StandardError=)
Let's go one step further and upgrade implicitly. Usually =syslog
assignments are historic artifacts only. Let's upgrade the lines
automatically, and politely suggest people update their unit
files/configuration (and drop the lines altogether, without
replacement).

Fixes: #15807
2020-05-15 00:05:46 +02:00
Benjamin Robin 20c3acfaad tree-wide: Replace assert() by assert_se() when there is side effect 2020-05-10 09:23:12 +02:00
Zbigniew Jędrzejewski-Szmek be32732168 basic/set: let set_put_strdup() create the set with string hash ops
If we're using a set with _put_strdup(), most of the time we want to use
string hash ops on the set, and free the strings when done. This defines
the appropriate a new string_hash_ops_free structure to automatically free
the keys when removing the set, and makes set_put_strdup() and set_put_strdupv()
instantiate the set with those hash ops.

hashmap_put_strdup() was already doing something similar.

(It is OK to instantiate the set earlier, possibly with a different hash ops
structure. set_put_strdup() will then use the existing set. It is also OK
to call set_free_free() instead of set_free() on a set with
string_hash_ops_free, the effect is the same, we're just overriding the
override of the cleanup function.)

No functional change intended.
2020-05-06 16:54:06 +02:00
Lennart Poettering c92391f52f
Merge pull request #15692 from keszybz/preset-cleanup
Make systemctl list-unit-files output more useful
2020-05-06 08:19:37 +02:00
Zbigniew Jędrzejewski-Szmek 8f7b256665 shared/install: optionally cache the preset list
When doing list-unit-files with --root, we would re-read the preset
list for every unit. This uses a cache to only do it once. The time
for list-unit-files goes down by about ~30%.

unit_file_query_preset() is also called from src/core/. This patch does not
touch that path, since the saving there are smaller, since preset status is
only read on demand over dbus, and caching would be more complicated.
2020-05-05 21:50:31 +02:00
Michal Sekletár d9e45bc3ab core: introduce support for cgroup freezer
With cgroup v2 the cgroup freezer is implemented as a cgroup
attribute called cgroup.freeze. cgroup can be frozen by writing "1"
to the file and kernel will send us a notification through
"cgroup.events" after the operation is finished and processes in the
cgroup entered quiescent state, i.e. they are not scheduled to
run. Writing "0" to the attribute file does the inverse and process
execution is resumed.

This commit exposes above low-level functionality through systemd's DBus
API. Each unit type must provide specialized implementation for these
methods, otherwise, we return an error. So far only service, scope, and
slice unit types provide the support. It is possible to check if a
given unit has the support using CanFreeze() DBus property.

Note that DBus API has a synchronous behavior and we dispatch the reply
to freeze/thaw requests only after the kernel has notified us that
requested operation was completed.
2020-04-30 19:02:51 +02:00
Frantisek Sumsal 86b52a3958 tree-wide: fix spelling errors
Based on a report from Fossies.org using Codespell.

Followup to #15436
2020-04-21 23:21:08 +02:00
Lennart Poettering 33b58dfb41 core: automatically add udev dependency for units using RootImage=
We use udev to wait for /dev/loopX devices to be fully proped hence we
need an implicit ordering dependency on it, for RootImage= to work
reliably in early boot, too.

Fixes: #14972
2020-04-21 16:31:06 +02:00
Zbigniew Jędrzejewski-Szmek 162392b75a tree-wide: spellcheck using codespell
Fixes #15436.
2020-04-16 18:00:40 +02:00
Zbigniew Jędrzejewski-Szmek a1a11d5610
Merge pull request #15365 from poettering/remount-fs-pstore-fix
pid1: automatically generate systemd-remount-fs.service deps, plus enable systemd-pstore from sysinit.target
2020-04-12 17:16:07 +02:00
Lennart Poettering 9b3c65ed36
Merge pull request #15352 from poettering/user-group-name-valdity-rework
user/group name validity rework
2020-04-09 18:49:22 +02:00
Lennart Poettering 611cb82612
Merge pull request #15318 from fbuihuu/inherit-umask-for-user-units
pid1: by default make user units inherit their umask from the user ma…
2020-04-09 17:15:55 +02:00
Franck Bui 5e37d1930b pid1: by default make user units inherit their umask from the user manager
This patch changes the way user managers set the default umask for the units it
manages.

Indeed one can expect that if user manager's umask is redefined through PAM
(via /etc/login.defs or pam_umask), all its children including the units it
spawns have their umask set to the new value.

Hence make user units inherit their umask value from their parent instead of
the hard coded value 0022 but allow them to override this value via their unit
file.

Note that reexecuting managers with 'systemctl daemon-reexec' after changing
UMask= has no effect. To take effect managers need to be restarted with
'systemct restart' instead. This behavior was already present before this
patch.

Fixes #6077.
2020-04-09 14:17:07 +02:00
Lennart Poettering 7a8867abfa user-util: rework how we validate user names
This reworks the user validation infrastructure. There are now two
modes. In regular mode we are strict and test against a strict set of
valid chars. And in "relaxed" mode we just filter out some really
obvious, dangerous stuff. i.e. strict is whitelisting what is OK, but
"relaxed" is blacklisting what is really not OK.

The idea is that we use strict mode whenver we allocate a new user
(i.e. in sysusers.d or homed), while "relaxed" mode is when we process
users registered elsewhere, (i.e. userdb, logind, …)

The requirements on user name validity vary wildly. SSSD thinks its fine
to embedd "@" for example, while the suggested NAME_REGEX field on
Debian does not even allow uppercase chars…

This effectively liberaralizes a lot what we expect from usernames.

The code that warns about questionnable user names is now optional and
only used at places such as unit file parsing, so that it doesn't show
up on every userdb query, but only when processing configuration files
that know better.

Fixes: #15149 #15090
2020-04-08 17:11:20 +02:00
Lennart Poettering f3b7a79b97 core: automatically add dependency on systemd-remount-fs.service if StateDirectory= is used
And similar for other settings that require a writable /var/.

Rationale: if these options are used for early-boot services (such as
systemd-pstore.service) we need /var/ writable. And if /var/ is on the
root fs, then systemd-remount-fs.service is the service that ensures
that /var/ is writable.

This allows us to remove explicit deps in services such as
systemd-pstore.service.
2020-04-08 16:29:25 +02:00
Franck Bui b862c25716 device: drop refuse_after
Scheduling devices after a given unit can be useful to start device *jobs* at a
specific time in the transaction, see commit 4195077ab4.

This (hidden) change was introduced by commit eef85c4a3f.
2020-04-01 10:35:14 +02:00
Wen Yang acd1987a18 core/unit: print info when unit_add_name failed
When there are hundreds of mounts on the server, it will take a long
time to analyze the failure of a certain mount unit. So it is useful
to print the reason why unit_add_name() failed.
2020-03-27 18:46:13 +01:00
Zbigniew Jędrzejewski-Szmek 5bcf34ebf3 pid1: when showing error status, do not switch to status=temporary
We would flip to status=temporary mode on the first error, and then switch back
to status=auto after the initial transaction was done. This isn't very useful,
because usually all the messages about successfully started units and not
related to the original failure. In fact, all those messages most likely cause
the information about the prime error to scroll off screen. And if the user
requested quiet boot, there's no reason to think that they care about those
success messages.

Also, when logging about dependency cycles, treat this similarly to a unit
error and show the message even if the status is "soft disabled" (before we
wouldn't show it in that case).
2020-03-01 11:42:42 +01:00