The BLKID and ELFUTILS strings were present twice. Let's reaarange things so that
each times requires definition in exactly one place.
Also let's sort things a bit:
the "heavy hitters" like PAM/MAC first,
then crypto libs,
then other libs, alphabetically,
compressors,
and external compat integrations.
I think it's useful for users to group similar concepts together to some extent.
For example, when checking what compression is available, it helps a lot to have
them listed together.
FDISK is renamed to LIBFDISK to make it clear that this is about he library and
the executable.
... when called with a valid environment variable name. This means that
any time we call it with a fixed string, it is guaranteed to return 0.
(Also when the variable is not present in the environment block.)
Currently, a loss of power after the machine-id was written but before
all units with ConditionFirstBoot=yes ran would lead to the next boot
finding a valid machine-id, thus not being marked first boot and not
re-running these units.
To make the first boot mechanism more robust, instead of writing
/etc/machine-id very early, fill it with a marker value "uninitialized"
and overmount it with a transiently provisioned machine-id. Then, after
the first boots completes (when systemd-machine-id-commit.service runs),
write the real machine-id to disk.
This mechanism is of course only invoked on first boot. If a first boot
is not detected, the machine-id is handled as previously.
Fixes: #4511
When /etc/machine-id contains the string "uninitialized" instead of
a valid machine-id, treat this like the file was missing and mark this
boot as the first (-> units with ConditionFirstBoot=yes will run).
Fixes#16991fb39af4ce4 replaced `free_arguments()` with
`reset_arguments()`, which frees arg_* variables as before, but also resets all
of them to the default values. `reset_arguments()` was positioned
in such a way that it overrode some arg_* values still in use at shutdown.
To avoid further unintentional resets, I moved `reset_arguments()`
right before the return, when nothing else will be using the arg_* variables.
Previously, we'd create them from user-runtime-dir@.service. That has
one benefit: since this service runs privileged, we can create the full
set of device nodes. It has one major drawback though: it security-wise
problematic to create files/directories in directories as privileged
user in directories owned by unprivileged users, since they can use
symlinks to redirect what we want to do. As a general rule we hence
avoid this logic: only unpriv code should populate unpriv directories.
Hence, let's move this code to an appropriate place in the service
manager. This means we lose the inaccessible block device node, but
since there's already a fallback in place, this shouldn't be too bad.
arg_system == true and getpid() == 1 hold under the very same condition
this early in the main() function (this only changes later when we start
parsing command lines, where arg_system = true is set if users invoke us
in test mode even when getpid() != 1.
Hence, let's simplify things, and merge a couple of if branches and not
pretend they were orthogonal.
Cache it early in startup of the system manager, right after `/run/systemd` is
created, so that further access to it can be done without accessing the EFI
filesystem at all.
Let systemd load a set of pre-compiled AppArmor profile files from a policy
cache at /etc/apparmor/earlypolicy. Maintenance of that policy cache must be
done outside of systemd.
After successfully loading the profiles systemd will attempt to change to a
profile named systemd.
If systemd is already confined in a profile, it will not load any profile files
and will not attempt to change it's profile.
If anything goes wrong, systemd will only log failures. It will not fail to
start.
This is a follow-up for 9f83091e3c.
Instead of reading the mtime off the configuration files after reading,
let's do so before reading, but with the fd we read the data from. This
is not only cleaner (as it allows us to save one stat()), but also has
the benefit that we'll detect changes that happen while we read the
files.
This also reworks unit file drop-ins to use the common code for
determining drop-in mtime, instead of reading system clock for that.
That's reduce the number of functions dealing with configuration
parsing/loading and should make the code simpler especially since this function
was used only once.
No functional change.
Most complexity of this patch is due to the fact that some manager settings
(basically the watchdog properties) can be set at runtime and in this case the
runtime values must be retained over daemon-reload or daemon-reexec.
For consistency sake, all watchdog properties behaves now the same way, that
is:
- Values defined by config files can be overridden by writing the new value
through their respective D-BUS properties. In this case, these values are
preserved over reload/reexec until the special value '0' or USEC_INFINITY
is written, which will then restore the last values loaded from the config
files. If the restored value is '0' or 'USEC_INFINITY', the watchdogs will
be disabled and the corresponding device will be closed.
- Reading the properties from a user instance will return the USEC_INFINITY
value as these properties are only meaningful for PID1.
- Writing to one of the watchdog properties of a user instance's will be a
NOP.
Fixes: #15453
Let's allow more memory to be locked on beefy machines than on small
ones. The previous limit of 64M is the lower bound still. This
effectively means on a 4GB machine we can lock 512M, which should be
more than enough, but still not lock up the machine entirely under
pressure.
Fixes: #15053
The commit b3ac5f8cb9 has changed the system mount propagation to
shared by default, and according to the following patch:
https://github.com/opencontainers/runc/pull/208
When starting the container, the pouch daemon will call runc to execute
make-private.
However, if the systemctl daemon-reexec is executed after the container
has been started, the system mount propagation will be changed to share
again by default, and the make-private operation above will have no chance
to execute.
In practice we are very unlikely to fail at this point, but for
consistency, we should always warn when allocation fails, and
we have free_and_strdup_warn() for this.
Reload the internal selabel cache automatically on SELinux policy reloads so non pid-1 daemons are participating.
Run the reload function `mac_selinux_reload()` not manually on daemon-reload, but rather pass it as callback to libselinux.
Trigger the callback prior usage of the systemd internal selabel cache by depleting the selinux netlink socket via `avc_netlink_check_nb()`.
Improves: a9dfac21ec ("core: reload SELinux label cache on daemon-reload")
Improves: #13363
systemd.show-status=error is useful for the case where people care about errors
only.
If people want to have a quiet boot, they most likely don't want to see all
status output even if there is a delay in boot, so make "quiet" imply
systemd.show-status=error instead of systemd.show-status=auto.
Fixes#14976.
We would say "Enabling" also for SHOW_STATUS_AUTO, which is actually
"soft off". So just print the exact state to make things easier to understand.
Also add a helper function to avoid repeating the enum value list.
For #14814.
Reloading the SELinux label cache here enables a light-wight follow-up of a SELinux policy change, e.g. adding a label for a RuntimeDirectory.
Closes: #13363
This makes the code do what the documentation says. The code had no inkling
about initrd.target, so I think this change is fairly risky. As a fallback,
default.target will be loaded, so initramfses which relied on current behaviour
will still work, as along as they don't have a different initrd.target.
In an initramfs created with recent dracut:
$ ls -l usr/lib/systemd/system/{default.target,initrd.target}
lrwxrwxrwx. usr/lib/systemd/system/default.target -> initrd.target
-rw-r--r--. usr/lib/systemd/system/initrd.target
So at least for dracut, there should be no difference.
Also avoid a pointless allocation.
This partially reverts a07a7324ad.
We have two pieces of information: the value and a boolean.
config_parse_timeout_abort() added in the reverted commit would write
the boolean to the usec_t value, making a mess.
The code is reworked to have just one implementation and two wrappers
which pass two pointers.
TasksMax= and DefaultTasksMax= can be specified as percentages. We don't
actually document of what the percentage is relative to, but the implementation
uses the smallest of /proc/sys/kernel/pid_max, /proc/sys/kernel/threads-max,
and /sys/fs/cgroup/pids.max (when present). When the value is a percentage,
we immediately convert it to an absolute value. If the limit later changes
(which can happen e.g. when systemd-sysctl runs), the absolute value becomes
outdated.
So let's store either the percentage or absolute value, whatever was specified,
and only convert to an absolute value when the value is used. For example, when
starting a unit, the absolute value will be calculated when the cgroup for
the unit is created.
Fixes#13419.