Reload the internal selabel cache automatically on SELinux policy reloads so non pid-1 daemons are participating.
Run the reload function `mac_selinux_reload()` not manually on daemon-reload, but rather pass it as callback to libselinux.
Trigger the callback prior usage of the systemd internal selabel cache by depleting the selinux netlink socket via `avc_netlink_check_nb()`.
Improves: a9dfac21ec ("core: reload SELinux label cache on daemon-reload")
Improves: #13363
Inside format_bytes, we return NULL if the value is UINT64_MAX. This
makes some kind of sense where this has some other semantic meaning than
being a value, but in this case the value is both a.) not the default
(so we definitely want to display it), and b.) means "infinity" (or
"max" in cgroup terminology).
This patch adds a small wrapper around format_bytes that can be used for
these cases, to avoid the following situation:
[root@tangsanjiao ~]# cat /sys/fs/cgroup/workload.slice/memory.low
max
[root@tangsanjiao ~]# systemctl show workload.slice -p MemoryLow
MemoryLow=infinity
[root@tangsanjiao ~]# systemctl status workload.slice | grep low:
Memory: 14.9G (low: (null))
After the patch:
[root@tangsanjiao ~]# systemctl status workload.slice | grep low:
Memory: 15.1G (low: infinity)
`log_enforcing()` and `log_enforcing_errno()` are only used for important messages, which describe failures in enforced mode.
They are guarded either by `!mac_selinux_use()` or `!label_hnd` checks, where the latter is itself guarded by the former.
Only SELinux enabled systems print these logs.
This helps to configure a system in permissive mode, without getting surprising failures when switching to enforced mode.
If we do, we operate on a separate set of logs and runtime objects
The namespace is configured via argv[1].
Fixes: #12123Fixes: #10230#9519
(These latter two issues ask for slightly different stuff, but the
usecases generally can be solved by running separate instances of
journald now, hence also declaring that as "Fixes:")
Previously there was a weird asymmetry: initially we'd resolve the
specified prefix path when chasing symlinks together with the actual
path we were supposed to cover, except when we hit an absolute symlink
where we'd use the root as it was. Let's unify handling here: the prefix
path is never resolved, and always left as it is.
This in particular fixes issues with symlinks in the prefix path, as
that confused the check that made sure we never left the root directory.
Fixes: #14634
Replaces: #14635
Let's remove a function of questionnable utility.
strv_clear() frees the items of a string array, but not the array
itself. i.e. it half-drestructs a string array and makes it empty. This
is not too useful an operation since we almost never need to just do
that, we also want to free the whole thing. In fact, strv_clear() is
only used in one of our .c file, and there it appears like unnecessary
optimization, given that for each array with n elements it leaves the
number of free()s we need to at O(n) which is not really an optimization
at all (it goes from n+1 to n, that's all).
Prompted by the discussions on #14605
In SecureBoot mode this is probably not what you want. As your cmdline
is cryptographically signed like when using Type #2 EFI Unified Kernel
Images (https://systemd.io/BOOT_LOADER_SPECIFICATION/) The user's
intention is then that the cmdline should not be modified. You want to
make sure that the system starts up as exactly specified in the signed
artifact.
This way we can use libxcrypt specific functionality such as
crypt_gensalt() and thus take benefit of the newer algorithms libxcrypt
implements. (Also adds support for a new env var $SYSTEMD_CRYPT_PREFIX
which may be used to select the hash algorithm to use for libxcrypt.)
Also, let's move the weird crypt.h inclusion into libcrypt.h so that
there's a single place for it.
We'd just print nothing and exit with 0. If the user gave an explicit
name, we should fail. If a pattern didn't match, we should at least warn.
$ networkctl status enx54ee75cb1dc0a* --no-pager && echo $?
No interfaces matched.
0
$ networkctl status enx54ee75cb1dc0a --no-pager
Interface "enx54ee75cb1dc0a" not found.
1
In subsequent commits, calls to if_nametoindex() will be replaced by a wrapper
that falls back to alternative name resolution over netlink. netlink support
requires libsystemd (for sd-netlink), and we don't want to add any functions
that require netlink in basic/. So stuff that calls if_nametoindex() for user
supplied interface names, and everything that depends on that, needs to be
moved.
We don't need a seperate output parameter that is of type int. glibc() says
that the type is "unsigned", but the kernel thinks it's "int". And the
"alternative names" interface also uses ints. So let's standarize on ints,
since it's clearly not realisitic to have interface numbers in the upper half
of unsigned int range.
Let's add the actual unicode names of the glyphs we use. Let's also add
in comments what the width expectations of these glyphs are on the
console.
Also, remove the "mdash" definition. First of all it wasn't used, but
what's worse the glyph encoded was actually an "ndash"...
Fixes: #14075
The variable is always initialized, but the compiler might not notice
that. With gcc-9.2.1-1.fc31:
$ CFLAGS='-Werror=maybe-uninitialized -Og' meson build
$ ninja -C build
[...]
../src/basic/tmpfile-util.c: In function ‘mkostemp_safe’:
../src/basic/tmpfile-util.c:76:12: error: ‘fd’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
76 | if (fd < 0)
| ^
On my machine stat returns size 22, but only 20 bytes are read:
openat(AT_FDCWD, "/sys/firmware/efi/efivars/LoaderTimeInitUSec-4a67b082-0a4c-41cf-b6c7-440b29bb8c4f", O_RDONLY|O_NOCTTY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=22, ...}) = 0
read(3, "\6\0\0\0", 4) = 4
read(3, "7\0001\0001\0003\0005\0002\0007\0\0\0", 18) = 16
Failed to read LoaderTimeInitUSec: Input/output error
Let's just accept that the kernel is returning inconsistent results.
It seems to happen two only two variables on my machine:
/sys/firmware/efi/efivars/LoaderTimeInitUSec-4a67b082-0a4c-41cf-b6c7-440b29bb8c4f
/sys/firmware/efi/efivars/LoaderTimeMenuUSec-4a67b082-0a4c-41cf-b6c7-440b29bb8c4f
so it might be related to the way we write them.
Let's improve memory allocation for call such as strv_extend() that just
one item to an strv: these are often called in a loop, where they used
to be very ineffecient, since we'd allocate byte-exact space. With this
change let's improve on that, by allocating exponentially by rounding up
to the next exponent of 2. This way we get GREEDY_REALLOC()-like
behaviour without passing around state.
In fact this should be good enough so that we could replace existing
loops around GREEDY_REALLOC() for strv build-up with plain strv_extend()
and get similar behaviour.
This adds a new safe_fork() flag. If set the child process' fd 1 becomes
fd 2 of the caller. This is useful for invoking tools (such as various
mkfs/fsck implementations) that output status messages to stdout, but
which we invoke and don't want to pollute stdout with their output.
This is not a new system call at all (since kernel 2.2), however it's
not exposed in glibc (a wrapper is exposed however in sigqueue(), but it
substantially simplifies the system call). Since we want a nice fallback
for sending signals on non-pidfd systems for pidfd_send_signal() let's
wrap rt_sigqueueinfo() since it takes the same siginfo_t parameter.
Reloading the SELinux label cache here enables a light-wight follow-up of a SELinux policy change, e.g. adding a label for a RuntimeDirectory.
Closes: #13363
usually we want to create new files with mode 0666 (modulated by the
umask). Sometimes we want more restrictive access though, let's add an
explicit flag support for that.
(Note that we don't bother with arbitrary access modes to keep things
simple: just "open as umask permits" and "private to me", nothing else)
This adds xfopenat() which is to fopen() what xopendirat() is to
opendir(), i.e. the "at" counterpart to fopen().
(Similar to the xopendir() case, we prefix this with "x", in case libc
gains this natively eventually.)
This makes the code do what the documentation says. The code had no inkling
about initrd.target, so I think this change is fairly risky. As a fallback,
default.target will be loaded, so initramfses which relied on current behaviour
will still work, as along as they don't have a different initrd.target.
In an initramfs created with recent dracut:
$ ls -l usr/lib/systemd/system/{default.target,initrd.target}
lrwxrwxrwx. usr/lib/systemd/system/default.target -> initrd.target
-rw-r--r--. usr/lib/systemd/system/initrd.target
So at least for dracut, there should be no difference.
Also avoid a pointless allocation.
This makes the naming more consistent: we now have
bootctl systemd-efi-options,
$SYSTEMD_EFI_OPTIONS
and the SystemdOptions EFI variable.
(SystemdEFIOptions would be redundant, because it is only used in the context
of efivars, and users don't interact with that name directly.)
bootctl is adjusted to use 2sp indentation, similarly to systemctl and other
programs.
Remove the prefix with the old name from 'bootctl systemd-efi-options' output,
since it's redundant and we don't want the old name anyway.
We would only write to the field, and take the address. All *readers* were
removed in 2841493927. (The explanation for why
the field wasn't removed back then is that the patch underwent a few iterations,
with the initial version adding translation back and forth. Later versions of
the patch simply emit a warning and ignore the old value. Apparently nobody
noticed that the value became unused.)
TasksMax= and DefaultTasksMax= can be specified as percentages. We don't
actually document of what the percentage is relative to, but the implementation
uses the smallest of /proc/sys/kernel/pid_max, /proc/sys/kernel/threads-max,
and /sys/fs/cgroup/pids.max (when present). When the value is a percentage,
we immediately convert it to an absolute value. If the limit later changes
(which can happen e.g. when systemd-sysctl runs), the absolute value becomes
outdated.
So let's store either the percentage or absolute value, whatever was specified,
and only convert to an absolute value when the value is used. For example, when
starting a unit, the absolute value will be calculated when the cgroup for
the unit is created.
Fixes#13419.
This partially reverts db11487d10 (the logic to
calculate the correct value is removed, we always use the same setting as for
the system manager). Distributions have an easy mechanism to override this if
they wish.
I think making this configurable is better, because different distros clearly
want different defaults here, and making this configurable is nice and clean.
If we don't make it configurable, distros which either have to carry patches,
or what would be worse, rely on some other configuration mechanism, like
/etc/profile. Those other solutions do not apply everywhere (they usually
require the shell to be used at some point), so it is better if we provide
a nice way to override the default.
Fixes #13469.
We already handle it specially in get_timezones(), hence we should OK it
here too, even if the timezone file doesn't actually exist.
Prompted by:
https://serverfault.com/questions/991172/invalid-time-zone-utc
(Yes, Ubuntu should install the UTC timezone data unconditionally: it
should not be an option, even if all other timezone data is excluded,
but since it's our business to validate user input but not out business
to validate distros, let's just accept "UTC" unconditionally, it's magic
after all)
f5947a5e92 dropped missing.h and
replaced with the more specific headers but did not add
missing_fcntl.h in places that use O_TMPFILE. This is needed for
some older versions of glibc.
Discussed in #13743, the -.service semantic conflicts with the
existing root mount and slice names, making this feature not
uniformly extensible to all types. Change the name to be
<type>.d instead.
Updating to this format also extends the top-level dropin to
unit types.