This is based on @jsynacek's patch from #8837, but adds the new URL in
two flavours instead of replacing the old, also making @keszybz happy.
Replaces: #8837
Previously we were a bit sloppy with the index and size types of arrays,
we'd regularly use unsigned. While I don't think this ever resulted in
real issues I think we should be more careful there and follow a
stricter regime: unless there's a strong reason not to use size_t for
array sizes and indexes, size_t it should be. Any allocations we do
ultimately will use size_t anyway, and converting forth and back between
unsigned and size_t will always be a source of problems.
Note that on 32bit machines "unsigned" and "size_t" are equivalent, and
on 64bit machines our arrays shouldn't grow that large anyway, and if
they do we have a problem, however that kind of overly large allocation
we have protections for usually, but for overflows we do not have that
so much, hence let's add it.
So yeah, it's a story of the current code being already "good enough",
but I think some extra type hygiene is better.
This patch tries to be comprehensive, but it probably isn't and I missed
a few cases. But I guess we can cover that later as we notice it. Among
smaller fixes, this changes:
1. strv_length()' return type becomes size_t
2. the unit file changes array size becomes size_t
3. DNS answer and query array sizes become size_t
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=76745
This is used as 'systemd-analyze show-config systemd/logind.conf', which
will dump
/etc/systemd/system/user@.service
/etc/systemd/system/user@.service.d/*.conf
/run/systemd/system/user@.service.d/*.conf
/usr/local/lib/systemd/system/user@.service.d/*.conf
/usr/lib/systemd/system/user@.service.d/*.conf
The idea is to make it easy to dump the configuration using the same locations
and order that systemd programs use themselves (including masking, in the right
order, etc.). This is the generic variant that works with any configuration
scheme that follows the same general rules:
$ systemd-analyze cat-config systemd/system.conf
$ systemd-analyze cat-config systemd/user.conf
$ systemd-analyze cat-config systemd/logind.conf
$ systemd-analyze cat-config systemd/sleep.conf
$ systemd-analyze cat-config systemd/journald.conf
$ systemd-analyze cat-config systemd/journal-remote.conf
$ systemd-analyze cat-config systemd/journal-upload.conf
$ systemd-analyze cat-config systemd/coredump.conf
$ systemd-analyze cat-config systemd/resolved.conf
$ systemd-analyze cat-config systemd/timesyncd.conf
$ systemd-analyze cat-config udev/udev.conf
We use MTUs all over the place, let's add a unified, strict parser for
it, that takes MTU ranges into account.
We already have parse_ifindex() close-by, hence this appears to be a
natural addition, in particular as the range checking is not entirely
trivial to do, as it depends on the protocol used.
This drops a good number of type-specific _cleanup_ macros, and patches
all users to just use the generic ones.
In most recent code we abstained from defining type-specific macros, and
this basically removes all those added already, with the exception of
the really low-level ones.
Having explicit macros for this is not too useful, as the expression
without the extra macro is generally just 2ch wider. We should generally
emphesize generic code, unless there are really good reasons for
specific code, hence let's follow this in this case too.
Note that _cleanup_free_ and similar really low-level, libc'ish, Linux
API'ish macros continue to be defined, only the really high-level OO
ones are dropped. From now on this should really be the rule: for really
low-level stuff, such as memory allocation, fd handling and so one, go
ahead and define explicit per-type macros, but for high-level, specific
program code, just use the generic _cleanup_() macro directly, in order
to keep things simple and as readable as possible for the uninitiated.
Note that before this patch some of the APIs (notable libudev ones) were
already used with the high-level macros at some places and with the
generic _cleanup_ macro at others. With this patch we hence unify on the
latter.
With the recent terminal_urlify() APIs we'll now sometimes generate
clickable link CSO sequences. Hence we should also be able to remove
them again from strings. This beefs up the logic to do so.
Follow-up for: 23b27b39d2
Those are quite similar to %i/%I, but refer to the last dash-separated
component of the name prefix.
The new functionality of dash-dropins could largely supersede the template
functionality, so it would be tempting to overload %i/%I. But that would
not be backwards compatible. So let's add the two new letters instead.
Newer terminals (in particular gnome-terminal) understand special escape
sequence for formatting clickable links. Let's support that to make our
tool output more clickable where that's appropriate.
For details see this:
https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda
The one big issue is that 'less' currently doesn't grok this, and
doesn't ignore sequence like regular terminal implementations do if they
don't support it. Hence for now, let's disable URL output if a pager is
used. We should revisit that though as soon as less added support for it
and enough time passed for it to enter various distributions.
Double newlines (i.e. one empty lines) are great to structure code. But
let's avoid triple newlines (i.e. two empty lines), quadruple newlines,
quintuple newlines, …, that's just spurious whitespace.
It's an easy way to drop 121 lines of code, and keeps the coding style
of our sources a bit tigther.
We check the same condition at various places. Let's add a trivial,
common helper for this, and use it everywhere.
It's not going to make things much faster or much shorter, but I think a
lot more readable
If the flag is set only a single step of the normalization is executed,
and the resulting path is returned.
This allows callers to normalize piecemeal, taking into account every
single intermediary path of the normalization.
We have plenty of code in our codebase that outputs tables to the
console, and all is homegrown and awful. Let's replace it with a generic
implementation that can do automatically what the old implementations
did manually.
Features:
1. Ellipsation (for fields overly long) and alignment (for
fields overly short)
2. Sorting of rows
3. automatically copies formatting from the same cell in the row above
4. Heavy use of varargs to make putting together tables easy
5. can expand and compress tables, with weights
6. Has a minimal understanding of unicode wide characters in order to
match unicode strings to character cell terminals.
7. Columns can be reordered and individually turned off.
8. pretty printing for various data types
And more.
This primarily changes to things:
1. Ellipsation to 0, 1 or 2 characters is now supported. Previously we'd
hit an assert if the new lengths was < 3, this is now permitted. The
result strings won't show too much info still of course, but the code
becomes a bit more generic and robust to use.
2. If a UTF-8 mode is disabled and the input string is pure ASCII, then
"..." is used for ellipsation, otherwise (as before) "…". This means
on a pure-ASCII system we should remain pure-ASCII, matching
behaviour otherwise exposed with special_glyph() and friends. Note
that we'll use "…" for ellipsiation as soon as either the locale
settings indicate an UTF-8 mode or the input string already contains
non-ASCII unicode characters.
Testing for these special cases is improved.
We go through the whole file system, so this test can take arbitrary time. But
this test is still quite useful, so let's at least try to make it more efficent
by not descending at all into the directories we would filter out later on
anyway.
Also increase the timeout, in case the previous step doesn't help enough.
Absolute paths make everything simple and quick, but sometimes this requirement
can be annoying. A good example is calling 'test', which will be located in
/usr/bin/ or /bin depending on the distro. The need the provide the full path
makes it harder a portable unit file in such cases.
This patch uses a fixed search path (DEFAULT_PATH which was already used as the
default value of $PATH), and if a non-absolute file name is found, it is
immediately resolved to a full path using this search path when the unit is
loaded. After that, everything behaves as if an absolute path was specified. In
particular, the executable must exist when the unit is loaded.
Doing manager_load_unit() followed by UNIT_VTABLE(unit)->start(unit) would
result in an assertion failure in ->start() if the unit failed to load
properly. Something like this is okey-ish is tests, since the test units are
not expected to fail to load, but the reason for failure is clearer if we
fail immediately.
$ sudo swapoff -av
swapoff /dev/vda4
$ sudo systemctl hibernate
Failed to hibernate system via logind: Not enough swap space for hibernation
Fixes#6729.
The Linux kernel is adding support for configuring the offset
into a disk. This allows swapfiles to be more usable as users
will no longer need to set the offset on their kernel command
line.
Use this API in systemd when hibernating as well.
Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
Running `test-path` under an umask such as 027 fails with:
Assertion '(s.st_mode & S_IRWXO) == 0004' failed at ../src/test/test-path.c:247, function test_path_makedirectory_directorymode(). Aborting.
Looking at directory /tmp/test-path_makedirectory, it was indeed created
with mode 0740, applying the umask to the requested 0744.
Set an explicit umask for this test, to ensure reproducible results.
We have the same code for this in place at various locations, let's
unify that. Also, let's repurpose test-fs-util.c as a test for this new
helper cal..
Not that it matters much, but it seems cleaner to also count those
inputs, even if they do not consume extra storage space.
The test is extended to include an empty input and counts in the test are
adjusted to include it.
When we are attempting to create directory somewhere in the bowels of /var/lib
and get an error that it already exists, it can be quite hard to diagnose what
is wrong (especially for a user who is not aware that the directory must have
the specified owner, and permissions not looser than what was requested). Let's
print a warning in most cases. A warning is appropriate, because such state is
usually a sign of borked installation and needs to be resolved by the adminstrator.
$ build/test-fs-util
Path "/tmp/test-readlink_and_make_absolute" already exists and is not a directory, refusing.
(or)
Directory "/tmp/test-readlink_and_make_absolute" already exists, but has mode 0775 that is too permissive (0755 was requested), refusing.
(or)
Directory "/tmp/test-readlink_and_make_absolute" already exists, but is owned by 1001:1000 (1000:1000 was requested), refusing.
Assertion 'mkdir_safe(tempdir, 0755, getuid(), getgid(), MKDIR_WARN_MODE) >= 0' failed at ../src/test/test-fs-util.c:320, function test_readlink_and_make_absolute(). Aborting.
No functional change except for the new log lines.
The warning is not emitted for absolute paths like /dev/sda or /home, which are
converted to .device and .mount unit names without any fuss.
Most of the time it's unlikely that users use invalid unit names on purpose,
so let's warn them. Warnings are silenced when --quiet is used.
$ build/systemctl show -p Id hello@foo-bar/baz
Invalid unit name "hello@foo-bar/baz" was escaped as "hello@foo-bar-baz" (maybe you should use systemd-escape?)
Id=hello@foo-bar-baz.service
$ build/systemd-run --user --slice foo-bar/baz --unit foo-bar/foo true
Invalid unit name "foo-bar/foo" was escaped as "foo-bar-foo" (maybe you should use systemd-escape?)
Invalid unit name "foo-bar/baz" was escaped as "foo-bar-baz" (maybe you should use systemd-escape?)
Running as unit: foo-bar-foo.service
Fixes#8302.
We have only three bits of space, i.e. 8 possible classes. Immediately reject
anything outside of that range. Add the fuzzer test case and an additional
unit test.
oss-fuzz #6908.
Also fix one case where the presence of a newline was used to generate
an invalid environment assignment.
Tested: with mkosi, which builds the local tree and run ninja tests.
example.swaps with "(deleted)" does not cause bogus entries in the list now,
but a memleak in libmount instead. The memleaks is not very important since
this code is run just once.
Reported as https://github.com/karelzak/util-linux/issues/596.
$ build/test-umount
...
/* test_swap_list("/proc/swaps") */
path=/var/tmp/swap o= f=0x0 try-ro=no dev=0:0
path=/dev/dm-2 o= f=0x0 try-ro=no dev=0:0
/* test_swap_list("/home/zbyszek/src/systemd/test/test-umount/example.swaps") */
path=/some/swapfile o= f=0x0 try-ro=no dev=0:0
path=/dev/dm-2 o= f=0x0 try-ro=no dev=0:0
==26912==
==26912== HEAP SUMMARY:
==26912== in use at exit: 16 bytes in 1 blocks
==26912== total heap usage: 1,546 allocs, 1,545 frees, 149,008 bytes allocated
==26912==
==26912== 16 bytes in 1 blocks are definitely lost in loss record 1 of 1
==26912== at 0x4C31C15: realloc (vg_replace_malloc.c:785)
==26912== by 0x55C5D8C: _IO_vfscanf (in /usr/lib64/libc-2.26.so)
==26912== by 0x55D8AEC: vsscanf (in /usr/lib64/libc-2.26.so)
==26912== by 0x55D25C3: sscanf (in /usr/lib64/libc-2.26.so)
==26912== by 0x53236D0: mnt_table_parse_stream (in /usr/lib64/libmount.so.1.1.0)
==26912== by 0x53249B6: mnt_table_parse_file (in /usr/lib64/libmount.so.1.1.0)
==26912== by 0x10D157: swap_list_get (umount.c:194)
==26912== by 0x10B06E: test_swap_list (test-umount.c:34)
==26912== by 0x10B24B: main (test-umount.c:56)
==26912==
==26912== LEAK SUMMARY:
==26912== definitely lost: 16 bytes in 1 blocks
==26912== indirectly lost: 0 bytes in 0 blocks
==26912== possibly lost: 0 bytes in 0 blocks
==26912== still reachable: 0 bytes in 0 blocks
==26912== suppressed: 0 bytes in 0 blocks
This seems to be a false positive in msan:
https://github.com/google/sanitizers/issues/767.
I don't see anything wrong with the code either, and valgrind does not see the
issue. Anyway, let's add the test case.
We don't have msan hooked up yet, but hopefully we'll in the future.
oss-fuzz #6884.
When running tests like test-unit-name, there is not point in setting
up the cgroup and signals and interacting with the environment. Similarly
when running fuzz testing of the parser.
Add new MANAGER_TEST_RUN_BASIC which takes the role of MANAGER_TEST_RUN_MINIMAL,
and redefine MANAGER_TEST_RUN_MINIMAL to just create the basic data structures.
This mainly gets around a kernel bug making it possible to
have non-existent paths in /proc/self/mountinfo, but it should also
prevent flaky failures that can happen if something changes immediately
after or during reading /proc/self/mountinfo.
Closes https://github.com/systemd/systemd/issues/8286.
Suspend to Hibernate is a new sleep method that invokes suspend
for a predefined period of time before automatically waking up
and hibernating the system.
It's similar to HybridSleep however there isn't a performance
impact on every suspend cycle.
It's intended to use with systems that may have a higher power
drain in their supported suspend states to prevent battery and
data loss over an extended suspend cycle.
Signed-off-by: Mario Limonciello <mario.limonciello@dell.com>
This should make it a bit easier to search for real file descriptor leaks.
```
$ valgrind --leak-check=full --track-fds=yes ./build/test-fileio
...
==29457==
==29457== FILE DESCRIPTORS: 4 open at exit.
==29457== Open file descriptor 3: /tmp/test-systemd_writing_tmpfile.lyV5Rc
==29457== at 0x4B9AD9E: open (open.c:43)
==29457== by 0x4B19B24: __gen_tempname (tempname.c:261)
==29457== by 0x4BA5CC3: mkostemp64 (mkostemp64.c:32)
==29457== by 0x48F739B: mkostemp_safe (fileio.c:1206)
==29457== by 0x10D968: test_writing_tmpfile (test-fileio.c:620)
==29457== by 0x10E930: main (test-fileio.c:767)
==29457==
```
This helps get around a bug confusing `glibc` and making the test bail
out with the following error under `asan` on `x86`:
Fatal error: glibc detected an invalid stdio handle
Aborted (core dumped)
The bug has been reported in https://github.com/google/sanitizers/issues/778,
but it is unlikely to be fixed anytime soon.
The unit files for test-execute are named like
`exec-(setting-name-in-lower-character)-(optional-text).service`.
However, test units for AmbientCapabilities= are not following this.
So, let's rename them for the consistency.
This does not change anything in the functionality of the test.
Quite often we need to set up a number of fds as stdin/stdout/stderr of
a process we are about to start. Add a generic implementation for a
routine doing that that takes care to do so properly:
1. Can handle the case where stdin/stdout/stderr where previously
closed, and the fds to set as stdin/stdout/stderr hence likely in the
0..2 range. handling this properly is nasty, since we need to first
move the fds out of this range in order to later move them back in, to
make things fully robust.
2. Can optionally open /dev/null in case for one or more of the fds, in
a smart way, sharing the open file if possible between multiple of
the fds.
3. Guarantees that O_CLOEXEC is not set on the three fds, even if the fds
already were in the 0..2 range and hence possibly weren't moved.
The nobody user/group may not synthesized by systemd.
To run tests the functionalities in such situation, this adds tests
by user/group by daemon, as it is expected to exists all environments.
"-m" is specified as a short form of "--monitor" in the option struct,
but not included in getopt_long's optstring. Update the optstring
to be consistent with the option struct.
Several tests request nobody user or group. If they are badly
configured, then tests may fail.
This makes test-execute check nobody user and group are configured
correctly before running such tests.
Fixes#8276.
gcc warns about unitialized memory access because it notices that ssize_t which
is < 0 could be cast to positive int value. We know that this can't really
happen because only -1 can be returned, but OTOH, in principle a large
*positive* value cannot be cast properly. This is unlikely too, since xattrs
cannot be too large, but it seems cleaner to just use a size_t to return the
value and avoid the cast altoghter. This makes the code simpler and gcc is
happy too.
The following warning goes away:
[113/1502] Compiling C object 'src/basic/basic@sta/xattr-util.c.o'.
In file included from ../src/basic/alloc-util.h:28:0,
from ../src/basic/xattr-util.c:30:
../src/basic/xattr-util.c: In function ‘fd_getcrtime_at’:
../src/basic/macro.h:207:60: warning: ‘b’ may be used uninitialized in this function [-Wmaybe-uninitialized]
UNIQ_T(A,aq) < UNIQ_T(B,bq) ? UNIQ_T(A,aq) : UNIQ_T(B,bq); \
^
../src/basic/xattr-util.c:155:19: note: ‘b’ was declared here
usec_t a, b;
^
If log_do_header() was called with overly long parameters, it'd generate
improper output. Essentially, it'd be truncated at random point, in particular
missing a newline at the end, so it'd run with the next field, usually MESSAGE=.
log_do_header is called with parameters from compiled code (file name, lien
nubmer, etc), so in practice this was unlikely to ever be a problem, but it is
possible. In particular, if systemd was compiled from sources in some deeply
nested directory (which happens for example in mock and other build roots), the
filename could be very long.
As a safety measure, let's truncate all parameters to 256 bytes. So we have
5 fields which are 256 bytes (plus the field name prefix), and a few other
fields with fixed width. This must always fit in the 2048 byte buffer.
I don't think there's much gain in calculating the required length precisely,
since it's a lot of fields and a few bytes allocated on the stack don't matter.
This patch adds safe_atoux16 for parsing an unsigned hexadecimal 16bit int, and
uses that for parsing USB device and vendor IDs.
This fixes a compile error with gcc-8 because while we know that USB IDs are 2 bytes,
the compiler does not know that.
../src/udev/udev-builtin-hwdb.c:80:38: error: '%04X' directive output may be
truncated writing between 4 and 8 bytes into a region of size between 2 and 6
[-Werror=format-truncation=]
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Signed-off-by: Patrick Uiterwijk <puiterwijk@redhat.com>
This improves the BPF/cgroup detection logic, and looks whether
BPF_ALLOW_MULTI is supported. This flag allows execution of multiple
BPF filters in a recursive fashion for a whole cgroup tree. It enables
us to properly report IP accounting for slice units, as well as
delegation of BPF support to units without breaking our own IP
accounting.
When synthetisation is turned off, there's just too many ways those tests can
go wrong. We are not interested in verifying that the db on disk is correct,
let's just skip all checks.
In the first version of this patch, I recorded if we detected a mismatch during
configuration and only skipped tests in that case, but actually it is possible
to change the host configuration between our configuration phase and running
of the tests. It's just more robust to skip always. (This is particularly true
if tests are installed.)
This introduces a new setting TemporaryFileSystem=. This is useful
to hide files not relevant to the processes invoked by unit, while
necessary files or directories can be still accessed by combining
with Bind{,ReadOnly}Paths=.
This makes it easier to see what is going on. Crashes may happen in a
nested test_{uid,gid}_to_name_one() function, and the default backtrace
doesn't show the actual string being tested.
The Linux kernel exposes the birth time now for files through statx()
hence make use of it where available. We keep the xattr logic in place
for this however, since only a subset of file systems on Linux currently
expose the birth time. NFS and tmpfs for example do not support it. OTOH
there are other file systems that do support the birth time but might
not support xattrs (smb…), hence make the best of the two, in particular
in order to deal with journal files copied between file system types and
to maintain compatibility with older file systems that are updated to
newer version of the file system.
config_parse_join_controllers would free the destination argument on failure,
which is contrary to our normal style, where failed parsing has no effect.
Moving it to shared also allows a test to be added.
We already have the terminal open, hence pass the fd we got to
ask_password_tty(), so that it doesn't have to reopen it a second time.
This is mostly an optimization, but it has the nice benefit of making us
independent from RLIMIT_NOFILE issues and so on, as we don't need to
allocate another fd needlessly.
This new helper not only removes a file from a directory but also
ensures its space on disk is deallocated, by either punching a hole over
the full file or truncating the file afterwards if the file's link
counter is 0. This is useful in "vacuuming" algorithms to ensure that
client's can't keep the disk space the vacuuming is supposed to recover
pinned simply by keeping an fd open to it.
This is similar to string_hash_ops but operates one file system paths
specifically. It will ensure that "/foo//bar" and "///foo/bar" are
considered to be the same path for hashmap purposes.
This makes use of the existing path_compare() API, and adds a matching
hashing function for it.
Note that relative and absolute paths will hash to different values,
however whether the path is suffixed with a slash or not is not
detected. This matches the existing path_compare() behaviour, and
follows the logic that on Linux there can't be two different objects at
path /foo/bar and /foo/bar/ either.
This adds some paranoia code that moves some of the fds we allocate for
longer periods of times to fds > 2 if they are allocated below this
boundary. This is a paranoid safety thing, in order to avoid that
external code might end up erroneously use our fds under the assumption
they were valid stdin/stdout/stderr. Think: some app closes
stdin/stdout/stderr and then invokes 'fprintf(stderr, …' which causes
writes on our fds.
This both adds the helper to do the moving as well as ports over a
number of users to this new logic. Since we don't want to litter all our
code with invocations of this I tried to strictly focus on fds we keep
open for long periods of times only and only in code that is frequently
loaded into foreign programs (under the assumptions that in our own
codebase we are smart enough to always keep stdin/stdout/stderr
allocated to avoid this pitfall). Specifically this means all code used
by NSS and our sd-xyz API:
1. our logging APIs
2. sd-event
3. sd-bus
4. sd-resolve
5. sd-netlink
This changed was inspired by this:
https://github.com/systemd/systemd/issues/8075#issuecomment-363689755
This shows that apparently IRL there are programs that do close
stdin/stdout/stderr, and we should accomodate for that.
Note that this won't fix any bugs, this just makes sure that buggy
programs are less likely to interfere with out own code.
This is preparation for emulating the "usage_usec" keyed attribute of
the "cpu.stat" property of the root cgroup from data in /proc. Similar,
for emulating the "memory.current" attribute.
It's not good if the paths are in different order. With --user, we expect
more paths, but it must be a strict superset, and the order for the ones
that appear in both sets must be the same.
$ diff -u <(build/systemd-analyze --global unit-paths) <(build/systemd-analyze --user unit-paths)|colordiff
--- /proc/self/fd/14 2018-02-08 14:11:45.425353107 +0100
+++ /proc/self/fd/15 2018-02-08 14:11:45.426353116 +0100
@@ -1,6 +1,17 @@
+/home/zbyszek/.config/systemd/system.control
+/run/user/1000/systemd/system.control
+/run/user/1000/systemd/transient
+/run/user/1000/systemd/generator.early
+/home/zbyszek/.config/systemd/user
/etc/systemd/user
+/run/user/1000/systemd/user
/run/systemd/user
+/run/user/1000/systemd/generator
+/home/zbyszek/.local/share/systemd/user
+/home/zbyszek/.local/share/flatpak/exports/share/systemd/user
+/var/lib/flatpak/exports/share/systemd/user
/usr/local/share/systemd/user
/usr/share/systemd/user
/usr/local/lib/systemd/user
/usr/lib/systemd/user
+/run/user/1000/systemd/generator.late
A test is added so that we don't regress on this.
The function `strv_join_quoted()` is now not used, and has a bug
in the buffer size calculation when the strings needs to escaped,
as reported in #8056.
So, let's remove the function.
Closes#8056.
Red is used for highligting, the same as grep does. Except when the line is
highlighted red already, because it has high priority, in which case plain ansi
highlight is used for the matched substring.
Coloring is implemented for short and cat outputs, and not for other types.
I guess we could also add it for verbose output in the future.
Case sensitive or case insensitive matching can be requested using
--case-sensitive[=yes|no].
Unless specified, matching is case sensitive if the pattern contains any
uppercase letters, and case insensitive otherwise. This matches what
forward-search does in emacs, and recently also --ignore-case in less. This
works surprisingly well, because usually when one is wants to do case-sensitive
matching, the pattern is usually camel-cased. In the less frequent case when
case-sensitive matching is required with an all-lowercase pattern,
--case-sensitive can be used to override the automatic logic.
Previously, we'd maintain two hashmaps keyed by PIDs, pointing to Unit
interested in SIGCHLD events for them. This scheme allowed a specific
PID to be watched by exactly 0, 1 or 2 units.
With this rework this is replaced by a single hashmap which is primarily
keyed by the PID and points to a Unit interested in it. However, it
optionally also keyed by the negated PID, in which case it points to a
NULL terminated array of additional Unit objects also interested. This
scheme means arbitrary numbers of Units may now watch the same PID.
Runtime and memory behaviour should not be impact by this change, as for
the common case (i.e. each PID only watched by a single unit) behaviour
stays the same, but for the uncommon case (a PID watched by more than
one unit) we only pay with a single additional memory allocation for the
array.
Why this all? Primarily, because allowing exactly two units to watch a
specific PID is not sufficient for some niche cases, as processes can
belong to more than one unit these days:
1. sd_notify() with MAINPID= can be used to attach a process from a
different cgroup to multiple units.
2. Similar, the PIDFile= setting in unit files can be used for similar
setups,
3. By creating a scope unit a main process of a service may join a
different unit, too.
4. On cgroupsv1 we frequently end up watching all processes remaining in
a scope, and if a process opens lots of scopes one after the other it
might thus end up being watch by many of them.
This patch hence removes the 2-unit-per-PID limit. It also makes a
couple of other changes, some of them quite relevant:
- manager_get_unit_by_pid() (and the bus call wrapping it) when there's
ambiguity will prefer returning the Unit the process belongs to based on
cgroup membership, and only check the watch-pids hashmap if that
fails. This change in logic is probably more in line with what people
expect and makes things more stable as each process can belong to
exactly one cgroup only.
- Every SIGCHLD event is now dispatched to all units interested in its
PID. Previously, there was some magic conditionalization: the SIGCHLD
would only be dispatched to the unit if it was only interested in a
single PID only, or the PID belonged to the control or main PID or we
didn't dispatch a signle SIGCHLD to the unit in the current event loop
iteration yet. These rules were quite arbitrary and also redundant as
the the per-unit handlers would filter the PIDs anyway a second time.
With this change we'll hence relax the rules: all we do now is
dispatch every SIGCHLD event exactly once to each unit interested in
it, and it's up to the unit to then use or ignore this. We use a
generation counter in the unit to ensure that we only invoke the unit
handler once for each event, protecting us from confusion if a unit is
both associated with a specific PID through cgroup membership and
through the "watch_pids" logic. It also protects us from being
confused if the "watch_pids" hashmap is altered while we are
dispatching to it (which is a very likely case).
- sd_notify() message dispatching has been reworked to be very similar
to SIGCHLD handling now. A generation counter is used for dispatching
as well.
This also adds a new test that validates that "watch_pid" registration
and unregstration works correctly.
The new flag returns the O_PATH fd of the final component, which may be
converted into a proper fd by open()ing it again through the
/proc/self/fd/xyz path.
Together with O_SAFE this provides us with a somewhat safe way to open()
files in directories potentially owned by unprivileged code, where we
want to refuse operation if any symlink tricks are played pointing to
privileged files.
When the flag is specified we won't transition to a privilege-owned
file or directory from an unprivileged-owned one. This is useful when
privileged code wants to load data from a file unprivileged users have
write access to, and validates the ownership, but want's to make sure
that no symlink games are played to read a root-owned system file
belonging to a different context.
log.h really should only include the bare minimum of other headers, as
it is really pulled into pretty much everything else and already in
itself one of the most basic pieces of code we have.
Let's hence drop inclusion of:
1. sd-id128.h because it's entirely unneeded in current log.h
2. errno.h, dito.
3. sys/signalfd.h which we can replace by a simple struct forward
declaration
4. process-util.h which was needed for getpid_cached() which we now hide
in a funciton log_emergency_level() instead, which nicely abstracts
the details away.
5. sys/socket.h which was needed for struct iovec, but a simple struct
forward declaration suffices for that too.
Ultimately this actually makes our source tree larger (since users of
the functionality above must now include it themselves, log.h won't do
that for them), but I think it helps to untangle our web of includes a
tiny bit.
(Background: I'd like to isolate the generic bits of src/basic/ enough
so that we can do a git submodule import into casync for it)
This adds a "watch-bind" feature to sd-bus connections. If set and the
AF_UNIX socket we are connecting to doesn't exist yet, we'll establish
an inotify watch instead, and wait for the socket to appear. In other
words, a missing AF_UNIX just makes connecting slower.
This is useful for daemons such as networkd or resolved that shall be
able to run during early-boot, before dbus-daemon is up, and want to
connect to dbus-daemon as soon as it becomes ready.
Let's rework touch_file() so that it works correctly on sockets, fifos,
and device nodes: let's open an O_PATH file descriptor first and operate
based on that, if we can. This is usually the better option as it this
means we can open AF_UNIX nodes in the file system, and update their
timestamps and ownership correctly. It also means we can correctly touch
symlinks and block/character devices without triggering their drivers.
Moreover, by operating on an O_PATH fd we can make sure that we
operate on the same inode the whole time, and it can't be swapped out in
the middle.
While we are at it, rework the call so that we try to adjust as much as
we can before returning on error. This is a good idea as we call the
function quite often without checking its result, and hence it's best to
leave the files around in the most "correct" fashion possible.
This renames wait_for_terminate_and_warn() to
wait_for_terminate_and_check(), and adds a flags parameter, that
controls how much to log: there's one flag that means we log about
abnormal stuff, and another one that controls whether we log about
non-zero exit codes. Finally, there's a shortcut flag value for logging
in both cases, as that's what we usually use.
All callers are accordingly updated. At three occasions duplicate logging
is removed, i.e. where the old function was called but logged in the
caller, too.
This reduces the meson man=false target count to 1281.
v2:
- link test-engine with libshared instead of libsystemd_static
Previous version built fine on F27, but fails on F26 with the following error:
/usr/bin/ld: /tmp/ccr8HRGw.ltrans6.ltrans.o: undefined reference to symbol '__start_BUS_ERROR_MAP@@SD_SHARED'
/home/zbyszek/fedora/systemd/systemd-9d5aae75c64f5583a110f03b94816aacc03bbf4d/x86_64-redhat-linux-gnu/src/shared/libsystemd-shared-236.so: error adding symbols: DSO missing from command line
v3:
- add libudev_basic
gcrypt_util_sources had to be moved because otherwise they appeared twice
in libshared.so halfproducts, causing an error.
-fvisibility=default is added to libbasic, libshared_static so that the symbols
appear properly in the exported symbol list in libshared.
The advantage is that files are not compiled twice. When configured with -Dman=false,
the ninja target list is reduced from 1588 to 1347 targets. The difference in compilation
time is small (<10%). I think this is because of -O0 and ccache and multiple cores, and
in different settings the compilation time could be reduced. The main advantage is that
errors and warnings are not reported twice.
This adds a simple condition/assert/match to the service manager, to
udev's .link handling and to networkd, for matching the kernel version
string.
In this version we only do fnmatch() based globbing, but we might want
to extend that to version comparisons later on, if we like, by slightly
extending the syntax with ">=", "<=", ">", "<" and "==" expressions.
This adds a new safe_fork() wrapper around fork() and makes use of it
everywhere. The new wrapper does a couple of things we previously did
manually and separately in a safer, more correct and automatic way:
1. Optionally resets signal handlers/mask in the child
2. Sets a name on all processes we fork off right after forking off (and
the patch assigns useful names for all processes we fork off now,
following a systematic naming scheme: always enclosed in () – in order
to indicate that these are not proper, exec()ed processes, but only
forked off children, and if the process is long-running with only our
own code, without execve()'ing something else, it gets am "sd-" prefix.)
3. Optionally closes all file descriptors in the child
4. Optionally sets a PR_SET_DEATHSIG to SIGTERM in the child, in a safe
way so that the parent dying before this happens being handled
safely.
5. Optionally reopens the logs
6. Optionally connects stdin/stdout/stderr to /dev/null
7. Debug logs about the forked off processes.
We already use the "_static" suffix for libshared_static ("shared" is the name
of the library, "static" is the format) and other libs, so let's rename for
consistency.
Also change libsystemd_static_sources to libsystemd_sources, since the same
list is used for both and shorter is better.
Up until now, the behaviour in systemd has (mostly) been to silently
ignore failures to action unit directives that refer to an unavailble
controller. The addition of AssertControlGroupController and its
conditional counterpart allow explicit specification of the desired
behaviour when such a situation occurs.
As for how this can happen, it is possible that a particular controller
is not available in the cgroup hierarchy. One possible reason for this
is that, in the running kernel, the controller simply doesn't exist --
for example, the CPU controller in cgroup v2 has only recently been
merged and was out of tree until then. Another possibility is that the
controller exists, but has been forcibly disabled by `cgroup_disable=`
on the kernel command line.
In future this will also support whatever comes out of issue #7624,
`DefaultXAccounting=never`, or similar.
This changes dns_name_between() to deal properly with checking whether B
is between A and C if A and C are equal. Previously we simply returned
-EINVAL in this case, refusing checking. With this change we correct
behaviour: if A and C are equal, then B is "between" both if it is
different from them. That's logical, since we do < and > comparisons, not
<= and >=, and that means that anything "right of A" and "left of C"
lies in between with wrap-around at the ends. And if A and C are equal
that means everything lies between, except for A itself.
This fixes handling of domains using NSEC3 "white lies", for example the
.it TLD.
Fixes: #7421
We currently look for "nobody" and "nfsnobody" when testing groups, both
of which do not exist on Ubuntu, our main testing environment. Let's
extend the tests slightly to also use "nogroup" if it exists.
We already synthesize records for both "root" and "nobody" in
nss-systemd. Let's do the same in our own NSS wrappers that are supposed
to bypass NSS if possible. Previously this was done for "root" only, but
let's clean this up, and do the same for "nobody" too, so that we
synthesize records the same way everywhere, regardless whether in NSS or
internally.
This adds uid_is_system() and gid_is_system(), similar in style to
uid_is_dynamic(). That a helper like this is useful is illustrated by
the fact that test-condition.c didn't get the check right so far, which
this patch fixes.
%h is a special specifier because we look at $HOME (unless running suid, but
let's say that this case does not apply to tmpfiles, since the code is
completely unready to be run suid). For all other specifiers we query the user
db and use those values directly. I'm not sure if this exception is good, but
let's just "document" status quo for now. If this is changes, it should be in
a separate PR.
It's failing on artful s390x and i386:
Running /tmp/autopkgtest.Pexzdu/build.lfO/debian/build-deb/systemd-tmpfiles on 'f /tmp/test-systemd-tmpfiles.c236s1uq/arg - - - - %m'
expect: '01234567890123456789012345678901'
actual: 'e84bc78d162e472a8ac9759f5f1e4e0e'
--- stderr ---
Traceback (most recent call last):
File "/tmp/autopkgtest.Pexzdu/build.lfO/debian/src/test/test-systemd-tmpfiles.py", line 129, in <module>
test_valid_specifiers(user=False)
File "/tmp/autopkgtest.Pexzdu/build.lfO/debian/src/test/test-systemd-tmpfiles.py", line 89, in test_valid_specifiers
test_content('f {} - - - - %m', '{}'.format(id128.get_machine().hex), user=user)
File "/tmp/autopkgtest.Pexzdu/build.lfO/debian/src/test/test-systemd-tmpfiles.py", line 84, in test_content
assert content == expected
AssertionError
-------
Let's skip the test for now until this is resolved properly on the autopkgtest
side.
sd_path_home() returns ENXIO when a variable (such as $XDG_RUNTIME_DIR) is not
defined. Previously we used ENOKEY for unresolvable specifiers. To avoid having
two codes, or translating ENXIO to ENOKEY, I replaced ENOKEY use with ENXIO.
v2:
- use sd_path_home and change to ENXIO everywhere
The code intentionally ignored unknown specifiers, treating them as text. This
needs to change because otherwise we can never add a new specifier in a backwards
compatible way. So just treat an unknown (potential) specifier as an error.
In principle this is a break of backwards compatibility, but the previous
behaviour was pretty much useless, since the expanded value could change every
time we add new specifiers, which we do all the time.
As a compromise for backwards compatibility, only fail on alphanumerical
characters. This should cover the most cases where an unescaped percent
character is used, like size=5% and such, which behave the same as before with
this patch. OTOH, this means that we will not be able to use non-alphanumerical
specifiers without breaking backwards compatibility again. I think that's an
acceptable compromise.
v2:
- add NEWS entry
v3:
- only fail on alphanumerical
This adds a new flavour of strextend(), called
strextend_with_separator(), which takes an optional separator string. If
specified, the separator is inserted between each appended string, as
well as before the first one, but only if the original string was
non-empty.
This new call is particularly useful when appending new options to mount
option strings and suchlike, which need to be comma-separated, and
initially start out from an empty string.
Let's optimize things a bit, and instead of having to strip whitespace
first before decoding base64, let's do that implicitly while doing so.
Given that base64 was designed the way it was designed specifically to
be tolerant to whitespace changes, it's a good idea to do this
automatically and implicitly.
In this way, individual errors in files can be treated differently than a
failure of the whole service.
A test is added to check that the expected value is returned.
Some parts are commented out, because it is not. This will be fixed in
a subsequent commit.
Now the function returns an empty string when given an empty string.
Not sure if this is the best option (maybe this should be an error?),
but at least the behaviour is well defined.
The kernel will reply with -ENOTDIR when we try to access a non-directory under
a name which ends with a slash. But our functions would strip the trailing slash
under various circumstances. Keep the trailing slash, so that
path_is_mount_point("/path/to/file/") return -ENOTDIR when /path/to/file/ is a file.
Tests are added for this change in behaviour.
Also, when called with a trailing slash, path_is_mount_point() would get
"" from basename(), and call name_to_handle_at(3, "", ...), and always
return -ENOENT. Now it'll return -ENOTDIR if the mount point is a file, and
true if it is a directory and a mount point.
v2:
- use strip_trailing_chars()
v3:
- instead of stripping trailing chars(), do the opposite — preserve them.
The code in install-printf.c and unit-printf.c for these is pretty much
the same and very generic. Let's move this all over to the generic
specifier.c, and share the implementations.
The test was written so far under the assumption that if two mounts are
placed onto the same location the "upper" mount is listed later in
/proc/self/mountinfo. This appears not to be guaranteed however, as
running the tests in a normal nspawn shows.
This patch fixes that: it reverses the hashmap of mounts we build:
instead of keying by path, we key by mnt_id, and if we notice that
path_get_mnt_id() doesn't match what a line in /proc/self/mountinfo
says, we use the returned ID to check if maybe another line agrees.
Fixes: #7431
This is a simple wrapper around name_to_handle_at_loop() and
fd_fdinfo_mnt_id() to query the mnt ID of a path. It uses
name_to_handle_at() where it can, and falls back to to
fd_fdinfo_mnt_id() where that doesn't work.
This is a best-effort thing of course, since neither name_to_handle_at()
nor the fdinfo logic work on all kernels.
First of all, let's rename it to read_etc_hostname(), to make clearer
what kind of configuration it actually reads: the file format defined in
/etc/hostname and nothing else.
Secondly: let's port this to use read_line(), i.e. the new way to read
lines from a file in a safe, bounded way.
Thirdly: let's strip leading/trailing whitespace from what we are
reading. Given that we are already pretty lenient what we read (comments
and empty lines), let's be permissive regarding whitespace too.
Fourthly: let's actually validate the hostname when reading it. So far
we tried to make it valid, but that's not always possible (for example,
we can't make an empty hostname valid, ever).
So far I avoided adding license headers to meson files, but they are pretty
big and important and should carry license headers like everything else.
I added my own copyright, even though other people modified those files too.
But this is mostly symbolic, so I hope that's OK.
All this function does is place some data in an in-memory read-only fd,
that may be read back to get the original data back.
Doing this in a way that works everywhere, given the different kernels
we support as well as different privilege levels is surprisingly
complex.
Linux doesn't have faccess(), hence let's emulate it. Linux has access()
and faccessat() but neither allows checking the access rights of an fd
passed in directly.
This path to ping is compatible with both debian-like and usr-merged
distros. This keeps the test simple, and should thus pass everywhere.
Fixes: #7267