device_path_make_{major_minor|canonical) generate device node paths
given a mode_t and a dev_t. We have similar code all over the place,
let's unify this in one place. The former will generate a "/dev/char/"
or "/dev/block" path, and never go to disk. The latter then goes to disk
and resolves that path to the actual path of the device node.
device_path_parse_major_minor() reverses device_path_make_major_minor(),
also withozut going to disk.
We have similar code doing something like this at various places, let's
unify this in a single set of functions. This also allows us to teach
them special tricks, for example handling of the
/run/systemd/inaccessible/{blk|chr} device nodes, which we use for
masking device nodes, and which do not exist in /dev/char/* and
/dev/block/*
We should probably drop path_join() entirely in the long run (and
then rename path_join_many() to it?), but for now let's make one a
wrapper for the other.
Let's be a bit more careful when parsing major/minor pairs, and filter
out more corner cases. This also means using safe_atou() rather than
sscanf() to avoid weird negative unsigned handling and such.
As it turns out glibc and the Linux kernel have different ideas about
the size of dev_t and how many bits exist for the major and the minor.
When validating major/minor numbers we should check against the kernel's
actual sizes, hence add macros for this.
Let's simplify things and drop the logic that /var/lib/machines is setup
as auto-growing btrfs loopback file /var/lib/machines.raw.
THis was done in order to make quota available for machine management,
but quite frankly never really worked properly, as we couldn't grow the
file system in sync with its use properly. Moreover philosophically it's
problematic overriding the admin's choice of file system like this.
Let's hence drop this, and simplify things. Deleting code is a good
feeling.
Now that regular file systems provide project quota we could probably
add per-machine quota support based on that, hence the btrfs quota
argument is not that interesting anymore (though btrfs quota is a bit
more powerful as it allows recursive quota, i.e. that the machine pool
gets an overall quota in addition to per-machine quota).
Fixes: #2728
This is also supposed to be preparation for doing #10234 eventually,
where a very similar operation is requested: instead of importing a tree
to /var/lib/machines it would need to be imported into
/var/lib/portables/.
This adds two optional functions that may be passed to the various copy
functions. One is invoked whenever we start copying a new file object,
the other while we copy file payload in each loop iteration.
When the caller passes one or both they can get notifications about copy
progress, for example to log where things are.
This is to startswith() what PATH_STARTSWITH_SET() is to
path_startswith().
Or in other words, checks if the specified string has any of the listed
prefixes, and if so, returns the remainder of the string.
This changes cg_enable_everywhere() to return which controllers are
enabled for the specified cgroup. This information is then used to
correctly track the enablement mask currently in effect for a unit.
Moreover, when we try to turn off a controller, and this works, then
this is indicates that the parent unit might succesfully turn it off
now, too as our unit might have kept it busy.
So far, when realizing cgroups, i.e. when syncing up the kernel
representation of relevant cgroups with our own idea we would strictly
work from the root to the leaves. This is generally a good approach, as
when controllers are enabled this has to happen in root-to-leaves order.
However, when controllers are disabled this has to happen in the
opposite order: in leaves-to-root order (this is because controllers can
only be enabled in a child if it is already enabled in the parent, and
if it shall be disabled in the parent then it has to be disabled in the
child first, otherwise it is considered busy when it is attempted to
remove it in the parent).
To make things complicated when invalidating a unit's cgroup membershup
systemd can actually turn off some controllers previously turned on at
the very same time as it turns on other controllers previously turned
off. In such a case we have to work up leaves-to-root *and*
root-to-leaves right after each other. With this patch this is
implemented: we still generally operate root-to-leaves, but as soon as
we noticed we successfully turned off a controller previously turned on
for a cgroup we'll re-enqueue the cgroup realization for all parents of
a unit, thus implementing leaves-to-root where necessary.
Ideally, coccinelle would strip unnecessary braces too. But I do not see any
option in coccinelle for this, so instead, I edited the patch text using
search&replace to remove the braces. Unfortunately this is not fully automatic,
in particular it didn't deal well with if-else-if-else blocks and ifdefs, so
there is an increased likelikehood be some bugs in such spots.
I also removed part of the patch that coccinelle generated for udev, where we
returns -1 for failure. This should be fixed independently.
Commit 100d5f6ee6 (user-util: add new wrappers for [...] database
files), ammended by commit 4f07ffa8f5 (Use #if instead of #ifdef for
ENABLE_GSHADOW) moved code from sysuser to basic/user-util.
In doing so, the combination of both commits properly propagated the
ENABLE_GSHADOW conditions around the function manipulating gshadow, but
they forgot to protect the inclusion of the gshadow.h header.
Fix that to be able to build on C libraries that do not provide gshadow
(e.g. uClibc-ng, where it does not exist.)
This changes DEFINE_MAIN_FUNCTION_WITH_POSITIVE_FAILURE() to propagate positive
return values as they were, i.e. stops mapping them all to EXIT_FAILURE. This
was suggested in review, but I thought that we only ever return EXIT_FAILURE,
so we don't need to propagate multiple return values.
I was wrong. Turns out that we already *do* have multiple positive return
values, when we call external binaries and propagate the result. systemd-inhibit
is one example, and b453c447e0 actually broke
this propagation. This commit fixes it.
In systemd-fsck we have the opposite case: we have only one failure value, and the
code needs to be adjusted, so that it keeps returning EXIT_FAILURE.
All other users of DEFINE_MAIN_FUNCTION_WITH_POSITIVE_FAILURE() return <= 1, and
are unaffected by this change.
We generally want to close the pager last. This patch closes the pager last,
after the static destuctor calls. This means that they can do logging and such
like during normal program runtime.
This doesn't have much effect on the final build, because we link libbasic.a
into libsystemd-shared.so, so in the end, all the object built from basic/
end up in libsystemd-shared. And when the static library is linked into binaries,
any objects that are included in it but are not used are trimmed. Hence, the
size of output artifacts doesn't change:
$ du -sb /var/tmp/inst*
54181861 /var/tmp/inst1 (old)
54207441 /var/tmp/inst1s (old split-usr)
54182477 /var/tmp/inst2 (new)
54208041 /var/tmp/inst2s (new split-usr)
(The negligible change in size is because libsystemd-shared.so is bigger
by a few hundred bytes. I guess it's because symbols are named differently
or something like that.)
The effect is on the build process, in particular partial builds. This change
effectively moves the requirements on some build steps toward the leaves of the
dependency tree. Two effects:
- when building items that do not depend on libsystemd-shared, we
build less stuff for libbasic.a (which wouldn't be used anyway,
so it's a net win).
- when building items that do depend on libshared, we reduce libbasic.a as a
synchronization point, possibly allowing better parallelism.
Method:
1. copy list of .h files from src/basic/meson.build to /tmp/basic
2. $ for i in $(grep '.h$' /tmp/basic); do echo $i; git --no-pager grep "include \"$i\"" src/basic/ 'src/lib*' 'src/nss-*' 'src/journal/sd-journal.c' |grep -v "${i%.h}.c";echo ;done | less
Code was not doing a wait() after kill() due to checking for a return value > 0, and was leaving zombie processes. This affected things like sd-bus unixexec connections.
As suggest here:
https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html#Attribute-Syntax
"You may optionally specify attribute names with ‘__’ preceding and
following the name. This allows you to use them in header files without
being concerned about a possible macro of the same name. For example,
you may use the attribute name __noreturn__ instead of noreturn. "
This way, we can extend the macro a bit with stuff pulled in from other
headers without this affecting everything which pulls in macro.h, which
is one of our most basic headers.
This is just refactoring, no change in behaviour, in prepartion for
later changes.
It was only used in one place, where we don't actually need it, and
it is too easy to forget to update it when adding new items to the table.
Let's just drop it.
systemd only uses functions that are as of Linux 4.15+ provided
externally to the CPU controller (currently usage_usec), so if we have a
new enough kernel, we don't need to set CGROUP_MASK_CPU for
CPUAccounting=true as the CPU controller does not need to necessarily be
enabled in this case.
Part of this patch is modelled on an earlier patch by Ryutaroh Matsumoto
(see PR #9665).
I decided to use a separate definition for this because it's too easy to return
positive from functions which don't need this distinction and only return
negative on error and success otherwise.
If we create a cgroup in one controller it might already have been
created in another too, if we have jointly mounted controllers. Take
that into consideration.
The function takes a pointer to a random block of memory and
the length of that block. It shouldn't crash every time it sees
a zero byte at the beginning there.
This should help the dev-kmsg fuzzer to keep going.
With gcc-7.1.1-3.fc26.aarch64:
../src/basic/json.c: In function ‘json_format’:
../src/basic/json.c:1409:40: warning: comparison is always true due to limited range of data type [-Wtype-limits]
if (*q >= 0 && *q < ' ')
^~
../src/basic/json.c: In function ‘inc_lines_columns’:
../src/basic/json.c:1762:31: warning: comparison is always true due to limited range of data type [-Wtype-limits]
} else if (*s >= 0 && *s < 127) /* Process ASCII chars quickly */
^~
Cast to (signed char) silences the warning, but a cast to (int) for some reason
doesn't.
Now that we don't (mis-)use the env file parser to parse kernel command
lines there's no need anymore to override the used newline character
set. Let's hence drop the argument and just "\n\r" always. This nicely
simplifies our code.
This introduces a wrapper around extrac_first_word() called
proc_cmdline_extract_first(), which suppresses "rd." parameters
depending on the specified calls.
This allows us to share more code between proc_cmdline_parse_given() and
proc_cmdline_get_key(), and makes it easier to reuse this logic for
other purposes.
Normally, we want to immediately quit on ^C. But when we are running under
less, people may set SYSTEMD_LESS without K, in which case they can use ^C to
communicate with less, and e.g. start and stop following input.
Fixes#6405.
All users of the macro (except for one, in serialize.c), use the macro in
connection with read_line(), so they must include fileio.h. Let's not play
libc games and require multiple header file to be included for the most common
use of a function.
The removal of def.h includes is not exact. I mostly went over the commits that
switch over to use read_line() and add def.h at the same time and reverted the
addition of def.h in those files.
Pretty much everything uses just the first argument, and this doesn't make this
common pattern more complicated, but makes it simpler to pass multiple options.
This makes DEPTH_MAX lower value, as test-json fails with stack
overflow.
Note that the test can pass with 8k, but for safety, here set to 4k.
Fixes#10738.
This helper is useful to ensure pidns/userns joining is properly
executed (as that requires a fork after the setns()). This is
particularly important when it comes to /proc/self/ access or
SCM_CREDENTIALS, but is generally the safer mode of operation.
Rename rdrand64 to rdrand, and switch from uint64_t to unsigned long.
This produces code that will compile/assemble on both x86-64 and x86-32.
This could be useful when running a 32-bit copy of systemd on a modern
Intel processor.
RDRAND is inherently arch-specific, so relying on the compiler-defined
'long' type seems reasonable.
We only use this when we don't require the best randomness. The primary
usecase for this is UUID generation, as this means we don't drain
randomness from the kernel pool for them. Since UUIDs are usually not
secrets RDRAND should be goot enough for them to avoid real-life
collisions.
Originally, the high_quality_required boolean argument controlled two
things: whether to extend any random data we successfully read with
pseudo-random data, and whether to return -ENODATA if we couldn't read
any data at all.
The boolean got replaced by RANDOM_EXTEND_WITH_PSEUDO, but this name
doesn't really cover the second part nicely. Moreover hiding both
changes of behaviour under a single flag is confusing. Hence, let's
split this part off under a new flag, and use it from random_bytes().
This should normally not happen, but given that the man page suggests
something about this in the context of interruption, let's handle this
and propagate an I/O error.
It's more descriptive, since we also have a function random_bytes()
which sounds very similar.
Also rename pseudorandom_bytes() to pseudo_random_bytes(). This way the
two functions are nicely systematic, one returning genuine random bytes
and the other pseudo random ones.
It's cheap to get RDRAND and given that srand() is anyway not really
useful for trusted randomness let's use RDRAND for it, after all we have
all the hard work for that already in place.
Otherwise doing comparing a CGroupMask (which is unsigned in effect)
with the result of CGROUP_CONTROLLER_TO_MASK() will result in warnings
about signedness differences.
We now have the "BPF" pseudo-controllers. These should never be assumed
to be accessible as /sys/fs/cgroup/<controller> and not through
"cgroup.subtree_control" either, hence always check explicitly before we
go to the file system. We do this through our new CGROUP_MASK_V1 and
CGROUP_MASK_V2 definitions.
Also, when we fail, don't clobber the return value.
This brings the call more in-line with our usual coding style, and
removes surprises.
None of the callers seemed to care about this behaviour.
This was mostly prompted by seeing the expression "in_initrd() && flags
& PROC_CMDLINE_RD_STRICT", which uses & and && without any brackets.
Let's make that a bit more readable and hide all doubts about operator
precedence.
Let's be more careful with what we serialize: let's ensure we never
serialize strings that are longer than LONG_LINE_MAX, so that we know we
can read them back with read_line(…, LONG_LINE_MAX, …) safely.
In order to implement this all serialization functions are move to
serialize.[ch], and internally will do line size checks. We'd rather
skip a serialization line (with a loud warning) than write an overly
long line out. Of course, this is just a second level protection, after
all the data we serialize shouldn't be this long in the first place.
While we are at it also clean up logging: while serializing make sure to
always log about errors immediately. Also, (void)ify all calls we don't
expect errors in (or catch errors as part of the general
fflush_and_check() at the end.