strerror() is not thread safe. Let's avoid it where it is easy hence.
(Ideally we'd not use it at all anymore, but that's sometimes a bit
nasty, not in this case though, where it is very easy to avoid)
Follow-up for: 27c3112dcb
This reworks the logic introduced in
a5cede8c24 (#13693).
First of all, let's move this out of util.c, since only PID 1 really
needs this, and there's no real need to have it in util.c.
Then, fix freeing of the variable. It previously relied on
STATIC_DESTRUCTOR_REGISTER() which however relies on static_destruct()
to be called explicitly. Currently only the main-func.h macros do that,
and PID 1 does not. (It might be worth investigating whether to do that,
but it's not trivial.) Hence the freeing wasn't applied.
Finally, an OOM check was missing, add it in.
Let's make this robust towards parallel updates to group lists. This is
not going to happen IRL, but it makes me sleep better at night: let's
iterate a couple of times in case the list is updated while we are at
it.
Follow-up for: f5e0b942af
When showing logs from a container, we would fail to show various lines:
Oct 29 09:50:51 krowka systemd-nspawn[61376]: Detected architecture x86-64.
Oct 29 09:50:51 krowka systemd-nspawn[61376]: [1B blob data]
Oct 29 09:50:51 krowka systemd-nspawn[61376]: Welcome to Fedora 32 (Rawhide)!
Oct 29 09:50:51 krowka systemd-nspawn[61376]: [1B blob data]
Those are only harmless \r characters that trail the line. We already replace
tabs and strip various ansi characters that we deem inconsequential, so let's
also strip trailing carriage returns. Non-trailing ones are different, because
they change what would be displayed.
Virtual filesystems such as sysfs or procfs use kernfs, and kernfs can work
with two sorts of virtual files.
One sort uses "seq_file", and the results of the first read are buffered for
the second read. The other sort uses "raw" reads which always go direct to the
device.
In the later case, the content of the virtual file must be retrieved with a
single read otherwise subsequent read might get the new value instead of
finding EOF immediately. That's the reason why the usage of fread(3) is
prohibited in this case as it always performs a second call to read(2) looking
for EOF which is subject to the race described previously.
Fixes: #13585.
See https://bugzilla.redhat.com/show_bug.cgi?id=1763488: when we say that
'foo@*.service' is not a valid unit name, this is not clear enough. Let's
include the name of the operation that does not support globbing in the
error message:
$ build/systemctl enable 'foo@*.service'
Glob pattern passed to enable, but globs are not supported for this.
Invalid unit name "foo@*.service" escaped as "foo@\x2a.service".
...
chase_symlinks() would return negative on error, and either a non-negative status
or a non-negative fd when CHASE_OPEN was given. This made the interface quite
complicated, because dependning on the flags used, we would get two different
"types" of return object. Coverity was always confused by this, and flagged
every use of chase_symlinks() without CHASE_OPEN as a resource leak (because it
would this that an fd is returned). This patch uses a saparate output parameter,
so there is no confusion.
(I think it is OK to have functions which return either an error or an fd. It's
only returning *either* an fd or a non-fd that is confusing.)
The default message for ENOSPC is very misleading: it says that the disk is
filled, but in fact the inotify watch limit is the problem.
So let's introduce and use a wrapper that simply calls inotify_add_watch(2) and
which fixes the error message up in case ENOSPC is returned.
PID1 may modified the environment passed by the kernel when it starts
running. Commit 9d48671c62 unset $HOME for
example.
In case PID1 is going to switch to a new root and execute a new system manager
which is not systemd, we should restore the original environment as the new
manager might expect some variables to be set by default (more specifically
$HOME).
Currently systemd will treat smb3 as local filesystem and cause
can't boot failures. Add smb3 to the list of remote filesystems
to fix this issue.
Signed-off-by: Kenneth D'souza <kdsouza@redhat.com>
On one of my test machines, test-path-util was failing because the
find_binary("xxxx-xxxx") was returning -EACCES instead of -ENOENT. This
happens because the PATH entry on that host contains a directory which
the user in question doesn't have access to. Typically applications
ignore permission errors when searching through PATH, for example in
bash:
$ whoami
cdown
$ PATH=/root:/bin type sh
sh is /bin/sh
This behaviour is present on zsh and other shells as well, though. This
patch brings our PATH search behaviour closer to other major Unix tools.
ARPHRD_NETROM was excluded, most likely just because it is protocol No. 0,
and ARPHRD_CISCO was reported under its alias name "HDLC". Let's just
allow defined aliases under the main name.
Our biggest object in libsystemd was a table full of zeros, for the arphdr
names. Let's use a switch (which gcc nicely optimizes for us), instead a
table with a gap between 826 and 65534:
$ ls -l build{,2}/src/basic/a6ba3eb@@basic@sta/arphrd-list.c.o
-rw-rw-r--. 1 zbyszek zbyszek 540232 Sep 22 00:29 build/src/basic/a6ba3eb\@\@basic\@sta/arphrd-list.c.o
-rw-rw-r--. 1 zbyszek zbyszek 20512 Sep 25 11:56 build2/src/basic/a6ba3eb\@\@basic\@sta/arphrd-list.c.o
$ ls -l build{,2}/src/shared/libsystemd-shared-243.so
-rwxrwxr-x. 1 zbyszek zbyszek 6774368 Sep 22 00:29 build/src/shared/libsystemd-shared-243.so
-rwxrwxr-x. 1 zbyszek zbyszek 6254808 Sep 25 12:16 build2/src/shared/libsystemd-shared-243.so
No functional change.
Introduce support for configuring cpus and mems for processes using
cgroup v2 CPUSET controller. This allows users to limit which cpus
and memory NUMA nodes can be used by processes to better utilize
system resources.
The cgroup v2 interfaces to control it are cpuset.cpus and cpuset.mems
where the requested configuration is written. However, it doesn't mean
that the requested configuration will be actually used as parent cgroup
may limit the cpus or mems as well. In order to reflect the real
configuration cgroup v2 provides read-only files cpuset.cpus.effective
and cpuset.mems.effective which are exported to users as well.
In various circumstances, overriding the kernel commandline can be inconvenient.
People have different bootloaders, and e.g. the grub config can be pretty scary.
grubby helps, but it isn't always available.
This option adds an alternative mechanism that can quite convenient on EFI
systems. cmdline settings have higher priority, because they can be (usually)
changed on the bootloader prompt.
$SYSTEMD_EFI_OPTIONS can be used to override, same as $SYSTEMD_PROC_CMDLINE.
I want to use efivars.[ch] in proc-cmdline.c, but most of the efivars stuff is
not needed in basic/. Move the file from shared/ to basic/, but then move back
most of the higher-level functions to the new shared/efi-loader.c file.
It if of course related to /proc/cmdline parsing, but is higher-level
functionality built on top of it. It should be in shared/ because it
is something to be used by pid1 and related utilities, not something for
level-level libraries.
This way less stuff needs to be in basic. Initially, I wanted to move all the
parts of cgroup-utils.[ch] that depend on efivars.[ch] to shared, because
efivars.[ch] is in shared/. Later on, I decide to split efivars.[ch], so the
move done in this patch is not necessary anymore. Nevertheless, it is still
valid on its own. If at some point we want to expose libbasic, it is better to
to not have stuff that belong in libshared there.
This avoid the use of the global variable.
Also rename cgroup_unified_update() to cgroup_unified_cached() and
cgroup_unified_flush() to cgroup_unified() to better reflect their new roles.
This function had two users (apart from tests), and both only used one
argument. And it seems likely that if we need to pass more directories,
either the _nulstr() or the _strv() form would be used. Let's simplify
the code.
In a current VirtualBox installation the board_vendor is set to "Oracle
Corporation". So we need to add this to the dmi_vendor_table for a
relieable detection.
This fixes#13429
Signed-off-by: Jan Losinski <losinski@wh2.tu-dresden.de>
Traditionally, user logins had a $PATH in which /bin was before /sbin, while
root logins had a $PATH with /sbin first. This allows the tricks that
consolehelper is doing to work. But even if we ignore consolehelper, having the
path in this order might have been used by admins for other purposes, and
keeping the order in user sessions will make it easier the adoption of systemd
user sessions a bit easier.
Fixes#733.
https://bugzilla.redhat.com/show_bug.cgi?id=1744059
OOM handling in manager_default_environment wasn't really correct.
Now the (theorertical) malloc failure in strv_new() is handled.
Please note that this has no effect on:
- systems with merged /bin-/sbin (e.g. arch)
- when there are no binaries that differ between the two locations.
E.g. on my F30 laptop there is exactly one program that is affected:
/usr/bin/setup -> consolehelper.
There is less and less stuff that relies on consolehelper, but there's still
some.
So for "clean" systems this makes no difference, but helps with legacy setups.
$ dnf repoquery --releasever=31 --qf %{name} --whatrequires usermode
anaconda-live
audit-viewer
beesu
chkrootkit
driftnet
drobo-utils-gui
hddtemp
mate-system-log
mock
pure-ftpd
setuptool
subscription-manager
system-config-httpd
system-config-rootpassword
system-switch-java
system-switch-mail
usermode-gtk
vpnc-consoleuser
wifi-radar
xawtv
New functions are called valid_user_group_name_compat() and
valid_user_group_name_or_id_compat() and accept dots in the user
or group name. No functional change except the tests.
This reverts commit 8a07b4033e.
The tests are kept. test-networkd-conf is adjusted to pass.
This fixes#13276. I think current rules are extremely confusing, as the
case in test-networkd-conf shows. We apply some kinds of unescaping (relating
to quoting), but not others (related to escaping of special characters).
But fixing this is hard, because people have adjusted quoting to match
our rules, and if we make the rules "better", things might break in unexpected
places.
Add a comment line explaining that the syscall defines might be
defined to invalid negative numbers, as libseccomp redefines them
to negative numbers if not defined by the kernel headers, which is
not obvious just from reading the code checking for defined && > 0
The #ifndef check used to work for missing __NR_* syscall defines, but
unfortunately libseccomp now redefines missing syscall number to negative
numbers, in their public header file, e.g.:
https://github.com/seccomp/libseccomp/blob/master/include/seccomp.h.in#L801
When systemd is built, since it includes <seccomp.h>, it pulls in the
incorrect negative value for any __NR_* syscall define that's included in
the seccomp.h header (for those syscalls that the kernel headers don't
yet define, e.g. when built with older/stable-distro kernels). This leads
to bugs like:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1821625
This changes the check so that it can override the negative number that
libseccomp defines, instead of trying to use the negative syscall number.
To avoid gcc warnings (which are failures with meson --werror), this checks
without generating a redefinition gcc warning.
I have no idea why libseccomp decided to define missing syscalls
to negative numbers inside their *public* header file, causing
problems like this.
Instead of checking for the STA_UNSYNC flag in the timex status, check
the maximum error. It is updated by the kernel, increasing at a rate of
500 ppm. The maximum value is 16 seconds, which triggers the STA_UNSYNC
flag.
This follows timedatex and allows timedated to correctly detect a clock
synchronized by chronyd when configured to not synchronize the RTC.
C's strerror() function does not return a "const char *" pointer
for the string. That has historic reasons and C99 even comments
that "[t]he array pointed to shall not be modified by the program".
Make the strerror_safe() wrapper correct this and be more strict
in this regard.
In recent systemd-nspawn we wouldn't parse init args like systemd.log-level=debug.
This is because we wouldn't even look at /proc/1/cmdline.
$ systemd-nspawn -n cat /proc/1/stat
1 (cat) R 0 1 1 34816 ....
^^^^^
34816 is 136:0 a.k.a. /dev/pts/0.
So far, we'd use hashmap_free_free to free both keys and values along with
the hashmap. I think it's better to make this more encapsulated: in this variant
the way contents are freed can be decided when the hashmap is created, and
users of the hashmap can always use hashmap_free.
This could already be done by calling unit_name_is_*(), but if we don't know
if the argument is a valid unit name, it is more convenient to have a single
function which returns the type or possibly an error if the unit name is not
valid.
The values in the enum are sorted "by length". Not really important, but it
seems more natural to me.
Some distros install nologin as /usr/sbin/nologin, others as
/sbin/nologin.
Since we can't really on merged-usr everywhere (where the path wouldn't
matter), make the path build time configurable via -Dnologin-path=.
Closes#13028
The enum order will be used to order jobs in the job queue.
Make sure that unit types that fork aditional processes come first to
maximize parallelism.
This restores proper speed with asan builds with gcc 9.1.1.
Fixes#12997.
$ rpm -q gcc
gcc-9.1.1-2.fc31.x86_64
$ time ASAN_OPTIONS=strict_string_checks=1:detect_stack_use_after_return=1:check_initialization_order=1:strict_init_order=1 UBSAN_OPTIONS=print_stacktrace=1:print_summary=1:halt_on_error=1 build-rawhide-sanitize/test-conf-parser
(old) 86.99s user 20.22s system 361% cpu 29.635 total
(new) 3.05s user 0.29s system 99% cpu 3.377 total
Size is increased a bit:
$ size build/systemd
(old) 1683421 246100 1208 1930729 1d75e9 build/systemd
(new) 1688237 246100 1208 1935545 1d88b9 build/systemd
... but that's <0.1%, so we don't really care.
The comment explains the reason: we'd wait for the second \n
and then ungetc() it. Then the buffered \n would cause a problem
when the next prompt was issued, so in effect it wasn't possible
to answer the second question.
It's hard to even say what exactly this combination means. Escaping is
necessary when quoting to have quotes within the string. So the escaping of
quote characters is inherently tied to quoting. When unquoting, it seems
natural to remove escaping which was done for the quoting purposes. But with
both flags we would be expected to re-add this escaping after unqouting? Or
maybe keep the escaping which is not necessary for quoting but otherwise
present? This all seems too complicated, let's just forbid such usage and
always fully unescape when unquoting.
Let's hide non-UTF-8 locales by default. It's 2019 after all.
Let's add an undocumented env var to reenable listing them though.
This should substantially shorten the list of choices we offer users,
and only show realistic choices.
note that only firstboot and localectl make use of this information, and
both allow configuration of values outside of these lists, hence all
this change does is hide legacy options, but they are still available if
you know what you do, and that's how it should be.
Let's drop the 'static' logic when a parameter can be NULL.
I think asan/ubsan are right here, judging by the C99 spec language:
"A declaration of a parameter as ‘‘array of type’’ shall be adjusted to
‘‘qualified pointer to type’’, where the type qualifiers (if any) are
those specified within the [ and ] of the array type derivation. If the
keyword static also appears within the [ and ] of the array type
derivation, then for each call to the function, the value of the
corresponding actual argument shall provide access to the first element
of an array with at least as many elements as specified by the size
expression."
If we specify NULL, then we certainly don't pvode access to any valid
array.
Fixes: #13039
This decorator tells compilers that the memory we return is shorter than
it actually is, thus triggering misleading bad memory access complaints.
Fixes: #13026
The implementation is pretty straight-foward: when we get a request to
clean some type of resources we fork off a process doing that, and while
it is running we are in the "cleaning" state.
This adds basic infrastructure to implement a "clean" operation for unit
types. This "clean" operation is supposed to remove on-disk resources of
units, and is supposed to be used in a later commit to clean our
RuntimeDirectory=, StateDirectory= and so on of service units.
Later commits will open this up to the bus, and hook up service units
with this.
This also adds a new generic ActiveState called UNIT_MAINTENANCE. It's
supposed to cover all kinds of "maintainance" state of units.
Specifically, this is supposed to cover the "cleaning" operations later
added for service units which might take a bit of time. This high-level,
generic, abstract state is called UNIT_MAINTENANCE instead of the
more specific "UNIT_CLEANING", since I think this should be kept open
for different operations possibly later on that could be nicely subsumed
under this (for example, maybe a recursive chown()ing operation could be
covered by this, and similar).
This new flag suppresses error if the top-level path specified doesn't
exist. This is useful since suppressing this on the caller side isn't
easy, since ENOENT migh be propagate for some reason from further inside
and we can't distuingish that.
While we are at it, also be a bit more careful witht the various
combinations of flags.
(Note that in some cases rm_rf() was already ignoring ENOENT from
unlink() or rmdir(), however that was pretty useless, since we always
open() the top-level path with O_DIRECTORY and if that hit ENOENT we
didn't ignore the failure).
And obviously CONFIG_LINE=0 is also not logged.
The way that log_syntax_internal now looks is becoming a bit crazy, but we
can't easily conditionalize on both unit and config_file, and we have different
types, so it's not easy to make this more compact.
When using build/ directory inside of the source directory:
__FILE__: ../src/test/test-log.c
RELATIVE_SOURCE_PATH: ..
PROJECT_FILE: src/test/test-log.c
When using a build directory outside of the source directory:
__FILE__: ../../../home/zbyszek/src/systemd-work/src/test/test-log.c
RELATIVE_SOURCE_PATH: ../../../home/zbyszek/src/systemd-work
PROJECT_FILE: src/test/test-log.c
Whenever I see EXTRACT_QUOTES, I'm always confused whether it means to
leave the quotes in or to take them out. Let's say "unquote", like we
say "cunescape".
Make possible to set NUMA allocation policy for manager. Manager's
policy is by default inherited to all forked off processes. However, it
is possible to override the policy on per-service basis. Currently we
support, these policies: default, prefer, bind, interleave, local.
See man 2 set_mempolicy for details on each policy.
Overall NUMA policy actually consists of two parts. Policy itself and
bitmask representing NUMA nodes where is policy effective. Node mask can
be specified using related option, NUMAMask. Default mask can be
overwritten on per-service level.
It's possible for a zombie process to have live threads. These are not listed
in /sys in "cgroup.procs" for cgroupsv2, but they show up in
"cgroup.threads" (cgroupv2) or "tasks" (cgroupv1) nodes. When killing a
cgroup (v2 only) with SIGKILL, let's also kill threads after killing processes,
so the live threads of a zombie get killed too.
Closes#12262.
prefix_root() is equivalent to path_join() in almost all ways, hence
let's remove it.
There are subtle differences though: prefix_root() will try shorten
multiple "/" before and after the prefix. path_join() doesn't do that.
This means prefix_root() might return a string shorter than both its
inputs combined, while path_join() never does that. I like the
path_join() semantics better, hence I think dropping prefix_root() is
totally OK. In the end the strings generated by both functon should
always be identical in terms of path_equal() if not streq().
This leaves prefix_roota() in place. Ideally we'd have path_joina(), but
I don't think we can reasonably implement that as a macro. or maybe we
can? (if so, sounds like something for a later PR)
Also add in a few missing OOM checks
The OCI changes in #9762 broke a use case in which we use nspawn from
inside a container that has dropped capabilities from the bounding set
that nspawn expected to retain. In an attempt to keep OCI compliance
and support our use case, I made hard failing on setting capabilities
not in the bounding set optional (hard fail if using OCI and log only
if using nspawn cmdline).
Fixes#12539
cap_last_cap() returns the last valid cap (instead of the number of
valid caps). to iterate through all known caps we hence need to use a <=
check, and not a < check like for all other cases. We got this right
usually, but in three cases we did not.
Using C11 thread-local storage in destructors causes uninitialized
read. Let's avoid that using a direct comparison instead of using
the cached values. As this code path is taken only when compiled
with -DVALGRIND=1, the performance cost shouldn't matter too much.
Fixes#12814
Allocating a pty is done in a couple of places so let's introduce a new helper
which does the job.
Also the new function, as well as openpt_in_namespace(), returns both pty
master and slave so the callers don't need to know about the pty slave
allocation details.
For the same reasons machine_openpt() prototype has also been changed to return
both pty master and slave so callers don't need to allocate a pty slave which
might be in a different namespace.
Finally openpt_in_namespace() has been renamed into
openpt_allocate_in_namespace().
fstat(2) is fine with O_PATH fds.
For changing owership of a file opened with O_PATH, there's fchownat(2).
Only changing permissions is problematic but we introduced fchmod_opath() for
that purpose.
Only changing ownership back to root is not enough we also need to
change the access mode, otherwise the user might have set 666 first, and
thus allow everyone access before and after the chown().
Inspired by #12431 let's also rework chmod_and_chown() and make sure we
never add more rights to a file not owned by the right user.
Also, let's make chmod_and_chown() just a wrapper arond
fchmod_and_chown().
let's also change strategy: instead of chown()ing first and stating
after on failure and supressing errors, let's avoid the chown in the
firts place, in the interest on keeping things minimal.