Fixes#17035. We use "," as the separator between arguments in fstab and crypttab
options field, but the kernel started using "," within arguments. Users will need
to escape those nested commas.
`startswith` already returns the string with the prefix skipped, so we
can simplify this further and avoid using a magic value.
Noticed in passing.
Co-authored-by: Lennart Poettering <lennart@poettering.net>
SD_DHCP6_OPTION_IA_NA does not exist in DHCP6_ADVERTISE packet if DHCP server only provides prefix delegation. So the attempt to send the DHCP6_REQUEST packet fails on r = dhcp6_option_append_ia(&opt, &optlen, &client->lease->ia); forever.
When invoked as non-root, we would suggest re-running as root without any
further hint. But this immediately spawns a machine from the local directory,
which can be rather surprising. So let's give a better hint.
(In general, I don't think commandline programs should do "significant" things
when invoked without any arguments. In this regard it would be better if
systemd-nspawn would not spawn a machine from the current directory if called
with no arguments and at least "-D ." would be required.)
If the whole call is simple and we don't need to look at the return value
apart from the conditional, let's use a form without assignment of the return
value. When the function call is more complicated, it still makes sense to
use a temporary variable.
Lennart wanted to do this back in
01c33c1eff.
For better or worse, this wasn't done because I thought that turning on MountAPIVFS
is a compat break for RootDirectory and people might be negatively surprised by it.
Without this, search for binaries doesn't work (access_fd() requires /proc).
Let's turn it on, but still allow overriding to "no".
When RootDirectory=/, MountAPIVFS=1 doesn't work. This might be a buglet on its
own, but this patch doesn't change the situation.
Right now, we always say `/etc/crypttab` even if the source was fully
derived from the kargs.
Let's match what `systemd-fstab-generator` does and use `/proc/cmdline`
when that's the case.
(Well, at least the ones where that makes sense. Where it does't make
sense are the ones that re invoked on the root path, which cannot
possibly be a symlink.)
Let's make umount_verbose() more like mount_verbose_xyz(), i.e. take log
level and flags param. In particular the latter matters, since we
typically don't actually want to follow symlinks when unmounting.
This is a follow-up for cae1e8fb88c5a6b0960a2d0be3df8755f0c78462: we
also call the detach ioctls in the shutdown code, hence add the fsync()s
there too, just to be safe.
Let's make sure to use strna() on the strings returned by fd_get_path()
where we knowingly ignore any failures. We got this right in most cases,
but two were missing.
We use it pretty much everywhere else, hence use it here too.
This also changes the error generated from EOPNOTSUPP to ENOSYS, to
match the other cases where we do such a check. One user checked for
EOPNOTSUPP which is updated to check for ENOSYS instead.
Currently the systemd-shutdown command attempts to stop swaps, DM
(crypt, LVM2) and loop devices, but it doesn't attempt to stop MD
RAID devices, which means that if the RAID is set up on crypt,
loop, etc. device, it won't be able to stop those underlying devices.
This code extends the shutdown application to also attempt stopping
the MD RAID devices.
Signed-off-by: Hubert Kario <hubert@kario.pl>
The directory backend needs a file system path, and not a raw block
device. That's only supported for the LUKS2 backend.
Let's make this clearer in the man page and also generate a better error
message if attempted anyway.
Fixes: #17068
When exiting, let's explicitly wait for our worker processes to finish
first. That's useful if unmounting of /home/ is scheduled to happen
right after homed is down, as we then can be sure that the home
directories are properly unmounted and detached by the time homed is
fully terminated (otherwise it might happen that our worker gets killed
by the service manager, thus leaving the home directory and its backing
devices up/left for auto-clean which might be async).
Likely fixes#16842
When debugging homed while being logged into a user account manged by
homed it is a good idea to be able to run a second copy of homed. In
order to not collide with its AF_UNIX socket and bus name use, let's add
a new env var $SYSTEMD_HOME_DEBUG_SUFFIX, when set the busnames/socket
names are suffixed by it. When setting this while debugging one can
invoke an additional copy without interfering with the host one.
This has two advantages:
- we save a bit of IO in early boot because we don't look for executables
which we might never call
- if the executable is in a different place and it was specified as a
non-absolute path, it is OK if it moves to a different place. This should
solve the case paths are different in the initramfs.
Since the executable path is only available quite late, the call to
mac_selinux_get_child_mls_label() which uses the path needs to be moved down
too.
Fixes#16076.
Similar to free_and_replace. I think this should be uppercase to make it
clear that this is a macro. free_and_replace should probably be uppercased
too.
Other tools silently ignore non-executable names found in path. By checking
F_OK, we would could pick non-executable path even though there is an executable
one later.
We had a special test case that the second semicolon would be interpreted
as an executable name. We would then try to find the executable and rely
on ";" not being found to cause ENOEXEC to be returned. I think that's just
crazy. Let's treat the second semicolon as a separator and ignore the
whole thing as we would whitespace.
test-execute is quite long and even with the test name it takes a moment
to find the relevant spot when something fails. Let's make things easier
by printing the exact location.
It's just too useful to immediately see with "systemd-dissect" what
"systemd-repart" generated for us without having to populate it with
/etc/os-release. Hence let's log a message if /etc/os-release is
missing, but proceed otherwise and show the partition table.
Let's suppress the secondary arch data, since we never ever want to
mount it if we found the primary arch.
Previously we only suppressed in the Verity case, but there's little
reason to entertain the idea of a secondary arch in non-Verity
environments either, we are not going to use them, and should not do
decryption or anything like that.
Uppercase first char of log message, and indicate correct program name.
Reindent comment table at one place.
Use correct, specific, enum type at one more place.
By default we'll run a container in --console=interactive and
--console=read-only mode depending if we are invoked on a tty or not so
that the container always gets a /dev/console allocated, i.e is always
suitable to run a full init system /as those typically expect a
/dev/console to exist).
With the new --console=autopipe mode we do something similar, but
slightly different: when not invoked on a tty we'll use --console=pipe.
This means, if you invoke some tool in a container with this you'll get
full inetractivity if you invoke it on a tty but things will also be
very nicely pipeable. OTOH you cannot invoke a full init system like
this, because you might or might not become a /dev/console this way...
Prompted-by: #17070
(I named this "autopipe" rather than "auto" or so, since the default
mode probably should be named "auto" one day if we add a name for it,
and this is so similar to "auto" except that it uses pipes in the
non-tty case).
Instead of first becoming a controlling process of the payload pty
as side effect of opening it (without O_NOCTTY), and then possibly
dropping it again, let's do it cleanly an reverse the logic: let's open
the pty without becoming its controller first. Only after everything
went the way we wanted it to go become the controller explicitly.
This has the benefit that the PID 1 stub process we run (as effect of
--as-pid2) doesn't have to lose the tty explicitly, but can just
continue running with things. And we explicitly make the tty controlling
right before invoking actual payload.
In order to make sure everything works as expected validate that the
stub PID 1 in the container really has no conrolling tty by issuing the
TIOCNOTTY tty and expecting ENOTTY, and log about it.
This shouldn't change behaviour much, it just makes thins a bit cleaner,
in particular as we'll not trigger SIGHUP on ourselves (since we are
controller and session leader) due to TIOCNOTTY which we then have to
explicitly ignore.
Just some refactoring: let's place the various verity related parameters
in a common structure, and pass that around instead of the individual
parameters.
Also, let's load the PKCS#7 signature data when finding metadata
right-away, instead of delaying this until we need it. In all cases we
call this there's not much time difference between the metdata finding
and the loading, hence this simplifies things and makes sure root hash
data and its signature is now always acquired together.
Graphics tablet devices comprise multiple event nodes, usually a Pen, Finger
and Pad node (that's how the kernel postfixes them). Pen and Pad are labeled
as ID_INPUT_TABLET but the pad doesn't actually send stylus events - it
doesn't usually have BTN_TOOL_PEN, merely BTN_STYLUS.
For the last several years, libwacom has set ID_INPUT_TABLET_PAD for all pad
devices known to it based on vid/pid and a "* Pad" name match. That does not
cover devices not in libwacom. libinput relies on ID_INPUT_TABLET_PAD to
initialize the pad backend.
We can't drop ID_INPUT_TABLET without breaking userspace, but we can add
ID_INPUT_TABLET_PAD ourselves - where a device has BTN_0 in addition to
BTN_STYLUS, let's add it as a pad.
There are some devices (notably: bamboos) that use BTN_LEFT instead of BTN_0
but they are relatively rare and there's a risk of mislabeling those devices,
so let's just stick with BTN_0 only.
RC_LOCAL_SCRIPT_PATH_START and RC_LOCAL_SCRIPT_PATH_STOP were was originally
added in the conversion to meson based on the autotools name. In
4450894653 RC_LOCAL_SCRIPT_PATH_STOP was dropped.
We don't need to use such a long name.
By settings AI_ADDRCONFIG in hints we cannot for example resolve "localhost"
when the local machine only has a loopback interface. This seems like an
unnecessary restriction, drop it.
Inspired by https://bugzilla.redhat.com/show_bug.cgi?id=1839007.
With new directive SystemCallLog= it's possible to list system calls to be
logged. This can be used for auditing or temporarily when constructing system
call filters.
---
v5: drop intermediary, update HASHMAP_FOREACH_KEY() use
v4: skip useless debug messages, actually parse directive
v3: don't declare unused variables with old libseccomp
v2: fix build without seccomp or old libseccomp
Define explicit action "kill" for SystemCallErrorNumber=.
In addition to errno code, allow specifying "kill" as action for
SystemCallFilter=.
---
v7: seccomp_parse_errno_or_action() returns -EINVAL if !HAVE_SECCOMP
v6: use streq_ptr(), let errno_to_name() handle bad values, kill processes,
init syscall_errno
v5: actually use seccomp_errno_or_action_to_string(), don't fail bus unit
parsing without seccomp
v4: fix build without seccomp
v3: drop log action
v2: action -> number
On centos7 ci:
--- test-libcrypt-util begin ---
Found container virtualization none.
/* test_hash_password */
ew3bU1.hoKk4o: yes
$1$gc5rWpTB$wK1aul1PyBn9AX1z93stk1: no
$2b$12$BlqcGkB/7BFvNMXKGxDea.5/8D6FTny.cbNcHW/tqcrcyo6ZJd8u2: no
$5$lGhDrcrao9zb5oIK$05KlOVG3ocknx/ThreqXE/gk.XzFFBMTksc4t2CPDUD: no
$6$c7wB/3GiRk0VHf7e$zXJ7hN0aLZapE.iO4mn/oHu6.prsXTUG/5k1AxpgR85ELolyAcaIGRgzfwJs3isTChMDBjnthZyaMCfCNxo9I.: no
$y$j9T$$9cKOWsAm4m97WiYk61lPPibZpy3oaGPIbsL4koRe/XD: no
Since the loop to check various xcrypt functions is already in place,
adding one more is cheap. And it is nicer to check for the function
directly. People like to backport things, so we might get lucky even
without having libxcrypt.
They are only used under src/home/, but I want to add tests in test-libcrypt-util.c.
And the functions are almost trivial, so I think it is OK to move them to shared.
This lets the libc/xcrypt allocate as much storage area as it needs.
Should fix#16965:
testsuite-46.sh[74]: ==74==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7f3e972e1080 at pc 0x7f3e9be8deed bp 0x7ffce4f28530 sp 0x7ffce4f27ce0
testsuite-46.sh[74]: WRITE of size 131232 at 0x7f3e972e1080 thread T0
testsuite-46.sh[74]: #0 0x7f3e9be8deec (/usr/lib/clang/10.0.1/lib/linux/libclang_rt.asan-x86_64.so+0x9feec)
testsuite-46.sh[74]: #1 0x559cd05a6412 in user_record_make_hashed_password /systemd-meson-build/../build/src/home/user-record-util.c:818:21
testsuite-46.sh[74]: #2 0x559cd058fb03 in create_home /systemd-meson-build/../build/src/home/homectl.c:1112:29
testsuite-46.sh[74]: #3 0x7f3e9b5b3058 in dispatch_verb /systemd-meson-build/../build/src/shared/verbs.c:103:24
testsuite-46.sh[74]: #4 0x559cd058c101 in run /systemd-meson-build/../build/src/home/homectl.c:3325:16
testsuite-46.sh[74]: #5 0x559cd058c00a in main /systemd-meson-build/../build/src/home/homectl.c:3328:1
testsuite-46.sh[74]: #6 0x7f3e9a88b151 in __libc_start_main (/usr/lib/libc.so.6+0x28151)
testsuite-46.sh[74]: #7 0x559cd0583e7d in _start (/usr/bin/homectl+0x24e7d)
testsuite-46.sh[74]: Address 0x7f3e972e1080 is located in stack of thread T0 at offset 32896 in frame
testsuite-46.sh[74]: #0 0x559cd05a60df in user_record_make_hashed_password /systemd-meson-build/../build/src/home/user-record-util.c:789
testsuite-46.sh[74]: This frame has 6 object(s):
testsuite-46.sh[74]: [32, 40) 'priv' (line 790)
testsuite-46.sh[74]: [64, 72) 'np' (line 791)
testsuite-46.sh[74]: [96, 104) 'salt' (line 809)
testsuite-46.sh[74]: [128, 32896) 'cd' (line 810)
testsuite-46.sh[74]: [33152, 33168) '.compoundliteral' <== Memory access at offset 32896 partially underflows this variable
testsuite-46.sh[74]: [33184, 33192) 'new_array' (line 832) <== Memory access at offset 32896 partially underflows this variable
testsuite-46.sh[74]: HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
testsuite-46.sh[74]: (longjmp and C++ exceptions *are* supported)
testsuite-46.sh[74]: SUMMARY: AddressSanitizer: stack-buffer-overflow (/usr/lib/clang/10.0.1/lib/linux/libclang_rt.asan-x86_64.so+0x9feec)
It seems 'struct crypt_data' is 32896 bytes, but libclang_rt wants more, at least 33168?
Like it's customary in our codebase bus_error_message() internally takes
abs() of the passed error anyway, hence no need to explicitly negate it.
We mostly got this right, but in too many cases we didn't. Fix that.
We already do this for socket and automount units, do it for path units
too: if the triggered service keeps hitting the start limit, then fail
the triggering unit too, so that we don#t busy loop forever.
(Note that this leaves only timer units out in the cold for this kind of
protection, but it shouldn't matter there, as they are naturally
protected against busy loops: they are scheduled by time anyway).
Fixes: #16669
In 4c2ef32767 we enabled propagating
triggered unit state to the triggering unit for service units in more
load states, so that we don't accidentally stop tracking state
correctly.
Do the same for our other triggering unit states: automounts, paths, and
timers.
Also, make this an assertion rather than a simple test. After all it
should never happen that we get called for half-loaded units or units of
the wrong type. The load routines should already have made this
impossible.
In shell, inside of double quotes only a select few chars should be
escaped. If other chars are escaped this has no effect. Correct the list
of chars that need such escaping.
Also, make sure we can read back the stuff we wrote out without loss.
Fixes: #16788
We want to be able to use BusName= in services that run during early boot
already, and thus don't synthesize deps on dbus there. Instead add them
when Type=dbus is set, because in that case we actually really need
D-Bus support.
Fixes: #17037
Symbolic names and number in the appropriate range are allowed
(log_level_from_string() DTRT already).
The target names are more messy, so we leave the verification to the service.
Heavily inspired by #15622. This adds:
systemctl service-log-level systemd-resolved
systemctl service-log-level systemd-resolved info
systemctl service-log-target systemd-resolved
systemctl service-log-target systemd-resolved console
We already have systemctl verbs log-level, log-target, and service-watchdogs.
Those two new verbs tie nicely into this scheme.
if we allocate a bunch of hash tables all at the same time, with none
earlier than the other, there's a good chance we'll initialize the
shared hash key multiple times, so that some threads will see a
different shared hash key than others.
Let's fix that, and make sure really everyone sees the same hash key.
Fixes: #17007
In containers we might lack the privs to up the socket buffers. Let's
not complain so loudly about that. Let's hence downgrade this to debug
logging if it's a permission problem.
(This wasn't an issue before b92f350789
because back then the failures wouldn't be detected at all.)
Previously we would call umount from ExecStartPost= of
systemd-cryptsetup instance in order to get rid of the keydev
mount (i.e. filesystem containing keyfile). Let's generate unit to
handle umount. Making this symmetrical (both mount and umount of keydev
are handled by units) fixes the problem with lingering keydev mounts.
Motivation for the change is the issue where keydev mount would stay
around even if device was successfully unlocked and mount is no longer
needed. That could happen previously because when generator options are
not prefixed with "rd." we run generators twice (e.g. rd.luks.key=...).
In such case disk is unlocked in initramfs phase of boot (assuming the
initrd image contains the generator and is able to handle unlocking of
LUKS devices). After switchroot we however enqueue start job for
systemd-cryptsetup instance (because units are regenerated second time)
and that pulls in its dependencies into transaction. Later the main
systemd-cryptsetup unit not actually started since it is already active
and has RemainaAfterExit=yes. Nevertheless, dependencies get activated
and keydev mount is attached again. Because previously we called umount
from ExecStartPost= of systemd-cryptsetup instance the umount is not
called second time and keydev filesystem stays lingering.
Otherwise our request will possibly fail if something else is already
enqeued, but given this is an explicit user request, let's not allow
things to fail.
Fixes: #16702
We generally don't support prefix being != /usr, and this is hardcoded
all over the place. In the systemd.pc file it wasn't so far. Let's
adjust this to match the rest of the codebase.
If actual_brightness is larger than max_brightness, then fall back to
use brightness attribute.
Also, if the saved value is invalid, then this makes remove the file in
/var/lib/systemd/backlight.
Hopefully fixes#17011.
A variety of sockopts exist both for IPv4 and IPv6 but require a
different pair of sockopt level/option number. Let's add helpers for
these that internally determine the right sockopt to call.
This should shorten code that generically wants to support both ipv4 +
ipv6 and for the first time adds correct support for some cases where we
only called the ipv4 versions, and not the ipv6 options.
Multicast snooping enabled bridges maintain a database for multicast
port memberships to decide which mulicast packet is supposed to
egress on which port.
This patch teaches networkd to add entries to this database manually
by adding `[BridgeMDB]` sections to `.network` configuration files.
When 4dfaa528d4 was first commited its callers relied on `errno` instead of the
return value for error reporting. Which worked fine, since internally
under all conditions base were set — even if ugly and not inline with
our coding style. Things then got broken in
f8606626ed where suddenly additional
syscalls might end up being done in the function, thus corrupting `errno`.
Two functional changes:
- "/" is now refused. The test is adjusted.
- The trailing NUL is *not* included in the returned size for abstract size. The
comments in sockaddr_un_set_path() indicate that this is the right thing to do,
and the code in socket_address_parse() wasn't doing that.
Closes#12624.
The formatting in systemd.socket.xml is updated a bit.
Currently in_addr_port_ifindex_name_to_string() always prints the ifindex
numerically. This is not super useful since the interface numbers are
semi-random. Should we use interface names in preference?
We would set .type to a fake value. All real callers (outside of tests)
immediately overwrite .type with a proper value after calling
socket_address_parse(). So let's not set it and adjust the few places
that relied on it being set to the fake value.
socket_address_parse() is modernized to only set the output argument on
success.
One special syntax is not supported anymore: "iface:port" would be parsed as an
interface name plus numerical port, equivalent to "[::]%iface:port". This was
added in 542563babd, but was undocumented, and we had no tests for it. It seems
that this actually wasn't doing anything useful, because the kernel only uses the
scope identifier for link-local addresses.
We don't try to resolve invalid ifnames as all. A different return
code is used. This difference will be verified later in test_socket_address_parse()
when socket_address_parse() is converted to use
in_addr_port_ifindex_name_from_string_auto().