We should chown what we allocate ourselves, i.e. any pty we allocate
ourselves. But for stuff we propagate, let's avoid that: we shouldn't
make more changes than necessary.
Fixes: #17229
If we set O_NONBLOCK on stdin/stdout directly this means the flag is
left set when we abort abnormally, as we don't get the chance to reset
it again on exit. This might confuse progrms invoking us. Moreover, if
programs invoking us continue to write to the stdout passed to us, they
might be confused by non-blocking mode too.
Hence, let's avoid this if possible: let's reopen stdin/stdout and set
O_NONBLOCK only on the reopend fds, leaving the original fds as they
are.
Prompted-by: https://github.com/systemd/systemd/pull/17070#issuecomment-702304802
*** Running /home/zbyszek/src/systemd-work/test/test-sysusers/test-14.input (with login.defs symlinked)
login.defs specifies UID allocation range 401–555 that is different than the built-in defaults (201–998)
login.defs specifies GID allocation range 405–666 that is different than the built-in defaults (201–990)
Also, even if login.defs are not present, don't start allocating at 1, but at
SYSTEM_UID_MIN.
Fixes#9769.
The test is adjusted. Actually, it was busted before, because sysusers would
never use SYSTEM_GID_MIN, so if SYSTEM_GID_MIN was different than
SYSTEM_UID_MIN, the tests would fail. On all "normal" systems the two are
equal, so we didn't notice. Since sysusers now always uses the minimum of the
two, we only need to substitute one value.
We don't (and shouldn't I think) look at them when determining the type of the
user, but they should be used during user/group allocation. (For example, an
admin may specify SYS_UID_MIN==200 to allow statically numbered users that are
shared with other systems in the range 1–199.)
It makes little sense to make the boundary between systemd and user guids
configurable. Nevertheless, a completely fixed compile-time define is not
enough in two scenarios:
- the systemd_uid_max boundary has moved over time. The default used to be
500 for a long time. Systems which are upgraded over time might have users
in the wrong range, but changing existing systems is complicated and
expensive (offline disks, backups, remote systems, read-only media, etc.)
- systems are used in a heterogenous enviornment, where some vendors pick
one value and others another.
So let's make this boundary overridable using /etc/login.defs.
Fixes#3855, #10184.
THis doesn't change the condition's logic at all, but is an attempt to
make things a bit more readable: instead of checking log_target !=
LOG_TARGET_AUTO let's actually list the targets where we want to
consider journal/syslog/kmsg, to make things a bit less confusing. After
all the message here is not to avoid them if LOG_TARGET_AUTO is set, but
to definitely do them in the other cases.
Let's explicitly deactivate all home dirs on shutdown, in order to
properly synchronizing unmounting and avoiding blocking devices.
Previously, we'd rely on automatic deactivation when home directories
become unused. However, that scheme is asynchronous, and ongoing
deactviations might conflicts with attempts to unmount /home. Let's fix
that by providing an explicit service systemd-homed-activate.service
whose only job is to have a ExecStop= line that explicitly deactivates
all home directories on shutdown. This service can the be ordered after
home.mount and similar, ensuring that we'll first deactivate all homes
before deactivating /home itself during shutdown.
This is kept separate from systemd-homed.service so that it is possible
to restart systemd-homed.service without deactivating all home
directories.
Fixes: #16842
If the hostname of a system is set to an fqdn, glibc traditionally
derives a search domain from it if none is explicitly configured.
This is a bit weird, and we currently don't do that in our own search
path logic.
Following #17193 let's turn this behaviour off for now.
Yes, this has a slight chance of pissing people off who think this
behaviour is good. If this is indeed an issue, we can revisit the issue
but in that case if we readd the concept we should do it properly:
derive the search domain from the fqdn in our codebase too and report it
in resolvectl, and in our generated stub files. But I have the suspicion
most people who set the hostname to an fqdn aren#t even aware of this
behaviour nor want it, so let's wait until people complain.
Fixes: #17193
It can be one of "foreign", "missing", "stub", "static", "uplink",
depending on how /etc/resolv.conf is set up:
foreign → someone/something else manages /etc/resolv.conf,
systemd-resolved is just the consumer
missing → /etc/resolv.conf is missing altogether
stub/static/uplink → the file is managed by resolved, with the
well-known modes
Fixes: #17159
This basically implements fc58c0c7bf for gshadow.
gpasswd may not have a lock/unlock that behaves the same as passwd, but
according to gshadow(5) the logic of the password field is the same.
This is like membarrier() I guess and basically just exposes CPU
functionality via kernel syscall on some archs. Let's whitelist it for
everyone.
Fixes: #17197
This effectively reverts commit 67acde4869.
After commits 569ad251ad and
67acde4869, -EACCES errors are ignored,
and thus 'udevadm trigger' succeeds even when it is invoked by non-root
users. Moreover, on -EACCES error, log messages are shown in debug
level, so usually we see no message, and users are easily confused
why uevents for devices are not triggered.
It always was the intention to expose this as trusted field _TID=, i.e.
automatically determine it from journald via some SCM_xyz field or so,
but this is never happened, and it's unlikely this will be added anytime
soon to the kernel either, hence let's just generate this sender side,
even if it means it's untrusted.
Let's turn off the search domain logic if a trailing dot is specified
when looking up hostnames and RRs via the Varlink + D-Bus APIs (and thus
also when doing so via nss-resolve). (This doesn't affect lookups via
the stub, since for the any search path logic is done client side
anyway)
It might make sense to force the DNS protocol in this case too (and
disable LLMR + mDNS), but we'll leave that for a different PR — if it
even makes sense. It might also make sense to disable the logic of never
routing single-label lookups to the Internet if a trailing to is
specified, but this needs more discussion too.
Strivtly speaking, this breaks backward compatibility. But setting
too large value into them, then their networking easily breaks.
Note that typically 100 for them is event too large. So, ommiting the
values equal or higher than 1024, and dropping support of k, M, and G
suffixes is OK for normal appropriate use cases.
See discussion in #16643.
Let's open the device node to modify with O_PATH, and then adjust it
only after verifying everything is in order. This fixes a race where the
a device appears, disappears and quickly reappers, while we are still
running the rules for the first appearance: when going by path we'd
possibly adjust half of the old and half of the new node. By O_PATH we
can pin the node while we operate on it, thus removing the race.
Previously, we'd do a superficial racey check if the device node changed
undearneath us, and would propagate EEXIST in that case, failing the
rule set. With this change we'll instead gracefully handle this, exactly
like in the pre-existing case when the device node disappeared in the
meantime.
Fixes#16991fb39af4ce4 replaced `free_arguments()` with
`reset_arguments()`, which frees arg_* variables as before, but also resets all
of them to the default values. `reset_arguments()` was positioned
in such a way that it overrode some arg_* values still in use at shutdown.
To avoid further unintentional resets, I moved `reset_arguments()`
right before the return, when nothing else will be using the arg_* variables.
If a namespace with PrivateTmp=true is constructed we need to restore
the context of the namespaces /tmp directory (i.e.
/tmp/systemd-private-XXXXX/tmp) to the (default) context of /tmp .
Otherwise filetransitions might result in the namespaces tmp directory
having the wrong context.
After https://github.com/systemd/systemd/pull/16981 only the presence of crypt_gensalt_ra
is checked, but there are cases where that function is available but crypt_preferred_method
is not, and they are used in the same ifdef.
Add a check for the latter as well.
Fixes#17035. We use "," as the separator between arguments in fstab and crypttab
options field, but the kernel started using "," within arguments. Users will need
to escape those nested commas.
`startswith` already returns the string with the prefix skipped, so we
can simplify this further and avoid using a magic value.
Noticed in passing.
Co-authored-by: Lennart Poettering <lennart@poettering.net>
SD_DHCP6_OPTION_IA_NA does not exist in DHCP6_ADVERTISE packet if DHCP server only provides prefix delegation. So the attempt to send the DHCP6_REQUEST packet fails on r = dhcp6_option_append_ia(&opt, &optlen, &client->lease->ia); forever.
When invoked as non-root, we would suggest re-running as root without any
further hint. But this immediately spawns a machine from the local directory,
which can be rather surprising. So let's give a better hint.
(In general, I don't think commandline programs should do "significant" things
when invoked without any arguments. In this regard it would be better if
systemd-nspawn would not spawn a machine from the current directory if called
with no arguments and at least "-D ." would be required.)
If the whole call is simple and we don't need to look at the return value
apart from the conditional, let's use a form without assignment of the return
value. When the function call is more complicated, it still makes sense to
use a temporary variable.
Lennart wanted to do this back in
01c33c1eff.
For better or worse, this wasn't done because I thought that turning on MountAPIVFS
is a compat break for RootDirectory and people might be negatively surprised by it.
Without this, search for binaries doesn't work (access_fd() requires /proc).
Let's turn it on, but still allow overriding to "no".
When RootDirectory=/, MountAPIVFS=1 doesn't work. This might be a buglet on its
own, but this patch doesn't change the situation.
Right now, we always say `/etc/crypttab` even if the source was fully
derived from the kargs.
Let's match what `systemd-fstab-generator` does and use `/proc/cmdline`
when that's the case.
(Well, at least the ones where that makes sense. Where it does't make
sense are the ones that re invoked on the root path, which cannot
possibly be a symlink.)
Let's make umount_verbose() more like mount_verbose_xyz(), i.e. take log
level and flags param. In particular the latter matters, since we
typically don't actually want to follow symlinks when unmounting.
This is a follow-up for cae1e8fb88c5a6b0960a2d0be3df8755f0c78462: we
also call the detach ioctls in the shutdown code, hence add the fsync()s
there too, just to be safe.
Let's make sure to use strna() on the strings returned by fd_get_path()
where we knowingly ignore any failures. We got this right in most cases,
but two were missing.
We use it pretty much everywhere else, hence use it here too.
This also changes the error generated from EOPNOTSUPP to ENOSYS, to
match the other cases where we do such a check. One user checked for
EOPNOTSUPP which is updated to check for ENOSYS instead.
Currently the systemd-shutdown command attempts to stop swaps, DM
(crypt, LVM2) and loop devices, but it doesn't attempt to stop MD
RAID devices, which means that if the RAID is set up on crypt,
loop, etc. device, it won't be able to stop those underlying devices.
This code extends the shutdown application to also attempt stopping
the MD RAID devices.
Signed-off-by: Hubert Kario <hubert@kario.pl>
The directory backend needs a file system path, and not a raw block
device. That's only supported for the LUKS2 backend.
Let's make this clearer in the man page and also generate a better error
message if attempted anyway.
Fixes: #17068
When exiting, let's explicitly wait for our worker processes to finish
first. That's useful if unmounting of /home/ is scheduled to happen
right after homed is down, as we then can be sure that the home
directories are properly unmounted and detached by the time homed is
fully terminated (otherwise it might happen that our worker gets killed
by the service manager, thus leaving the home directory and its backing
devices up/left for auto-clean which might be async).
Likely fixes#16842
When debugging homed while being logged into a user account manged by
homed it is a good idea to be able to run a second copy of homed. In
order to not collide with its AF_UNIX socket and bus name use, let's add
a new env var $SYSTEMD_HOME_DEBUG_SUFFIX, when set the busnames/socket
names are suffixed by it. When setting this while debugging one can
invoke an additional copy without interfering with the host one.
This has two advantages:
- we save a bit of IO in early boot because we don't look for executables
which we might never call
- if the executable is in a different place and it was specified as a
non-absolute path, it is OK if it moves to a different place. This should
solve the case paths are different in the initramfs.
Since the executable path is only available quite late, the call to
mac_selinux_get_child_mls_label() which uses the path needs to be moved down
too.
Fixes#16076.
Similar to free_and_replace. I think this should be uppercase to make it
clear that this is a macro. free_and_replace should probably be uppercased
too.
Other tools silently ignore non-executable names found in path. By checking
F_OK, we would could pick non-executable path even though there is an executable
one later.
We had a special test case that the second semicolon would be interpreted
as an executable name. We would then try to find the executable and rely
on ";" not being found to cause ENOEXEC to be returned. I think that's just
crazy. Let's treat the second semicolon as a separator and ignore the
whole thing as we would whitespace.
test-execute is quite long and even with the test name it takes a moment
to find the relevant spot when something fails. Let's make things easier
by printing the exact location.
It's just too useful to immediately see with "systemd-dissect" what
"systemd-repart" generated for us without having to populate it with
/etc/os-release. Hence let's log a message if /etc/os-release is
missing, but proceed otherwise and show the partition table.
Let's suppress the secondary arch data, since we never ever want to
mount it if we found the primary arch.
Previously we only suppressed in the Verity case, but there's little
reason to entertain the idea of a secondary arch in non-Verity
environments either, we are not going to use them, and should not do
decryption or anything like that.
Uppercase first char of log message, and indicate correct program name.
Reindent comment table at one place.
Use correct, specific, enum type at one more place.
By default we'll run a container in --console=interactive and
--console=read-only mode depending if we are invoked on a tty or not so
that the container always gets a /dev/console allocated, i.e is always
suitable to run a full init system /as those typically expect a
/dev/console to exist).
With the new --console=autopipe mode we do something similar, but
slightly different: when not invoked on a tty we'll use --console=pipe.
This means, if you invoke some tool in a container with this you'll get
full inetractivity if you invoke it on a tty but things will also be
very nicely pipeable. OTOH you cannot invoke a full init system like
this, because you might or might not become a /dev/console this way...
Prompted-by: #17070
(I named this "autopipe" rather than "auto" or so, since the default
mode probably should be named "auto" one day if we add a name for it,
and this is so similar to "auto" except that it uses pipes in the
non-tty case).
Instead of first becoming a controlling process of the payload pty
as side effect of opening it (without O_NOCTTY), and then possibly
dropping it again, let's do it cleanly an reverse the logic: let's open
the pty without becoming its controller first. Only after everything
went the way we wanted it to go become the controller explicitly.
This has the benefit that the PID 1 stub process we run (as effect of
--as-pid2) doesn't have to lose the tty explicitly, but can just
continue running with things. And we explicitly make the tty controlling
right before invoking actual payload.
In order to make sure everything works as expected validate that the
stub PID 1 in the container really has no conrolling tty by issuing the
TIOCNOTTY tty and expecting ENOTTY, and log about it.
This shouldn't change behaviour much, it just makes thins a bit cleaner,
in particular as we'll not trigger SIGHUP on ourselves (since we are
controller and session leader) due to TIOCNOTTY which we then have to
explicitly ignore.
Just some refactoring: let's place the various verity related parameters
in a common structure, and pass that around instead of the individual
parameters.
Also, let's load the PKCS#7 signature data when finding metadata
right-away, instead of delaying this until we need it. In all cases we
call this there's not much time difference between the metdata finding
and the loading, hence this simplifies things and makes sure root hash
data and its signature is now always acquired together.
Graphics tablet devices comprise multiple event nodes, usually a Pen, Finger
and Pad node (that's how the kernel postfixes them). Pen and Pad are labeled
as ID_INPUT_TABLET but the pad doesn't actually send stylus events - it
doesn't usually have BTN_TOOL_PEN, merely BTN_STYLUS.
For the last several years, libwacom has set ID_INPUT_TABLET_PAD for all pad
devices known to it based on vid/pid and a "* Pad" name match. That does not
cover devices not in libwacom. libinput relies on ID_INPUT_TABLET_PAD to
initialize the pad backend.
We can't drop ID_INPUT_TABLET without breaking userspace, but we can add
ID_INPUT_TABLET_PAD ourselves - where a device has BTN_0 in addition to
BTN_STYLUS, let's add it as a pad.
There are some devices (notably: bamboos) that use BTN_LEFT instead of BTN_0
but they are relatively rare and there's a risk of mislabeling those devices,
so let's just stick with BTN_0 only.
RC_LOCAL_SCRIPT_PATH_START and RC_LOCAL_SCRIPT_PATH_STOP were was originally
added in the conversion to meson based on the autotools name. In
4450894653 RC_LOCAL_SCRIPT_PATH_STOP was dropped.
We don't need to use such a long name.
By settings AI_ADDRCONFIG in hints we cannot for example resolve "localhost"
when the local machine only has a loopback interface. This seems like an
unnecessary restriction, drop it.
Inspired by https://bugzilla.redhat.com/show_bug.cgi?id=1839007.
With new directive SystemCallLog= it's possible to list system calls to be
logged. This can be used for auditing or temporarily when constructing system
call filters.
---
v5: drop intermediary, update HASHMAP_FOREACH_KEY() use
v4: skip useless debug messages, actually parse directive
v3: don't declare unused variables with old libseccomp
v2: fix build without seccomp or old libseccomp
Define explicit action "kill" for SystemCallErrorNumber=.
In addition to errno code, allow specifying "kill" as action for
SystemCallFilter=.
---
v7: seccomp_parse_errno_or_action() returns -EINVAL if !HAVE_SECCOMP
v6: use streq_ptr(), let errno_to_name() handle bad values, kill processes,
init syscall_errno
v5: actually use seccomp_errno_or_action_to_string(), don't fail bus unit
parsing without seccomp
v4: fix build without seccomp
v3: drop log action
v2: action -> number
On centos7 ci:
--- test-libcrypt-util begin ---
Found container virtualization none.
/* test_hash_password */
ew3bU1.hoKk4o: yes
$1$gc5rWpTB$wK1aul1PyBn9AX1z93stk1: no
$2b$12$BlqcGkB/7BFvNMXKGxDea.5/8D6FTny.cbNcHW/tqcrcyo6ZJd8u2: no
$5$lGhDrcrao9zb5oIK$05KlOVG3ocknx/ThreqXE/gk.XzFFBMTksc4t2CPDUD: no
$6$c7wB/3GiRk0VHf7e$zXJ7hN0aLZapE.iO4mn/oHu6.prsXTUG/5k1AxpgR85ELolyAcaIGRgzfwJs3isTChMDBjnthZyaMCfCNxo9I.: no
$y$j9T$$9cKOWsAm4m97WiYk61lPPibZpy3oaGPIbsL4koRe/XD: no
Since the loop to check various xcrypt functions is already in place,
adding one more is cheap. And it is nicer to check for the function
directly. People like to backport things, so we might get lucky even
without having libxcrypt.