When doing list-unit-files with --root, we would re-read the preset
list for every unit. This uses a cache to only do it once. The time
for list-unit-files goes down by about ~30%.
unit_file_query_preset() is also called from src/core/. This patch does not
touch that path, since the saving there are smaller, since preset status is
only read on demand over dbus, and caching would be more complicated.
This changes the calendarspec parser to allow expressions such as
"00:05..05", i.e. a range where start and end is the same. It also
allows expressions such as "00:1-2/3", i.e. where the repetition value
does not fit even once in the specified range. With this patch both
cases will now be optimized away, i.e. the range is removed and a fixed
value is used, which is functionally equivalent.
See #15030 for an issue where the inability to parse such expressions
caused confusion.
I think it's probably better to accept these gracefully and optimizing
them away instead of refusing them with a plain EINVAL. With a tool such
as "systemd-analyze" calendar it should be easy to figure out the
normalized form with the redundant bits optimized away.
Let's try to mangle table contents a bit to make them more suitable as
JSON field names. Specifically when we see "foo bar" convert this to
"foo_bar" as field name, as variable/field names are generally assumed
to be without spaces.
Systems where a mount point is expected to be read-write needs a way to
fail mount units that fallback as read-only.
Add a property to allow setting the -w option when calling mount(8).
$ systemctl --no-pager --root /tmp/root2/ cat ctrl-alt-del.target
Failed to resolve symlink /tmp/root2/etc/systemd/system/ctrl-alt-del.target pointing to /usr/lib/systemd/system/reboot.target, ignoring: Channel number out of range
...
Let's be extra careful whenever we return from recvmsg() and see
MSG_CTRUNC set. This generally means we ran into a programming error, as
we didn't size the control buffer large enough. It's an error condition
we should at least log about, or propagate up. Hence do that.
This is particularly important when receiving fds, since for those the
control data can be of any size. In particular on stream sockets that's
nasty, because if we miss an fd because of control data truncation we
cannot recover, we might not even realize that we are one off.
(Also, when failing early, if there's any chance the socket might be
AF_UNIX let's close all received fds, all the time. We got this right
most of the time, but there were a few cases missing. God, UNIX is hard
to use)
Let's define a new, generic bus interface that any daemon can implement
for querying/setting the log level.
We can turn this into something more powerful later on, but for now,
only expose three properties: the log level, log target and the syslog
identifier (with the former two being writable).
This is supposed to be generic, so that it can be implemented by 3rd
party daemons too, eventually.
It's not that I think that "hostname" is vastly superior to "host name". Quite
the opposite — the difference is small, and in some context the two-word version
does fit better. But in the tree, there are ~200 occurrences of the first, and
>1600 of the other, and consistent spelling is more important than any particular
spelling choice.
The watchdog ping is performed for every iteration of manager event
loop. This results in a lot of ioctls on watchdog device driver
especially during boot or if services are aggressively using sd_notify.
Depending on the watchdog device driver this may have performance
impact on embedded systems.
The patch skips sending the watchdog to device driver if the ping is
requested before half of the watchdog timeout.
It's pretty, and it highlights that the pw prompt is kinda special and
needs user input.
We suppress the emoji entirel if there's no emoji support (i.e. this
means we suppress the ASCII replacement), since it carries no additional
information, it is just decoration to highlight a line.
An stdio FILE* stream usually refers to something with a file
descriptor, but that's just "usually". It doesn't have to, when taking
fmemopen() and similar into account. Most of our calls to fileno()
assumed the call couldn't fail. In most cases this was correct, but in
some cases where we didn't know whether we work on files or memory we'd
use the returned fd as if it was unconditionally valid while it wasn't,
and passed it to a multitude of kernel syscalls. Let's fix that, and do
something reasonably smart when encountering this case.
(Running test-fileio with this patch applied will remove tons of ioctl()
calls on -1).
This makes the Environment entries more round-trippable: a similar format is
used for input and output. It is certainly more useful for users, because
showing [unprintable] on anything non-trivial makes systemctl show -p Environment
useless in many cases.
Fixes: #14723 and https://bugzilla.redhat.com/show_bug.cgi?id=1525593.
$ systemctl --user show -p Environment run-*.service
Environment=ASDF=asfd "SPACE= "
Environment=ASDF=asfd "SPACE=\n\n\n"
Environment=ASDF=asfd "TAB=\t\\" "FOO=X X"
This reworks the user validation infrastructure. There are now two
modes. In regular mode we are strict and test against a strict set of
valid chars. And in "relaxed" mode we just filter out some really
obvious, dangerous stuff. i.e. strict is whitelisting what is OK, but
"relaxed" is blacklisting what is really not OK.
The idea is that we use strict mode whenver we allocate a new user
(i.e. in sysusers.d or homed), while "relaxed" mode is when we process
users registered elsewhere, (i.e. userdb, logind, …)
The requirements on user name validity vary wildly. SSSD thinks its fine
to embedd "@" for example, while the suggested NAME_REGEX field on
Debian does not even allow uppercase chars…
This effectively liberaralizes a lot what we expect from usernames.
The code that warns about questionnable user names is now optional and
only used at places such as unit file parsing, so that it doesn't show
up on every userdb query, but only when processing configuration files
that know better.
Fixes: #15149#15090
Many of the convenience functions from sd-bus operate on verbose sets
of discrete strings for destination/path/interface/member.
For most callers, destination/path/interface are uniform, and just the
member is distinct.
This commit introduces a new struct encapsulating the
destination/path/interface pointers called BusAddress, and wrapper
functions which take a BusAddress* instead of three strings, and just
pass the encapsulated strings on to the sd-bus convenience functions.
Future commits will update call sites to use these helpers throwing
out a bunch of repetitious destination/path/interface strings littered
throughout the codebase, replacing them with some appropriately named
static structs passed by pointer to these new helpers.
Stop generating device dependencies for extrinsic mounts: we already exclude
extrinsic mounts from the usual start-up and shutdown dependencies but some
extra deps added by generator_write_device_deps() were remaining.
Giving --echo to systemd-ask-password allows to echo the user input instead
of masking it. This is useful when querying for usernames or similar.
Showing "(press TAB for no echo)" does not make sense there, so do not.
Note that pressing TAB or ESC still disables echo.
In case the dissected image has a filesystem, don't scan for partitions. This
avoids problems with services using a `RootImage=` in early boot when udevd is
not yet started.
This adds SYSTEMD_GENERATOR_PATH and SYSTEMD_ENVIRONMENT_GENERATOR_PATH
environment variables that will be read in the same manner as
SYSTEMD_UNIT_PATH is. i.e. if set, these paths will be used and a
trailing empty entry means that the usual paths will be appended, while
no trailing entry means that solely the given paths are used.
It fully initializes the address structure, so no need for pre-initialization,
and also returns the length of the address, so no need to recalculate using
SOCKADDR_UN_LEN().
socklen_t is unsigned, so let's not use an int for it. (It doesn't matter, but
seems cleaner and more portable to not assume anything about the type.)
As of the commit aae9a96d4b removing --follow
option in systemctl command, OUTPUT_FOLLOW has never been set anywhere. Let's
remove it.
The condition expression of the if-statement in show_journal() that refers to
OUTPUT_FOLLOW now thus evaluates always to true. Hence, the call of
sd_journal_wait() is in dead code, and the outer infinite for-loop is
meaningless, which we remove as cleanup.
There is no functional change by this commit.
This prevents an error in pam_systemd when logging in.
sshd[2623165]: pam_unix(sshd:session): session opened for user tony.stark(uid=10001) by (uid=0)
sshd[2623165]: pam_systemd(sshd:session): Failed to get user record: Invalid argument
Bug: https://bugs.gentoo.org/708824
For #8495: it is arguably useful to not show the length of the password
in public spaces. It is possible to press TAB or BS to cancel the asterisks,
but this is not very discoverable. Let's make it discoverable by showing
a message (in gray). The message is "erased" after the first character
is entered.
test-ask-password-api would crash if ^D was pressed.
If think the callers generally expect a non-empty strv as reply. Let's
return an error if we have nothing to return.
Also modernize test-ask-password-api a bit.
Previously, when doing an async PK query we'd store the original
callback/userdata pair and call it again after the PK request is
complete. This is problematic, since PK queries might be slow and in the
meantime the userdata might be released and re-acquired. Let's avoid
this by always traversing through the message handlers so that we always
re-resolve the callback and userdata pair and thus can be sure it's
up-to-date and properly valid.
When we do an async pk request, let's store which action/details we used
for the original request, and when we are called for the second time,
let's compare. If the action/details changed, let's not allow the access
to go through.
We use those strings as hash keys. While writing "a...b" looks strange,
"a///b" does not look so strange. Both syntaxes would actually result in the
value being correctly written to the file, but they would confuse our
de-deplication over keys. So let's normalize. Output also becomes nicer.
Add test.
In SecureBoot mode this is probably not what you want. As your cmdline
is cryptographically signed like when using Type #2 EFI Unified Kernel
Images (https://systemd.io/BOOT_LOADER_SPECIFICATION/) The user's
intention is then that the cmdline should not be modified. You want to
make sure that the system starts up as exactly specified in the signed
artifact.
This way we can use libxcrypt specific functionality such as
crypt_gensalt() and thus take benefit of the newer algorithms libxcrypt
implements. (Also adds support for a new env var $SYSTEMD_CRYPT_PREFIX
which may be used to select the hash algorithm to use for libxcrypt.)
Also, let's move the weird crypt.h inclusion into libcrypt.h so that
there's a single place for it.
In subsequent commits, calls to if_nametoindex() will be replaced by a wrapper
that falls back to alternative name resolution over netlink. netlink support
requires libsystemd (for sd-netlink), and we don't want to add any functions
that require netlink in basic/. So stuff that calls if_nametoindex() for user
supplied interface names, and everything that depends on that, needs to be
moved.
If we have a template unit template@.service, it should be allowed to specify a
dependency on a unit without an instance, bar@.service. When the unit is created,
the instance will be propagated into the target, so template@inst.service will
depend on bar@inst.service.
This commit changes unit_dependency_name_compatible(), which makes the manager
accept links like that, and unit_file_verify_alias(), so that the installation
function will agree to create a symlink like that, and finally the tests are
adjusted to pass.
It might be needed to follow symlinks more deeply when we're looking for
enablement symlinks pointing to the removed service.
Let's consider the case where service 'old' is being renamed 'new' (will happen
most likely during package upgrade). Before the service is going to be renamed,
there's the following enablement symlink:
/etc/systemd/system/multi-user.target.wants/old.service -> /usr/lib/systemd/system/old.service
In order to rename 'old' into 'new' and transparently restart the service, the
old name is still provided as a 'static' alias for the new service. This should
also help keeping backward compatibilities since the old name might still be
embedded in unit files, scripts, generators and such.
Hence after the package is upgraded, the following symlinks including the
enablement symlink are present:
/usr/lib/systemd/system/old.service -> new.service
/etc/systemd/system/multi-user.target.wants/old.service -> /usr/lib/systemd/system/old.service
If later the user decides to disable the service, we should figure out that the
enablement symlink (which still has the old name) is actually referring to 'new'
(indirectly) even if it points to the alias.
This mostly reuses existing checkers used by pid1, so handling of aliases
should be consistent. Hopefully, with the test it'll be clearer what it
happening.
Support for .wants/.requires "aliases" is restored. Those are still used in the
wild quite a bit, so we need to support them.
See https://github.com/systemd/systemd/pull/13119 for a discussion of aliases
with an instance that point to a different template: this is allowed.
When in a userns environment we cannot take away per-mount point flags
set on a mount point that was passed to us. Hence we need to be careful
to always check the actual mount flags in place and manipulate only
those flags of them that we actually want to change and not reset more
as side-effect.
We mostly got this right already in
bind_remount_recursive_with_mountinfo(), but didn't in the simpler
bind_remount_one_with_mountinfo(). Catch up.
(The old code assumed that the MountEntry.flags field contained the
right flag settings, but it actually doesn't for new mounts we just
established as for those mount() establishes the initial flags for us,
and we have to read them back to figure out which ones the kernel
picked.)
Fixes: #13622
This cleans up the function in multiple ways:
- change order of parameters to follow our usualy system of putting
return parameters last
- rename return parameter "ret" as we usually do
- don't initialize local variables we override immediately anyway
- downgrade log messages to LOG_DEBUG (since we don't log about any
other errors here above LOG_DEBUG, as this is mostly an "API"-style
function)
- handle that mnt_fs_get_vfs_options() may return NULL (according to
docs)
- manually map the ST_xyz to MS_xyz flags on statvfs(), because while
they are mostly the same, they aren't entirely the same, MS_RELATIME and
ST_RELATIME are defined differently (sad!)
To make this easier to understand, let's always log (at debug level)
when we accept or reject each device:
/swapfile: detection of swap file offset on Btrfs is not supported
/swapfile: is a candidate device.
/dev/zram0: ignoring zram swap
/dev/vdb: ignoring device with lower priority
/dev/vdc: ignoring device with lower usable space
...
If we know that hibernation will fail, refuse. This includes cases where
/sys/power/resume is set and doesn't match any device, or
/sys/power/resume_offset is set and we're not on btrfs and it doesn't match.
If /sys/power/resume is not set at all, we still accept the device with the
highest priority (see 6d176522f5 and
88bc86fcf8)
Tested cases:
1. no swap active → refuse
2. just zram swap active → refuse
3. swapfile on btrfs with /sys/power/resume{,_offset} set → OK
4. swapfile on btrfs with /sys/power/resume set, offset not set → refuse
5. swapfile on btrfs with /sys/power/resume set to nonexistent device, offset set → refuse
6. /sys/power/resume not set, offset set, candidate exists → OK (*)
7. /sys/power/resume not set, offset not set, candidate exists → OK
(*) I think this should fail, but I'm leaving that for the next commit.
Change the behavior of string arrays in a bus property map. Previously,
passing the same strv pointer to more than one map entry would result in
the old strv being freed and overwritten. With this change, an existing
strv pointer is appended to.
This is important if we want to create one strv comprised of multiple
dependencies. This makes it so callers don't have to create one strv per
dependency and subsequently merge them into one strv.
When calculation of swap file offset is unsupported, rely on the
/sys/power/resume & /sys/power/resume_offset values if configured
rather than requiring a matching swap entry to be identified.
Refactor to use dev_t for comparison of resume= device instead of string.
This has been requested many times before. Let's add it finally.
GPT auto-discovery for /var is a bit more complex than for other
partition types: the other partitions can to some degree be shared
between multiple OS installations on the same disk (think: swap, /home,
/srv). However, /var is inherently something bound to an installation,
i.e. specific to its identity, or actually *is* its identity, and hence
something that cannot be shared.
To deal with this this new code is particularly careful when it comes to
/var: it will not mount things blindly, but insist that the UUID of the
partition matches a hashed version of the machine-id of the
installation, so that each installation has a very specific /var
associated with it, and would never use any other. (We actually use
HMAC-SHA256 on the GPT partition type for /var, keyed by the machine-id,
since machine-id is something we want to keep somewhat private).
Setting the right UUID for installations takes extra care. To make
things a bit simpler to set up, we avoid this safety check for nspawn
and RootImage= in unit files, under the assumption that such container
and service images unlikely will have multiple installations on them.
The check is hence only required when booting full machines, i.e. in
in systemd-gpt-auto-generator.
To help with putting together images for full machines, PR #14368
introduces a repartition tool that can automatically fill in correctly
calculated UUIDs on first boot if images have the var partition UUID
initialized to all zeroes. With that in place systems can be put
together in a way that on first boot the machine ID is determined and
the partition table automatically adjusted to have the /var partition
with the right UUID.
It turns out that this is not necessary. When we try to resolve alias@inst, we
first check alias@inst, and if that is not found, fall back to alias@. Since we
already created a symlink for alias@, we will find that and the result will be
the same.
To support ProtectHome=y in a user namespace (which mounts the inaccessible
nodes), the nodes need to be accessible by the user. Create these paths and
devices in the user runtime directory so they can be used later if needed.
Don't try to show top level drop-in for non-existent units or when trying to
instantiate non-instantiated units:
$ systemctl cat nonexistent@.service
Assertion 'name' failed at src/shared/dropin.c:143, function unit_file_find_dirs(). Aborting.
$ systemctl cat systemd-journald@.service
Assertion 'name' failed at src/shared/dropin.c:143, function unit_file_find_dirs(). Aborting.
Ideally, we would want to report this over back over dbus. But that is pretty hard,
because the unitfile parsing logic doesn't provide any feedback.
systemd-analyze verify also doesn't notice the issue, because it doesn't look
at the [Install] section at all. Let's print a message in the logs at least.
If we call LOOP_CLR_FD and LOOP_CTL_REMOVE too rapidly, the kernel cannot deal
with that (5.3.13-300.fc31.x86_64 running on dual core
Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz).
$ sudo strace -eioctl build/test-dissect-image /tmp/foobar3.img
ioctl(3, TCGETS, 0x7ffcee47de20) = -1 ENOTTY (Inappropriate ioctl for device)
ioctl(4, LOOP_CTL_GET_FREE) = 9
ioctl(5, LOOP_SET_FD, 3) = 0
ioctl(5, LOOP_SET_STATUS64, {lo_offset=0, lo_number=0, lo_flags=LO_FLAGS_READ_ONLY|LO_FLAGS_AUTOCLEAR|LO_FLAGS_PARTSCAN, lo_file_name="", ...}) = 0
ioctl(5, BLKGETSIZE64, [299999744]) = 0
ioctl(5, CDROM_GET_CAPABILITY, 0) = -1 EINVAL (Invalid argument)
ioctl(5, BLKSSZGET, [512]) = 0
Waiting for device (parent + 0 partitions) to appear...
Found root partition, writable of type btrfs at #-1 (/dev/block/7:9)
ioctl(5, LOOP_CLR_FD) = 0
ioctl(3, LOOP_CTL_REMOVE, 9) = -1 EBUSY (Device or resource busy)
Failed to remove loop device: Device or resource busy
This seems to be clear race condition, and attaching strace is generally enough
to "win" the race. But even with strace attached, we will fail occasionally.
Let's wait a bit and retry. With the wait, on my machine, the second attempt
always succeeds:
...
Found root partition, writable of type btrfs at #-1 (/dev/block/7:9)
ioctl(5, LOOP_CLR_FD) = 0
ioctl(3, LOOP_CTL_REMOVE, 9) = -1 EBUSY (Device or resource busy)
ioctl(3, LOOP_CTL_REMOVE, 9) = 9
+++ exited with 0 +++
Without the wait, all 64 attempts will occasionally fail.
The function no longer returns the fd. This complicated semantics, because it
wasn't clear what holds the ownership: the return value or the output
parameter. There were no users of the fd in the return value, so let's
simplify things conceptually and only return the fd once.
Reduce the scope of variables.
LOOP_CLR_FD was called on the wrong fd. Let's use a cleanup function to make
this automatic and reduce chances of a mixup in the future.
CID 1408498.
CID 1409488.
This code was added in 903659e7b2. The change
that is done here is a simple fix to avoid use of a
unitialized/wrongly-initialized variable, but the bigger issue is that nothing
looks at the returned result to distinguish between 0 and a positive return
value.
At the beginning of seccomp_memory_deny_write_execute architectures
can set individual filter_syscall, block_syscall, shmat_syscall values.
The former two are then used in the call to add_seccomp_syscall_filter
but shmat_syscall is not.
Right now all shmat_syscall values are the same, so the change is a
no-op, but if ever an architecture is added/modified this would be a
subtle source for a mistake so fix it by using shmat_syscall later.
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
If seccomp_memory_deny_write_execute was fatally failing to load rules it
already returned a bad retval.
But if any adding filters failed it skipped the subsequent seccomp_load and
always returned an rc of 0 even if no rule was loaded at all.
Lets fix this requiring to (non fatally-failing) load at least one rule set.
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>