It fully initializes the address structure, so no need for pre-initialization,
and also returns the length of the address, so no need to recalculate using
SOCKADDR_UN_LEN().
socklen_t is unsigned, so let's not use an int for it. (It doesn't matter, but
seems cleaner and more portable to not assume anything about the type.)
As of the commit aae9a96d4b removing --follow
option in systemctl command, OUTPUT_FOLLOW has never been set anywhere. Let's
remove it.
The condition expression of the if-statement in show_journal() that refers to
OUTPUT_FOLLOW now thus evaluates always to true. Hence, the call of
sd_journal_wait() is in dead code, and the outer infinite for-loop is
meaningless, which we remove as cleanup.
There is no functional change by this commit.
This prevents an error in pam_systemd when logging in.
sshd[2623165]: pam_unix(sshd:session): session opened for user tony.stark(uid=10001) by (uid=0)
sshd[2623165]: pam_systemd(sshd:session): Failed to get user record: Invalid argument
Bug: https://bugs.gentoo.org/708824
For #8495: it is arguably useful to not show the length of the password
in public spaces. It is possible to press TAB or BS to cancel the asterisks,
but this is not very discoverable. Let's make it discoverable by showing
a message (in gray). The message is "erased" after the first character
is entered.
test-ask-password-api would crash if ^D was pressed.
If think the callers generally expect a non-empty strv as reply. Let's
return an error if we have nothing to return.
Also modernize test-ask-password-api a bit.
Previously, when doing an async PK query we'd store the original
callback/userdata pair and call it again after the PK request is
complete. This is problematic, since PK queries might be slow and in the
meantime the userdata might be released and re-acquired. Let's avoid
this by always traversing through the message handlers so that we always
re-resolve the callback and userdata pair and thus can be sure it's
up-to-date and properly valid.
When we do an async pk request, let's store which action/details we used
for the original request, and when we are called for the second time,
let's compare. If the action/details changed, let's not allow the access
to go through.
We use those strings as hash keys. While writing "a...b" looks strange,
"a///b" does not look so strange. Both syntaxes would actually result in the
value being correctly written to the file, but they would confuse our
de-deplication over keys. So let's normalize. Output also becomes nicer.
Add test.
In SecureBoot mode this is probably not what you want. As your cmdline
is cryptographically signed like when using Type #2 EFI Unified Kernel
Images (https://systemd.io/BOOT_LOADER_SPECIFICATION/) The user's
intention is then that the cmdline should not be modified. You want to
make sure that the system starts up as exactly specified in the signed
artifact.
This way we can use libxcrypt specific functionality such as
crypt_gensalt() and thus take benefit of the newer algorithms libxcrypt
implements. (Also adds support for a new env var $SYSTEMD_CRYPT_PREFIX
which may be used to select the hash algorithm to use for libxcrypt.)
Also, let's move the weird crypt.h inclusion into libcrypt.h so that
there's a single place for it.
In subsequent commits, calls to if_nametoindex() will be replaced by a wrapper
that falls back to alternative name resolution over netlink. netlink support
requires libsystemd (for sd-netlink), and we don't want to add any functions
that require netlink in basic/. So stuff that calls if_nametoindex() for user
supplied interface names, and everything that depends on that, needs to be
moved.
If we have a template unit template@.service, it should be allowed to specify a
dependency on a unit without an instance, bar@.service. When the unit is created,
the instance will be propagated into the target, so template@inst.service will
depend on bar@inst.service.
This commit changes unit_dependency_name_compatible(), which makes the manager
accept links like that, and unit_file_verify_alias(), so that the installation
function will agree to create a symlink like that, and finally the tests are
adjusted to pass.
It might be needed to follow symlinks more deeply when we're looking for
enablement symlinks pointing to the removed service.
Let's consider the case where service 'old' is being renamed 'new' (will happen
most likely during package upgrade). Before the service is going to be renamed,
there's the following enablement symlink:
/etc/systemd/system/multi-user.target.wants/old.service -> /usr/lib/systemd/system/old.service
In order to rename 'old' into 'new' and transparently restart the service, the
old name is still provided as a 'static' alias for the new service. This should
also help keeping backward compatibilities since the old name might still be
embedded in unit files, scripts, generators and such.
Hence after the package is upgraded, the following symlinks including the
enablement symlink are present:
/usr/lib/systemd/system/old.service -> new.service
/etc/systemd/system/multi-user.target.wants/old.service -> /usr/lib/systemd/system/old.service
If later the user decides to disable the service, we should figure out that the
enablement symlink (which still has the old name) is actually referring to 'new'
(indirectly) even if it points to the alias.
This mostly reuses existing checkers used by pid1, so handling of aliases
should be consistent. Hopefully, with the test it'll be clearer what it
happening.
Support for .wants/.requires "aliases" is restored. Those are still used in the
wild quite a bit, so we need to support them.
See https://github.com/systemd/systemd/pull/13119 for a discussion of aliases
with an instance that point to a different template: this is allowed.
When in a userns environment we cannot take away per-mount point flags
set on a mount point that was passed to us. Hence we need to be careful
to always check the actual mount flags in place and manipulate only
those flags of them that we actually want to change and not reset more
as side-effect.
We mostly got this right already in
bind_remount_recursive_with_mountinfo(), but didn't in the simpler
bind_remount_one_with_mountinfo(). Catch up.
(The old code assumed that the MountEntry.flags field contained the
right flag settings, but it actually doesn't for new mounts we just
established as for those mount() establishes the initial flags for us,
and we have to read them back to figure out which ones the kernel
picked.)
Fixes: #13622
This cleans up the function in multiple ways:
- change order of parameters to follow our usualy system of putting
return parameters last
- rename return parameter "ret" as we usually do
- don't initialize local variables we override immediately anyway
- downgrade log messages to LOG_DEBUG (since we don't log about any
other errors here above LOG_DEBUG, as this is mostly an "API"-style
function)
- handle that mnt_fs_get_vfs_options() may return NULL (according to
docs)
- manually map the ST_xyz to MS_xyz flags on statvfs(), because while
they are mostly the same, they aren't entirely the same, MS_RELATIME and
ST_RELATIME are defined differently (sad!)
To make this easier to understand, let's always log (at debug level)
when we accept or reject each device:
/swapfile: detection of swap file offset on Btrfs is not supported
/swapfile: is a candidate device.
/dev/zram0: ignoring zram swap
/dev/vdb: ignoring device with lower priority
/dev/vdc: ignoring device with lower usable space
...
If we know that hibernation will fail, refuse. This includes cases where
/sys/power/resume is set and doesn't match any device, or
/sys/power/resume_offset is set and we're not on btrfs and it doesn't match.
If /sys/power/resume is not set at all, we still accept the device with the
highest priority (see 6d176522f5 and
88bc86fcf8)
Tested cases:
1. no swap active → refuse
2. just zram swap active → refuse
3. swapfile on btrfs with /sys/power/resume{,_offset} set → OK
4. swapfile on btrfs with /sys/power/resume set, offset not set → refuse
5. swapfile on btrfs with /sys/power/resume set to nonexistent device, offset set → refuse
6. /sys/power/resume not set, offset set, candidate exists → OK (*)
7. /sys/power/resume not set, offset not set, candidate exists → OK
(*) I think this should fail, but I'm leaving that for the next commit.
Change the behavior of string arrays in a bus property map. Previously,
passing the same strv pointer to more than one map entry would result in
the old strv being freed and overwritten. With this change, an existing
strv pointer is appended to.
This is important if we want to create one strv comprised of multiple
dependencies. This makes it so callers don't have to create one strv per
dependency and subsequently merge them into one strv.
When calculation of swap file offset is unsupported, rely on the
/sys/power/resume & /sys/power/resume_offset values if configured
rather than requiring a matching swap entry to be identified.
Refactor to use dev_t for comparison of resume= device instead of string.
This has been requested many times before. Let's add it finally.
GPT auto-discovery for /var is a bit more complex than for other
partition types: the other partitions can to some degree be shared
between multiple OS installations on the same disk (think: swap, /home,
/srv). However, /var is inherently something bound to an installation,
i.e. specific to its identity, or actually *is* its identity, and hence
something that cannot be shared.
To deal with this this new code is particularly careful when it comes to
/var: it will not mount things blindly, but insist that the UUID of the
partition matches a hashed version of the machine-id of the
installation, so that each installation has a very specific /var
associated with it, and would never use any other. (We actually use
HMAC-SHA256 on the GPT partition type for /var, keyed by the machine-id,
since machine-id is something we want to keep somewhat private).
Setting the right UUID for installations takes extra care. To make
things a bit simpler to set up, we avoid this safety check for nspawn
and RootImage= in unit files, under the assumption that such container
and service images unlikely will have multiple installations on them.
The check is hence only required when booting full machines, i.e. in
in systemd-gpt-auto-generator.
To help with putting together images for full machines, PR #14368
introduces a repartition tool that can automatically fill in correctly
calculated UUIDs on first boot if images have the var partition UUID
initialized to all zeroes. With that in place systems can be put
together in a way that on first boot the machine ID is determined and
the partition table automatically adjusted to have the /var partition
with the right UUID.
It turns out that this is not necessary. When we try to resolve alias@inst, we
first check alias@inst, and if that is not found, fall back to alias@. Since we
already created a symlink for alias@, we will find that and the result will be
the same.
To support ProtectHome=y in a user namespace (which mounts the inaccessible
nodes), the nodes need to be accessible by the user. Create these paths and
devices in the user runtime directory so they can be used later if needed.
Don't try to show top level drop-in for non-existent units or when trying to
instantiate non-instantiated units:
$ systemctl cat nonexistent@.service
Assertion 'name' failed at src/shared/dropin.c:143, function unit_file_find_dirs(). Aborting.
$ systemctl cat systemd-journald@.service
Assertion 'name' failed at src/shared/dropin.c:143, function unit_file_find_dirs(). Aborting.
Ideally, we would want to report this over back over dbus. But that is pretty hard,
because the unitfile parsing logic doesn't provide any feedback.
systemd-analyze verify also doesn't notice the issue, because it doesn't look
at the [Install] section at all. Let's print a message in the logs at least.
If we call LOOP_CLR_FD and LOOP_CTL_REMOVE too rapidly, the kernel cannot deal
with that (5.3.13-300.fc31.x86_64 running on dual core
Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz).
$ sudo strace -eioctl build/test-dissect-image /tmp/foobar3.img
ioctl(3, TCGETS, 0x7ffcee47de20) = -1 ENOTTY (Inappropriate ioctl for device)
ioctl(4, LOOP_CTL_GET_FREE) = 9
ioctl(5, LOOP_SET_FD, 3) = 0
ioctl(5, LOOP_SET_STATUS64, {lo_offset=0, lo_number=0, lo_flags=LO_FLAGS_READ_ONLY|LO_FLAGS_AUTOCLEAR|LO_FLAGS_PARTSCAN, lo_file_name="", ...}) = 0
ioctl(5, BLKGETSIZE64, [299999744]) = 0
ioctl(5, CDROM_GET_CAPABILITY, 0) = -1 EINVAL (Invalid argument)
ioctl(5, BLKSSZGET, [512]) = 0
Waiting for device (parent + 0 partitions) to appear...
Found root partition, writable of type btrfs at #-1 (/dev/block/7:9)
ioctl(5, LOOP_CLR_FD) = 0
ioctl(3, LOOP_CTL_REMOVE, 9) = -1 EBUSY (Device or resource busy)
Failed to remove loop device: Device or resource busy
This seems to be clear race condition, and attaching strace is generally enough
to "win" the race. But even with strace attached, we will fail occasionally.
Let's wait a bit and retry. With the wait, on my machine, the second attempt
always succeeds:
...
Found root partition, writable of type btrfs at #-1 (/dev/block/7:9)
ioctl(5, LOOP_CLR_FD) = 0
ioctl(3, LOOP_CTL_REMOVE, 9) = -1 EBUSY (Device or resource busy)
ioctl(3, LOOP_CTL_REMOVE, 9) = 9
+++ exited with 0 +++
Without the wait, all 64 attempts will occasionally fail.
The function no longer returns the fd. This complicated semantics, because it
wasn't clear what holds the ownership: the return value or the output
parameter. There were no users of the fd in the return value, so let's
simplify things conceptually and only return the fd once.
Reduce the scope of variables.
LOOP_CLR_FD was called on the wrong fd. Let's use a cleanup function to make
this automatic and reduce chances of a mixup in the future.
CID 1408498.
CID 1409488.
This code was added in 903659e7b2. The change
that is done here is a simple fix to avoid use of a
unitialized/wrongly-initialized variable, but the bigger issue is that nothing
looks at the returned result to distinguish between 0 and a positive return
value.
At the beginning of seccomp_memory_deny_write_execute architectures
can set individual filter_syscall, block_syscall, shmat_syscall values.
The former two are then used in the call to add_seccomp_syscall_filter
but shmat_syscall is not.
Right now all shmat_syscall values are the same, so the change is a
no-op, but if ever an architecture is added/modified this would be a
subtle source for a mistake so fix it by using shmat_syscall later.
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
If seccomp_memory_deny_write_execute was fatally failing to load rules it
already returned a bad retval.
But if any adding filters failed it skipped the subsequent seccomp_load and
always returned an rc of 0 even if no rule was loaded at all.
Lets fix this requiring to (non fatally-failing) load at least one rule set.
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
So far we silently convert negative return values from run() as
EXIT_FAILURE, which is how UNIX expects it. In many cases it would be
very useful for the caller to retrieve the actual error number we exit
with. Let's generically return that via sd_notify()'s ERRNO= attribute.
This means callers can set $NOTIFY_SOCKET and get the actual error
number delivered at their doorstep just like that.
The code was using timeout=0 as the default option string. This option string
was ultimately passed to generator_write_timeouts(), which only looks for
comment=systemd.device-timeout= or x-systemd.device-timeout=, i.e. the whole
call path was bogus. Let's rework this: generator_write_timeouts() now writes
any timeouts if configured by the user. create_disk() writes out it's own
timeout, but with lower priority. Since the code path that was calling
timeout=0 was not effective, the only change is that we stop overwriting the
timeout if explicitly configured by the user.
In both code paths, ignore failure to write.
This adds json_dispatch_const_string() which is similar to
json_dispatch_string() but doesn't store a strdup()'ed copy of the
string, but a pointer directly into the JSON record.
This should simplify cases where the json variant sticks around long
enough anyway.
Let's add a concept of normalization: as preparation for signing json
records let's add a mechanism to bring JSON records into a well-defined
order so that we can safely validate JSON records.
This adds two booleans to each JsonVariant object: "sorted" and
"normalized". The latter indicates whether a variant is fully sorted
(i.e. all keys of objects listed in alphabetical order) recursively down
the tree. The former is a weaker property: it only checks whether the
keys of the object itself are sorted. All variants which are
"normalized" are also "sorted", but not vice versa.
The knowledge of the "sorted" property is then used to optimize
searching for keys in the variant by using bisection.
Both properties are determined at the moment the variants are allocated.
Since our objects are immutable this is safe.
This will call json_variant_sensitive() internally while parsing for
each allocated sub-variant. This is better than calling it a posteriori
at the end, because partially parsed variants will always be properly
erased from memory this way.
An object marked with this flag will be erased from memory when it is
freed. This is useful for dealing with sensitive data (key material,
passphrases) encoded in JSON objects.
We can break if KEYCTL_READ return value is equal to our buffer size.
From keyctl(2):
On a successful return, the return value is always the total size of
the payload data. To determine whether the buffer was of sufficient
size, check to see that the return value is less than or equal to the
value supplied in arg4.