Even with the new keyed hash table journal feature: if an attacker
manages to get access to the journal file id it could synthesize records
that result in hash collisions. Let's rotate automatically when we
notice that, so that a new journal file ID is generated, our performance
is restored and the attacker has to guess a new file ID before being
able to trigger the issue again.
That said, untrusted peers should never get access to journal files in
the first case...
This adds a new (incompatible) feature to journal files: if enabled the
hash function used for the hash tables is no longer jenkins hash with a
zero key, but siphash keyed by the file uuid that is included in the
file header anyway. This should make our hash tables more robust against
collision attacks, as long as the attacker has no read access to the
journal files. We switch from jenkins to siphash simply because it's
more well-known and we standardize for the rest of our codebase onto it.
This is hardening in order to make collision attacks harder for clients
that can forge log messages but have no read access to the logs. It has
no effect on clients that have read access.
Let's prefix this with "jenkins_" since it wraps the jenkins hash. We
want to add support for other hash functions to journald soon, hence
better be clear with what this is. In particular as all other symbols
defined by lookup3.h actually are prefixed "jenkins_".
Let's clean this up a bit, following our usual nomenclature to name
return parameters ret-xyz.
This is mostly a bit of renaming, but there's also some minor other
changes: if we return a pointer to a mmap'ed object plus its offset, in
almost all cases we are happy if either parameter is NULL in case the
caller is not interested in it. Let's fix the remaining case to do this
too, to minimize surprises.
The object flags field is a bitmask, hence don't sloppily define
_OBJECT_COMPRESSED_MAX as one mor than the previous flag. That worked OK
as long as we only had two flags, but will fall apart as soon as we have
three. Let's fix this.
(It's kinda sloppy how the string table is built here, as it will be
quite sparse as soon as we have more enum entries, but let's keep it for
now.)
Instead of reading these files at startup and never again, let's read
them when we need them. As an optimization (in particular as some of
these files contain the data for many fields at once) let's cache the
results as long as the stat data (i.e. mtime) remains stable.
Also, while we are at it, if we can't read any of these props, let's not
fail everything, but continue without the data.
Since cryptsetup 2.3.0 a new API to verify dm-verity volumes by a
pkcs7 signature, with the public key in the kernel keyring,
is available. Use it if libcryptsetup supports it.
Since cryptsetup 2.3.0 a new API to verify dm-verity volumes by a
pkcs7 signature, with the public key in the kernel keyring,
is available. Use it if libcryptsetup supports it in the
veritysetup helper binary.
https://tools.ietf.org/html/draft-knodel-terminology-02https://lwn.net/Articles/823224/
This gets rid of most but not occasions of these loaded terms:
1. scsi_id and friends are something that is supposed to be removed from
our tree (see #7594)
2. The test suite defines an API used by the ubuntu CI. We can remove
this too later, but this needs to be done in sync with the ubuntu CI.
3. In some cases the terms are part of APIs we call or where we expose
concepts the kernel names the way it names them. (In particular all
remaining uses of the word "slave" in our codebase are like this,
it's used by the POSIX PTY layer, by the network subsystem, the mount
API and the block device subsystem). Getting rid of the term in these
contexts would mean doing some major fixes of the kernel ABI first.
Regarding the replacements: when whitelist/blacklist is used as noun we
replace with with allow list/deny list, and when used as verb with
allow-list/deny-list.
This fixes commit db2b8d2e28 that
rectified parsing empty values but broke parsing explicit infinity.
Intended parsing semantics will be captured in a testcase in a follow up
commit.
Ref: #16248
Presently, CLI utilities such as systemctl will check whether they have a tty
attached or not to decide whether to parse /proc/cmdline or EFI variable
SystemdOptions looking for systemd.log_* entries.
But this check will be misleading if these tools are being launched by a
daemon, such as a monitoring daemon or automation service that runs in
background.
Make log handling of CLI tools uniform by never checking /proc/cmdline or EFI
variables to determine the logging level.
Furthermore, introduce a new log_setup_cli() shortcut to set up common options
used by most command-line utilities.
Also use double space before the tracking args at the end. Without
the comma this looks ugly, but it's a bit better with the double space.
At least it doesn't look like a variable with a type.
I'm not sure if I understand the code correctly, but it seems that if
storig in the second set failed, we'd return with the first set having
no reference on the link object, and the link object could be freed in the
future, leaving the set with a dangling reference.
On error, we'd just free the object, and not close the fd.
While at it, let's use set_ensure_consume() to make sure we don't leak
the object if it was already in the set. I'm not sure if that condition
can be achieved.
_cleanup_set_free_ is enough for unit_files, because unit_files is
allocated in set_put_strdup(), which uses string_hash_ops_free.
This fixes a leak if marker was already present in the table.
This combines set_ensure_allocated() with set_consume(). The cool thing is that
because we know the hash ops, we can correctly free the item if appropriate.
Similarly to set_consume(), the goal is to simplify handling of the case where
the item needs to be freed on error and if already present in the set.
* Drop mac_selinux_use() condition from mac_selinux_free(): if the
passed pointer holds memory we want to free it even if SELinux is
disabled
* Drop NULL-check cause man:freecon(3) states that freecon(NULL) is a
well-defined NOP
* Assert that on non-SELinux builds the passed pointer is always NULL,
to avoid memory leaks
Previously we'd used the existance of a specific AF_UNIX socket in the
abstract namespace as lock for disabling lookup recursions. (for
breaking out of the loop: userdb synthesized from nss → nss synthesized
from userdb → userdb synthesized from nss → …)
I did it like that because it promised to work the same both in static
and in dynmically linked environments and is accessible easily from any
programming language.
However, it has a weakness regarding reuse attacks: the socket is
securely hashed (siphash) from the thread ID in combination with the
AT_RANDOM secret. Thus it should not be guessable from an attacker in
advance. That's only true if a thread takes the lock only once and
keeps it forever. However, if a thread takes and releases it multiple
times an attacker might monitor that and quickly take the lock
after the first iteration for follow-up iterations.
It's not a big issue given that userdb (as the primary user for this)
never released the lock and we never made the concept a public
interface, and it was only included in one release so far, but it's
something that deserves fixing. (moreover it's a local DoS only, only
permitting to disable native userdb lookups)
With this rework the libnss_systemd.so.2 module will now export two
additional symbols. These symbols are not used by glibc, but can be used
by arbitrary programs: one can be used to disable nss-systemd, the other
to check if it is currently disabled.
The lock is per-thread. It's slightly less pretty, since it requires
people to manually link against C code via dlopen()/dlsym(), but it
should work safely without the aforementioned weakness.
This just adds a _cleanup_ helper call encapsulating dlclose().
This also means libsystemd-shared is linked against libdl now. I don't
think this is much of an issue, since libdl is part of glibc anyway, and
anything from exotic. It's not an optional part of the OS (think: NSS
requires dynamic linking), hence this pulls in no deps and is almost
certainly loaded into all process' memory anyway.
[zj: use DEFINE_TRIVIAL_CLEANUP_FUNC().]
This reverts commit 53aa85af24.
The reason is that that patch changes the dbus api to be different than
the types declared by introspection api.
Replaces #16122.
This reverts commit 097537f07a.
At least Fedora and Debian have already reverted this at the distro
level because it causes more problems than it solves. Arch is debating
reverting it as well [0] but would strongly prefer that this happens
upstream first. Fixes#15188.
[0] https://bugs.archlinux.org/task/66458
If an autostart file for GNOME has a phase specified, then this implies
it is a session service that needs to be started at a specific time.
We have no way of handling the ordering, and while it does make sense
to explicitly hide these services with X-systemd-skip, there is no point
in even trying to handle them.
Since #15533 we didn't create the mount point for selinuxfs anymore.
Before it we created it twice because we mount selinuxfs twice: once the
superblock, and once we remount its bind mound read-only. The second
mkdir would mean we'd chown() the host version of selinuxfs (since
there's only one selinuxfs superblock kernel-wide).
The right time to create mount point point is once: before we mount the
selinuxfs. But not a second time for the remount.
Fixes: #16032
We'd try to map a zero-byte buffer from a NULL pointer, which is undefined behaviour.
src/systemd/src/libsystemd/sd-bus/bus-message.c:3161:60: runtime error: applying zero offset to null pointer
#0 0x7f6ff064e691 in find_part /work/build/../../src/systemd/src/libsystemd/sd-bus/bus-message.c:3161:60
#1 0x7f6ff0640788 in message_peek_body /work/build/../../src/systemd/src/libsystemd/sd-bus/bus-message.c:3283:16
#2 0x7f6ff064e8db in enter_struct_or_dict_entry /work/build/../../src/systemd/src/libsystemd/sd-bus/bus-message.c:3967:21
#3 0x7f6ff06444ac in bus_message_enter_struct /work/build/../../src/systemd/src/libsystemd/sd-bus/bus-message.c:4009:13
#4 0x7f6ff0641dde in sd_bus_message_enter_container /work/build/../../src/systemd/src/libsystemd/sd-bus/bus-message.c:4136:21
#5 0x7f6ff0619874 in sd_bus_message_dump /work/build/../../src/systemd/src/libsystemd/sd-bus/bus-dump.c:178:29
#6 0x4293d9 in LLVMFuzzerTestOneInput /work/build/../../src/systemd/src/fuzz/fuzz-bus-message.c:39:9
#7 0x441986 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:558:15
#8 0x44121e in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool*) /src/libfuzzer/FuzzerLoop.cpp:470:3
#9 0x443164 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::__1::vector<fuzzer::SizedFile, fuzzer::fuzzer_allocator<fuzzer::SizedFile> >&) /src/libfuzzer/FuzzerLoop.cpp:770:7
#10 0x4434bc in fuzzer::Fuzzer::Loop(std::__1::vector<fuzzer::SizedFile, fuzzer::fuzzer_allocator<fuzzer::SizedFile> >&) /src/libfuzzer/FuzzerLoop.cpp:799:3
#11 0x42d2bc in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:846:6
#12 0x42978a in main /src/libfuzzer/FuzzerMain.cpp:19:10
#13 0x7f6fef13c82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
#14 0x407808 in _start (out/fuzz-bus-message+0x407808)
set_put()/set_ensure_put() return 0, not -EEXIST, if the entry is already
found in the set. In this case this does not make any difference, but let's
not confuse the reader.
We would say "ignoring", but invalidate the peer anyway.
Let's only do that if we modified the peer irreperably.
Also add comments explaining allocation handling.
Patch contains a coccinelle script, but it only works in some cases. Many
parts were converted by hand.
Note: I did not fix errors in return value handing. This will be done separate
to keep the patch comprehensible. No functional change is intended in this
patch.
It's such a common operation to allocate the set and put an item in it,
that it deserves a helper. set_ensure_put() has the same return values
as set_put().
Comes with tests!
Access to bit fields is less efficient, and since the Manager is a singleton,
a byte or two of space in the structure doesn't matter at all. (And in this
particular case, because of alignment issues, we wouldn't save anything
anyway.)
The key macros added in commit 6fe95d3020 look strange at first sight.
Add a comment with just the tablet name after each line, so that it's
obvious that these lines address device-specific issues of the EFI
firmware, and not broken/old code.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
This is an attempt to clean up the POP3/SMTP/LPR/… DHCP lease server
data logic in networkd. This reduces code duplication and fixes a number
of bugs.
This removes any support for collecting POP3/SMPT/LPR servers acquired
via local DHCP client releases since noone uses that, and given how old
these protocols are I doubt this will change. It keeps support for
configuring them for the dhcp server however.
The differences between the DNS/NTP/SIP/POP3/SMTP/LPR configuration
logics are minimized.
This removes the relevant symbols from sd-network.h (which is an
internal API only at this point after all).
This is unfortunately not well test, given the old code for this had
barely any tests. But the new code should not perform worse at least,
and allow us to release, since it corrects some interfaces visible in
the .network configuration format.
Fixes: #15943
../src/core/main.c: In function 'main':
../src/core/main.c:2637:32: error: implicit declaration of function 'cache_efi_options_variable'; did you mean 'systemd_efi_options_variable'? [-Werror=implicit-function-declaration]
(void) cache_efi_options_variable();
^~~~~~~~~~~~~~~~~~~~~~~~~~
systemd_efi_options_variable
Strictly speaking this is a compat breakage, but given the tool was
added only in the last release, let's try to sail under the radar, and
fix this early before anyone notices it wasn't supported always.
When a key is pressed, the EFI firmware gives us a 64-bit word that
contains the modifier key code in the upper 32 bits, the scan code in
the middle 16 bits, and a unicode character in the low 16 bits.
Some bogus EFI firmwares will put the unicode character in the scan code
area, for instance on the EZpad mini 4s tablet.
Others will even put the unicode character in both the scan code and
unicode areas. This is the case for instance on the Teclast X98+ II
tablet.
Add workarounds for these corner cases, only for the carriage return key
right now. Some more workarounds may be needed, e.g. for volume keys,
but I cannot test it.
Partially fixes#8466.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
The original logic was logging an "ignored" debug message, but it was still
going ahead and calling proc_cmdline_parse_given() on the NULL line. Fix that
to skip that explicitly when the EFI variable wasn't really read.
Depending on if the system has been scheduled for shutdown or for reboot pring the corresponding message (and not only "Shutdown"). Prtinting the "wrong" message when rebooting will mislead and panic people. I get these messages via cron from remote servers and it would be bad if those systems actually *did* shut down, as the email from cron is telling me. Those messages cause an adrenalin spike in our team, which wouldn't happen, if the message was "correct"
Fixes#16129.
Cache it early in startup of the system manager, right after `/run/systemd` is
created, so that further access to it can be done without accessing the EFI
filesystem at all.
The only way to control "ShowStatus" property programmatically was to use the
signal API and wait until the property "ShowStatus" switched to the new value.
This interface is rather cumbersome to use and doesn't allow to temporarily
override the current setting and later restore the overridden value in
race-free manner.
The new method also accepts the empty string as argument which allows to
restore the initial value of ShowStatus, ie the value before it was overridden
by this method.
Fixes: #11447.
half of find_hibernation_location() logged at debug level, the other
half logged at error level, and the third half didn't log at all.
Let's clean this up somewhat. Since can_sleep() is probably more
a library-style function let's downgrade everything to LOG_DEBUG and
then make sure sleep.c logs at error level, as the main program.
Prompted by the discussion on #16110, let's migrate more code to
fd_wait_for_event().
This only leaves 7 places where we call into poll()/poll() directly in
our entire codebase. (one of which is fd_wait_for_event() itself)
Use -Dstandalone-binaries=yes to enable building and installing this standalone
version of the binary without a dependency on the systemd-shared solib.
Also move the list of sources for systemd-tmpfiles to its own meson.build file.
"less" doesn't properly reset its terminal on SIGTERM, it does so only
on SIGINT. Let's thus configure SIGINT instead of SIGTERM.
I think this is something less should fix too, and clean up things
correctly on SIGTERM, too. However, given that we explicitly enable
SIGINT behaviour by passing "K" to $LESS I figure it makes sense if we
also send SIGINT instead of SIGTERM to match it.
Fixes: #16084
unit_choose_id() is about marking one of the aliases of the unit as the main
name. With the preparatory work in previous patches, all aliases of the unit
must have the same instance, so the operation to update the instance is a noop.
Upon an incoming connection for an accepting socket, we'd create a unit like
foo@0.service, then figure out that the instance name should be e.g. "0-41-0",
and then add the name foo@0-41-0.service to the unit. This obviously violates
the rule that any service needs to have a constance instance part.
So let's reverse the order: we first determine the instance name and then
create the unit with the correct name from the start.
There are two cases where we don't know the instance name:
- analyze-verify: we just do a quick check that the instance unit can be
created. So let's use a bogus instance string.
- selinux: the code wants to load the service unit to extract the ExecStart path
and query it for the selinux label. Do the same as above.
Note that in both cases it is possible that the real unit that is loaded could
be different than the one with the bogus instance value, for example if there
is a dropin for a specific instance name. We can't do much about this, since we
can't figure out the instance name in advance. The old code had the same
shortcoming.
They were added recently in acd1987a18. We can
make them more informative by using unit_type_to_string() and not repeating
unit names as much. Also, %m should not be used together with SYNTHETIC_ERRNO().
We would check that the instance is present in both units (or missing in both).
But when it is defined, it should be the same in both. The comment in the code
was explicitly saying that differing instance strings are allowed, but this
mostly seems to be a left-over from old times. The man page is pretty clear:
> the instance (if any) is always uniquely defined for a given unit and all its
> aliases.
We allocated the names set for each unit, but in the majority of cases, we'd
put only one name in the set:
$ systemctl show --value -p Names '*'|grep .|grep -v ' '|wc -l
564
$ systemctl show --value -p Names '*'|grep .|grep ' '|wc -l
16
So let's add a separate .id field, and only store aliases in the set, and only
create the set if there's at least one alias. This requires a bit of gymnastics
in the code, but I think this optimization is worth the trouble, because we
save one object for many loaded units.
In particular set_complete_move() wasn't very useful because the target
unit would always have at least one name defined, i.e. the optimization to
move the whole set over would never fire.
poll() sets POLLNVAL inside of the poll structures if an invalid fd is
passed. So far we generally didn't check for that, thus not taking
notice of the error. Given that this specific kind of error is generally
indication of a programming error, and given that our code is embedded
into our projects via NSS or because people link against our library,
let's explicitly check for this and convert it to EBADF.
(I ran into a busy loop because of this missing check when some of my
test code accidentally closed an fd it shouldn't close, so this is a
real thing)
Let systemd load a set of pre-compiled AppArmor profile files from a policy
cache at /etc/apparmor/earlypolicy. Maintenance of that policy cache must be
done outside of systemd.
After successfully loading the profiles systemd will attempt to change to a
profile named systemd.
If systemd is already confined in a profile, it will not load any profile files
and will not attempt to change it's profile.
If anything goes wrong, systemd will only log failures. It will not fail to
start.
Let's make sure $XDG_RUNTIME_DIR for the user instance and /run for the
system instance is always organized the same way: the "inaccessible"
device nodes should be placed in a subdir of either called "systemd" and
a subdir of that called "inaccessible".
This way we can emphasize the common behaviour, and only differ where
really necessary.
Follow-up for #13823
On systemd systems we generally don't need to chdir() to root, we don't
need to setup /dev/ ourselves (as PID 1 does that during earliest boot),
and we don't need to set the OOM adjustment values, as that's done via
unit files.
Hence, drop this. if people want to use udev from other init systems
they should do this on their own, I am very sure it's a good thing to do
it from outside of udevd, so that fewer privileges are required by udevd. In
particular the dev_setup() stuff is something that people who build
their own non-systemd distros want to set up themselves anyway, in
particular as they already have to mount devtmpfs themselves anyway.
Note that this only drops stuff that isn't really necessary for testing
stuff, i.e. process properties and settings that don't matter if you
quickly want to invoke udev from a terminal session to test something.
This doesn't fix anything IRL, but is a bit cleaner, since it makes sure
that arg_type is properly passed to crypt_load() in all cases.
We actually never set arg_type to CRYPT_LUKS2, which is why this wasn't
noticed before, but theoretically this might change one day, and
existing comments suggest it as possible value for arg_type, hence let's
process it properly.
let's do automatic discovery only for our native LUKS/LUKS2 headers,
since they are Linux stuff, and let's require that BitLocker to be
requested explicitly.
This makes sure cryptsetup without either "luks" nor "bitlk" in the
option string will work. Right now it would fail because we'd load the
superblock once with luks and once with bitlk and one of them would
necessarily fail.
Follow-up for #15979
dm-verity support in dissect-image at the moment is restricted to GPT
volumes.
If the image a single-filesystem type without a partition table (eg: squashfs)
and a roothash/verity file are passed, set the verity flag and mark as
read-only.
The usual behaviour when a timeout expires is to terminate/kill the
service. This is what user usually want in production systems. To debug
services that fail to start/stop (especially sporadic failures) it
might be necessary to trigger the watchdog machinery and write core
dumps, though. Likewise, it is usually just a waste of time to
gracefully stop a stuck service. Instead it might save time to go
directly into kill mode.
This commit adds two new options to services: TimeoutStartFailureMode=
and TimeoutStopFailureMode=. Both take the same values and tweak the
behavior of systemd when a start/stop timeout expires:
* 'terminate': is the default behaviour as it has always been,
* 'abort': triggers the watchdog machinery and will send SIGABRT
(unless WatchdogSignal was changed) and
* 'kill' will directly send SIGKILL.
To handle the stop failure mode in stop-post state too a new
final-watchdog state needs to be introduced.
The fact that m->show_status was serialized/deserialized made impossible any
further customisation of this setting via system.conf. IOW the value was
basically always locked unless it was changed via signals.
This patch reworks the handling of m->show_status but also makes sure that if a
new value was changed via the signal API then this value is kept and preserved
accross PID1 reexecuting or reloading.
Note: this effectively means that once the value is set via the signal
interface, it can be changed again only through the signal API.
The name 'manager_get_show_status()' suggests that the function simply reads
the property 'show_status' of the manager and hence returns a 'StatusType'
value.
However it was doing more than that since it contained the logic (based on
'show_status' but also on the state of the manager) to figure out if status
message could be emitted to the console.
Hence this patch renames the function to 'manager_should_show_status()'. The
previous name will be reused in a later patch to effectively return the value
of 'show_status' property.
No functional change.
Single filesystem images are mounted from the /dev/block/X:Y symlink
rather than /dev/loopZ, so we need to wait for udev to create it or
mounting will be racy and occasionally fail.