It's cheap to get RDRAND and given that srand() is anyway not really
useful for trusted randomness let's use RDRAND for it, after all we have
all the hard work for that already in place.
The sd_device object always has syspath and sysname, but subsytem may not.
Also, it may take some costs to get subsystem.
So, let's drop subsystem from logs.
A race condition happens when calling ask_password_auto() multiple times
to unlock several disks on boot and effectively no password caching is
utilized. This patch fixes it by polling the cache when waiting for
the password.
This splits the "environment" field of Manager into two:
transient_environment and client_environment. The former is generated
from configuration file, kernel cmdline, environment generators. The
latter is the one the user can control with "systemctl set-environment"
and similar.
Both sets are merged transparently whenever needed. Separating the two
sets has the benefit that we can safely flush out the former while
keeping the latter during daemon reload cycles, so that env var settings
from env generators or configuration files do not accumulate, but
dynamic API changes are kept around.
Note that this change is not entirely transparent to users: if the user
first uses "set-environment" to override a transient variable, and then
uses "unset-environment" to unset it again things will revert to the
original transient variable now, while previously the variable was fully
removed. This change in behaviour should not matter too much though I
figure.
Fixes: #9972
If networkd starts earlier than all network interfaces are initialized,
then uninitialized interfaces are staying in pending state and cannot
become up.
With this, such interfaces are started after receiving 'change' event.
If WorkingDirectory is on NFS, root might only have the privileges of
nobody and the chdir to the WorkingDirectory might fail, even if the
user running the service would have the proper privileges to chdir to
that directory.
Fixes#10568
There is difference between time set by the user and real elapsed time because of accuracy feature.
If you change the system date(or time) between these times, the timer drops.
You can easily reproduce it with the following command.
-----------------------------------------------------------
$ systemd-run --on-active=3s ls; sleep 3; date -s "`date`"
-----------------------------------------------------------
In the following command, the problem is rarely reproduced. But it exists.
---------------------------------------------------------------------------------------------
$ systemd-run --on-active=3s --timer-property=AccuracySec=1us ls ; sleep 1; date -s "`date`"
---------------------------------------------------------------------------------------------
Note : Global AccuracySec value.
----------------------------------------------------------------------
$ cat /etc/systemd/system.conf
DefaultTimerAccuracySec=1min
----------------------------------------------------------------------
If unit_deserialize() fails (because one read line is overly long), it returns
an error and we would have assumed that the next read would point to the next
unit to deserialize.
But instead unit_deserialize() can leave the file offset in the middle of a
line.
Therefore we need to ignore and skip the current unit in this case too.
While at it, move unit deserialization in a dedicated functions. That should
make the code easier to read.
With lz4 1.8.3, this function can now decompress partial results into a smaller
buffer. The release news don't say anything interesting, but the test case that
was previously failing now works OK.
Fixes#10259.
A test is added. It shows that with *older* lz4, a partial decompression can
occur with the returned size smaller then the requested number of bytes _and_
smaller then the size of the compressed data:
(lz4-libs-1.8.2-1.fc29.x86_64)
Compressed 4194304 → 16464
Decompressed → 4194304
Decompressed partial 12/4194304 → 4194304
Decompressed partial 1/1 → -2 (bad)
Decompressed partial 2/2 → -2 (bad)
Decompressed partial 3/3 → -2 (bad)
Decompressed partial 4/4 → -2 (bad)
Decompressed partial 5/5 → -2 (bad)
Decompressed partial 6/6 → 6 (good)
Decompressed partial 7/7 → 6 (good)
Decompressed partial 8/8 → 6 (good)
Decompressed partial 9/9 → 6 (good)
Decompressed partial 10/10 → 6 (good)
Decompressed partial 11/11 → 6 (good)
Decompressed partial 12/12 → 6 (good)
Decompressed partial 13/13 → 6 (good)
Decompressed partial 14/14 → 6 (good)
Decompressed partial 15/15 → 6 (good)
Decompressed partial 16/16 → 6 (good)
Decompressed partial 17/17 → 6 (good)
Decompressed partial 18/18 → -16459 (bad)
(lz4-libs-1.8.3-1.fc29.x86_64)
Compressed 4194304 → 16464
Decompressed → 4194304
Decompressed partial 12/4194304 → 12
Decompressed partial 1/1 → 1 (good)
Decompressed partial 2/2 → 2 (good)
Decompressed partial 3/3 → 3 (good)
Decompressed partial 4/4 → 4 (good)
...
If we got such a short "successful" decompression in decompress_startswith() as
implemented before this patch, we could be confused and return a false negative
result. But it turns out that this only occurs with small output buffer
sizes. We use greedy_realloc() to manager the buffer, so it is always at least
64 bytes. I couldn't hit a case where decompress_startswith() would actually
return a bogus result. But since the lack of proof is not conclusive, the code
for *older* lz4 is changed too, just to be safe. We cannot rule out that on a
different architecture or with some unlucky compressed string we could hit this
corner case.
The fallback code is guarded by a version check. The check uses a function not
the compile-time define, because there was no soversion bump in lz4 or new
symbols, and we could be compiled against a newer lz4 and linked at runtime
with an older one. (This happens routinely e.g. when somebody upgrades a subset
of distro packages.)
I thought this might fail with lz4 < 1.8.3, but it seems that because of
greedy_realloc, we always use a buffer that is large enough, and it always
passes.
For example, <luks.uuid>=/keyfile:LABEL="KEYFILE FS" previously wouldn't
work, because we truncated label at the first whitespace character,
i.e. LABEL="KEYFILE".
lz4-r130 was released on May 29th, 2015. Let's drop the work-around for older
versions. In particular, we won't test any new code against those ancient
releases, so we shouldn't pretend they are supported.
I went through my antique collection of fuzzers the other day
to see which ones I hadn't sent upstream yet. This one
seems to be nice to have and ready to be merged. As far as I can
tell, it hasn't managed to find anything useful yet,
but it's better to be safe than sorry especially when it comes to networking
code :-)
After curl 7.20.0, this function never returns negative error codes.
Make this consistent with the other call to this function and only
compare against CURLM_OK.
Otherwise doing comparing a CGroupMask (which is unsigned in effect)
with the result of CGROUP_CONTROLLER_TO_MASK() will result in warnings
about signedness differences.
We now have the "BPF" pseudo-controllers. These should never be assumed
to be accessible as /sys/fs/cgroup/<controller> and not through
"cgroup.subtree_control" either, hence always check explicitly before we
go to the file system. We do this through our new CGROUP_MASK_V1 and
CGROUP_MASK_V2 definitions.
Also, when we fail, don't clobber the return value.
This brings the call more in-line with our usual coding style, and
removes surprises.
None of the callers seemed to care about this behaviour.
That way we can pin a specific inode and analyze it and manipulate it
without it being swapped out beneath our hands.
Fixes a vulnerability originally found by Jann Horn from Google.
CVE-2018-15687
LP: #1796692https://bugzilla.redhat.com/show_bug.cgi?id=1639076
When we start a service process we pass the selected watchdog timeout to
it with the $WATCHDOG_USEC environment variable. If the unit file is
reconfigured later, we need to make sure to continue to honour the
original timeout, i.e. watch $WATCHDOG_USEC was set to, otherwise we'll
expect the ping at a different time as the service process is sending it
to us.
Hence, whenever we start a unit, save the watchdog timeout, and stick to
that for everything we do.
Fixes: #9467
Let's make sure we always use the right watchdog timeout: when a service
has overwritten it, then stick to it, also for follow-up processes of
the same service.
This was mostly prompted by seeing the expression "in_initrd() && flags
& PROC_CMDLINE_RD_STRICT", which uses & and && without any brackets.
Let's make that a bit more readable and hide all doubts about operator
precedence.
Let's be more careful with what we serialize: let's ensure we never
serialize strings that are longer than LONG_LINE_MAX, so that we know we
can read them back with read_line(…, LONG_LINE_MAX, …) safely.
In order to implement this all serialization functions are move to
serialize.[ch], and internally will do line size checks. We'd rather
skip a serialization line (with a loud warning) than write an overly
long line out. Of course, this is just a second level protection, after
all the data we serialize shouldn't be this long in the first place.
While we are at it also clean up logging: while serializing make sure to
always log about errors immediately. Also, (void)ify all calls we don't
expect errors in (or catch errors as part of the general
fflush_and_check() at the end.
The predicate function manager_timestamp_shall_serialize() simply says
whether to serialize or not serialize a timestamp, and should make
things a bit easier to read.
This should be much better than fgets(), as we can read substantially
longer lines and overly long lines result in proper errors.
Fixes a vulnerability discovered by Jann Horn at Google.
CVE-2018-15686
LP: #1796402https://bugzilla.redhat.com/show_bug.cgi?id=1639071
"killing" is very UNIX terminology, and not really what this is about.
Let's be more correct and say "send a UNIX signal" for the operation.
Otherwise things are really weird if users call "journalctl --rotate"
from the command line, which internally asks systemd to send SIGUSR2 to
to journald: when german locale is selected this asks the user — roughly
transliterated — whether they want to "eliminate" journald, which is
definitely not the intended meaning.
Before this when asked for rotation we'd only rotate files we have open
anyway. However there might be a number of other files on disk that are
active (i.e. not archived yet) but not open. Let's take care of those
too, so that rotation is always comprehensive, and the user gets the
guarantee that afterthe rotation all stored data is in archived files.
Fixes: #1017
journalctl --vacuum-*= only vacuums archived files. To archive all
active files the rotate operation is used. Let's add a new switch that
combines both, so that the user a single command to first move all
running journal files into archival and then vacuum them.
See: #1017
Let's split out the part that actually renames the file in case we can't
open it into a new function journal_file_dispose().
This way we can reuse the function in other cases where we want to open
a file but can't.
Let's split the function in three: the part where we archive the old
file into journal_file_archive(), and the part where we initiate the
deferred closing into journal_file_initiate_close().
journal_file_rotate() then simply becomes a wrapper around these two
calls, and the opening of the new journal file.
This useful so that we can archive journal files without having to open
new ones, i.e. to do only the archival part of the rotation, without the
rotation part.
Let's store array sizes and indexes in size_t. And let's count numbers
of files in uint64_t (simply because that is the type of the input
parameter for this of the function)
journald calls fd_get_path() a lot (it probably shouldn't, there's some
room for improvement there, but I'll leave that for another time), hence
it's worth optimizing the call a bit, in particular as it's easy.
Previously we'd open the dir /proc/self/fd/ first, before reading the
symlink inside it. This means the whole function requires three system
calls: open(), readlinkat(), close(). The reason for doing it this way
is to distinguish the case when we see ENOENT because /proc is not
mounted and the case when the fd doesn't exist.
With this change we'll directly go for the readlink(), and only if that
fails do an access() to see if /proc is mounted at all.
This optimizes the common case (where the fd is valid and /proc
mounted), in favour of the uncommon case (where the fd doesn#t exist or
/proc is not mounted).
I noticed while profiling journald that we invoke readlinkat() a ton on
open /proc/self/fd/<fd>, and that the returned paths are more often than
not longer than the 99 chars used before, when we look at archived
journal files. This means for these cases we generally need to execute
two rather than one syscalls.
Let's increase the buffer size a tiny bit, so that we reduce the number
of syscalls executed. This is really a low-hanging fruit of
optimization.
Our current set of flags allows an option to be either
use just in initrd or both in initrd and normal system.
This new flag is intended to be used in the case where
you want apply some settings just in initrd or just
in normal system.
Don't add an implicit RequiresMountsFor depenency for the
WorkingDirectory of a unit if the "-" character was used to
indicate that "a missing working directory is not considered fatal"
(see systemd.exec(5)). Otherwise systemd might fail the unit
because of missing dependencies.
This doesn't change anything in the generated source, but I think makes
semantically more sense, as these structures have undefined size, and we
only want to know the size up to the data field in these cases.
First of all, let's fix logging: let's simply log the same message as we
do on success, so that there's always the same pair of these messages
around, regardless if the suspend was successful or not. To distuingish
a successful suspend from a failed one, check the ERRNO= field of the
structured message.
In most ways a failed suspend cycle is not distuingishable from a
successful one that took no time, hence let's treat it this way, and
always pair the success message with a failure message.
This also changes a more important concept: the post-suspend callouts
are now called also called on failure, following the same logic: let's
always run them in pairs: for every pre callout a post callout has to
follow.
Let's be a tiny bit more careful here.
Also, let's rearrange things to simplify them a bit, and to not use "r"
outside of its immediate scope of validity.
Also, let's rename it to rtc_write_wake_alarm(). Both changes together
make sure rtc_write_wake_alarm() and rtc_read_time() are more alike in
their naming and semantics.