$ git grep -e 'This program is free software' -l |grep -v LICENSE | \
xargs perl -i -0pe 's/ \* This program.*?for more details.\s*\*\n( \* You should have.*licenses.>.\n)?//gms'
For some reason they were missed previously. All those files seem to
have proper SDPX tags.
1) mv /var/tmp /var/tmp.old
2) mkdir /tmp/varrr
3) ln -s /tmp/varrr /var/tmp
Now, when a service has PrivateTmp=yes, during namespace setup,
/tmp is first mounted over with a new mount. Then, when /var/tmp
is being resolved, it points to /tmp/varrr, which by then doesn't
exist, because it had already been obscured.
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
On overlayfs, FTW_MOUNT causes nftw to not list *any* files because the
condition used by glibc to verify that it's on the same mountpoint doesn't work
on overlayfs, see https://bugzilla.suse.com/show_bug.cgi?id=1096807 for the
details.
However using FTW_MOUNT doesn't seem to be really needed when walking through
the keymap directorie tree. So until the glibc or the kernel is fixed (which
might take some time), let's make localectl works with overlayfs.
There's a small side effect here, by which regular (non-directory) files with
bind mounts will be parsed while they were skipped by the previous logic.
To make debugging easier, this patches allows one to change the log target and
do reload/reexec without modifying configuration permanently, which makes
debugging easier.
Indeed if one changed the log target at runtime (via the bus or via signals),
the change was lost on the next reload/reexecution.
In order to restore back the default value (set via system.conf, environment
variables or any other means ), the empty string in the "LogTarget" property is
now supported as well as sending SIGTRMIN+26 signal.
To make debugging easier, this patches allows one to change the log level and
do reload/reexec without modifying configuration permanently, which makes
debugging easier.
Indeed if one changed the log max level at runtime (via the bus or via
signals), the change was lost on the next daemon reload/reexecution.
In order to restore the original value back (set via system.conf, environment
variables or any other means), the empty string in the "LogLevel" property is
now supported as well as sending SIGRTMIN+23 signal.
Let's add "const" where we don't change structures passed.
Also, we generally use "unsigned char" for IP prefix length values, do
so here too. Previously different parts of the sd-radv.h API used
different types for this.
sd_radv_stop is called from two places. if sd_radv_stop is alrady
success then just don't try to close it .
```
systemd-networkd[604]: RADV: Stopping IPv6 Router Advertisement daemon
systemd-networkd[604]: RADV: Unable to send last Router Advertisement with router lifetime set to zero: Bad file descriptor <==================HERE
systemd-networkd[604]: RADV: Updated prefix 2a0a:*:*:fc::/64 preferred 1h valid 2h
systemd-networkd[604]: RADV: Started IPv6 Router Advertisement daemon
```
Closes one of the issue #8960
C++03: "An rvalue of arithmetic, enumeration, pointer, or pointer to member
type can be converted to an rvalue of type bool. A zero value, null pointer
value, or null member pointer value is converted to false; any other value is
converted to true"
C should behave the same because pointers are scalars in C, but let's verify
that.
They are not needed, because anything that is non-zero is converted
to true.
C11:
> 6.3.1.2: When any scalar value is converted to _Bool, the result is 0 if the
> value compares equal to 0; otherwise, the result is 1.
https://stackoverflow.com/questions/31551888/casting-int-to-bool-in-c-c
DNS queries need timeout values to detect whether a DNS server is
unresponsive or, if the query is sent over UDP, whether a DNS message
was lost and has to be resent. The total time that it takes to answer a
query to arrive is t + RTT, where t is the maximum time that the DNS
server that is being queried needs to answer the query.
An authoritative server stores a copy of the zone that it serves in main
memory or secondary storage, so t is very small and therefore the time
that it takes to answer a query is almost entirely determined by the
RTT. Modern authoritative server software keeps its zones in main memory
and, for example, Knot DNS and NSD are able to answer in less than
100 µs [1]. So iterative resolvers continuously measure the RTT to
optimize their query timeouts and to resend queries more quickly if they
are lost.
systemd-resolved is a stub resolver: it forwards DNS queries to an
upstream resolver and waits for an answer. So the time that it takes for
systemd-resolved to answer a query is determined by the RTT and the time
that it takes the upstream resolver to answer the query.
It seems common for iterative resolver software to set a total timeout
for the query. Such total timeout subsumes the timeout of all queries
that the iterative has to make to answer a query. For example, BIND
seems to use a default timeout of 10 s.
At the moment systemd-resolved derives its query timeout entirely from
the RTT and does not consider the query timeout of the upstream
resolver. Therefore it often mistakenly degrades the feature set of its
upstream resolvers if it takes them longer than usual to answer a query.
It has been reported to be a considerable problem in practice, in
particular if DNSSEC=yes. So the query timeout systemd-resolved should
be derived from the timeout of the upstream resolved and the RTT to the
upstream resolver.
At the moment systemd-resolved measures the RTT as the time that it
takes the upstream resolver to answer a query. This clearly leads to
incorrect measurements. In order to correctly measure the RTT
systemd-resolved would have to measure RTT separately and continuously,
for example with a query with an empty question section or a query for
the SOA RR of the root zone so that the upstream resolver would be able
to answer to query without querying another server. However, this
requires significant changes to systemd-resolved. So it seems best to
postpone them until other issues have been addressed and to set the
resend timeout to a fixed value for now.
As mentioned, BIND seems to use a timeout of 10 s, so perhaps 12 s is a
reasonable value that also accounts for common RTT values. If we assume
that the we are going to retry, it could be less. So it should be enough
to set the resend timeout to DNS_TIMEOUT_MAX_USEC as
DNS_SERVER_FEATURE_RETRY_ATTEMPTS * DNS_TIMEOUT_MAX_USEC = 15 s.
However, this will not solve the incorrect feature set degradation and
should be seen as a temporary change until systemd-resolved does
probe the feature set of an upstream resolver independently from the
actual queries.
[1] https://www.knot-dns.cz/benchmark/
Let's always write "1 << 0", "1 << 1" and so on, except where we need
more than 31 flag bits, where we write "UINT64(1) << 0", and so on to force
64bit values.
This new setting is supposed to be useful in most cases where
"MountFlags=slave" is currently used, i.e. as an explicit way to run a
service in its own mount namespace and decouple propagation from all
mounts of the new mount namespace towards the host.
The effect of MountFlags=slave and PrivateMounts=yes is mostly the same,
as both cause a CLONE_NEWNS namespace to be opened, and both will result
in all mounts within it to be mounted MS_SLAVE. The difference is mostly
on the conceptual/philosophical level: configuring the propagation mode
is nothing people should have to think about, in particular as the
matter is not precisely easyto grok. Moreover, MountFlags= allows configuration
of "private" and "slave" modes which don't really make much sense to use
in real-life and are quite confusing. In particular PrivateMounts=private means
mounts made on the host stay pinned for good by the service which is
particularly nasty for removable media mount. And PrivateMounts=shared
is in most ways a NOP when used a alone...
The main technical difference between setting only MountFlags=slave or
only PrivateMounts=yes in a unit file is that the former remounts all
mounts to MS_SLAVE and leaves them there, while that latter remounts
them to MS_SHARED again right after. The latter is generally a nicer
approach, since it disables propagation, while MS_SHARED is afterwards
in effect, which is really nice as that means further namespacing down
the tree will get MS_SHARED logic by default and we unify how
applications see our mounts as we always pass them as MS_SHARED
regardless whether any mount namespacing is used or not.
The effect of PrivateMounts=yes was implied already by all the other
mount namespacing options. With this new option we add an explicit knob
for it, to request it without any other option used as well.
See: #4393
This drops needless safety checks that ensure we only reference block
devices for blockio/io settings. The backing code was already able to
accept regular file system paths too, in which case the backing device
node of that file system would be used. Hence, let's drop the artificial
restrictions and open up this underlying functionality.
We document the rule that return values >= 0 of functions are supposed
to indicate success, and that in case of success all return parameters
should be initialized. Let's actually do so.
Just a tiny coding style fix-up.
Jun 11 14:29:12 krowka systemd[1]: /etc/systemd/system/workingdir.service:6: = path is not normalizedWorkingDirectory: /../../etc
↓
Jun 11 14:32:12 krowka systemd[1]: /etc/systemd/system/workingdir.service:6: WorkingDirectory= path is not normalized: /../../etc
Since bb28e68477 parsing failures of
certain unit file settings will result in load failures of units. This
introduces a new load state "bad-setting" that is entered in precisely
this case.
With this addition error messages on bad settings should be a lot more
explicit, as we don't have to show some generic "errno" error in that
case, but can explicitly say that a bad setting is at fault.
Internally this unit load state is entered as soon as any configuration
loader call returns ENOEXEC. Hence: config parser calls should return
ENOEXEC now for such essential unit file settings. Turns out, they
generally already do.
Fixes: #9107
The load_error is only valid in some load_state cases, lets generate
prettier messages for other cases too, by reusing the
bus_unit_validate_load_state() call which does jus that.
Clients (such as systemctl) ignored LoadError unles LoadState was
"error" before. With this change they could even show LoadError in other
cases and it would show a useful name.
Let's use a switch() statement, cover more cases with pretty messages.
Also let's rename it to "validate", as that's more specific that
"check", as it implies checking for a "valid"/"good" state, which is
what this function does.
This makes two changes: first of all we will now explicitly check
whether a domain to test against an NSEC record is actually below the
signer's name. This is relevant for NSEC records that chain up the end
and the beginning of a zone: we shouldn't alow that NSEC record to match
against domains outside of the zone.
This also fixes how we handle NSEC checks for domains that are prefixes
of the NSEC RR domain itself, fixing #8164 which triggers this specific
case. The non-wildcard NSEC check is simplified for that, we can
directly make our between check, there's no need to find the "Next
Closer" first, as the between check should not be affected by additional
prefixes. For the wild card NSEC check we'll prepend the asterisk in
this case to the NSEC RR itself to make a correct check.
Fixes: #8164
oss-fuzz flags this as:
==1==WARNING: MemorySanitizer: use-of-uninitialized-value
0. 0x7fce77519ca5 in ascii_is_valid systemd/src/basic/utf8.c:252:9
1. 0x7fce774d203c in ellipsize_mem systemd/src/basic/string-util.c:544:13
2. 0x7fce7730a299 in print_multiline systemd/src/shared/logs-show.c:244:37
3. 0x7fce772ffdf3 in output_short systemd/src/shared/logs-show.c:495:25
4. 0x7fce772f5a27 in show_journal_entry systemd/src/shared/logs-show.c:1077:15
5. 0x7fce772f66ad in show_journal systemd/src/shared/logs-show.c:1164:29
6. 0x4a2fa0 in LLVMFuzzerTestOneInput systemd/src/fuzz/fuzz-journal-remote.c:64:21
...
I didn't reproduce the issue, but this looks like an obvious error: the length
is specified, so we shouldn't use the string with any functions for normal
C-strings.
This introduces a has_data boolean field in struct unit_files which can
be used to detect the end of the array.
Use a _cleanup_ for struct unit_files in acquire_time_data and its
callers. Code for acquire_time_data is also simplified by replacing
goto's with straight returns.
Tested: By running the commands below, also checking them under valgrind.
- build/systemd-analyze blame
- build/systemd-analyze critical-chain
- build/systemd-analyze plot
Fixes: Coverity finding CID 996464.
Let's add some protection for split horizon setups, where different
zones are visible on the same global DNS servers depending on where you
come from.
Fixes: #9196
This simplifies the code a bit and hopefully fixes Coverity finding
CID 1382966. There was not actually a resource leak here (Coverity
seemed to be confused by thinking log_oom() could actually return 0),
but the fix doesn't hurt and should make this code more resilient to
future refactorings.
Tested: builds fine, manually called scsi_id, seems to work ok.
This fixes an insecure use of tainted data as argument to functions that
allocate memory and read from files, which could be tricked into getting
networkctl to allocate a large amount of memory and fill it with file
data.
This was uncovered by Coverity. Fixes CID 1393254.
This should fix a leak of the allocated Prefix if sd_radv_prefix_new
fails for some reason.
The code was already initializing prefix to NULL and using TAKE_PTR to
return it, so only the _cleanup_ was missing.
Fixes Coverity finding CID 1382976.
This shouldn't be necessary, since read() should never return a size
larger than the size of the buffer passed in, but Coverity doesn't seem
to understand that.
We could possibly fix this with a model file for Coverity, but given
changing the code is not that much of a biggie, let's just do that
instead.
Fixes CID 996458: Overflowed or truncated value (or a value computed
from an overflowed or truncated value) `pos` used as array index.
Tested: `ninja -C build/ test`, builds without warnings, test cases pass.
The refactor in #9200 inadvertently dropped the variable assignment to
traverse the device and its hierarchy in add_matches_for_device().
This was uncovered by Coverity (CID #1393310).
Fix that by restoring the assignment.
Tested: `journalctl /dev/sda` now filters journalctl output again.
This is mostly fall-out from d1a1f0aaf0,
however some cases are older bugs.
There might be more issues lurking, this was a simple grep for "%m"
across the tree, with all lines removed that mention "errno" at all.
Previously the enumerate() callback defined for each unit type would do
two things:
1. It would create perpetual units (i.e. -.slice, system.slice, -.mount and
init.scope)
2. It would enumerate units from /proc/self/mountinfo, /proc/swaps and
the udev database
With this change these two parts are split into two seperate methods:
enumerate() now only does #2, while enumerate_perpetual() is responsible
for #1. Why make this change? Well, perpetual units should have a
slightly different effect that those found through enumeration: as
perpetual units should be up unconditionally, perpetually and thus never
change state, they should also not pull in deps by their state changing,
not even when the state is first set to active. Thus, their state is
generally initialized through the per-device coldplug() method in
similar fashion to the deserialized state from a previous run would be
put into place. OTOH units found through regular enumeration should
result in state changes (and thus pull in deps due to state changes),
hence their state should be put in effect in the catchup() method
instead. Hence, given this difference, let's also separate the
functions, so that the rule is:
1. What is created in enumerate_perpetual() should be started in
coldplug()
2. What is created in enumerate() should be started in catchup().
This reworks how device units are "powered on".
This makes sure that any device changes that might have happened while
we were restarting/reloading will be noticed properly. For that we'll
now properly serialize/deserialize both the device unit state and the
device "found" flags, and restore these initially in the "coldplug"
phase of the manager deserialization. While enumerating the udev devices
during startup we'll put together a new "found" flags mask, which we'll
the switch to in the "catchup" phase of the manager deserialization,
which follows the "coldplug" phase.
Note that during the "coldplug" phase no unit state change events are
generated, which is different for the "catchall" phase which will do
that. Thus we correctly make sure that the deserialized state won't pull
in new deps, but any device's change while we were reloading would.
Fixes: #8832
Replaces: #8675
This is very similar to the existing unit method coldplug() but is
called a bit later. The idea is that that coldplug() restores the unit
state from before any prior reload/restart, i.e. puts the deserialized
state in effect. The catchup() call is then called a bit later, to
catch up with the system state for which we missed notifications while
we were reloading. This is only really useful for mount, swap and device
mount points were we should be careful to generate all missing unit
state change events (i.e. call unit_notify() appropriately) for
everything that happened while we were reloading.
let's drop the "now" argument, it's exactly what MANAGER_IS_RUNNING()
returns, hence let's use that instead to simplify things.
Moreover, let's change the add/found argument pair to become found/mask,
which allows us to change multiple flags at the same time into opposing
directions, which will be useful later on.
Also, let's change the return type to void. It's a notifier call where
callers will ignore the return value anyway as it is nothing actionable.
Should not change behaviour.
We do this checks as protection against bind mount cycles on the same
file system. However, the check wasn't really effective for that, as
it would only detect cycles A → B → A this way. By using
fs_is_mount_point() we'll also detect cycles A → A.
Also, while we are at it, make these file system boundary checks
optional. This is not used anywhere, but might be eventually...
Most importantly though add a longer blurb explanation the why.
This fixes the copy routines on overlay filesystem, which typically
returns the underlying st_dev for files, symlinks, etc.
The value of st_dev is guaranteed to be the same for directories, so
checking it on directories only fixes this code on overlay filesystem
and still keeps it from traversing mount points (which was the original
intent.)
There's a small side effect here, by which regular (non-directory) files
with bind mounts will be copied by the new logic (while they were
skipped by the previous logic.)
Tested: ./build/test-copy with an overlay on /tmp.
Fixes: #9134
This adds a function sd_bus_slot_set_destroy_callback() to set a function
which can free userdata or perform other cleanups.
sd_bus_slot_get_destory_callback() queries the callback, and is included
for completeness.
Without something like this, for floating asynchronous callbacks, which might
be called or not, depending on the sequence of events, it's hard to perform
resource cleanup. The alternative would be to always perform the cleanup from
the caller too, but that requires more coordination and keeping of some shared
state. It's nicer to keep the cleanup contained between the callback and the
function that requests the callback.
This shows a minor memleak:
==1883== 24 bytes in 1 blocks are definitely lost in loss record 1 of 1
==1883== at 0x4C2DBAB: malloc (vg_replace_malloc.c:299)
==1883== by 0x4E9D385: malloc_multiply (alloc-util.h:69)
==1883== by 0x4EA2959: bus_request_name_async_may_reload_dbus (bus-util.c:1841)
==1883== by ...
The exchange of messages is truncated at two different points: once right
after the first callback is requested, and the second time after the full
sequence has run (usually resulting in an error because of policy).
LOG_MESSAGE is just a wrapper, but it keeps the arguments indented together
with the format string, so put the argument inside of the macro invocation.
(No functional change.)
Also use lowercase for "trust anchor" — it should either be all capitaled or not
at all, and it's not a proper name, so let's make it all lowercase.
Also, add a newline, to make the string more readable. "%s" can expand to
something that is quite long.
It's a bit prettier that day as the function won't silently overwrite
any possibly pre-initialized field, and destroy it right before we
allocate a new event source.
This tests a couple of corner cases of the sd-event API including
changing priorities of existing event sources, as well as overflow
conditions of the inotify queue.
This adds a new call sd_event_add_inotify() which allows watching for
inotify events on specified paths.
sd-event will try to minimize the number of inotify fds allocated, and
will try to add file watches to the same inotify fd objects as far as
that's possible. Doing this kind of inotify object should optimize
behaviour in programs that watch a limited set of mostly independent
files as in most cases a single inotify object will suffice for watching
all files.
Traditionally, this kind of coalescing logic (i.e. that multiple event
sources are implemented on top of a single inotify object) was very hard
to do, as the inotify API had serious limitations: it only allowed
adding watches by path, and would implicitly merge watches installed on
the same node via different path, without letting the caller know about
whether such merging took place or not.
With the advent of O_PATH this issue can be dealt with to some point:
instead of adding a path to watch to an inotify object with
inotify_add_watch() right away, we can open the path with O_PATH first,
call fstat() on the fd, and check the .st_dev/.st_ino fields of that
against a list of watches we already have in place. If we find one we
know that the inotify_add_watch() will update the watch mask of the
existing watch, otherwise it will create a new watch. To make this
race-free we use inotify_add_watch() on the /proc/self/fd/ path of the
O_PATH fd, instead of the original path, so that we do the checking and
watch updating with guaranteed the same inode.
This approach let's us deal safely with inodes that may appear under
various different paths (due to symlinks, hardlinks, bind mounts, fs
namespaces). However it's not a perfect solution: currently the kernel
has no API for changing the watch mask of an existing watch -- unless
you have a path or fd to the original inode. This means we can "merge"
the watches of the same inode of multiple event sources correctly, but
we cannot "unmerge" it again correctly in many cases, as access to the
original inode might have been lost, due to renames, mount/unmount, or
deletions. We could in theory always keep open an O_PATH fd of the inode
to watch so that we can change the mask anytime we want, but this is
highly problematics, as it would consume too many fds (and in fact the
scarcity of fds is the reason why watch descriptors are a separate
concepts from fds) and would keep the backing mounts busy (wds do not
keep mounts busy, fds do). The current implemented approach to all this:
filter in userspace and accept that the watch mask on some inode might
be higher than necessary due to earlier installed event sources that
might have ceased to exist. This approach while ugly shouldn't be too
bad for most cases as the same inodes are probably wacthed for the same
masks in most implementations.
In order to implement priorities correctly a seperate inotify object is
allocated for each priority that is used. This way we get separate
per-priority event queues, of which we never dequeue more than a few
events at a time.
Fixes: #3982
Scope units don't have a main or control process we can watch, hence
let's explicitly watch the PIDs contained in them early on, just to make
things more robust and have at least something to watch.
This reworks how systemd tracks processes on cgroupv1 systems where
cgroup notification is not reliable. Previously, whenever we had reason
to believe that new processes showed up or got removed we'd scan the
cgroup of the scope or service unit for new processes, and would tidy up
the list of PIDs previously watched. This scanning is relatively slow,
and does not scale well. With this change behaviour is changed: instead
of scanning for new/removed processes right away we do this work in a
per-unit deferred event loop job. This event source is scheduled at a
very low priority, so that it is executed when we have time but does not
starve other event sources. This has two benefits: this expensive work is
coalesced, if events happen in quick succession, and we won't delay
SIGCHLD handling for too long.
This patch basically replaces all direct invocation of
unit_watch_all_pids() in scope.c and service.c with invocations of the
new unit_enqueue_rewatch_pids() call which just enqueues a request of
watching/tidying up the PID sets (with one exception: in
scope_enter_signal() and service_enter_signal() we'll still do
unit_watch_all_pids() synchronously first, since we really want to know
all processes we are about to kill so that we can track them properly.
Moreover, all direct invocations of unit_tidy_watch_pids() and
unit_synthesize_cgroup_empty_event() are removed too, when the
unit_enqueue_rewatch_pids() call is invoked, as the queued job will run
those operations too.
All of this is done on cgroupsv1 systems only, and is disabled on
cgroupsv2 systems as cgroup-empty notifications are reliable there, and
we do not need SIGCHLD events to track processes there.
Fixes: #9138
This way we don't need to repeat the argument twice.
I didn't replace all instances. I think it's better to leave out:
- asserts
- comparisons like x & y == x, which are mathematically equivalent, but
here we aren't checking if flags are set, but if the argument fits in the
flags.
Previously, we'd not care about failures that were seen earlier and
remain in "exited" state. This could be triggered if the main process of
a service failed while ExecStartPost= was still running, as in that case
we'd not immediately act on the main process failure because we needed
to wait for ExecStartPost= to finish, before acting on it.
Fixes: #8929
Let's show a message at the time of logout i.e. entering the "closing"
state, not just e.g. once the user closes `tmux` and the session can be
removed completely. (At least when KillUserProcesses=no applies. My
thinking was we can spare the log noise if we're killing the processes
anyway).
These are two independent events. I think the logout event is quite
significant in the session lifecycle. It will be easier for a user who
does not know logind details to understand why "Removed session" doesn't
appear at logout time, if we have a specific message we can show at this
time :).
Tested using tmux and KillUserProcesses=no. I can also confirm the extra
message doesn't show when using KillUserProcesses=yes. Maybe it looks a
bit mysterious when you use KillOnlyUsers= / KillExcludeUsers=, but
hopefully not alarmingly so.
I was looking at systemd-logind messages on my system, because I can
reproduce two separate problems with Gnome on Fedora 28 where
sessions are unexpectedly in state "closing". (One where a GUI session
limps along in a degraded state[1], and another where spice-vdagent is left
alive after logout, keeping the session around[2]). It logged when
sessions were created and removed, but it didn't log when the session
entered the "closing" state.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1583240#c1
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1583261Closes#9096
The function is similar to path_kill_slashes() but also removes
initial './', trailing '/.', and '/./' in the path.
When the second argument of path_simplify() is false, then it
behaves as the same as path_kill_slashes(). Hence, this also
replaces path_kill_slashes() with path_simplify().
This "netdevsim" as implied by the name is a tool for network developers and is a simulator.
This simulated networking device is used for testing various networking APIs and at this time
is particularly focused on testing hardware offloading related interfaces.
We don't need a temporary variable when parsing just one number, because
our parsing functions do not touch the output variable on error.
TAKE_PTR is more expressive than 'n = NULL'.
First, ellipsize() and ellipsize_mem() should not read past the input
buffer. Those functions take an explicit length for the input data, so they
should not assume that the buffer is terminated by a nul.
Second, ellipsization was off in various cases where wide on multi-byte
characters were used.
We had some basic test for ellipsize(), but apparently it wasn't enough to
catch more serious cases.
Should fix https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=8686.
* it fails with gcc8 when -O1 or -Os is used (and -ftree-vrp which is added by -O2 and higher isn't used)
../git/src/basic/time-util.c: In function 'format_timespan':
../git/src/basic/time-util.c:508:46: error: '%0*llu' directive output between 1 and 2147483647 bytes may cause result to exceed 'INT_MAX' [-Werror=format-truncation=]
"%s"USEC_FMT".%0*"PRI_USEC"%s",
^~~~
../git/src/basic/time-util.c:508:60: note: format string is defined here
"%s"USEC_FMT".%0*"PRI_USEC"%s",
../git/src/basic/time-util.c:508:46: note: directive argument in the range [0, 18446744073709551614]
"%s"USEC_FMT".%0*"PRI_USEC"%s",
^~~~
../git/src/basic/time-util.c:507:37: note: 'snprintf' output 4 or more bytes (assuming 2147483651) into a destination of size 4294967295
k = snprintf(p, l,
^~~~~~~~~~~~~~
"%s"USEC_FMT".%0*"PRI_USEC"%s",
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
p > buf ? " " : "",
~~~~~~~~~~~~~~~~~~~
a,
~~
j,
~~
b,
~~
table[i].suffix);
~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
[zj: change 'char' to 'signed char']
sparc sets the carry bit when a syscall fails. Use this information to
set errno and return -1 as appropriate.
The added test case calls raw_clone() with flags known to be invalid
according to the clone(2) manpage.
We already do that in get_process_cmdline(), which is very similar in
behaviour otherwise. Hence, let's be safe and also filter them in
get_process_comm(). Let's try to retain as much information as we can
though and escape rather than suppress unprintable characters. Let's not
increase comm names beyond the kernel limit on such names however.
Also see discussion about this here:
https://marc.info/?l=linux-api&m=152649570404881&w=2
For short buffer sizes cellescape() was a bit wasteful, as it might
suffice to to drop a single character to find enough place for the full
four byte ellipsis, if that one character was a four character escape.
With this rework we'll guarantee to drop the minimum number of
characters from the end to fit in the ellipsis.
If the buffers we write to are large this doesn't matter much. However,
if they are short (as they are when talking about the process comm
field) then it starts to matter that we put as much information as we
can in the space we get.
This adds a flags parameter to unit_notify() which can be used to pass
additional notification information to the function. We the make the old
reload_failure boolean parameter one of these flags, and then add a new
flag that let's unit_notify() if we are configured to restart the
service.
Note that this adjusts behaviour of systemd to match what the docs say.
Fixes: #8398
'systemctl disable --runtime' would disable a unit, but only if it was enabled
with '--runtime', and silently do nothing if the unit was enabled persistently.
And similarly 'systemctl disable' would do nothing if the unit was enabled in
/run. This just doesn't seem useful.
This pathch changes enable/disable and mask/unmask to be asymmetrical. enable
and mask create symlinks in /etc or /run, depending on whether --runtime was
specified. disable and unmask remove symlinks from both locations. --runtime
cannot be specified for the disable and unmask verbs.
The advantage is that 'disable' now means that the unit is disabled, period.
And similarly for 'unmask', all masks are removed.
Similarly for preset and preset-all, they now cannot be called with --runtime,
and are asymmetrical: when they enable a unit, symlinks are created in /etc.
When they disable a unit, all symlinks are nuked.
$ systemctl --root=/ enable bluetooth
Created symlink /etc/systemd/system/dbus-org.bluez.service → /usr/lib/systemd/system/bluetooth.service.
Created symlink /etc/systemd/system/bluetooth.target.wants/bluetooth.service → /usr/lib/systemd/system/bluetooth.service.
$ systemctl --root=/ --runtime enable bluetooth
Created symlink /run/systemd/system/dbus-org.bluez.service → /usr/lib/systemd/system/bluetooth.service.
Created symlink /run/systemd/system/bluetooth.target.wants/bluetooth.service → /usr/lib/systemd/system/bluetooth.service.
$ systemctl --root=/ disable bluetooth
Removed /run/systemd/system/bluetooth.target.wants/bluetooth.service.
Removed /run/systemd/system/dbus-org.bluez.service.
Removed /etc/systemd/system/bluetooth.target.wants/bluetooth.service.
Removed /etc/systemd/system/dbus-org.bluez.service.
$ systemctl --root=/ disable --runtime bluetooth
--runtime cannot be used with disable
$ systemctl --root=/ mask --runtime bluetooth
Created symlink /run/systemd/system/bluetooth.service → /dev/null.
$ systemctl --root=/ mask bluetooth
Created symlink /etc/systemd/system/bluetooth.service → /dev/null.
$ systemctl --root=/ unmask bluetooth
Removed /run/systemd/system/bluetooth.service.
Removed /etc/systemd/system/bluetooth.service.
$ systemctl --root=/ unmask --runtime bluetooth
--runtime cannot be used with unmask
$ systemctl --root=/ --runtime enable bluetooth
Created symlink /run/systemd/system/dbus-org.bluez.service → /usr/lib/systemd/system/bluetooth.service.
Created symlink /run/systemd/system/bluetooth.target.wants/bluetooth.service → /usr/lib/systemd/system/bluetooth.service.
$ systemctl --root=/ enable bluetooth
Created symlink /etc/systemd/system/dbus-org.bluez.service → /usr/lib/systemd/system/bluetooth.service.
Created symlink /etc/systemd/system/bluetooth.target.wants/bluetooth.service → /usr/lib/systemd/system/bluetooth.service.
$ systemctl --root=/ preset bluetooth
Removed /run/systemd/system/bluetooth.target.wants/bluetooth.service.
Removed /run/systemd/system/dbus-org.bluez.service.
Removed /etc/systemd/system/bluetooth.target.wants/bluetooth.service.
Removed /etc/systemd/system/dbus-org.bluez.service.
$ systemctl --root=/ preset --runtime bluetooth
--runtime cannot be used with preset
$ systemctl preset-all --runtime
--runtime cannot be used with preset-all
UINTN is the integer type equalling the native ptr size. Let's fix the
casting warnings described in #7788 by casting the the pointers and
values to this type first. That way we cast integers to the right size
first before turning them into pointers, and pointers are first
covnerted to integers of the right size before converting them into
integers.
Not tested, since I lack i386 EFI systems, but I think this is simple
enough to be correct event without testing.
Fixes: #7788
Also remove the comma from the comment everywhere, I think the comma
unnecessarilly put emphasis on the clause after the comma.
Fixes#9090.
Reproducer:
systemd-journal-remote --split-mode=none -o /tmp/msg6.journal --trust=all --listen-http=8080
systemd-journal-upload -u http://localhost:8080
journalctl --file /tmp/msg6.journal -o verbose -n1
The boot id is stored twice, and different code paths use either one or the
other. So we need to store it both in the header and as a field for full
compatibility.
journalctl -o short would display those entries, but journalctl -o short-full
would refuse. If the entry is bad, just fall back to the receive-side realtime
timestamp like we would if it was completely missing.
The journal verification functions would reject such an entry. It would probably
still display fine (because we prefer _SOURCE_REALTIME_TIMESTAMP= if present), but
it seems wrong to create an entry that would not pass verification.
If the timestamp is above 9999-12-30, (or 2038-something-something on 32 bit),
use XXXX-XX-XX XX:XX:XX as the replacement.
The problem with refusing to print timestamps is that our code accepts such
timestamps, so we can't really just refuse to process them afterwards. Also, it
makes journal files non-portable, because suddently we might completely refuse
to print entries which are totally OK on a different machine.
This makes the fuzzing much more efficient. Optionally provide output is
$SYSTEMD_FUZZ_OUTPUT is set, which makes debugging of any failures much easier.
The case from 056129deb73df17ece4212db39d2ca0842d9a49c is still detected properly.
The parser never accepted "__"-prefixed fields in binary format, but there was
a comment questioning this decision. Let's make it official, and remove the
comment.
Also, for clarity, let's move the dunder field parsing after the field
verification check. This doesn't change much, because invalid fields cannot be
known special fields, but is seems cleaner to first verify the validity of the
name, and then check if it is one of the known ones.
$ build-asan/fuzz-journal-remote test/fuzz-regressions/fuzz-journal-remote/crash-96dee870ea66d03e89ac321eee28ea63a9b9aa45
...
Ignoring invalid field: "S\020"
Ignoring invalid field: "S\020"
...
If the field name includes nul bytes, we won't print all of the name.
But that seems enough of a corner case to ignore.
We'd look for a '=' separator using memchr, i.e. ignoring any nul bytes in the
string, but then do a strndup, which would terminate on any nul byte, and then
again do a memcmp, which would access memory past the chunk allocated by strndup.
Of course, we probably shouldn't allow keys with nul bytes in them. But we
currently do, so there might be journal files like that out there. So let's fix
the journal-reading code first.
`fuzz-journal-remote` seems to be failing under `msan` as soon as it starts:
$ sudo infra/helper.py run_fuzzer systemd fuzz-journal-remote
Running: docker run --rm -i --privileged -e FUZZING_ENGINE=libfuzzer -v /home/vagrant/oss-fuzz/build/out/systemd:/out -t gcr.io/oss-fuzz-base/base-runner run_fuzzer fuzz-journal-remote
Using seed corpus: fuzz-journal-remote_seed_corpus.zip
/out/fuzz-journal-remote -rss_limit_mb=2048 -timeout=25 /tmp/fuzz-journal-remote_corpus -max_len=65536 < /dev/null
INFO: Seed: 3380449479
INFO: Loaded 2 modules (36336 inline 8-bit counters): 36139 [0x7ff36ea31d39, 0x7ff36ea3aa64), 197 [0x9998c8, 0x99998d),
INFO: Loaded 2 PC tables (36336 PCs): 36139 [0x7ff36ea3aa68,0x7ff36eac7d18), 197 [0x999990,0x99a5e0),
INFO: 2 files found in /tmp/fuzz-journal-remote_corpus
INFO: seed corpus: files: 2 min: 4657b max: 7790b total: 12447b rss: 97Mb
Uninitialized bytes in __interceptor_pwrite64 at offset 24 inside [0x7fffdd4d7230, 240)
==15==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x7ff36e685e8a in journal_file_init_header /work/build/../../src/systemd/src/journal/journal-file.c:436:13
#1 0x7ff36e683a9d in journal_file_open /work/build/../../src/systemd/src/journal/journal-file.c:3333:21
#2 0x7ff36e68b8f6 in journal_file_open_reliably /work/build/../../src/systemd/src/journal/journal-file.c:3520:13
#3 0x4a3f35 in open_output /work/build/../../src/systemd/src/journal-remote/journal-remote.c:70:13
#4 0x4a34d0 in journal_remote_get_writer /work/build/../../src/systemd/src/journal-remote/journal-remote.c:136:21
#5 0x4a550f in get_source_for_fd /work/build/../../src/systemd/src/journal-remote/journal-remote.c:183:13
#6 0x4a46bd in journal_remote_add_source /work/build/../../src/systemd/src/journal-remote/journal-remote.c:235:13
#7 0x4a271c in LLVMFuzzerTestOneInput /work/build/../../src/systemd/src/fuzz/fuzz-journal-remote.c:36:9
#8 0x4f27cc in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:524:13
#9 0x4efa0b in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool*) /src/libfuzzer/FuzzerLoop.cpp:448:3
#10 0x4f8e96 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:732:7
#11 0x4f9f73 in fuzzer::Fuzzer::Loop(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:752:3
#12 0x4bf329 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:756:6
#13 0x4ac391 in main /src/libfuzzer/FuzzerMain.cpp:20:10
#14 0x7ff36d14982f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
#15 0x41f9d8 in _start (/out/fuzz-journal-remote+0x41f9d8)
Uninitialized value was stored to memory at
#0 0x7ff36e61cd41 in sd_id128_randomize /work/build/../../src/systemd/src/libsystemd/sd-id128/sd-id128.c:288:16
#1 0x7ff36e685cec in journal_file_init_header /work/build/../../src/systemd/src/journal/journal-file.c:426:13
#2 0x7ff36e683a9d in journal_file_open /work/build/../../src/systemd/src/journal/journal-file.c:3333:21
#3 0x7ff36e68b8f6 in journal_file_open_reliably /work/build/../../src/systemd/src/journal/journal-file.c:3520:13
#4 0x4a3f35 in open_output /work/build/../../src/systemd/src/journal-remote/journal-remote.c:70:13
#5 0x4a34d0 in journal_remote_get_writer /work/build/../../src/systemd/src/journal-remote/journal-remote.c:136:21
#6 0x4a550f in get_source_for_fd /work/build/../../src/systemd/src/journal-remote/journal-remote.c:183:13
#7 0x4a46bd in journal_remote_add_source /work/build/../../src/systemd/src/journal-remote/journal-remote.c:235:13
#8 0x4a271c in LLVMFuzzerTestOneInput /work/build/../../src/systemd/src/fuzz/fuzz-journal-remote.c:36:9
#9 0x4f27cc in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:524:13
#10 0x4efa0b in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool*) /src/libfuzzer/FuzzerLoop.cpp:448:3
#11 0x4f8e96 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:732:7
#12 0x4f9f73 in fuzzer::Fuzzer::Loop(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:752:3
#13 0x4bf329 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:756:6
#14 0x4ac391 in main /src/libfuzzer/FuzzerMain.cpp:20:10
#15 0x7ff36d14982f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
Uninitialized value was created by an allocation of 't' in the stack frame of function 'sd_id128_randomize'
#0 0x7ff36e61cb00 in sd_id128_randomize /work/build/../../src/systemd/src/libsystemd/sd-id128/sd-id128.c:274
SUMMARY: MemorySanitizer: use-of-uninitialized-value /work/build/../../src/systemd/src/journal/journal-file.c:436:13 in journal_file_init_header
Exiting
MS: 0 ; base unit: 0000000000000000000000000000000000000000
artifact_prefix='./'; Test unit written to ./crash-847911777b3096783f4ee70a69ab6d28380c810b
[vagrant@localhost oss-fuzz]$ sudo infra/helper.py check_build --sanitizer=memory systemd
Running: docker run --rm -i --privileged -e FUZZING_ENGINE=libfuzzer -e SANITIZER=memory -v /home/vagrant/oss-fuzz/build/out/systemd:/out -t gcr.io/oss-fuzz-base/base-runner test_all
INFO: performing bad build checks for /out/fuzz-dhcp-server.
INFO: performing bad build checks for /out/fuzz-journal-remote.
INFO: performing bad build checks for /out/fuzz-unit-file.
INFO: performing bad build checks for /out/fuzz-dns-packet.
4 fuzzers total, 0 seem to be broken (0%).
Check build passed.
It's a false positive which is most likely caused by
https://github.com/google/sanitizers/issues/852. I think it could be got around
by avoiding `getrandom` when the code is compiled with `msan`
We shouldn't just log arbitrary stuff, in particular newlines and control chars
Now:
Unknown dunder line __CURSORFACILITY=6\nSYSLOG_IDENTIFIER=/USR/SBIN/CRON\nMES…, ignoring.
Unknown dunder line __REALTIME_TIME[TAMP=1404101101501874\n__MONOTONIC_TIMEST…, ignoring.
It's not supposed to be the most efficient, but instead fast and simple to use.
I kept the logic in ellipsize_mem() to use unicode ellipsis even in non-unicode
locales. I'm not quite convinced things should be this way, especially that with
this patch it'd actually be simpler to always use "…" in unicode locale and "..."
otherwise, but Lennart wanted it this way for some reason.
Something is wrong with the entry (probably a missing timestamp), so no point
in rotating. But suppress the error in process_source(), so that the processing
of the data stream continues.
Also, just return 0 from writer_write() on success, the only caller doesn't
care.
The code to open journal files seems like the wrong place to enforce this. We
already check during boot and refuse to boot if machine-id is missing, no need
to enforce this here. In particular, it seems better to write logs from
journald even if they are not completely functional rather than refuse to
operate at all, and systemd-journal-remote also writes journal files and may
even be run on a system without systemd at all.
The docker image that oss-fuzz uses has an empty /etc/machine-id. Obviously
this is an error in the docker, but docker is fact of life, and it seems better
for systemd-journal-remote to work in such an incomplete environment.
We currently return -ENOMEDIUM when /etc/machine-id is empty, and -EINVAL when
it is all zeros. But -EINVAL is also used for invalid args. The distinction
between empty and all-zero is not very important, let's use the same return
code.
Also document -ENOENT and -ENOMEDIUM since they can be a bit surprising.