* logind: trivial simplification
free_and_strdup() handles NULL arg, so make use of that.
* boot: fix two typos
* pid1: rewrite check in ignore_proc() to not check condition twice
It's harmless, but it seems nicer to evaluate a condition just a single time.
* core/execute: reformat exec_context_named_iofds() for legibility
* core/execute.c: check asprintf return value in the usual fashion
This is unlikely to fail, but we cannot rely on asprintf return value
on failure, so let's just be correct here.
CID #1368227.
* core/timer: use (void)
CID #1368234.
* journal-file: check asprintf return value in the usual fashion
This is unlikely to fail, but we cannot rely on asprintf return value
on failure, so let's just be correct here.
CID #1368236.
* shared/cgroup-show: use (void)
CID #1368243.
* cryptsetup: do not return uninitialized value on error
CID #1368416.
gcc 7 adds -Wimplicit-fallthrough=3 to -Wextra. There are a few ways
we could deal with that. After we take into account the need to stay compatible
with older versions of the compiler (and other compilers), I don't think adding
__attribute__((fallthrough)), even as a macro, is worth the trouble. It sticks
out too much, a comment is just as good. But gcc has some very specific
requiremnts how the comment should look. Adjust it the specific form that it
likes. I don't think the extra stuff we had in those comments was adding much
value.
(Note: the documentation seems to be wrong, and seems to describe a different
pattern from the one that is actually used. I guess either the docs or the code
will have to change before gcc 7 is finalized.)
https://bugzilla.redhat.com/show_bug.cgi?id=1416201
$ journalctl -b
Journal file /var/log/journal/ad18f69b80264b52bb3b766240742383/system@0005467d92e23784-a6571c8b69d09124.journal~ uses an unsupported feature, ignoring file.
Use SYSTEMD_LOG_LEVEL=debug journalctl --file=/var/log/journal/ad18f69b80264b52bb3b766240742383/system@0005467d92e23784-a6571c8b69d09124.journal~ to see the details.
-- No entries --
$ journalctl --file=/var/log/journal/ad18f69b80264b52bb3b766240742383/system@0005467d92e23784-a6571c8b69d09124.journal~
Journal file /var/log/journal/ad18f69b80264b52bb3b766240742383/system@0005467d92e23784-a6571c8b69d09124.journal~ uses incompatible flag lz4-compressed disabled at compilation time.
Failed to open journal file /var/log/journal/ad18f69b80264b52bb3b766240742383/system@0005467d92e23784-a6571c8b69d09124.journal~: Protocol not supported
mmap cache statistics: 0 hit, 1 miss
Failed to open files: Protocol not supported
Never permit that we write to journal files that have newer timestamps than our
local wallclock has. If we'd accept that, then the entries in the file might
end up not being ordered strictly.
Let's refuse this with ETXTBSY, and then immediately rotate to use a new file,
so that each file remains strictly ordered also be wallclock internally.
When iterating through partially synced journal files we need to be prepared
for hitting with invalid entries (specifically: non-initialized). Instead of
generated an error and giving up, let's simply try to preceed with the next one
that is valid (and debug log about this).
This reworks the logic introduced with caeab8f626
to iteration in both directions, and tries to look for valid entries located
after the invalid one. It also extends the behaviour to both iterating through
the global entry array and per-data object entry arrays.
Fixes: #4088
Let's make dissecting of borked journal files more expressive: if we encounter
an object whose first 8 bytes are all zeroes, then let's assume the object was
simply never initialized, and say so.
Previously, this would be detected as "overly short object", which is true too
in a away, but it's a lot more helpful printing different debug options for the
case where the size is not initialized at all and where the size is initialized
to some bogus value.
No function behaviour change, only a different log messages for both cases.
Let's make it easier to figure out when we see an invalid journal file, why we
consider it invalid, and add some minimal debug logging for it.
This log output is normally not seen (after all, this all is library code),
unless debug logging is exlicitly turned on.
Since commit 5996c7c295 (v190 !), the
calculation of the HMAC is broken because the hash for a data object
including a field is done in the wrong order: the field object is
hashed before the data object is.
However during verification, the hash is done in the opposite order as
objects are scanned sequentially.
The only code path which makes a journal durable is via
journal_file_set_offline().
When we perform a rotate the journal's header->state is being set to
STATE_ARCHIVED prior to journal_file_set_offline() being called.
In journal_file_set_offline(), we short-circuit the entire offline when
f->header->state != STATE_ONLINE.
This all results in none of the journal_file_set_offline() fsync() calls
being reached when rotate archives a journal, so archived journals are
never explicitly made durable.
What we do now is instead of setting the f->header->state to
STATE_ARCHIVED directly in journal_file_rotate() prior to
journal_file_close(), we set an archive flag in f->archive for the
journal_file_set_offline() machinery to honor by committing
STATE_ARCHIVED instead of STATE_OFFLINE when set.
Prior to this, rotated journals were never getting fsync() explicitly
performed on them, since journal_file_set_offline() short-circuited.
Obviously this is undesirable, and depends entirely on the underlying
filesystem as to how much durability was achieved when simply closing
the file.
Note that this problem existed prior to the recent asynchronous fsync
changes, but those changes do facilitate our performing this durable
offline on rotate without blocking, regardless of the underlying
filesystem sync-on-close semantics.
Previously, when we used a bisection table for seeking through a corrupted
file, and the end of the bisection table was corrupted we'd most likely fail
the entire seek operation. Improve the situation: if we encounter invalid
entries in a bisection table, linearly go backwards until we find a working
entry again.
When we linearly iterate through a corrupted journal file, and we encounter a
read error, don't consider this fatal, but merely as EOF condition (and log
about it).
Early in journal_file_set_offline() f->header->state is tested to see if
it's != STATE_ONLINE, and since there's no need to do anything if the
journal isn't online, the function simply returned here.
Since moving part of the offlining process to a separate thread, there
are two problems here:
1. We can't simply check f->header->state, because if there is an
offline thread active it may modify f->header->state.
2. Even if the journal is deemed offline, the thread responsible may
still need joining, so a bare return may leak the thread's resources
like its stack.
To address #1, the helper journal_file_is_offlining() is called prior to
accessing f->header->state.
If journal_file_is_offlining() returns true, f->header->state isn't even
checked, because an offlining journal is obviously online, and we'll
just continue with the normal set offline code path.
If journal_file_is_offlining() returns false, then it's safe to check
f->header->state, because the offline_state is beyond the point of
modifying f->header->state, and there's a memory barrier in the helper.
If we find f->header->state is != STATE_ONLINE, then we call the
idempotent journal_file_set_offline_thread_join() on the way out of the
function, to join a potential lingering offline thread.
Show the various timestamps in hexadecimal too. This is useful for matching the
timestamps included in cursor strings (which are encoded in hex, too), with the
references in the journal header.
Also, expose this via the "journalctl --file=-" syntax for STDIN. This feature
remains undocumented though, as it is probably not too useful in real-life as
this still requires fds that support mmaping and seeking, i.e. does not work
for pipes, for which reading from STDIN is most commonly used.
Throughout the tree there's spurious use of spaces separating ++ and --
operators from their respective operands. Make ++ and -- operator
consistent with the majority of existing uses; discard the spaces.
When we rotate journals, we must set offline and close the current one,
but don't generally need to wait for this to complete.
Instead, we'll initiate an asynchronous offline via
journal_file_set_offline(oldfile, false), and add the file to a
per-server set of deferred closes to be closed later when they
won't block.
There's one complication however; journal_file_open() via
journal_file_verify_header() assumes that any writable journal in the
online state is the product of an unclean shutdown or other form of
corruption.
Thus there's a need for journal_file_open() to be aware of deferred
closes and synchronize with their completion when opening preexisting
journals for writing. To facilitate this the deferred closes set is
supplied to the journal_file_open() function where the deferred closes
may be closed synchronously before verifying the header in such
circumstances.
This adds a wait flag to journal_file_set_offline(), when false the offline is
performed asynchronously in a separate thread.
When wait is true, if an asynchronous offline is already in-progress it is
restarted and waited for. Otherwise the offline is performed synchronously
without the use of a thread.
journal_file_set_online() cancels or waits for the asynchronous offline to
complete if in-flight, depending on where in the offline process the thread
happens to be. If the thread is in the fsync() phase, it is cancelled and
waiting is unnecessary. Otherwise, the thread is joined before proceeding.
A new offline_state member is added to JournalFile which is used via
atomic operations for communicating between the offline thread and the
journal_file_set_{offline,online}() functions.
ISO/IEC 9899:1999 §7.21.1/2 says:
Where an argument declared as size_t n specifies the length of the array
for a function, n can have the value zero on a call to that
function. Unless explicitly stated otherwise in the description of a
particular function in this subclause, pointer arguments on such a call
shall still have valid values, as described in 7.1.4.
In base64_append_width memcpy was called as memcpy(x, NULL, 0). GCC 4.9
started making use of this and assumes This worked fine under -O0, but
does something strange under -O3.
This patch fixes a bug in base64_append_width(), fixes a possible bug in
journal_file_append_entry_internal(), and makes use of the new function
to simplify the code in other places.
Remove the check that triggers rotation of the journal file when the arriving log entry had a monotonic timestamp smaller that the previous log entry. This check causes unnecessary rotations when journal-remote was receiving from multiple senders, therefore monotonicity can not be guaranteed. Also, it does not offer any useful functionality for systemd-journald.
This primarily contains some minor coding style fixups for 7a24f3bf2f and earlier changes. Specifically:
* Don't log at log levels above LOG_DEBUG from "library" code like journal-file.c
* Don't negate errno values before passing them to log_debug_errno(), as the call can handle this fine anyway
* Cast some calls we knowingly ignore the return values of to (void)
* Don't clobber function call-by-ref return values on failure
* Don't mix function calls and variable declarations in one line
There's also one more relevant change: when failing to enqueue a journal change fs event, we'll run it immediately.
compress_blob took src, src_size, dst and *dst_size, but dst_size
wasn't used as an input parameter with the size of dst, but only as an
output parameter. dst was implicitly assumed to be at least src_size-1.
This code wasn't *wrong*, because the only real caller in
journal-file.c got it right. But it was misleading, and the tests in
test-compress.c got it wrong, and worked only because the output
buffer happened to be the same size as input buffer. So add a seperate
dst_allocated_size parameter to make it explicit what the size of the
buffer is, and to allow test to proceed with different output buffer
sizes.
For low end embedded systems 4 MiB for each journal file is a lot of
memory. Journald will use at least 512 KiB even if JOURNAL_FILE_SIZE_MIN is
set to less than that so just use 512 KiB.
When reading stuff, we should only return EIO when an actual read error
occured, not when we don't like the data for whatever reason.
We already return ENODATA for all other kinds of file truncation, hence
do the same for the most obvious kind, so that callers know what ENODATA
means.
There are more than enough calls doing string manipulations to deserve
its own files, hence do something about it.
This patch also sorts the #include blocks of all files that needed to be
updated, according to the sorting suggestions from CODING_STYLE. Since
pretty much every file needs our string manipulation functions this
effectively means that most files have sorted #include blocks now.
Also touches a few unrelated include files.
Let's make sure we handle compression errors properly, and don't
misunderstand an error for success.
Also, let's actually compress things if lz4 is enabled.
Fixes#1662.
Implement a maximum limit on number of journal files to keep around.
Enforcing a limit is useful on this since our performance when viewing
pays a heavy penalty for each journal file to interleve. This setting is
turned on now by default, and set to 100.
Also, actully implement what 348ced9097
promised: use whatever we find on disk at startup as lower bound on how
much disk space we can use. That commit introduced some provisions to
implement this, but actually never did.
This also adds "journalctl --vacuum-files=" to vacuum files on disk by
their number explicitly.
off_t is a really weird type as it is usually 64bit these days (at least
in sane programs), but could theoretically be 32bit. We don't support
off_t as 32bit builds though, but still constantly deal with safely
converting from off_t to other types and back for no point.
Hence, never use the type anymore. Always use uint64_t instead. This has
various benefits, including that we can expose these values directly as
D-Bus properties, and also that the values parse the same in all cases.
When a new journal file is created we write the header first, then sync
and only then create the data and field hash tables in them. That means
to other processes it might appear that the files have a valid header
but not data and field hash tables. Our reader code should be able to
deal with this.
With this change we'll not map the two hash tables right-away after
opening a file for reading anymore (because that will of course fail if
the objects are missing), but delay this until the first time we access
them. On top of that, when we want to look something up in the hash
tables and we notice they aren't initialized yet, we consider them
empty.
This improves handling of some journal files reported in #487.
This way users have the freedom to set or unset the FS_NOCOW_FL flag on
their journal files by setting it on the journal directory. Since our
default tmpfiles configuration now sets this flag on the directory the
flag is set by default on new files, however people can opt-out of this
by masking the tmpfiles file for it.
like:
src/shared/install.c: In function ‘unit_file_lookup_state’:
src/shared/install.c:1861:16: warning: ‘r’ may be used uninitialized in
this function [-Wmaybe-uninitialized]
return r < 0 ? r : state;
^
src/shared/install.c:1796:13: note: ‘r’ was declared here
int r;
^
Logs constantly show
systemd-journald[395]: Failed to set file attributes: Inappropriate ioctl for device
This is because ext4 does not support FS_NOCOW_FL.
[zj: fold into one conditional as suggested on the ML and
fix (preexisting) r/errno confusion in error message.]
Commit 668c965af "journal: skipping of exhausted journal files is bad if
direction changed" fixed a correctness issue, but it also significantly
limited the cases where the optimization that skips exhausted journal
files could apply.
As a result, some journalctl queries are much slower in v219 than in v218.
(e.g. queries where a "--since" cutoff should have quickly eliminated
older journal files from consideration, but didn't.)
If already in the initial iteration find_location_with_matches() finds
no entry, the journal file's location is not updated. This is fine,
except that:
- We must update at least f->last_direction. The optimization relies on
it. Let's separate that from journal_file_save_location() and update
it immediately after the direction checks.
- The optimization was conditional on "f->current_offset > 0", but it
would always be 0 in this scenario. This check is unnecessary for the
optimization.
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
Leave it to the compiler to figure out whether it shall inline stuff or
not.
Only place where using static inline is OK to use is in in header
files, really.
btrfs' COW logic results in heavily fragment journal files, which is
detrimental for perfomance. Hence, turn off COW for journal files as we
create them.
Turning off COW comes at the cost of data integrity guarantees, but this
should be acceptable, given that we do our own checksumming, and
generally have a pretty conservative write pattern.
Also see discussion on linux-btrfs:
http://www.spinics.net/lists/linux-btrfs/msg41001.html
Our write pattern is quite awful for CoW file systems (btrfs...), as we
keep updating file parts in the beginning of the file. This results in
fragmented journal files. Hence: when rotating files, defragment them,
since at that point we know that no further write accesses will be made.
Since the file headers might be replaced by zeroed pages now due to
sigbus we should make sure we don't end up dividing by zero because we
don't check values read from journal file headers for changes.
Even though we use fallocate() it appears that file systems like btrfs
will trigger SIGBUS on certain low-disk-space situation. We should
handle that, hence catch the signal, add it to a list of invalidated
pages, and replace the page with an empty memory area. After each write
check if SIGBUS was triggered, and consider the write invalid if it was.
This should make journald a lot more robust with file systems where
fallocate() is not reliable, for example all CoW file systems
(btrfs...), where changing written data can fail with disk full errors.
https://bugzilla.redhat.com/show_bug.cgi?id=1045810
In next_beyond_location() when we find a candidate entry in a journal
file, save its location information in struct JournalFile.
The purpose of remembering the locations of candidate entries is to be
able to save work in the next iteration. This patch does only the
remembering part.
LOCATION_SEEK means the location identifies a candidate entry.
When a winner is picked from among candidates, it becomes
LOCATION_DISCRETE.
LOCATION_TAIL here signifies we've iterated the file to the end (or the
beginning in the case of reversed direction).
try_context() is such a hot path that the hashmap lookup is expensive.
The number of contexts is small - it is the number of object types.
Using a hashmap is overkill. A plain array will do.
Before:
$ time ./journalctl --since=2014-06-01 --until=2014-07-01 > /dev/null
real 0m9.445s
user 0m9.228s
sys 0m0.213s
After:
$ time ./journalctl --since=2014-06-01 --until=2014-07-01 > /dev/null
real 0m5.438s
user 0m5.266s
sys 0m0.170s
Note that numbers 0 and -1 are both replaced with OBJECT_UNUSED,
because they are treated the same everywhere (e.g. type_to_context()
translates them both to 0).