There are more than enough calls doing string manipulations to deserve
its own files, hence do something about it.
This patch also sorts the #include blocks of all files that needed to be
updated, according to the sorting suggestions from CODING_STYLE. Since
pretty much every file needs our string manipulation functions this
effectively means that most files have sorted #include blocks now.
Also touches a few unrelated include files.
As it turns out machine_name_is_valid() does the exact same thing as
hostname_is_valid() these days, as it just invoked that and checked the
name length was < 64. However, hostname_is_valid() checks the length
against HOST_NAME_MAX anyway (which is 64 on Linux), hence any
additional check is redundant.
We hence replace machine_name_is_valid() by a macro that simply maps it
to hostname_is_valid() but sets the allow_trailing_dot parameter to
false. We also move this this call to hostname-util.h, to the same place
as the hostname_is_valid() declaration.
remove_directory will always return 0 so this can never happen.
Besides that, d->path and d are freed so we would end up with
a null pointer dereference anyway.
Commit 668c965af "journal: skipping of exhausted journal files is bad if
direction changed" fixed a correctness issue, but it also significantly
limited the cases where the optimization that skips exhausted journal
files could apply.
As a result, some journalctl queries are much slower in v219 than in v218.
(e.g. queries where a "--since" cutoff should have quickly eliminated
older journal files from consideration, but didn't.)
If already in the initial iteration find_location_with_matches() finds
no entry, the journal file's location is not updated. This is fine,
except that:
- We must update at least f->last_direction. The optimization relies on
it. Let's separate that from journal_file_save_location() and update
it immediately after the direction checks.
- The optimization was conditional on "f->current_offset > 0", but it
would always be 0 in this scenario. This check is unnecessary for the
optimization.
include-what-you-use automatically does this and it makes finding
unnecessary harder to spot. The only content of poll.h is a include
of sys/poll.h so should be harmless.
After all it is now much more like strjoin() than strappend(). At the
same time, add support for NULL sentinels, even if they are normally not
necessary.
If we scale our buffer to be wide enough for the format string, we
should expect that the calculation was correct.
char_array_0() invocations are removed, since snprintf nul-terminates
the output in any case.
A similar wrapper is used for strftime calls, but only in timedatectl.c.
This reverts commit b914ea8d37.
We really need to put a limit on all our resources, everywhere, and in
particular if we operate on external data.
Hence, let's reintroduce the limit, but bump it substantially, so that
it is guaranteed to be higher than any realistic RLIMIT_NOFILE setting.
EOF is meaningless if the direction of iteration changes.
Move the EOF optimization under the direction check.
This fixes test-journal-interleaving for me.
Thanks to Filipe Brandenburger for telling me about the failure.
next_with_matches() is odd in that its "unit64_t *offset" parameter is
both input and output. In other it's purely for output.
The function is called from two places in next_beyond_location(). In
both of them "&cp" is used as the argument and in both cases cp is
guaranteed to equal f->current_offset.
Let's just have next_with_matches() ignore "*offset" on input and
operate with f->current_offset.
I did not investigate why it is, but it makes my usual benchmark run
reproducibly faster:
$ time ./journalctl --since=2014-06-01 --until=2014-07-01 > /dev/null
real 0m4.032s
user 0m3.896s
sys 0m0.135s
(Compare to preceding commit, where real was 4.4s.)
I accidentally broke the detection of duplicate entries in 7943f42275
"journal: optimize iteration by returning previously found candidate
entry".
When we have a known location of a candidate entry, we must not return
from next_beyond_location() immediately. We must go through the
duplicates detection to make sure the candidate differs from the
already iterated entry.
This fix slows down iteration a bit, but it's still faster than it
was before the rework.
$ time ./journalctl --since=2014-06-01 --until=2014-07-01 > /dev/null
real 0m4.448s
user 0m4.298s
sys 0m0.149s
(Compare with results from commit 7943f42275, where real was 5.3s before
the rework.)
In next_beyond_location() when the JournalFile's location type is
LOCATION_SEEK, it means there's nothing to do, because we already have
the location of the candidate entry. Do an early return. Note that now
next_beyond_location() does not anymore guarantee on return that the
entry is mapped, but previous patches made sure the caller does not
care.
This optimization is at least as good as "journal: optimize iteration:
skip files that cannot improve current candidate entry" was.
Timing results on my workstation, using:
$ time ./journalctl -q --since=2014-06-01 --until=2014-07-01 > /dev/null
Before "Revert "journal: optimize iteration: skip files that cannot
improve current candidate entry":
real 0m5.349s
user 0m5.166s
sys 0m0.181s
Now:
real 0m3.901s
user 0m3.724s
sys 0m0.176s
If from a previous iteration we know we are at the end of a journal
file, don't bother looking into the file again. This is complicated by
the fact that the EOF does not have to be permanent (think of
"journalctl -f"). So we also check if the number of entries in the
journal file changed.
This optimization has a similar effect as "journal: optimize iteration:
skip whole files behind current location" had.
set_location() is called from real_journal_next() when a winning entry
has been picked from among the candidates in journal files.
The location type is always set to LOCATION_DISCRETE. No need to pass
it as a parameter.
The per-JournalFile location information is already updated at this
point. No need for having the direction and offset here.
In next_beyond_location() when we find a candidate entry in a journal
file, save its location information in struct JournalFile.
The purpose of remembering the locations of candidate entries is to be
able to save work in the next iteration. This patch does only the
remembering part.
LOCATION_SEEK means the location identifies a candidate entry.
When a winner is picked from among candidates, it becomes
LOCATION_DISCRETE.
LOCATION_TAIL here signifies we've iterated the file to the end (or the
beginning in the case of reversed direction).
Note that numbers 0 and -1 are both replaced with OBJECT_UNUSED,
because they are treated the same everywhere (e.g. type_to_context()
translates them both to 0).
The only user is sd_journal_enumerate_unique() and, as explained in
the previous commit (fed67c38e3 "journal: map objects to context set by
caller, not by actual object type"), the use of them there is now
superfluous. Let's remove them.
This reverts major parts of commits:
ae97089d49 journal: fix access to munmapped memory in
sd_journal_enumerate_unique
06cc69d44c sd-journal: fix sd_journal_enumerate_unique skipping values
Tested with an "--enable-debug" build and "journalctl --list-boots".
It gives the expected number of results. Additionally, if I then revert
the previous commit ("journal: map objects to context set by caller, not
to actual object type"), it crashes with SIGSEGV, as expected.
Suppose that while iterating we have already looked into a journal file
and got a candidate for the next entry. And we are considering to look
into another journal file because it may contain an entry that is nearer
to the current location than the candidate.
We should skip the whole journal file if we can tell by looking at its
header that none of its entries can precede the candidate.
Before:
$ time ./journalctl --since=2014-06-01 --until=2014-07-01 > /dev/null
real 0m20.518s
user 0m19.989s
sys 0m0.328s
After:
$ time ./journalctl --since=2014-06-01 --until=2014-07-01 > /dev/null
real 0m9.445s
user 0m9.228s
sys 0m0.213s
Interleaving of entries from many journal files is expensive. But there
is room for optimization.
We can skip looking into journal files whose entries all lie before the
current iterating location. We can tell if that's the case from looking
at the journal file header. This saves a huge amount of work if one has
many of mostly not interleaved journal files.
On my workstation with 90 journal files in /var/log/journal/ID/
totalling 3.4 GB I get these results:
Before:
$ time ./journalctl --since=2014-06-01 --until=2014-07-01 > /dev/null
real 5m54.258s
user 2m4.263s
sys 3m48.965s
After:
$ time ./journalctl --since=2014-06-01 --until=2014-07-01 > /dev/null
real 0m20.518s
user 0m19.989s
sys 0m0.328s
The high "sys" time in the original was caused by putting more stress on
the mmap-cache than it could handle. With the patch the working set
now consists of fewer mmap windows and mmap-cache is not thrashing.
If the format string contains %m, clearly errno must have a meaningful
value, so we might as well use log_*_errno to have ERRNO= logged.
Using:
find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\((".*%m.*")/log_\1_errno(errno, \2/'
Plus some whitespace, linewrap, and indent adjustments.
As a followup to 086891e5c1 "log: add an "error" parameter to all
low-level logging calls and intrdouce log_error_errno() as log calls
that take error numbers", use sed to convert the simple cases to use
the new macros:
find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\("(.*)%s"(.*), strerror\(-([a-zA-Z_]+)\)\);/log_\1_errno(-\4, "\2%m"\3);/'
Multi-line log_*() invocations are not covered.
And we also should add log_unit_*_errno().
sd_journal_enumerate_unique will lock its mmap window to prevent it
from being released by calling mmap_cache_get with keep_always=true.
This call may return windows that are wider, but compatible with the
parameters provided to it.
This can result in a mismatch where the window to be released cannot
properly be selected, because we have more than one window matching the
parameters of mmap_cache_release. Therefore, introduce a release_cookie
to be used when releasing the window.
https://bugs.freedesktop.org/show_bug.cgi?id=79380
systemctl would call sd_j_enumerate_unique() interleaved with
sd_j_next(). But the latter can remove a file if it detects an
error in it. In those circumstances sd_j_enumerate_unique would
restart with the first file in hashmap. With many corrupted files
sd_j_enumerate_unique might iterate over the list multiple times.
Avoid this by jumping to the next file in unique list if possible,
or setting a flag that tells sd_j_enumerate_unique that it is done
otherwise.
It is redundant to store 'hash' and 'compare' function pointers in
struct Hashmap separately. The functions always comprise a pair.
Store a single pointer to struct hash_ops instead.
systemd keeps hundreds of hashmaps, so this saves a little bit of
memory.
String which ended in an unfinished quote were accepted, potentially
with bad memory accesses.
Reject anything which ends in a unfished quote, or contains
non-whitespace characters right after the closing quote.
_FOREACH_WORD now returns the invalid character in *state. But this return
value is not checked anywhere yet.
Also, make 'word' and 'state' variables const pointers, and rename 'w'
to 'word' in various places. Things are easier to read if the same name
is used consistently.
mbiebl_> am I correct that something like this doesn't work
mbiebl_> ExecStart=/usr/bin/encfs --extpass='/bin/systemd-ask-passwd "Unlock EncFS"'
mbiebl_> systemd seems to strip of the quotes
mbiebl_> systemctl status shows
mbiebl_> ExecStart=/usr/bin/encfs --extpass='/bin/systemd-ask-password Unlock EncFS $RootDir $MountPoint
mbiebl_> which is pretty weird
Add liblz4 as an optional dependency when requested with --enable-lz4,
and use it in preference to liblzma for journal blob and coredump
compression. To retain backwards compatibility, XZ is used to
decompress old blobs.
Things will function correctly only with lz4-119.
Based on the benchmarks found on the web, lz4 seems to be the best
choice for "quick" compressors atm.
For pkg-config status, see http://code.google.com/p/lz4/issues/detail?id=135.
safe_close() automatically becomes a NOP when a negative fd is passed,
and returns -1 unconditionally. This makes it easy to write lines like
this:
fd = safe_close(fd);
Which will close an fd if it is open, and reset the fd variable
correctly.
By making use of this new scheme we can drop a > 200 lines of code that
was required to test for non-negative fds or to reset the closed fd
variable afterwards.
If we encounter an inconsistency in a file, let's just
ignore it. Otherwise, after previous patch, we would try,
and fail, to use this file in every invocation of sd_journal_next
or sd_journal_previous that happens afterwards.
If -flto is used then gcc will generate a lot more warnings than before,
among them a number of use-without-initialization warnings. Most of them
without are false positives, but let's make them go away, because it
doesn't really matter.
sd_j_e_u needs to keep a reference to an object while comparing it
with possibly duplicate objects in other files. Because the size of
mmap cache is limited, with enough files and object to compare to,
at some point the object being compared would be munmapped, resulting
in a segmentation fault.
Fix this issue by turning keep_always into a reference count that can
be increased and decreased. Other callers which set keep_always=true
are unmodified: their references are never released but are ignored
when the whole file is closed, which happens at some point. keep_always
is increased in sd_j_e_u and later on released.
This adds the new library call sd_journal_open_container() and a new
"-M" switch to journalctl. Particular care is taken that journalctl's
"-b" switch resolves to the current boot ID of the container, not the
host.
A few asserts are replaced with 'return -EINVAL'. I think that
assert should not be used to check argument in public functions.
Fields in struct sd_journal are rearranged to make it less
swiss-cheesy.
The order was different in various places, which makes it harder to
read to code. Also consistently use ternany for all direction checks.
Remove one free(NULL).
This allows the caller to explicitly specify which journal files
should be opened. The same functionality could be achieved before
by creating a directory and playing around with symlinks. It
is useful to debug stuff and explore the journal, and has been
requested before.
Waiting is supported, the journal will notice modifications on
the files supplied when opening the journal, but will not add
any new files.
AND term usually don't have many subterms (4 seems to be the maximum
sensible number, e.g. _BOOT_ID && _SYSTEMD_UNIT && _PID && MESSAGE_ID).
Nevertheless, the cost of checking each subterm can be relatively
high, especially when the nested terms are compound, and it
makes sense to minimize the number of checks.
Instead of looping to the end and then again over the whole list once
again after at least one term changed the offset, start the loop at
the term which caused the change. This way ½ terms in the AND match
are not checked unnecessarily again.
This is the just the library part.
SD_JOURNAL_CURRENT_USER flags is added to sd_j_open(), to open
files from current user.
SD_JOURNAL_SYSTEM_ONLY is renamed to SD_JOURNAL_SYSTEM,
and changed to mean to (also) open system files. This way various
flags can be combined, which gives them nicer semantics, especially
if other ones are added later.
Backwards compatibility is kept, because SD_JOURNAL_SYSTEM_ONLY
is equivalent to SD_JOURNAL_SYSTEM if used alone, and before there
we no other flags.
Also reworded a few debug messages for brevity, and added a log
statement which prints out the filter at debug level:
Journal filter: (((UNIT=sys-module-configfs.device AND _PID=1) OR (COREDUMP_UNIT=sys-module-configfs.device AND MESSAGE_ID=fc2e22bc6ee647b6b90729ab34a250b1) OR _SYSTEMD_UNIT=sys-module-configfs.device) AND _BOOT_ID=4e3c518ab0474c12ac8de7896fe6b154)
I'm assuming that it's fine if a _const_ or _pure_ function
calls assert. It is assumed that the assert won't trigger,
and even if it does, it can only trigger on the first call
with a given set of parameters, and we don't care if the
compiler moves the order of calls.
This reverts commit 4826f0b7b5.
Because statfs.t_type can be int on some architecures, we have to cast
the const magic to the type, otherwise the compiler warns about
signed/unsigned comparison, because the magic can be 32 bit unsigned.
statfs(2) man page is also wrong on some systems, because
f_type is not __SWORD_TYPE on some architecures.
The following program:
int main(int argc, char**argv)
{
struct statfs s;
statfs(argv[1], &s);
printf("sizeof(f_type) = %d\n", sizeof(s.f_type));
printf("sizeof(__SWORD_TYPE) = %d\n", sizeof(__SWORD_TYPE));
printf("sizeof(long) = %d\n", sizeof(long));
printf("sizeof(int) = %d\n", sizeof(int));
if (sizeof(s.f_type) == sizeof(int)) {
printf("f_type = 0x%x\n", s.f_type);
} else {
printf("f_type = 0x%lx\n", s.f_type);
}
return 0;
}
executed on s390x gives for a btrfs:
sizeof(f_type) = 4
sizeof(__SWORD_TYPE) = 8
sizeof(long) = 8
sizeof(int) = 4
f_type = 0x9123683e
This reverts commit a858b64ddd.
This reverts commit aea275c431.
This reverts commit fc6e6d245e.
This reverts commit c4073a27c5.
This reverts commit cddf148028.
This reverts commit 8c68a70170.
The constants are now casted to __SWORD_TYPE, which should resolve the
compiler warnings about signed vs unsigned.
After talking to Kay, we concluded:
This should be fixed in the kernel, not worked around in userspace tools.
Architectures cannot use int and expect magic constants lager than INT_MAX
to work correctly. The kernel header needs to be fixed.
Even coreutils cannot handle it:
#define RAMFS_MAGIC 0x858458f6
# stat -f -c%t /
ffffffff858458f6
#define BTRFS_SUPER_MAGIC 0x9123683E
# stat -f -c%t /mnt
ffffffff9123683e
Although I found the perfect working macro to fix the thing :)
__extension__ ({ \
bool _ret = false; \
switch(f) { case c: _ret=true; }; \
( _ret ); \
})
On some architectures (like s390x) the kernel has the type int for
f_type, but long in userspace.
Assigning the 32 bit magic constants from linux/magic.h to the 31 bit
signed f_type in the kernel, causes f_type to be negative for some
constants.
glibc extends the int to long for those architecures in 64 bit mode, so
the negative int becomes a negative long, which cannot be simply
compared to the original magic constant, because the compiler would
automatically cast the constant to long.
To workaround this issue, we also compare to the (int)MAGIC value in a
macro. Of course, we could do #ifdef with the architecure, but it has to
be maintained, and the magic constants are 32 bit anyway.
Someday, when the int is unsigned or long for all architectures, we can
remove this macro again. Until then, keep it as simple as it can be.
Instead of making a type up, just use __SWORD_TYPE, after reading
statfs(2).
Too bad, this does not fix s390x because __SWORD_TYPE is (long int) and
the kernel uses (int) to fill in the field!!!!!!
statfs.f_type is signed but the filesystem magics are unsigned.
Casting the magics to signed will not make the signed.
Problem seen on big-endian 64bit s390x with __fsword_t 8 bytes.
Casting statfs.f_type to unsigned on the other hand will get us what we
need.
https://bugzilla.redhat.com/show_bug.cgi?id=953217
When using "-p" and "-b" in combination with "-u", the output is not
what you would expect. The reason is the sd_journal_add_disjunction()
call in add_matches_for_unit() and add_matches_for_user_unit(), which
adds two ORs without taking the other conditions to every OR.
Adding another level on top with AND and sd_journal_add_conjunction()
solves the problem.
Output before:
$ journalctl -o short-monotonic -ab -p 0 -u sshd.service
-- Reboot --
[ 3.216305] lenovo systemd[1]: Starting OpenSSH server daemon...
-- Reboot --
[ 3.168666] lenovo systemd[1]: Starting OpenSSH server daemon...
[ 3.169639] lenovo systemd[1]: Started OpenSSH server daemon.
[36285.635389] lenovo systemd[1]: Stopped OpenSSH server daemon.
-- Reboot --
[ 10.838657] lenovo systemd[1]: Starting OpenSSH server daemon...
[ 10.913698] lenovo systemd[1]: Started OpenSSH server daemon.
[ 6881.035183] lenovo systemd[1]: Stopped OpenSSH server daemon.
-- Reboot --
[ 6.636228] lenovo systemd[1]: Starting OpenSSH server daemon...
[ 6.662573] lenovo systemd[1]: Started OpenSSH server daemon.
[ 6.681148] lenovo sshd[397]: Server listening on 0.0.0.0 port 22.
[ 6.681379] lenovo sshd[397]: Server listening on :: port 22.
As we see, the output is from _every_ boot and priority 0 is not taken
into account.
Output after patch:
$ journalctl -o short-monotonic -ab -p 0 -u sshd.service
-- Logs begin at Sun 2013-02-24 20:54:44 CET, end at Tue 2013-03-19 14:58:21 CET. --
Increasing the priority:
$ journalctl -o short-monotonic -ab -p 6 -u sshd.service
-- Logs begin at Sun 2013-02-24 20:54:44 CET, end at Tue 2013-03-19 14:59:12 CET. --
[ 6.636228] lenovo systemd[1]: Starting OpenSSH server daemon...
[ 6.662573] lenovo systemd[1]: Started OpenSSH server daemon.
[ 6.681148] lenovo sshd[397]: Server listening on 0.0.0.0 port 22.
[ 6.681379] lenovo sshd[397]: Server listening on :: port 22.
This function should be used when filling in "struct pollfd"'s .events
field for watching the journal. It will always return POLLIN for now,
but we should keep our options open to change this later on.
This mimics libsystemd-bus' sd_bus_get_events() call with the same
purpose.
In order to write tests for the catalog functions, they
are made non-static and start taking a 'database' parameter,
which is the name of a file with the preprocessed catalog
entries.
This makes it possible to make test-catalog part of the
normal test suite, since it now only operates on files
in /tmp.
Some more tests are added.
- Reword messages a bit
- Correct check whether EACCES is in the set of errors
- Don't complain if no journal files are found
- allocate Set object for errors lazily since in the best case we don't
need it at all.
- don't consider it an error if /run/log/journal doesn't exist (because
that's the usual case actually, if storage is enabled)
There are many ways in which we can get those checks wrong, so it is
better to warn and then error out on a real access failure.
The error messages are wrapped to <80 lines, because their primary
use is to be displayed in the terminal, and it is easier to read them
this way. Reading them in the journal can be a bit trickier, but
this is a bug in logs-show.c.
Seems natural to be able to specify relative directory,
e.g. with journalctl -D. And even if, this should be checked
in front-end code, not in the library.
Logs written by journald from the initramfs may be written to a
directory with the name created from a random machine-id. Afterwards,
when the root filesystem has been mounted and machine-id reinitalized,
logs will be written to the directory with a name created from the
proper machine-id. When logs are flushed to /var/log/journal,
everything is copied to one output directory.
When journalctl without '-m' is run after the logs have been flushed
to /var/log/journal, all messages are shown. However, when run while
logs are still in /run/log/journal, those stored under the random
machine-id will not be shown.
Make journalctl behave the same regardless whether persistent storage
has been enabled or not, and slurp all files from /run/log/journal
even without '-m'.
Documentation states that 0 is correct, and all other
similar functions return 0 on success.
Pointed-out-by: Steven Hiscocks <steven-systemd@hiscocks.me.uk>
This introduces a new data threshold setting for sd_journal objects
which controls the maximum size of objects to decompress. This is
relieves the library from having to decompress full data objects even
if a client program is only interested in the initial part of them.
This speeds up "systemd-coredumpctl" drastically when invoked without
parameters.
Some filesystem magics are too big to fit in 31 bits,
and are wrapped to negative. f_type is an int on 32 bits, so
it is signed, and we get a warning on comparison.
The message catalog can be used to attach short help texts to log lines,
keyed by their MESSAGE_ID= fields. This is useful to help the
administrator understand the context and cause of a message, find
possible solutions and find further related documentation.
Since this is keyed off MESSAGE_ID= this will only work for native
journal messages.
The message catalog supports i18n, and is useful to augment english
language system messages with explanations in the local language.
This commit only includes short explanatory messages for a few example
message IDs, we'll add more complete documentation for the relevant
systemd messages later on.
Network file systems generally do not offer inotify() that would work
across the network. We hence cannot rely on inotify() exclusiely in
those case. Provide an API to determine these cases, and suggest doing
manual regular rechecks.
Note that this is not complete yet, as we need to rescan journal dirs on
network file systems explicitly to find new/removed files
The new 'unique' API allows listing all unique field values that a field
specified by a field name can take in all entries of the journal. This
allows answering queries such as "What units logged to the journal?",
"What hosts have logged into the journal?", "Which boot IDs have logged
into the journal?".
Ultimately this allows implementation of tools similar to lastlog based
on journal data.
Note that listing these field values will not work for journal files
created with older journald, as the field values are not indexed in
older files.
The mmap cache doesn't guarantee that we can look at two files at the
same time. Hence make sure to look at the entries to compare one
after the other, instead of at the same time when comparing them, and
reposition the window in between.
instead of having one simple per-file cache implement an more
comprehensive one that works for multiple files and can actually
maintain multiple maps per file and per object type.
This adds forward-secure authentication of journal files. This patch
includes key generation as well as tagging of journal files,
Verification of journal files will be added in a later patch.
The default of 2047 hash table entries turned out to result in way too
many collisions for bigger files, hence scale the hash table size by the
estimated maximum file size.
Previously, when the main data hash table grows too full the performance
simply started to decrease drastically. Instead, now simply rotate to a
new journal file as the hash table gets to full, so that we can start
with a new fresh empty hash table.
we now can take multiple matches, and they will apply as AND if they
apply to different fields and OR if they apply to the same fields. Also,
terms of this kind can be combined with an overreaching OR.
There's now sd_journal_new_directory() for watching specific journal
directories. This is exposed in journalctl -D.
sd_journal_wait() and sd_journal_process() now return whether changes in
the journal are invalidating or just appending.
We now create inotify kernel watches only when we actually need them