If a file was opened for writing, and then closed immediately without
actually writing any entries, on subsequent opening, it would be
considered "corrupted". This should be totally fine, and even in
read mode, an empty file can become non-empty later on.
Add liblz4 as an optional dependency when requested with --enable-lz4,
and use it in preference to liblzma for journal blob and coredump
compression. To retain backwards compatibility, XZ is used to
decompress old blobs.
Things will function correctly only with lz4-119.
Based on the benchmarks found on the web, lz4 seems to be the best
choice for "quick" compressors atm.
For pkg-config status, see http://code.google.com/p/lz4/issues/detail?id=135.
safe_close() automatically becomes a NOP when a negative fd is passed,
and returns -1 unconditionally. This makes it easy to write lines like
this:
fd = safe_close(fd);
Which will close an fd if it is open, and reset the fd variable
correctly.
By making use of this new scheme we can drop a > 200 lines of code that
was required to test for non-negative fds or to reset the closed fd
variable afterwards.
With a corrupted file, we can get in a situation where two entries
in the entry array point to the same object. Then journal_file_next_entry
will find the first one using generic_arrray_bisect, and try to move to
the second one, but since the address is the same, generic_array_get will
return the first one. journal_file_next_entry ends up in an infinite loop.
https://bugzilla.redhat.com/show_bug.cgi?id=1047039
gcc (4.8.2, arm) does not understand that journal_file_append_field()
will always set 'fo' when it returns 0, so this warning is bogus.
Anyway, fix it by initialiting fo = NULL.
In trying to track down a stupid linker bug, I noticed a bunch of
memset() calls that should be using memzero() to make it more "obvious"
that the options are correct (i.e. 0 is not the length, but the data to
set). So fix up all current calls to memset(foo, 0, length) to
memzero(foo, length).
sd_j_e_u needs to keep a reference to an object while comparing it
with possibly duplicate objects in other files. Because the size of
mmap cache is limited, with enough files and object to compare to,
at some point the object being compared would be munmapped, resulting
in a segmentation fault.
Fix this issue by turning keep_always into a reference count that can
be increased and decreased. Other callers which set keep_always=true
are unmodified: their references are never released but are ignored
when the whole file is closed, which happens at some point. keep_always
is increased in sd_j_e_u and later on released.
While all the libc implementations I know return NULL when memchr's size
parameter is 0, without accessing any memory, passing NULL to memchr is
still invalid:
C11 7.24.1p2: Where an argument declared as "size_t n" specifies the length
of the array for a function, n can have the value zero on a call to that
function. Unless explicitly stated otherwise in the description of a
particular function in this subclause, pointer arguments on such a call
shall still have valid values, as described in 7.1.4. On such a call, a
function that locates a character finds no occurrence, a function that
compares two character sequences returns zero, and a function that copies
characters copies zero characters.
see http://llvm.org/bugs/show_bug.cgi?id=18247
let's just do a single fallocate() as far as possible, and don't
distuingish between allocated space and file size.
This way we can save a syscall for each append, which makes quite some
benefits.
This extends 62678ded 'efi: never call qsort on potentially
NULL arrays' to all other places where qsort is used and it
is not obvious that the count is non-zero.
Bump the minimal size of the journal so that we can be sure creating the
journal file will always succeed. Previously the minimum size was
smaller than a empty jounral file...
I'm assuming that it's fine if a _const_ or _pure_ function
calls assert. It is assumed that the assert won't trigger,
and even if it does, it can only trigger on the first call
with a given set of parameters, and we don't care if the
compiler moves the order of calls.
code in src/shared/macro.h only defined MAX/MIN in case
they were not defined previously. however the MAX/MIN
macros implemented in glibc are not of the "safe" kind but defined
as:
define MIN(a,b) (((a)<(b))?(a):(b))
define MAX(a,b) (((a)>(b))?(a):(b))
Avoid nasty side effects by using our own versions instead.
Also fix the warnings derived from this change.
[zj: - modify MAX3 macro to fix warning about _a shadowing _a,
- do bootchart/svg.c too,
- remove unused MIN3.]
Add option to force journal sync with fsync. Default timeout is 5min.
Interval configured via SyncIntervalSec option at journal.conf. Synced
journal files will be marked as OFFLINE.
Manual sync can be performed via sending SIGUSR1.
This introduces a new data threshold setting for sd_journal objects
which controls the maximum size of objects to decompress. This is
relieves the library from having to decompress full data objects even
if a client program is only interested in the initial part of them.
This speeds up "systemd-coredumpctl" drastically when invoked without
parameters.
When traversing entry array chains for a bisection or for retrieving an
item by index we previously always started at the beginning of the
chain. Since we tend to look at the same chains repeatedly, let's cache
where we have been the last time, and maybe we can skip ahead with this
the next time.
This turns most bisections and index lookups from O(log(n)*log(n)) into
O(log(n)). More importantly however, we seek around on disk much less,
which is good to reduce buffer cache and seek times on rotational disks.
The new 'unique' API allows listing all unique field values that a field
specified by a field name can take in all entries of the journal. This
allows answering queries such as "What units logged to the journal?",
"What hosts have logged into the journal?", "Which boot IDs have logged
into the journal?".
Ultimately this allows implementation of tools similar to lastlog based
on journal data.
Note that listing these field values will not work for journal files
created with older journald, as the field values are not indexed in
older files.
Let's clean up our terminology a bit. New terminology:
FSS = Forward Secure Sealing
FSPRG = Forward Secure Pseudo-Random Generator
FSS is the combination of FSPRG and a HMAC.
Sealing = process of adding authentication tags to the journal.
Verification = process of checking authentication tags to the journal.
Sealing Key = The key used for adding authentication tags to the journal.
Verification Key = The key used for checking authentication tags of the journal.
Key pair = The pair of Sealing Key and Verification Key
Internally, the Sealing Key is the combination of the FSPRG State plus
change interval/start time.
Internally, the Verification Key is the combination of the FSPRG Seed
plus change interval/start time.
instead of having one simple per-file cache implement an more
comprehensive one that works for multiple files and can actually
maintain multiple maps per file and per object type.
This adds forward-secure authentication of journal files. This patch
includes key generation as well as tagging of journal files,
Verification of journal files will be added in a later patch.
The default of 2047 hash table entries turned out to result in way too
many collisions for bigger files, hence scale the hash table size by the
estimated maximum file size.
Previously, when the main data hash table grows too full the performance
simply started to decrease drastically. Instead, now simply rotate to a
new journal file as the hash table gets to full, so that we can start
with a new fresh empty hash table.
After all the point of the realtime clock (in contrast to the monotonic
clock) is that it does not have to be strictly monotonic, hence don't
enforce this when flushing the journal from /run to /var.
we now can take multiple matches, and they will apply as AND if they
apply to different fields and OR if they apply to the same fields. Also,
terms of this kind can be combined with an overreaching OR.
According to the man pages of posix_fallocate, it returns zero on
success or an error number on failure; however, errno is not set
on failure. If the kernel or a library other than glibc does not
support the function for example, EOPNOTSUPP will be returned and
the error will not be handled properly with original code.
We finally got the OK from all contributors with non-trivial commits to
relicense systemd from GPL2+ to LGPL2.1+.
Some udev bits continue to be GPL2+ for now, but we are looking into
relicensing them too, to allow free copy/paste of all code within
systemd.
The bits that used to be MIT continue to be MIT.
The big benefit of the relicensing is that closed source code may now
link against libsystemd-login.so and friends.
Apparently the perfomance price for compression is to steep to apply it
for all objects >= 64 and < 512 in size, as measured by Arjan Van De
Ven, hence increase the threshold to 512 which yields better results.