log.h really should only include the bare minimum of other headers, as
it is really pulled into pretty much everything else and already in
itself one of the most basic pieces of code we have.
Let's hence drop inclusion of:
1. sd-id128.h because it's entirely unneeded in current log.h
2. errno.h, dito.
3. sys/signalfd.h which we can replace by a simple struct forward
declaration
4. process-util.h which was needed for getpid_cached() which we now hide
in a funciton log_emergency_level() instead, which nicely abstracts
the details away.
5. sys/socket.h which was needed for struct iovec, but a simple struct
forward declaration suffices for that too.
Ultimately this actually makes our source tree larger (since users of
the functionality above must now include it themselves, log.h won't do
that for them), but I think it helps to untangle our web of includes a
tiny bit.
(Background: I'd like to isolate the generic bits of src/basic/ enough
so that we can do a git submodule import into casync for it)
We use the same check at two places, let's add a tiny helper function
for it, since it's not entirely trivialy, and we changes this before
multiple times, and it's a good thing if we can change it at one place
only instead of multiple.
This ensures that in all threads we fork off in the background in our
code we mask out all signals, so that our thread won't end up getting
signals delivered the main process should be getting.
We always set the signal mask before forking off the thread, so that the
thread has the right mask set from its earliest existance on.
Instead of compiling those files twice, once for libsystemd and once for
libshared, compile once as a static archive and then link into both.
This reduce the meson target for man=no compile to 1291.
gcrypt_util_sources had to be moved because otherwise they appeared twice
in libshared.so halfproducts, causing an error.
-fvisibility=default is added to libbasic, libshared_static so that the symbols
appear properly in the exported symbol list in libshared.
The advantage is that files are not compiled twice. When configured with -Dman=false,
the ninja target list is reduced from 1588 to 1347 targets. The difference in compilation
time is small (<10%). I think this is because of -O0 and ccache and multiple cores, and
in different settings the compilation time could be reduced. The main advantage is that
errors and warnings are not reported twice.
This makes things a bit easier to read I think, and also makes sure we
always use the _unlikely_ wrapper around it, which so far we used
sometimes and other times we didn't. Let's clean that up.
Let's replace usage of fputc_unlocked() and friends by __fsetlocking(f,
FSETLOCKING_BYCALLER). This turns off locking for the entire FILE*,
instead of doing individual per-call decision whether to use normal
calls or _unlocked() calls.
This has various benefits:
1. It's easier to read and easier not to forget
2. It's more comprehensive, as fprintf() and friends are covered too
(as these functions have no _unlocked() counterpart)
3. Philosophically, it's a bit more correct, because it's more a
property of the file handle really whether we ever pass it on to another
thread, not of the operations we then apply to it.
This patch reworks all pieces of codes that so far used fxyz_unlocked()
calls to use __fsetlocking() instead. It also reworks all places that
use open_memstream(), i.e. use stdio FILE* for string manipulations.
Note that this in some way a revert of 4b61c87511.
The "nobody" user might possibly be seen by the journal or coredumping
code if unmapped userns-using processes are somehow visible to them.
Let's make sure we don't do the ACL magic for this user either, since
this is a special system user that might be backed by different real
users in different contexts.
This adds uid_is_system() and gid_is_system(), similar in style to
uid_is_dynamic(). That a helper like this is useful is illustrated by
the fact that test-condition.c didn't get the check right so far, which
this patch fixes.
N_IOVEC_OBJECT_FIELDS is bumped 14 → 18 (see dispatch_message_real() and
count!)
N_IOVEC_PAYLOAD_FIELDS is bumped 15 → 16 (see
server_space_usage_message() and count!)
Also, add comments, to make clear what is what.
Coverity says that's undefined. I'm pretty sure we always would get a nan, but
let's avoid (formally) undefined behaviour since that can cause compilers to do
strange things.
So far I avoided adding license headers to meson files, but they are pretty
big and important and should carry license headers like everything else.
I added my own copyright, even though other people modified those files too.
But this is mostly symbolic, so I hope that's OK.
And let's make use of it to implement two new unit settings with it:
1. LogLevelMax= is a new per-unit setting that may be used to configure
log priority filtering: set it to LogLevelMax=notice and only
messages of level "notice" and lower (i.e. more important) will be
processed, all others are dropped.
2. LogExtraFields= is a new per-unit setting for configuring per-unit
journal fields, that are implicitly included in every log record
generated by the unit's processes. It takes field/value pairs in the
form of FOO=BAR.
Also, related to this, one exisiting unit setting is ported to this new
facility:
3. The invocation ID is now pulled from /run/systemd/units/ instead of
cgroupfs xattrs. This substantially relaxes requirements of systemd
on the kernel version and the privileges it runs with (specifically,
cgroupfs xattrs are not available in containers, since they are
stored in kernel memory, and hence are unsafe to permit to lesser
privileged code).
/run/systemd/units/ is a new directory, which contains a number of files
and symlinks encoding the above information. PID 1 creates and manages
these files, and journald reads them from there.
Note that this is supposed to be a direct path between PID 1 and the
journal only, due to the special runtime environment the journal runs
in. Normally, today we shouldn't introduce new interfaces that (mis-)use
a file system as IPC framework, and instead just an IPC system, but this
is very hard to do between the journal and PID 1, as long as the IPC
system is a subject PID 1 manages, and itself a client to the journal.
This patch cleans up a couple of types used in journal code:
specifically we switch to size_t for a couple of memory-sizing values,
as size_t is the right choice for everything that is memory.
Fixes: #4089Fixes: #3041Fixes: #4441
When we drop messages of a unit, we log about. Let's add some structured
data to that. Let's include how many messages we dropped, but more
importantly, let's link up the message we generate to the unit we
dropped the messages from by using the "OBJECT" logic, i.e. by
generating OBJECT_SYSTEMD_UNIT= fields and suchlike, that "journalctl
-u" and friends already look for.
Fixes: #6494
clang warns about a few sites like this:
../src/journal/journal-file.c:1780:48: warning: taking address of packed member 'entry_offset' of class or structure 'DataObject' may result in an unaligned pointer value [-Waddress-of-packed-member]
&o->data.entry_offset,
^~~~~~~~~~~~~~~~~~~~
but DataObject.entry_offset will always be 8-byte aligned as long as
the DataObject structure is aligned. Similarly in other cases, the
field is always aligned. Let's just silence the warning to avoid noise.
gcc does not know -Waddress-of-packed-member, and would warn about an unknown
warning, so we need to conditionalize on __clang__.
../src/journal/journald-native.c:341:13: warning: variable 'context' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
if (ucred && pid_is_valid(ucred->pid)) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/journal/journald-native.c:350:42: note: uninitialized use occurs here
context, ucred, tv, label, label_len);
^~~~~~~
../src/journal/journald-native.c:335:31: note: initialize the variable 'context' to silence this warning
ClientContext *context;
^
= NULL
Very nice reporting!
Functions that we call can handle context == NULL, so it's enough to simply
initialize the variable.
This option allows restricting the shown fields in the output modes that
would normally show all fields. It allows clients that are only
interested in a subset of the fields to access those more efficiently.
Also, it makes the resulting size of the output more predictable.
It has no effect on the various `short` output modes, because those
already only show a subset of the fields.
The advantage is that is the name is mispellt, cpp will warn us.
$ git grep -Ee "conf.set\('(HAVE|ENABLE)_" -l|xargs sed -r -i "s/conf.set\('(HAVE|ENABLE)_/conf.set10('\1_/"
$ git grep -Ee '#ifn?def (HAVE|ENABLE)' -l|xargs sed -r -i 's/#ifdef (HAVE|ENABLE)/#if \1/; s/#ifndef (HAVE|ENABLE)/#if ! \1/;'
$ git grep -Ee 'if.*defined\(HAVE' -l|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_]*)\)/\1/g'
$ git grep -Ee 'if.*defined\(ENABLE' -l|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_]*)\)/\1/g'
+ manual changes to meson.build
squash! build-sys: use #if Y instead of #ifdef Y everywhere
v2:
- fix incorrect setting of HAVE_LIBIDN2
This adds IOVEC_INIT() and IOVEC_MAKE() for initializing iovec structures
from a pointer and a size. On top of these IOVEC_INIT_STRING() and
IOVEC_MAKE_STRING() are added which take a string and automatically
determine the size of the string using strlen().
This patch removes the old IOVEC_SET_STRING() macro, given that
IOVEC_MAKE_STRING() is now useful for similar purposes. Note that the
old IOVEC_SET_STRING() invocations were two characters shorter than the
new ones using IOVEC_MAKE_STRING(), but I think the new syntax is more
readable and more generic as it simply resolves to a C99 literal
structure initialization. Moreover, we can use very similar syntax now
for initializing strings and pointer+size iovec entries. We canalso use
the new macros to initialize function parameters on-the-fly or array
definitions. And given that we shouldn't have so many ways to do the
same stuff, let's just settle on the new macros.
(This also converts some code to use _cleanup_ where dynamically
allocated strings were using IOVEC_SET_STRING() before, to modernize
things a bit)
This adds a new setting LineMax= to journald.conf, and sets it by
default to 48K. When we convert stream-based stdout/stderr logging into
record-based log entries, read up to the specified amount of bytes
before forcing a line-break.
This also makes three related changes:
- When a NUL byte is read we'll not recognize this as alternative line
break, instead of silently dropping everything after it. (see #4863)
- The reason for a line-break is now encoded in the log record, if it
wasn't a plain newline. Specifically, we distuingish "nul",
"line-max" and "eof", for line breaks due to NUL byte, due to the
maximum line length as configured with LineMax= or due to end of
stream. This data is stored in the new implicit _LINE_BREAK= field.
It's not synthesized for plain \n line breaks.
- A randomized 128bit ID is assigned to each log stream.
With these three changes in place it's (mostly) possible to reconstruct
the original byte streams from log data, as (most) of the context of
the conversion from the byte stream to log records is saved now. (So,
the only bits we still drop are empty lines. Which might be something to
look into in a future change, and which is outside of the scope of this
work)
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=86465
See: #4863
Replaces: #4875
Introduce journal_file_check_object(), which does lightweight object
sanity checks, and use it in journal_file_move_to_object(), so that we
will catch certain corrupted objects in the journal file.
This fixes#6447, where we had only partially written out OBJECT_ENTRY
(ObjectHeader written, but rest of object zero bytes), causing
"journalctl --list-boots" to fail.
$ builddir.vanilla/journalctl --list-boots -D bug6447/
Failed to determine boots: No data available
$ builddir.patched/journalctl --list-boots -D bug6447/
-52 22633da1c5374a728d6c215e2c301dc2 Mon 2017-07-10 05:29:21 EEST—Mon 2017-07-10 05:31:51 EEST
-51 2253aab9ea7e4a2598f2abda82939eff Mon 2017-07-10 05:32:22 EEST—Mon 2017-07-10 05:36:49 EEST
-50 ef0d85d35c74486fa4104f9d6391b6ba Mon 2017-07-10 05:40:33 EEST—Mon 2017-07-10 05:40:40 EEST
[...]
Note that journal_file_check_object() is similar to
journal_file_object_verify(). The most expensive checks are omitted, as
they would slow down every journal_file_move_to_object() call too much.
With this implementation, the added overhead is small, for example when
dumping some journal content to /dev/null
(built with -Dbuildtype=debugoptimized -Db_ndebug=true):
Performance counter stats for 'builddir.vanilla/journalctl -D 76f4d4c3406945f9a60d3ca8763aa754/':
12542,311634 task-clock:u (msec) # 1,000 CPUs utilized
0 context-switches:u # 0,000 K/sec
0 cpu-migrations:u # 0,000 K/sec
80 100 page-faults:u # 0,006 M/sec
41 786 963 456 cycles:u # 3,332 GHz
105 453 864 770 instructions:u # 2,52 insn per cycle
24 342 227 334 branches:u # 1940,809 M/sec
105 709 217 branch-misses:u # 0,43% of all branches
12,545199291 seconds time elapsed
Performance counter stats for 'builddir.patched/journalctl -D 76f4d4c3406945f9a60d3ca8763aa754/':
12734,723233 task-clock:u (msec) # 1,000 CPUs utilized
0 context-switches:u # 0,000 K/sec
0 cpu-migrations:u # 0,000 K/sec
80 693 page-faults:u # 0,006 M/sec
42 661 017 429 cycles:u # 3,350 GHz
107 696 985 865 instructions:u # 2,52 insn per cycle
24 950 526 745 branches:u # 1959,252 M/sec
101 762 806 branch-misses:u # 0,41% of all branches
12,737527327 seconds time elapsed
Fixes#6447.
'journalctl --vacuum-*' does not suppress output message with --quiet.
Let journal_directory_vacuum honors --quiet to fix the problem.
BugLink: https://bugs.launchpad.net/bugs/1692188
Cache client metadata, in order to be improve runtime behaviour under
pressure.
This is inspired by @vcaputo's work, specifically:
https://github.com/systemd/systemd/pull/2280
That code implements related but different semantics.
For a longer explanation what this change implements please have a look
at the long source comment this patch adds to journald-context.c.
After this commit:
# time bash -c 'dd bs=$((1024*1024)) count=$((1*1024)) if=/dev/urandom | systemd-cat'
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.2783 s, 95.2 MB/s
real 0m11.283s
user 0m0.007s
sys 0m6.216s
Before this commit:
# time bash -c 'dd bs=$((1024*1024)) count=$((1*1024)) if=/dev/urandom | systemd-cat'
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 52.0788 s, 20.6 MB/s
real 0m52.099s
user 0m0.014s
sys 0m7.170s
As side effect, this corrects the journal's rate limiter feature: we now
always use the unit name as key for the ratelimiter.
Let's be a bit stricter in what we end up logging: ignore invalid unit
name specifications. Let's validate all input!
As we ignore unit names passed in from unprivileged clients anyway the
effect of this additional check is minimal.
(Also, no need to initialize the identifier/unit_id fields of stream
objects to NULL if empty strings are passed, the default is NULL
anyway...)
As a follow-up for db3f45e2d2 let's do the
same for all other cases where we create a FILE* with local scope and
know that no other threads hence can have access to it.
For most cases this shouldn't change much really, but this should speed
dbus introspection and calender time formatting up a bit.
This moves pretty much all uses of getpid() over to getpid_raw(). I
didn't specifically check whether the optimization is worth it for each
replacement, but in order to keep things simple and systematic I
switched over everything at once.
Introduces window_matches_fd() for the fd matching case in try_context(),
In find_mmap() we're already walking a list of windows by fd, checking
this is pointless work in a potentially hot loop with many windows.
Instead of context_detach_window() and a manual attach of the new
window, simply call context_attach_window() which performs the
detach first if appropriate.
journal_file_move_to_object() can skip the second
journal_file_move_to() call if the first one already mapped a
sufficiently large area.
Now that mmap_cache_get() returns the size of the mapped area
when asked, ask for the size and only perform the second call if
the required size exceeds the mapped size instead of the object
header size.
This results in a nice performance boost in my testing, even with
a corpus of many small logs burning much CPU time elsewhere:
Before:
# time ./journalctl -b -1 --no-pager > /dev/null
real 0m16.330s
user 0m16.281s
sys 0m0.046s
# time ./journalctl -b -1 --no-pager > /dev/null
real 0m16.409s
user 0m16.358s
sys 0m0.048s
# time ./journalctl -b -1 --no-pager > /dev/null
real 0m16.625s
user 0m16.558s
sys 0m0.061s
After:
# time ./journalctl -b -1 --no-pager > /dev/null
real 0m15.311s
user 0m15.257s
sys 0m0.046s
# time ./journalctl -b -1 --no-pager > /dev/null
real 0m15.201s
user 0m15.135s
sys 0m0.062s
# time ./journalctl -b -1 --no-pager > /dev/null
real 0m15.170s
user 0m15.113s
sys 0m0.053s
If requested, return the actual mapping size to the caller in
addition to the address.
journal_file_move_to_object() often performs two successive
mmap_cache_get() calls via journal_file_move_to(); one to get the
object header, then another to get the entire object when it's
larger than the header's size.
If mmap_cache_get() returned the actual mapping's size, it's
probable that the second mmap_cache_get() could be skipped when
the established mapping already encompassed the desired size.
This way we have a MMapFileDescriptor reference external to the cache,
and can supply the handle directly to mmap_cache_get(), eliminating
hashmap lookups entirely from the hot path.
e268b81e moved an fflush() from output_json() to the generic
output_journal(), when it probably should have deleted all fflush()
calls from logs-show.c altogether.
The caller supplies the FILE * to these functions, and should be in
charge of flushing as needed. The current implementation essentially
defeats any buffering stdio was bringing to the table, resulting in
extraneous tiny write() calls in commands like `journalctl -b`.
This commit removes the fflush() call from output_journal(), and adds
them to journalctl before waiting for more entries and at completion.
This way in the hot path when journalctl loops on entries stdio can
combine multiple entries into bulkier write() calls.
MESSAGE=data\n and MESSAGE\n40000000data\n are both valid serializations, so
they should be stored in the journal. Before, MESSAGE, SYSLOG_FACILITY,
SYSLOG_IDENTIFIER, PRIORITY, and OBJECT_PID would be only honoured if they were
given in the first form.
Fixed#5973.
For all except the last entry in a single packet, we would dispatch the
message to the journal, but not forward it, nor perform proper cleanup.
Rewrite the code to process each entry in a helper function, and make
server_process_native_message() just call this function in a loop.
Fixes#5643.
v2:
- properly decrement *remaining when processing entry separator
timespec::tv_nsec can have different sizes depending on the
host architecture. On x32 in particular, it is 8 bytes long
while the long int type is only 4 bytes long. Hence, using
ld as a format specifier will trigger a format error. Thus,
explicitly cast timespec::tv_nsec to nsec_t and use PRI_NSEC
as the format specifier to make sure the sizes for both match.
This reverts commit 6355e75610.
The previously mentioned commit inadvertently broke a lot of SELinux related
functionality for both unprivileged users and systemd instances running as
MANAGER_USER. In particular, setting the correct SELinux context after a User=
directive is used would fail to work since we attempt to set the security
context after changing UID. Additionally, it causes activated socket units to
be mislabeled for systemd --user processes since setsockcreatecon() would never
be called.
Reverting this fixes the issues with labeling outlined above, and reinstates
SELinux access checks on unprivileged user services.
Try to honor --show-cursor in more situations by never terminating early
when we didn't read any logs.
In particular, sd_journal_previous_skip() now returns 0 when it didn't
actually skip anything (for example with --lines=0), which resulted in
--show-cursor not working anymore.
Using conf.set() with a boolean argument does the right thing:
either #ifdef or #undef. This means that conf.set can be used unconditionally.
Previously I used '1' as the placeholder value, and that needs to be changed to
'true' for consistency (under meson 1 cannot be used in boolean context). All
checks need to be adjusted.
When some error occurs during the initialization of JournalFile,
the JournalFile can be left without hash tables created. When later
trying to append an entry to that file, the assertion in
journal_file_link_data() fails, and journald crashes.
This patch fix this issue by checking *_hash_table_size in
journal_file_verify_header().
Also detect libgpg-error. Require both to be present for HAVE_CRYPT,
even though libgpg-error is only used in src/resolve. If one is available,
the other should be too, so it doesn't seem worth the trouble to make two
separate conditions.
When caller invokes sd_journal_open() we usually open at least one
directory with journal files. add_root_directory() function increments
current_invalidate_counter. After sd_journal_open() returns
current_invalidate_counter != last_invalidate_counter.
After caller waits for journal events (e.g. waits for new messages in
journal) then it usually calls sd_journal_process(). However, on first
call to sd_journal_process(), function determine_change() returns
SD_JOURNAL_INVALIDATE even though no journal files were
deleted/moved. This is because current_invalidate_counter !=
last_invalidate_counter.
After the fix we make sure counters has the same value before we begin
processing inotify events.
Shell scripts should be executable so that meson reports their
invocation succinctly (does not print 'sh' '-e').
Python scripts should not be executable so that meson does the
detection of the right python binary itself.
Add -u everywhere to catch potential errors.
The indentation for emacs'es meson-mode is added .dir-locals.
All files are reindented automatically, using the lasest meson-mode from git.
Indentation should now be fairly consistent.
This simplifies things and leads to a smaller installation footprint.
libsystemd_internal and libsystemd_journal_internal are linked into
libystemd-shared and available to all programs linked to libsystemd-shared.
libsystemd_journal_internal is not needed anymore, and libsystemd-shared
is used everwhere. The few exceptions are: libsystemd.so, test-engine,
test-bus-error, and various loadable modules.
This implementation assumes that the arguments in compiler.cmd_array()
don't contain any spaces. Since we are only interested in compilation
on Linux, I think this is a safe assumption.
Solution suggested by Nirbheek Chauhan.
It's crucial that we can build systemd using VS2010!
... er, wait, no, that's not the official reason. We need to shed old systems
by requring python 3! Oh, no, it's something else. Maybe we need to throw out
345 years of knowlege accumulated in autotools? Whatever, this new thing is
cool and shiny, let's use it.
This is not complete, I'm throwing it out here for your amusement and critique.
- rules for sd-boot are missing. Those might be quite complicated.
- rules for tests are missing too. Those are probably quite simple and
repetitive, but there's lots of them.
- it's likely that I didn't get all the conditions right, I only tested "full"
compilation where most deps are provided and nothing is disabled.
- busname.target and all .busname units are skipped on purpose.
Otherwise, installation into $DESTDIR has the same list of files and the
autoconf install, except for .la files.
It'd be great if people had a careful look at all the library linking options.
I added stuff until things compiled, and in the end there's much less linking
then in the old system. But it seems that there's still a lot of unnecessary
deps.
meson has a `shared_module` statement, which sounds like something appropriate
for our nss and pam modules. Unfortunately, I couldn't get it to work. For the
nss modules, we need an .so version of '2', but `shared_module` disallows the
version argument. For the pam module, it also didn't work, I forgot the reason.
The handling of .m4 and .in and .m4.in files is rather awkward. It's likely
that this could be simplified. If make support is ever dropped, I think it'd
make sense to switch to a different templating system so that two different
languages and not required, which would make everything simpler yet.
v2:
- use get_pkgconfig_variable
- use sh not bash
- use add_project_arguments
v3:
- drop required:true and fix progs/prog typo
v4:
- use find_library('bz2')
- add TTY_GID definition
- define __SANE_USERSPACE_TYPES__
- use join_paths(prefix, ...) is used on all paths to make them all absolute
v5:
- replace all declare_dependency's with []
- add more conf.get guards around optional components
v6:
- drop -pipe, -Wall which are the default in meson
- use compiler.has_function() and compiler.has_header_symbol instead of the
hand-rolled checks.
- fix duplication in 'liblibsystemd' library name
- use the right .sym file for pam_systemd
- rename 'compiler' to 'cc': shorter, and more idiomatic.
v7:
- use ENABLE_ENVIRONMENT_D not HAVE_ENVIRONMENT_D
- rename prefix to prefixdir, rootprefix to rootprefixdir
("prefix" is too common of a name and too easy to overwrite by mistake)
- wrap more stuff with conf.get('ENABLE...') == 1
- use rootprefix=='/' and rootbindir as install_dir, to fix paths under
split-usr==true.
v8:
- use .split() also for src/coredump. Now everything is consistent ;)
- add rootlibdir option and use it on the libraries that require it
v9:
- indentation
v10:
- fix check for qrencode and libaudit
v11:
- unify handling of executable paths, provide options for all progs
This makes the meson build behave slightly differently than the
autoconf-based one, because we always first try to find the executable in the
filesystem, and fall back to the default. I think different handling of
loadkeys, setfont, and telinit was just a historical accident.
In addition to checking in $PATH, also check /usr/sbin/, /sbin for programs.
In Fedora $PATH includes /usr/sbin, (and /sbin is is a symlink to /usr/sbin),
but in Debian, those directories are not included in the path.
C.f. https://github.com/mesonbuild/meson/issues/1576.
- call all the options 'xxx-path' for clarity.
- sort man/rules/meson.build properly so it's stable
Both gcc and clang issue a host of warnings about void pointers used in
arithmetic. The warning must be ignored in that file to avoid multiple
warnings.
Makefile.am used to set this for all libsystemd-journal-internal.a sources,
because there's no finer granularity for warnings. Let's just set it for
this one file.
I think it's nice to mark the test as skipped instead of omitting
it entirely, hence #ifdefs in the code instead of excluding the test
in Makefile.am/meson.build.
Native journal messages (_TRANSPORT=journal) typically don't have a
syslog facility attached to it. As a result when forwarding the messages
to syslog they ended up with facility 0 (LOG_KERN).
Apply syslog_fixup_facility() so we use LOG_USER instead.
Fixes: #5640
It is possible to overflow uint64_t while validating the header of
a journal file. To prevent this, the addition itself is checked to
be within the limits of UINT64_MAX first.
To keep this readable, I have introduced two stack variables which
hold the converted values during validation.
The only functional change is that log_notice("No journal files were found.")
is not printed any more with --quiet. log_error("No journal files were opened
due to insufficient permissions.") is still printed.
I wasn't quite sure where to put this function, but shared/ seems to be the
right place and none of the existing files seem to fit too well.
v2: rename journal_access_check to journal_access_check_and_warn.
The cg_pid_get_path_shifted() is called twice during
server_dispatch_message(). We can get rid of the second by passing the
path to dispatch_message_real().
SD_ID128_MAKE is clearly not a standard C macro, so let’s point the user
to its documentation to let them know which header they need and what
they can then do with MESSAGE_XYZ.
Embedding sd_id128_t's in constant strings was rather cumbersome. We had
SD_ID128_CONST_STR which returned a const char[], but it had two problems:
- it wasn't possible to statically concatanate this array with a normal string
- gcc wasn't really able to optimize this, and generated code to perform the
"conversion" at runtime.
Because of this, even our own code in coredumpctl wasn't using
SD_ID128_CONST_STR.
Add a new macro to generate a constant string: SD_ID128_MAKE_STR.
It is not as elegant as SD_ID128_CONST_STR, because it requires a repetition
of the numbers, but in practice it is more convenient to use, and allows gcc
to generate smarter code:
$ size .libs/systemd{,-logind,-journald}{.old,}
text data bss dec hex filename
1265204 149564 4808 1419576 15a938 .libs/systemd.old
1260268 149564 4808 1414640 1595f0 .libs/systemd
246805 13852 209 260866 3fb02 .libs/systemd-logind.old
240973 13852 209 255034 3e43a .libs/systemd-logind
146839 4984 34 151857 25131 .libs/systemd-journald.old
146391 4984 34 151409 24f71 .libs/systemd-journald
It is also much easier to check if a certain binary uses a certain MESSAGE_ID:
$ strings .libs/systemd.old|grep MESSAGE_ID
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
$ strings .libs/systemd|grep MESSAGE_ID
MESSAGE_ID=c7a787079b354eaaa9e77b371893cd27
MESSAGE_ID=b07a249cd024414a82dd00cd181378ff
MESSAGE_ID=641257651c1b4ec9a8624d7a40a9e1e7
MESSAGE_ID=de5b426a63be47a7b6ac3eaac82e2f6f
MESSAGE_ID=d34d037fff1847e6ae669a370e694725
MESSAGE_ID=7d4958e842da4a758f6c1cdc7b36dcc5
MESSAGE_ID=1dee0369c7fc4736b7099b38ecb46ee7
MESSAGE_ID=39f53479d3a045ac8e11786248231fbf
MESSAGE_ID=be02cf6855d2428ba40df7e9d022f03d
MESSAGE_ID=7b05ebc668384222baa8881179cfda54
MESSAGE_ID=9d1aaa27d60140bd96365438aad20286
The compiler should not be able to optimize out the memset, because optarg is global
memory. In this case, not making the argument an empty string is nicer, so just use
an open-coded version of string_erase from before the explicit_bzero change.
76ec966f0e changed the code from ESHUTDOWN to ERFKILL, but missed one
spot in bus-common-errors.c. Fix that.
The code in transaction.c was checking for ERFKILL, but I'm not sure if this
mismatch had any effect, i.e. if there were any code paths in which the wrong
code actually made difference.
Also add comments when ESHUTDOWN is used in the journal code, so it's easy to
distinguish those cases when grepping. Standarize on the same capitalization.
(There's also a bunch of uses in sd-bus.c, but that's clearly different.)
* logind: trivial simplification
free_and_strdup() handles NULL arg, so make use of that.
* boot: fix two typos
* pid1: rewrite check in ignore_proc() to not check condition twice
It's harmless, but it seems nicer to evaluate a condition just a single time.
* core/execute: reformat exec_context_named_iofds() for legibility
* core/execute.c: check asprintf return value in the usual fashion
This is unlikely to fail, but we cannot rely on asprintf return value
on failure, so let's just be correct here.
CID #1368227.
* core/timer: use (void)
CID #1368234.
* journal-file: check asprintf return value in the usual fashion
This is unlikely to fail, but we cannot rely on asprintf return value
on failure, so let's just be correct here.
CID #1368236.
* shared/cgroup-show: use (void)
CID #1368243.
* cryptsetup: do not return uninitialized value on error
CID #1368416.
gcc 7 adds -Wimplicit-fallthrough=3 to -Wextra. There are a few ways
we could deal with that. After we take into account the need to stay compatible
with older versions of the compiler (and other compilers), I don't think adding
__attribute__((fallthrough)), even as a macro, is worth the trouble. It sticks
out too much, a comment is just as good. But gcc has some very specific
requiremnts how the comment should look. Adjust it the specific form that it
likes. I don't think the extra stuff we had in those comments was adding much
value.
(Note: the documentation seems to be wrong, and seems to describe a different
pattern from the one that is actually used. I guess either the docs or the code
will have to change before gcc 7 is finalized.)
https://bugzilla.redhat.com/show_bug.cgi?id=1416201
$ journalctl -b
Journal file /var/log/journal/ad18f69b80264b52bb3b766240742383/system@0005467d92e23784-a6571c8b69d09124.journal~ uses an unsupported feature, ignoring file.
Use SYSTEMD_LOG_LEVEL=debug journalctl --file=/var/log/journal/ad18f69b80264b52bb3b766240742383/system@0005467d92e23784-a6571c8b69d09124.journal~ to see the details.
-- No entries --
$ journalctl --file=/var/log/journal/ad18f69b80264b52bb3b766240742383/system@0005467d92e23784-a6571c8b69d09124.journal~
Journal file /var/log/journal/ad18f69b80264b52bb3b766240742383/system@0005467d92e23784-a6571c8b69d09124.journal~ uses incompatible flag lz4-compressed disabled at compilation time.
Failed to open journal file /var/log/journal/ad18f69b80264b52bb3b766240742383/system@0005467d92e23784-a6571c8b69d09124.journal~: Protocol not supported
mmap cache statistics: 0 hit, 1 miss
Failed to open files: Protocol not supported
After parsing the --verify-key argument, overwrite it with null bytes.
This minimizes (but does not completely eliminate) the time frame within
which another process on the system can extract the verification key
from the journalctl command line.
gperf-3.1 generates lookup functions that take a size_t length
parameter instead of unsigned int. Test for this at configure time.
Fixes: https://github.com/systemd/systemd/issues/5039
The journalctl man page says: "-m, --merge Show entries interleaved from all
available journals, including remote ones.", but current version of journalctl
doesn't live up to this promise. This patch simply adds
"/var/log/journal/remote" to search path if --merge flag is used.
Should fix issue #3618
This changes journald to not write to /var/log/journal until it received
SIGUSR1 for the first time, thus having been requested to flush the runtime
journal to disk.
This makes the journal work nicer with systems which have the root file system
writable early, but still need to rearrange /var before journald should start
writing and creating files to it, for example because ACLs need to be applied
first, or because /var is to be mounted from another file system, NFS or tmpfs
(as is the case for systemd.volatile=state).
Before this change we required setupts with /var split out to mount the root
disk read-only early on, and ship an /etc/fstab that remounted it writable only
after having placed /var at the right place. But even that was racy for various
preparations as journald might end up accessing the file system before it was
entirely set up, as soon as it was writable.
With this change we make scheduling when to start writing to /var/log/journal
explicit. This means persistent mode now requires
systemd-journal-flush.service in the mix to work, as otherwise journald would
never write to the directory.
See: #1397
This improves kernel command line parsing in a number of ways:
a) An kernel option "foo_bar=xyz" is now considered equivalent to
"foo-bar-xyz", i.e. when comparing kernel command line option names "-" and
"_" are now considered equivalent (this only applies to the option names
though, not the option values!). Most of our kernel options used "-" as word
separator in kernel command line options so far, but some used "_". With
this change, which was a source of confusion for users (well, at least of
one user: myself, I just couldn't remember that it's systemd.debug-shell,
not systemd.debug_shell). Considering both as equivalent is inspired how
modern kernel module loading normalizes all kernel module names to use
underscores now too.
b) All options previously using a dash for separating words in kernel command
line options now use an underscore instead, in all documentation and in
code. Since a) has been implemented this should not create any compatibility
problems, but normalizes our documentation and our code.
c) All kernel command line options which take booleans (or are boolean-like)
have been reworked so that "foobar" (without argument) is now equivalent to
"foobar=1" (but not "foobar=0"), thus normalizing the handling of our
boolean arguments. Specifically this means systemd.debug-shell and
systemd_debug_shell=1 are now entirely equivalent.
d) All kernel command line options which take an argument, and where no
argument is specified will now result in a log message. e.g. passing just
"systemd.unit" will no result in a complain that it needs an argument. This
is implemented in the proc_cmdline_missing_value() function.
e) There's now a call proc_cmdline_get_bool() similar to proc_cmdline_get_key()
that parses booleans (following the logic explained in c).
f) The proc_cmdline_parse() call's boolean argument has been replaced by a new
flags argument that takes a common set of bits with proc_cmdline_get_key().
g) All kernel command line APIs now begin with the same "proc_cmdline_" prefix.
h) There are now tests for much of this. Yay!
Make sure to populate the cache in cache_space_refresh() at least once
otherwise it's possible that the system boots fast enough (and the journal
flush service is finished) before the invalidate cache timeout (30 us) has
expired.
Fixes: #4790
Let's remove chase_symlinks_prefix() and instead introduce a flags parameter to
chase_symlinks(), with a flag CHASE_PREFIX_ROOT that exposes the behaviour of
chase_symlinks_prefix().
Let's use chase_symlinks() everywhere, and stop using GNU
canonicalize_file_name() everywhere. For most cases this should not change
behaviour, however increase exposure of our function to get better tested. Most
importantly in a few cases (most notably nspawn) it can take the correct root
directory into account when chasing symlinks.
We don't have plural in the name of any other -util files and this
inconsistency trips me up every time I try to type this file name
from memory. "formats-util" is even hard to pronounce.
I'm seeing strange decompression errors with lz4, which
might be content-dependent. Extend test-compression to allow
testing specific content.
(Edit: PEBKAC: lzcat and lz4cat are not the same beast.
Nevertheless, the test might still be useful in the future.)
This stripping is contolled by a new boolean parameter. When the parameter
is true, it means that the caller does not care about the distinction between
initrd and real root, and wants to act on both rd-dot-prefixed and unprefixed
parameters in the initramfs, and only on the unprefixed parameters in real
root. If the parameter is false, behaviour is the same as before.
Changes by caller:
log.c (systemd.log_*): changed to accept rd-dot-prefix params
pid1: no change, custom logic
cryptsetup-generator: no change, still accepts rd-dot-prefix params
debug-generator: no change, does not accept rd-dot-prefix params
fsck: changed to accept rd-dot-prefix params
fstab-generator: no change, custom logic
gpt-auto-generator: no change, custom logic
hibernate-resume-generator: no change, does not accept rd-dot-prefix params
journald: changed to accept rd-dot-prefix params
modules-load: no change, still accepts rd-dot-prefix params
quote-check: no change, does not accept rd-dot-prefix params
udevd: no change, still accepts rd-dot-prefix params
I added support for "rd." params in the three cases where I think it's
useful: logging, fsck options, journald forwarding options.
This makes journald use the common option parsing functionality.
One behavioural change is implemented:
"systemd.journald.forward_to_syslog" is now equivalent to
"systemd.journald.forward_to_syslog=1".
I think it's nicer to use this way.
Now that determine_space_for() only deals with storage space (cached) values,
rename it so it reflects the fact that only the cached storage space values are
updated.
Updating min_use is rather an unusual operation that is limited when we first
open the journal files, therefore extracts it from determine_space_for() and
create a function of its own and call this new function when needed.
determine_space_for() is now dealing with storage space (cached) values only.
There should be no functional changes.
The set of storage space values we cache are calculated according to a couple
of filesystem statistics (free blocks, block size).
This patch caches the vfs stats we're interested in so these values are
available later and coherent with the rest of the space cached values.
This patch makes system_journal_open() stop emitting the space usage
message. The caller is now free to emit this message when appropriate.
When restarting the journal, we can now emit the message *after*
flushing the journal (if required) so that all flushed log entries are
written in the persistent journal *before* the status message.
This is required since the status message is always younger than the
flushed entries.
Fixes#4190.
This commit simply extracts from determine_space_for() the code which emits the
storage usage message and put it into a function of its own so it can be reused
by others paths later.
No functional changes.
This structure keeps track of specificities for a given journal type
(persistent or volatile) such as metrics, name, etc...
The cached space values are now moved in this structure so that each
journal has its own set of cached values.
Previously only one set existed and we didn't know if the cached
values were for the runtime journal or the persistent one.
When doing:
determine_space_for(s, runtime_metrics, ...);
determine_space_for(s, system_metrics, ...);
the second call returned the cached values for the runtime metrics.
This commit simply extracts from determine_space_for() the code which
determines the FS usage where the passed path lives (statvfs(3)) and put it
into a function of its own so it can be reused by others paths later.
No functional changes.
Never permit that we write to journal files that have newer timestamps than our
local wallclock has. If we'd accept that, then the entries in the file might
end up not being ordered strictly.
Let's refuse this with ETXTBSY, and then immediately rotate to use a new file,
so that each file remains strictly ordered also be wallclock internally.
As soon as we notice that the clock jumps backwards, rotate journal files. This
is beneficial, as this makes sure that the entries in journal files remain
strictly ordered internally, and thus the bisection algorithm applied on it is
not confused.
This should help avoiding borked wallclock-based bisection on journal files as
witnessed in #4278.
Let's use the earliest linearized event timestamp for journal entries we have:
the event dispatch timestamp from the event loop, instead of requerying the
timestamp at the time of writing.
This makes the time a bit more accurate, allows us to query the kernel time one
time less per event loop, and also makes sure we always use the same timestamp
for both attempts to write an entry to a journal file.
When iterating through partially synced journal files we need to be prepared
for hitting with invalid entries (specifically: non-initialized). Instead of
generated an error and giving up, let's simply try to preceed with the next one
that is valid (and debug log about this).
This reworks the logic introduced with caeab8f626
to iteration in both directions, and tries to look for valid entries located
after the invalid one. It also extends the behaviour to both iterating through
the global entry array and per-data object entry arrays.
Fixes: #4088
Let's make dissecting of borked journal files more expressive: if we encounter
an object whose first 8 bytes are all zeroes, then let's assume the object was
simply never initialized, and say so.
Previously, this would be detected as "overly short object", which is true too
in a away, but it's a lot more helpful printing different debug options for the
case where the size is not initialized at all and where the size is initialized
to some bogus value.
No function behaviour change, only a different log messages for both cases.
Let's make it easier to figure out when we see an invalid journal file, why we
consider it invalid, and add some minimal debug logging for it.
This log output is normally not seen (after all, this all is library code),
unless debug logging is exlicitly turned on.
When date is changed in system to future and normal user logs to new journal file, and then date is changed back to present time, the "journalctl --list-boot" command goes to forever loop. This commit tries to fix this problem by checking first the boot id list if the found boot id was already in that list. If it is found, then stopping the boot id find loop.
This adds a new invocation ID concept to the service manager. The invocation ID
identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is
generated each time a unit moves from and inactive to an activating or active
state.
The primary usecase for this concept is to connect the runtime data PID 1
maintains about a service with the offline data the journal stores about it.
Previously we'd use the unit name plus start/stop times, which however is
highly racy since the journal will generally process log data after the service
already ended.
The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel,
except that it applies to an individual unit instead of the whole system.
The invocation ID is passed to the activated processes as environment variable.
It is additionally stored as extended attribute on the cgroup of the unit. The
latter is used by journald to automatically retrieve it for each log logged
message and attach it to the log entry. The environment variable is very easily
accessible, even for unprivileged services. OTOH the extended attribute is only
accessible to privileged processes (this is because cgroupfs only supports the
"trusted." xattr namespace, not "user."). The environment variable may be
altered by services, the extended attribute may not be, hence is the better
choice for the journal.
Note that reading the invocation ID off the extended attribute from journald is
racy, similar to the way reading the unit name for a logging process is.
This patch adds APIs to read the invocation ID to sd-id128:
sd_id128_get_invocation() may be used in a similar fashion to
sd_id128_get_boot().
PID1's own logging is updated to always include the invocation ID when it logs
information about a unit.
A new bus call GetUnitByInvocationID() is added that allows retrieving a bus
path to a unit by its invocation ID. The bus path is built using the invocation
ID, thus providing a path for referring to a unit that is valid only for the
current runtime cycleof it.
Outlook for the future: should the kernel eventually allow passing of cgroup
information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we
can alter the invocation ID to be generated as hash from that rather than
entirely randomly. This way we can derive the invocation race-freely from the
messages.
Currently, the ratelimit does not handle the number of suppressed messages accurately.
Even though the number of messages reaches the limit, it still allows to add one extra messages to journal.
This patch fixes the problem.
This patch fixes wrong calculation of burst_modulate(), which now calculates
the values smaller than really expected ones if available disk space is
strictly more than 1MB.
In particular, if available disk space is strictly more than 1MB and strictly
less than 16MB, the resulted value becomes smaller than its original one.
>>> (math.log2(1*1024**2)-16) / 4
1.0
>>> (math.log2(16*1024**2)-16) / 4
2.0
>>> (math.log2(256*1024**2)-16) / 4
3.0
→ This matches the comment in the function.
Since commit 5996c7c295 (v190 !), the
calculation of the HMAC is broken because the hash for a data object
including a field is done in the wrong order: the field object is
hashed before the data object is.
However during verification, the hash is done in the opposite order as
objects are scanned sequentially.
The key used to be jammed next to the local file path. Based on the format string on line 1675, I determined that the order of arguments was written incorrectly, and updated the function based on that assumption.
Before:
```
Please write down the following secret verification key. It should be stored
at a safe location and should not be saved locally on disk.
/var/log/journal/9b47c1a5b339412887a197b7654673a7/fss8f66d6-f0a998-f782d0-1fe522/18fdb8-35a4e900
The sealing key is automatically changed every 15min.
```
After:
```
Please write down the following secret verification key. It should be stored
at a safe location and should not be saved locally on disk.
d53ed4-cc43d6-284e10-8f0324/18fdb8-35a4e900
The sealing key is automatically changed every 15min.
```
According to its manual page, flags given to mkostemp(3) shouldn't include
O_RDWR, O_CREAT or O_EXCL flags as these are always included. Beyond
those, the only flag that all callers (except a few tests where it
probably doesn't matter) use is O_CLOEXEC, so set that unconditionally.
When the system journal becomes re-opened post-flush with the runtime
journal open, it implies we've recovered from something like an ENOSPC
situation where the system journal rotate had failed, leaving the system
journal closed, causing the runtime journal to be opened post-flush.
For the duration of the unavailable system journal, we log to the
runtime journal. But when the system journal gets opened (space made
available, for example), we need to close the runtime journal before new
journal writes will go to the system journal. Calling
server_flush_to_var() after opening the system journal with a runtime
journal present, post-flush, achieves this while preserving the runtime
journal's contents in the system journal.
The combination of the present flushed flag file and the runtime journal
being open is a state where we should be logging to the system journal,
so it's appropriate to resume doing so once we've successfully opened
the system journal.
If journals get into a closed state like when rotate fails due to
ENOSPC, when space is made available it currently goes unnoticed leaving
the journals in a closed state indefinitely.
By calling system_journal_open() on entry to find_journal() we ensure
the journal has been opened/created if possible.
Also moved system_journal_open() up to after open_journal(), before
find_journal().
Fixes https://github.com/systemd/systemd/issues/3968
It is useful to look at a (possibly inactive) container or other os tree
with --root=/path/to/container. This is similar to specifying
--directory=/path/to/container/var/log/journal --directory=/path/to/container/run/systemd/journal
(if using --directory multiple times was allowed), but doesn't require
as much typing.
The directory argument that is given to sd_j_o_d was ignored when
SD_JOURNAL_OS_ROOT was given, and directories relative to the root of the host
file system were used. With that flag, sd_j_o_d should do the same as
sd_j_open_container: use the path as "prefix", i.e. the directory relative to
which everything happens.
Instead of touching sd_j_o_d, journal_new is fixed to do what sd_j_o_c
was doing, and treat the specified path as prefix when SD_JOURNAL_OS_ROOT is
specified.
Fixed (master) versions of libtool pass -fsanitize=address correctly
into CFLAGS and LDFLAGS allowing ASAN to be used without any special
configure tricks..however ASAN triggers in lookup3.c for the same
reasons valgrind does. take the alternative codepath if
__SANITIZE_ADDRESS__ is defined as well.
Beef up the existing var_tmp() call, rename it to var_tmp_dir() and add a
matching tmp_dir() call (the former looks for the place for /var/tmp, the
latter for /tmp).
Both calls check $TMPDIR, $TEMP, $TMP, following the algorithm Python3 uses.
All dirs are validated before use. secure_getenv() is used in order to limite
exposure in suid binaries.
This also ports a couple of users over to these new APIs.
The var_tmp() return parameter is changed from an allocated buffer the caller
will own to a const string either pointing into environ[], or into a static
const buffer. Given that environ[] is mostly considered constant (and this is
exposed in the very well-known getenv() call), this should be OK behaviour and
allows us to avoid memory allocations in most cases.
Note that $TMPDIR and friends override both /var/tmp and /tmp usage if set.