log.h really should only include the bare minimum of other headers, as
it is really pulled into pretty much everything else and already in
itself one of the most basic pieces of code we have.
Let's hence drop inclusion of:
1. sd-id128.h because it's entirely unneeded in current log.h
2. errno.h, dito.
3. sys/signalfd.h which we can replace by a simple struct forward
declaration
4. process-util.h which was needed for getpid_cached() which we now hide
in a funciton log_emergency_level() instead, which nicely abstracts
the details away.
5. sys/socket.h which was needed for struct iovec, but a simple struct
forward declaration suffices for that too.
Ultimately this actually makes our source tree larger (since users of
the functionality above must now include it themselves, log.h won't do
that for them), but I think it helps to untangle our web of includes a
tiny bit.
(Background: I'd like to isolate the generic bits of src/basic/ enough
so that we can do a git submodule import into casync for it)
We use the same check at two places, let's add a tiny helper function
for it, since it's not entirely trivialy, and we changes this before
multiple times, and it's a good thing if we can change it at one place
only instead of multiple.
This ensures that in all threads we fork off in the background in our
code we mask out all signals, so that our thread won't end up getting
signals delivered the main process should be getting.
We always set the signal mask before forking off the thread, so that the
thread has the right mask set from its earliest existance on.
Instead of compiling those files twice, once for libsystemd and once for
libshared, compile once as a static archive and then link into both.
This reduce the meson target for man=no compile to 1291.
gcrypt_util_sources had to be moved because otherwise they appeared twice
in libshared.so halfproducts, causing an error.
-fvisibility=default is added to libbasic, libshared_static so that the symbols
appear properly in the exported symbol list in libshared.
The advantage is that files are not compiled twice. When configured with -Dman=false,
the ninja target list is reduced from 1588 to 1347 targets. The difference in compilation
time is small (<10%). I think this is because of -O0 and ccache and multiple cores, and
in different settings the compilation time could be reduced. The main advantage is that
errors and warnings are not reported twice.
This makes things a bit easier to read I think, and also makes sure we
always use the _unlikely_ wrapper around it, which so far we used
sometimes and other times we didn't. Let's clean that up.
Let's replace usage of fputc_unlocked() and friends by __fsetlocking(f,
FSETLOCKING_BYCALLER). This turns off locking for the entire FILE*,
instead of doing individual per-call decision whether to use normal
calls or _unlocked() calls.
This has various benefits:
1. It's easier to read and easier not to forget
2. It's more comprehensive, as fprintf() and friends are covered too
(as these functions have no _unlocked() counterpart)
3. Philosophically, it's a bit more correct, because it's more a
property of the file handle really whether we ever pass it on to another
thread, not of the operations we then apply to it.
This patch reworks all pieces of codes that so far used fxyz_unlocked()
calls to use __fsetlocking() instead. It also reworks all places that
use open_memstream(), i.e. use stdio FILE* for string manipulations.
Note that this in some way a revert of 4b61c87511.
The "nobody" user might possibly be seen by the journal or coredumping
code if unmapped userns-using processes are somehow visible to them.
Let's make sure we don't do the ACL magic for this user either, since
this is a special system user that might be backed by different real
users in different contexts.
This adds uid_is_system() and gid_is_system(), similar in style to
uid_is_dynamic(). That a helper like this is useful is illustrated by
the fact that test-condition.c didn't get the check right so far, which
this patch fixes.
N_IOVEC_OBJECT_FIELDS is bumped 14 → 18 (see dispatch_message_real() and
count!)
N_IOVEC_PAYLOAD_FIELDS is bumped 15 → 16 (see
server_space_usage_message() and count!)
Also, add comments, to make clear what is what.
Coverity says that's undefined. I'm pretty sure we always would get a nan, but
let's avoid (formally) undefined behaviour since that can cause compilers to do
strange things.
So far I avoided adding license headers to meson files, but they are pretty
big and important and should carry license headers like everything else.
I added my own copyright, even though other people modified those files too.
But this is mostly symbolic, so I hope that's OK.
And let's make use of it to implement two new unit settings with it:
1. LogLevelMax= is a new per-unit setting that may be used to configure
log priority filtering: set it to LogLevelMax=notice and only
messages of level "notice" and lower (i.e. more important) will be
processed, all others are dropped.
2. LogExtraFields= is a new per-unit setting for configuring per-unit
journal fields, that are implicitly included in every log record
generated by the unit's processes. It takes field/value pairs in the
form of FOO=BAR.
Also, related to this, one exisiting unit setting is ported to this new
facility:
3. The invocation ID is now pulled from /run/systemd/units/ instead of
cgroupfs xattrs. This substantially relaxes requirements of systemd
on the kernel version and the privileges it runs with (specifically,
cgroupfs xattrs are not available in containers, since they are
stored in kernel memory, and hence are unsafe to permit to lesser
privileged code).
/run/systemd/units/ is a new directory, which contains a number of files
and symlinks encoding the above information. PID 1 creates and manages
these files, and journald reads them from there.
Note that this is supposed to be a direct path between PID 1 and the
journal only, due to the special runtime environment the journal runs
in. Normally, today we shouldn't introduce new interfaces that (mis-)use
a file system as IPC framework, and instead just an IPC system, but this
is very hard to do between the journal and PID 1, as long as the IPC
system is a subject PID 1 manages, and itself a client to the journal.
This patch cleans up a couple of types used in journal code:
specifically we switch to size_t for a couple of memory-sizing values,
as size_t is the right choice for everything that is memory.
Fixes: #4089Fixes: #3041Fixes: #4441
When we drop messages of a unit, we log about. Let's add some structured
data to that. Let's include how many messages we dropped, but more
importantly, let's link up the message we generate to the unit we
dropped the messages from by using the "OBJECT" logic, i.e. by
generating OBJECT_SYSTEMD_UNIT= fields and suchlike, that "journalctl
-u" and friends already look for.
Fixes: #6494
clang warns about a few sites like this:
../src/journal/journal-file.c:1780:48: warning: taking address of packed member 'entry_offset' of class or structure 'DataObject' may result in an unaligned pointer value [-Waddress-of-packed-member]
&o->data.entry_offset,
^~~~~~~~~~~~~~~~~~~~
but DataObject.entry_offset will always be 8-byte aligned as long as
the DataObject structure is aligned. Similarly in other cases, the
field is always aligned. Let's just silence the warning to avoid noise.
gcc does not know -Waddress-of-packed-member, and would warn about an unknown
warning, so we need to conditionalize on __clang__.
../src/journal/journald-native.c:341:13: warning: variable 'context' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
if (ucred && pid_is_valid(ucred->pid)) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/journal/journald-native.c:350:42: note: uninitialized use occurs here
context, ucred, tv, label, label_len);
^~~~~~~
../src/journal/journald-native.c:335:31: note: initialize the variable 'context' to silence this warning
ClientContext *context;
^
= NULL
Very nice reporting!
Functions that we call can handle context == NULL, so it's enough to simply
initialize the variable.
This option allows restricting the shown fields in the output modes that
would normally show all fields. It allows clients that are only
interested in a subset of the fields to access those more efficiently.
Also, it makes the resulting size of the output more predictable.
It has no effect on the various `short` output modes, because those
already only show a subset of the fields.
The advantage is that is the name is mispellt, cpp will warn us.
$ git grep -Ee "conf.set\('(HAVE|ENABLE)_" -l|xargs sed -r -i "s/conf.set\('(HAVE|ENABLE)_/conf.set10('\1_/"
$ git grep -Ee '#ifn?def (HAVE|ENABLE)' -l|xargs sed -r -i 's/#ifdef (HAVE|ENABLE)/#if \1/; s/#ifndef (HAVE|ENABLE)/#if ! \1/;'
$ git grep -Ee 'if.*defined\(HAVE' -l|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_]*)\)/\1/g'
$ git grep -Ee 'if.*defined\(ENABLE' -l|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_]*)\)/\1/g'
+ manual changes to meson.build
squash! build-sys: use #if Y instead of #ifdef Y everywhere
v2:
- fix incorrect setting of HAVE_LIBIDN2
This adds IOVEC_INIT() and IOVEC_MAKE() for initializing iovec structures
from a pointer and a size. On top of these IOVEC_INIT_STRING() and
IOVEC_MAKE_STRING() are added which take a string and automatically
determine the size of the string using strlen().
This patch removes the old IOVEC_SET_STRING() macro, given that
IOVEC_MAKE_STRING() is now useful for similar purposes. Note that the
old IOVEC_SET_STRING() invocations were two characters shorter than the
new ones using IOVEC_MAKE_STRING(), but I think the new syntax is more
readable and more generic as it simply resolves to a C99 literal
structure initialization. Moreover, we can use very similar syntax now
for initializing strings and pointer+size iovec entries. We canalso use
the new macros to initialize function parameters on-the-fly or array
definitions. And given that we shouldn't have so many ways to do the
same stuff, let's just settle on the new macros.
(This also converts some code to use _cleanup_ where dynamically
allocated strings were using IOVEC_SET_STRING() before, to modernize
things a bit)
This adds a new setting LineMax= to journald.conf, and sets it by
default to 48K. When we convert stream-based stdout/stderr logging into
record-based log entries, read up to the specified amount of bytes
before forcing a line-break.
This also makes three related changes:
- When a NUL byte is read we'll not recognize this as alternative line
break, instead of silently dropping everything after it. (see #4863)
- The reason for a line-break is now encoded in the log record, if it
wasn't a plain newline. Specifically, we distuingish "nul",
"line-max" and "eof", for line breaks due to NUL byte, due to the
maximum line length as configured with LineMax= or due to end of
stream. This data is stored in the new implicit _LINE_BREAK= field.
It's not synthesized for plain \n line breaks.
- A randomized 128bit ID is assigned to each log stream.
With these three changes in place it's (mostly) possible to reconstruct
the original byte streams from log data, as (most) of the context of
the conversion from the byte stream to log records is saved now. (So,
the only bits we still drop are empty lines. Which might be something to
look into in a future change, and which is outside of the scope of this
work)
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=86465
See: #4863
Replaces: #4875
Introduce journal_file_check_object(), which does lightweight object
sanity checks, and use it in journal_file_move_to_object(), so that we
will catch certain corrupted objects in the journal file.
This fixes#6447, where we had only partially written out OBJECT_ENTRY
(ObjectHeader written, but rest of object zero bytes), causing
"journalctl --list-boots" to fail.
$ builddir.vanilla/journalctl --list-boots -D bug6447/
Failed to determine boots: No data available
$ builddir.patched/journalctl --list-boots -D bug6447/
-52 22633da1c5374a728d6c215e2c301dc2 Mon 2017-07-10 05:29:21 EEST—Mon 2017-07-10 05:31:51 EEST
-51 2253aab9ea7e4a2598f2abda82939eff Mon 2017-07-10 05:32:22 EEST—Mon 2017-07-10 05:36:49 EEST
-50 ef0d85d35c74486fa4104f9d6391b6ba Mon 2017-07-10 05:40:33 EEST—Mon 2017-07-10 05:40:40 EEST
[...]
Note that journal_file_check_object() is similar to
journal_file_object_verify(). The most expensive checks are omitted, as
they would slow down every journal_file_move_to_object() call too much.
With this implementation, the added overhead is small, for example when
dumping some journal content to /dev/null
(built with -Dbuildtype=debugoptimized -Db_ndebug=true):
Performance counter stats for 'builddir.vanilla/journalctl -D 76f4d4c3406945f9a60d3ca8763aa754/':
12542,311634 task-clock:u (msec) # 1,000 CPUs utilized
0 context-switches:u # 0,000 K/sec
0 cpu-migrations:u # 0,000 K/sec
80 100 page-faults:u # 0,006 M/sec
41 786 963 456 cycles:u # 3,332 GHz
105 453 864 770 instructions:u # 2,52 insn per cycle
24 342 227 334 branches:u # 1940,809 M/sec
105 709 217 branch-misses:u # 0,43% of all branches
12,545199291 seconds time elapsed
Performance counter stats for 'builddir.patched/journalctl -D 76f4d4c3406945f9a60d3ca8763aa754/':
12734,723233 task-clock:u (msec) # 1,000 CPUs utilized
0 context-switches:u # 0,000 K/sec
0 cpu-migrations:u # 0,000 K/sec
80 693 page-faults:u # 0,006 M/sec
42 661 017 429 cycles:u # 3,350 GHz
107 696 985 865 instructions:u # 2,52 insn per cycle
24 950 526 745 branches:u # 1959,252 M/sec
101 762 806 branch-misses:u # 0,41% of all branches
12,737527327 seconds time elapsed
Fixes#6447.
'journalctl --vacuum-*' does not suppress output message with --quiet.
Let journal_directory_vacuum honors --quiet to fix the problem.
BugLink: https://bugs.launchpad.net/bugs/1692188
Cache client metadata, in order to be improve runtime behaviour under
pressure.
This is inspired by @vcaputo's work, specifically:
https://github.com/systemd/systemd/pull/2280
That code implements related but different semantics.
For a longer explanation what this change implements please have a look
at the long source comment this patch adds to journald-context.c.
After this commit:
# time bash -c 'dd bs=$((1024*1024)) count=$((1*1024)) if=/dev/urandom | systemd-cat'
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.2783 s, 95.2 MB/s
real 0m11.283s
user 0m0.007s
sys 0m6.216s
Before this commit:
# time bash -c 'dd bs=$((1024*1024)) count=$((1*1024)) if=/dev/urandom | systemd-cat'
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 52.0788 s, 20.6 MB/s
real 0m52.099s
user 0m0.014s
sys 0m7.170s
As side effect, this corrects the journal's rate limiter feature: we now
always use the unit name as key for the ratelimiter.
Let's be a bit stricter in what we end up logging: ignore invalid unit
name specifications. Let's validate all input!
As we ignore unit names passed in from unprivileged clients anyway the
effect of this additional check is minimal.
(Also, no need to initialize the identifier/unit_id fields of stream
objects to NULL if empty strings are passed, the default is NULL
anyway...)
As a follow-up for db3f45e2d2 let's do the
same for all other cases where we create a FILE* with local scope and
know that no other threads hence can have access to it.
For most cases this shouldn't change much really, but this should speed
dbus introspection and calender time formatting up a bit.
This moves pretty much all uses of getpid() over to getpid_raw(). I
didn't specifically check whether the optimization is worth it for each
replacement, but in order to keep things simple and systematic I
switched over everything at once.
Introduces window_matches_fd() for the fd matching case in try_context(),
In find_mmap() we're already walking a list of windows by fd, checking
this is pointless work in a potentially hot loop with many windows.
Instead of context_detach_window() and a manual attach of the new
window, simply call context_attach_window() which performs the
detach first if appropriate.
journal_file_move_to_object() can skip the second
journal_file_move_to() call if the first one already mapped a
sufficiently large area.
Now that mmap_cache_get() returns the size of the mapped area
when asked, ask for the size and only perform the second call if
the required size exceeds the mapped size instead of the object
header size.
This results in a nice performance boost in my testing, even with
a corpus of many small logs burning much CPU time elsewhere:
Before:
# time ./journalctl -b -1 --no-pager > /dev/null
real 0m16.330s
user 0m16.281s
sys 0m0.046s
# time ./journalctl -b -1 --no-pager > /dev/null
real 0m16.409s
user 0m16.358s
sys 0m0.048s
# time ./journalctl -b -1 --no-pager > /dev/null
real 0m16.625s
user 0m16.558s
sys 0m0.061s
After:
# time ./journalctl -b -1 --no-pager > /dev/null
real 0m15.311s
user 0m15.257s
sys 0m0.046s
# time ./journalctl -b -1 --no-pager > /dev/null
real 0m15.201s
user 0m15.135s
sys 0m0.062s
# time ./journalctl -b -1 --no-pager > /dev/null
real 0m15.170s
user 0m15.113s
sys 0m0.053s
If requested, return the actual mapping size to the caller in
addition to the address.
journal_file_move_to_object() often performs two successive
mmap_cache_get() calls via journal_file_move_to(); one to get the
object header, then another to get the entire object when it's
larger than the header's size.
If mmap_cache_get() returned the actual mapping's size, it's
probable that the second mmap_cache_get() could be skipped when
the established mapping already encompassed the desired size.
This way we have a MMapFileDescriptor reference external to the cache,
and can supply the handle directly to mmap_cache_get(), eliminating
hashmap lookups entirely from the hot path.
e268b81e moved an fflush() from output_json() to the generic
output_journal(), when it probably should have deleted all fflush()
calls from logs-show.c altogether.
The caller supplies the FILE * to these functions, and should be in
charge of flushing as needed. The current implementation essentially
defeats any buffering stdio was bringing to the table, resulting in
extraneous tiny write() calls in commands like `journalctl -b`.
This commit removes the fflush() call from output_journal(), and adds
them to journalctl before waiting for more entries and at completion.
This way in the hot path when journalctl loops on entries stdio can
combine multiple entries into bulkier write() calls.
MESSAGE=data\n and MESSAGE\n40000000data\n are both valid serializations, so
they should be stored in the journal. Before, MESSAGE, SYSLOG_FACILITY,
SYSLOG_IDENTIFIER, PRIORITY, and OBJECT_PID would be only honoured if they were
given in the first form.
Fixed#5973.
For all except the last entry in a single packet, we would dispatch the
message to the journal, but not forward it, nor perform proper cleanup.
Rewrite the code to process each entry in a helper function, and make
server_process_native_message() just call this function in a loop.
Fixes#5643.
v2:
- properly decrement *remaining when processing entry separator
timespec::tv_nsec can have different sizes depending on the
host architecture. On x32 in particular, it is 8 bytes long
while the long int type is only 4 bytes long. Hence, using
ld as a format specifier will trigger a format error. Thus,
explicitly cast timespec::tv_nsec to nsec_t and use PRI_NSEC
as the format specifier to make sure the sizes for both match.
This reverts commit 6355e75610.
The previously mentioned commit inadvertently broke a lot of SELinux related
functionality for both unprivileged users and systemd instances running as
MANAGER_USER. In particular, setting the correct SELinux context after a User=
directive is used would fail to work since we attempt to set the security
context after changing UID. Additionally, it causes activated socket units to
be mislabeled for systemd --user processes since setsockcreatecon() would never
be called.
Reverting this fixes the issues with labeling outlined above, and reinstates
SELinux access checks on unprivileged user services.
Try to honor --show-cursor in more situations by never terminating early
when we didn't read any logs.
In particular, sd_journal_previous_skip() now returns 0 when it didn't
actually skip anything (for example with --lines=0), which resulted in
--show-cursor not working anymore.
Using conf.set() with a boolean argument does the right thing:
either #ifdef or #undef. This means that conf.set can be used unconditionally.
Previously I used '1' as the placeholder value, and that needs to be changed to
'true' for consistency (under meson 1 cannot be used in boolean context). All
checks need to be adjusted.
When some error occurs during the initialization of JournalFile,
the JournalFile can be left without hash tables created. When later
trying to append an entry to that file, the assertion in
journal_file_link_data() fails, and journald crashes.
This patch fix this issue by checking *_hash_table_size in
journal_file_verify_header().
Also detect libgpg-error. Require both to be present for HAVE_CRYPT,
even though libgpg-error is only used in src/resolve. If one is available,
the other should be too, so it doesn't seem worth the trouble to make two
separate conditions.
When caller invokes sd_journal_open() we usually open at least one
directory with journal files. add_root_directory() function increments
current_invalidate_counter. After sd_journal_open() returns
current_invalidate_counter != last_invalidate_counter.
After caller waits for journal events (e.g. waits for new messages in
journal) then it usually calls sd_journal_process(). However, on first
call to sd_journal_process(), function determine_change() returns
SD_JOURNAL_INVALIDATE even though no journal files were
deleted/moved. This is because current_invalidate_counter !=
last_invalidate_counter.
After the fix we make sure counters has the same value before we begin
processing inotify events.
Shell scripts should be executable so that meson reports their
invocation succinctly (does not print 'sh' '-e').
Python scripts should not be executable so that meson does the
detection of the right python binary itself.
Add -u everywhere to catch potential errors.
The indentation for emacs'es meson-mode is added .dir-locals.
All files are reindented automatically, using the lasest meson-mode from git.
Indentation should now be fairly consistent.
This simplifies things and leads to a smaller installation footprint.
libsystemd_internal and libsystemd_journal_internal are linked into
libystemd-shared and available to all programs linked to libsystemd-shared.
libsystemd_journal_internal is not needed anymore, and libsystemd-shared
is used everwhere. The few exceptions are: libsystemd.so, test-engine,
test-bus-error, and various loadable modules.
This implementation assumes that the arguments in compiler.cmd_array()
don't contain any spaces. Since we are only interested in compilation
on Linux, I think this is a safe assumption.
Solution suggested by Nirbheek Chauhan.
It's crucial that we can build systemd using VS2010!
... er, wait, no, that's not the official reason. We need to shed old systems
by requring python 3! Oh, no, it's something else. Maybe we need to throw out
345 years of knowlege accumulated in autotools? Whatever, this new thing is
cool and shiny, let's use it.
This is not complete, I'm throwing it out here for your amusement and critique.
- rules for sd-boot are missing. Those might be quite complicated.
- rules for tests are missing too. Those are probably quite simple and
repetitive, but there's lots of them.
- it's likely that I didn't get all the conditions right, I only tested "full"
compilation where most deps are provided and nothing is disabled.
- busname.target and all .busname units are skipped on purpose.
Otherwise, installation into $DESTDIR has the same list of files and the
autoconf install, except for .la files.
It'd be great if people had a careful look at all the library linking options.
I added stuff until things compiled, and in the end there's much less linking
then in the old system. But it seems that there's still a lot of unnecessary
deps.
meson has a `shared_module` statement, which sounds like something appropriate
for our nss and pam modules. Unfortunately, I couldn't get it to work. For the
nss modules, we need an .so version of '2', but `shared_module` disallows the
version argument. For the pam module, it also didn't work, I forgot the reason.
The handling of .m4 and .in and .m4.in files is rather awkward. It's likely
that this could be simplified. If make support is ever dropped, I think it'd
make sense to switch to a different templating system so that two different
languages and not required, which would make everything simpler yet.
v2:
- use get_pkgconfig_variable
- use sh not bash
- use add_project_arguments
v3:
- drop required:true and fix progs/prog typo
v4:
- use find_library('bz2')
- add TTY_GID definition
- define __SANE_USERSPACE_TYPES__
- use join_paths(prefix, ...) is used on all paths to make them all absolute
v5:
- replace all declare_dependency's with []
- add more conf.get guards around optional components
v6:
- drop -pipe, -Wall which are the default in meson
- use compiler.has_function() and compiler.has_header_symbol instead of the
hand-rolled checks.
- fix duplication in 'liblibsystemd' library name
- use the right .sym file for pam_systemd
- rename 'compiler' to 'cc': shorter, and more idiomatic.
v7:
- use ENABLE_ENVIRONMENT_D not HAVE_ENVIRONMENT_D
- rename prefix to prefixdir, rootprefix to rootprefixdir
("prefix" is too common of a name and too easy to overwrite by mistake)
- wrap more stuff with conf.get('ENABLE...') == 1
- use rootprefix=='/' and rootbindir as install_dir, to fix paths under
split-usr==true.
v8:
- use .split() also for src/coredump. Now everything is consistent ;)
- add rootlibdir option and use it on the libraries that require it
v9:
- indentation
v10:
- fix check for qrencode and libaudit
v11:
- unify handling of executable paths, provide options for all progs
This makes the meson build behave slightly differently than the
autoconf-based one, because we always first try to find the executable in the
filesystem, and fall back to the default. I think different handling of
loadkeys, setfont, and telinit was just a historical accident.
In addition to checking in $PATH, also check /usr/sbin/, /sbin for programs.
In Fedora $PATH includes /usr/sbin, (and /sbin is is a symlink to /usr/sbin),
but in Debian, those directories are not included in the path.
C.f. https://github.com/mesonbuild/meson/issues/1576.
- call all the options 'xxx-path' for clarity.
- sort man/rules/meson.build properly so it's stable
Both gcc and clang issue a host of warnings about void pointers used in
arithmetic. The warning must be ignored in that file to avoid multiple
warnings.
Makefile.am used to set this for all libsystemd-journal-internal.a sources,
because there's no finer granularity for warnings. Let's just set it for
this one file.