Add LogRateLimitIntervalSec= and LogRateLimitBurst= options for
services. If provided, these values get passed to the journald
client context, and those values are used in the rate limiting
function in the journal over the the journald.conf values.
Part of #10230
This makes use of rlimit_nofile_bump() in all tools that access the
journal. In some cases this replaces older code to achieve this, and
others we add it in where it was missing.
All over the place we define local variables for the various sockopts
that take a bool-like "int" value. Sometimes they are const, sometimes
static, sometimes both, sometimes neither.
Let's clean this up, introduce a common const variable "const_int_one"
(as well as one matching "const_int_zero") and use it everywhere, all
acorss the codebase.
Also, while we are at it, beef it up, by adding json-seq support (i.e.
https://tools.ietf.org/html/rfc7464). This is particularly useful in
conjunction with jq's --seq switch.
Let's remove redundancy and not advertise "journalctl --new-id128"
anymore, now that we have "systemd-id128 new" in a proper tool.
This allows us to reduce the overly large journalctl command set a bit.
Note that this just removes the --help and man text, the call remains
available for compat reasons.
The raison d'etre for this program is printing machine-app-specific IDs. We
provide a library function for that, but not a convenient API. We can hardly
ask people to quickly hack their own C programs or call libsystemd through CFFI
in python or another scripting language if they just want to print an ID.
Verb 'new' was already available as 'journalctl --new-id128', but this makes
it more discoverable.
v2:
- rename binary to systemd-id128
- make --app-specific= into a switch that applies to boot-id and machine-id
This makes it so that tests no longer need to know the absolute paths to the
source and build dirs, instead using the systemd-runtest.env file to get these
paths when running from the build tree.
Confirmed that test-catalog works on `ninja test`, when called standalone and
also when the environment file is not present, in which case it will use the
installed location under /usr/lib/systemd/catalog.
The location can now also be overridden for this test by setting the
$SYSTEMD_CATALOG_DIR environment variable.
This is a bit like the info link in most of GNU's --help texts, but we
don't do info but man pages, and we make them properly clickable on
terminal supporting that, because awesome.
I think it's generally advisable to link up our (brief) --help texts and
our (more comprehensive) man pages a bit, so this should be an easy and
straight-forward way to do it.
The function replaces a couple commas, a semicolon and the final newline with
zero bytes in the string passed to it. The 'const' seems to have been added
by accident during a bulk edit (more specifically 3b3154df7e).
Even if built without gcrypt, show the relevant options in help message.
Otherwise, the help message diverges from the man page or suggestions
by the shell completion.
Looked for definitions of functions using the *_compare_func() suffix.
Tested:
- Unit tests passed (ninja -C build/ test)
- Installed this build and booted with it.
This is useful if someone wants to recreate the original syslog datagram. We
already include timestamp information as _SOURCE_REALTIME_TIMESTAMP=, and in
normal use that timestamp, converted back to the form used by syslog
(Mth dd HH:MM:SS) would usually give the value. But there are various
circumstances where this might not be true. Most obviously, if the datagram is
sent a bit later after being prepared, the time is rounded to the nearest
second, and it might be off. This is especially bad around New Year when the
syslog timestamp wraps around. Then the same timezone and locale need to be
used to recreate the original timestamp. In the end doing this reliably is
complicated, and it seems much easier to just unconditionally include the
original timestamp.
If the original timestamp cannot be located, we store the full log line.
This way, it should be always possible to recreate the original input.
Example:
MESSAGE=x
SYSLOG_TIMESTAMP=Sep 15 15:07:58
SYSLOG_RAW
^]^@^@^@^@^@^@^@<13>Sep 15 15:07:58 HOST: x^@y
_PID=3318
_SOURCE_REALTIME_TIMESTAMP=1530743976393553
Fixes#2398.
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
They are not needed, because anything that is non-zero is converted
to true.
C11:
> 6.3.1.2: When any scalar value is converted to _Bool, the result is 0 if the
> value compares equal to 0; otherwise, the result is 1.
https://stackoverflow.com/questions/31551888/casting-int-to-bool-in-c-c
The refactor in #9200 inadvertently dropped the variable assignment to
traverse the device and its hierarchy in add_matches_for_device().
This was uncovered by Coverity (CID #1393310).
Fix that by restoring the assignment.
Tested: `journalctl /dev/sda` now filters journalctl output again.
This is mostly fall-out from d1a1f0aaf0,
however some cases are older bugs.
There might be more issues lurking, this was a simple grep for "%m"
across the tree, with all lines removed that mention "errno" at all.
The journal verification functions would reject such an entry. It would probably
still display fine (because we prefer _SOURCE_REALTIME_TIMESTAMP= if present), but
it seems wrong to create an entry that would not pass verification.
The code to open journal files seems like the wrong place to enforce this. We
already check during boot and refuse to boot if machine-id is missing, no need
to enforce this here. In particular, it seems better to write logs from
journald even if they are not completely functional rather than refuse to
operate at all, and systemd-journal-remote also writes journal files and may
even be run on a system without systemd at all.
The docker image that oss-fuzz uses has an empty /etc/machine-id. Obviously
this is an error in the docker, but docker is fact of life, and it seems better
for systemd-journal-remote to work in such an incomplete environment.
In journal_file_set_online() the offline thread doesn't need to be
joined if it's been canceled before actually reaching the phase of
writing the offline state.
When dealing with a large number of template instances, for example
when launching daemons per VRF, it is hard for operators to correlate
log lines to arguments.
Add a new with-unit mode which, if available, prefixes unit and user
unit names when displaying its log messages instead of the syslog
identifier. It will also use the full timestamp with timezones, like
the short-full mode.
Most our other parsing functions do this, let's do this here too,
internally we accept that anyway. Also, the closely related
load_env_file() and load_env_file_pairs() also do this, so let's be
systematic.
This makes most header files easier to look at. Also Emacs gets really
slow when browsing through large sections of overly long prototypes,
which is much improved by this macro.
We should probably not do something similar with too many other cases,
as macros like this might help readability for some, but make it worse
for others. But I think given the complexity of this specific prototype
and how often we use it, it's worth doing.
This simplifies the use of tempfiles in tests and fixes "leaked"
temporary files in test-fileio, test-catalog, test-conf-parser.
Not the whole tree is converted.
Configuration through environment variable is inconvenient with meson, because
they cannot be convieniently changed and/or are not preserved during
reconfiguration (https://github.com/mesonbuild/meson/issues/1503).
This adds -Dvalgrind=true/false, which has the advantage that it can be set
at any time with meson configure -Dvalgrind=... and ninja will rebuild targets
as necessary. Additional minor advantages are better consistency with the
options for hashmap debugging, and typo avoidance with '#if' instead of '#ifdef'.
Previously we were a bit sloppy with the index and size types of arrays,
we'd regularly use unsigned. While I don't think this ever resulted in
real issues I think we should be more careful there and follow a
stricter regime: unless there's a strong reason not to use size_t for
array sizes and indexes, size_t it should be. Any allocations we do
ultimately will use size_t anyway, and converting forth and back between
unsigned and size_t will always be a source of problems.
Note that on 32bit machines "unsigned" and "size_t" are equivalent, and
on 64bit machines our arrays shouldn't grow that large anyway, and if
they do we have a problem, however that kind of overly large allocation
we have protections for usually, but for overflows we do not have that
so much, hence let's add it.
So yeah, it's a story of the current code being already "good enough",
but I think some extra type hygiene is better.
This patch tries to be comprehensive, but it probably isn't and I missed
a few cases. But I guess we can cover that later as we notice it. Among
smaller fixes, this changes:
1. strv_length()' return type becomes size_t
2. the unit file changes array size becomes size_t
3. DNS answer and query array sizes become size_t
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=76745
This drops a good number of type-specific _cleanup_ macros, and patches
all users to just use the generic ones.
In most recent code we abstained from defining type-specific macros, and
this basically removes all those added already, with the exception of
the really low-level ones.
Having explicit macros for this is not too useful, as the expression
without the extra macro is generally just 2ch wider. We should generally
emphesize generic code, unless there are really good reasons for
specific code, hence let's follow this in this case too.
Note that _cleanup_free_ and similar really low-level, libc'ish, Linux
API'ish macros continue to be defined, only the really high-level OO
ones are dropped. From now on this should really be the rule: for really
low-level stuff, such as memory allocation, fd handling and so one, go
ahead and define explicit per-type macros, but for high-level, specific
program code, just use the generic _cleanup_() macro directly, in order
to keep things simple and as readable as possible for the uninitiated.
Note that before this patch some of the APIs (notable libudev ones) were
already used with the high-level macros at some places and with the
generic _cleanup_ macro at others. With this patch we hence unify on the
latter.
Double newlines (i.e. one empty lines) are great to structure code. But
let's avoid triple newlines (i.e. two empty lines), quadruple newlines,
quintuple newlines, …, that's just spurious whitespace.
It's an easy way to drop 121 lines of code, and keeps the coding style
of our sources a bit tigther.
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
It might happen that we try to bisect through a chain of offset arrays in the
journal whose last element was just allocated but no item yet written
to. In that case that array will be all NUL, but it might still end up
in our array chain cache. If it does, we cannot use it for bisection,
since for bisection we need to know the value of the first entry in that
array, but if it's uninitialized it does not have a first value. Hence,
as a simple fix, in this unlikely case, simply ignore the chain cache.
This is supposed to fix the issue pointed out in #8432, but in a more
permissive way, as this case isn't strictly a badly formatted journal
but actually a valid state (though one within a very short time window),
and we should make the best of it, and handle it gracefully.
Background: in each journal file entries are linked up in large arrays
of offsets. In each array the entries are strictly ordered by the
offsets of the entries, which permits search by bisection. These arrays
are allocated with a fixed size and then filled up as entries are added
to the journal file. If an array is fully filled up, a new array
(double in size as the old one) is appended to the journal file, and
linked up. This means, the journal file will contain a series of chained
up arrays, each time doubling in size, and strictly ordered. When
looking for an entry we maintain a "chain cache", which allows us to
bypass traversing the chain in full if we look for entries close to each
other in a short time. With the fix above we make sure we don't
erroneously use a chain cache item that doesn't carry enough information
for this bisection to work.
Original issue identified (with patch) by @Kxuan.
Replaces: #8432
This is similar to TAKE_PTR() but operates on file descriptors, and thus
assigns -1 to the fd parameter after returning it.
Removes 60 lines from our codebase. Pretty good too I think.
This macro will read a pointer of any type, return it, and set the
pointer to NULL. This is useful as an explicit concept of passing
ownership of a memory area between pointers.
This takes inspiration from Rust:
https://doc.rust-lang.org/std/option/enum.Option.html#method.take
and was suggested by Alan Jenkins (@sourcejedi).
It drops ~160 lines of code from our codebase, which makes me like it.
Also, I think it clarifies passing of ownership, and thus helps
readability a bit (at least for the initiated who know the new macro)
This rearranges chase_symlinks() a bit: if no special flags are
specified it will now revert to behaviour before
b12d25a8d6. However, if the new
CHASE_TRAIL_SLASH flag is specified it will follow the behaviour
introduced by that commit.
I wasn't sure which one to make the beaviour that requires specification
of a flag to enable. I opted to make the "append trailing slash"
behaviour the one to enable by a flag, following the thinking that the
function should primarily be used to generate a normalized path, and I
am pretty sure a path without trailing slash is the more "normalized"
one, as the trailing slash is not really a part of it, but merely a
"decorator" that tells various system calls to generate ENOTDIR if the
path doesn't refer to a path.
Or to say this differently: if the slash was part of normalization then
we really should add it in all cases when the final path is a directory,
not just when the user originally specified it.
Fixes: #8544
Replaces: #8545
The warning is not emitted for absolute paths like /dev/sda or /home, which are
converted to .device and .mount unit names without any fuss.
Most of the time it's unlikely that users use invalid unit names on purpose,
so let's warn them. Warnings are silenced when --quiet is used.
$ build/systemctl show -p Id hello@foo-bar/baz
Invalid unit name "hello@foo-bar/baz" was escaped as "hello@foo-bar-baz" (maybe you should use systemd-escape?)
Id=hello@foo-bar-baz.service
$ build/systemd-run --user --slice foo-bar/baz --unit foo-bar/foo true
Invalid unit name "foo-bar/foo" was escaped as "foo-bar-foo" (maybe you should use systemd-escape?)
Invalid unit name "foo-bar/baz" was escaped as "foo-bar-baz" (maybe you should use systemd-escape?)
Running as unit: foo-bar-foo.service
Fixes#8302.
We update the boot ID whenever the file is opened for writing (i.e. set
to ONLINE stat), even if we never write a single entry to it. Hence,
don't insist that the last entry's boot ID matches the file header.
As pointed out by Matthijs van Duin:
https://lists.freedesktop.org/archives/systemd-devel/2018-March/040499.html
Previously the compression threshold was hardcoded to 512, which meant that
smaller values wouldn't be compressed. This left some storage savings on the
table, so instead, we make that number tunable.
Even if pager_open() fails, in general, we should continue the operations.
All erroneous cases in pager_open() show log message in the function.
So, it is not necessary to check the returned value.
"noreturn" is reserved and can be used in other header files we include:
[ 16s] In file included from /usr/include/gcrypt.h:30:0,
[ 16s] from ../src/journal/journal-file.h:26,
[ 16s] from ../src/journal/journal-vacuum.c:31:
[ 16s] /usr/include/gpg-error.h:1544:46: error: expected ‘,’ or ‘;’ before ‘)’ token
[ 16s] void gpgrt_log_bug (const char *fmt, ...) GPGRT_ATTR_NR_PRINTF(1,2);
Here we include grcrypt.h (which in turns include gpg-error.h) *after* we
"noreturn" was defined in macro.h.
At various places we only want to close fds if they are not
stdin/stdout/stderr, i.e. fds 0, 1, 2. Let's add a unified helper call
for that, and port everything over.
When running journalctl --user-unit=foo as an unprivileged user we could get
the usual hint:
Hint: You are currently not seeing messages from the system and other users.
Users in groups 'adm', 'systemd-journal', 'wheel' can see all messages.
...
But with --user-unit our filter is:
(((_UID=0 OR _UID=1000) AND OBJECT_SYSTEMD_USER_UNIT=foo.service) OR
((_UID=0 OR _UID=1000) AND COREDUMP_USER_UNIT=foo.service) OR
(_UID=1000 AND USER_UNIT=foo.service) OR
(_UID=1000 AND _SYSTEMD_USER_UNIT=foo.service))
so we would never see messages from other users.
We could still see messages from the system. In fact, on my machine the
only messages with OBJECT_SYSTEMD_USER_UNIT= are from the system:
journalctl $(journalctl -F OBJECT_SYSTEMD_USER_UNIT|sed 's/.*/OBJECT_SYSTEMD_USER_UNIT=\0/')
Thus, a more correct hint is that we cannot see messages from the system.
Make it so.
Fixes#7887.
The Linux kernel exposes the birth time now for files through statx()
hence make use of it where available. We keep the xattr logic in place
for this however, since only a subset of file systems on Linux currently
expose the birth time. NFS and tmpfs for example do not support it. OTOH
there are other file systems that do support the birth time but might
not support xattrs (smb…), hence make the best of the two, in particular
in order to deal with journal files copied between file system types and
to maintain compatibility with older file systems that are updated to
newer version of the file system.
Let's add a common implementation for regular file checks, that are
careful to return the right error code (EISDIR/EISLNK/EBADFD) when we
are encountering a wrong file node.
Let's make sure we aren't confused if a journal file is replaced by a
different one (for example due to rotation) if we are in a q overflow:
let's compare the inode/device information, and if it changed replace
any open file object as needed.
Fixes: #8198
Let's be more careful with the naming, and indicate that the function
is about *named* journal files, and will validate the name as needed.
(in opposition to add_any_file() which doesn't care about names)
Coverity now started warning about this ("Calling unlinkat without checking
return value (as is done elsewhere 12 out of 15 times).", and it is right:
most of the time we should at list print a log message so people can figure
out something is wrong when this happens.
v2:
- use warning level in journald too (this is unlikely to happen ever, so it
should be safe to something that is visible by default).
If `journalctl` take a long time to process messages, and during that
time journal file rotation occurs, a `journalctl` client will keep
those rotated files open until it calls `sd_journal_process()`, which
typically happens as a result of calling `sd_journal_wait()` below in
the "following" case. By periodically calling `sd_journal_process()`
during the processing loop we shrink the window of time a client
instance has open file descriptors for rotated (deleted) journal
files.
(Lennart: slightly reworked version, that dropped some of the commenting
which was solved otherwise)
In that case we have no inotify fd yet, and there's nothing to process
hence. Let's make the call a NOP.
(Previously, without this change we'd end up trying to read off inotify
fd -1, which is quite a problem... 😢)
This ensures that clients can't keep all files pinned interfering with
our vacuuming logic.
This should fix the last issue pointed out in #7998 and #8032Fixes: #7998
This adds proper handling of IN_Q_OVERFLOW: when the inotify queue runs
over we'll reiterate all directories we are looking at. At the same time
we'll mark all files and directories we encounter that way with a
generation counter we first increased. All files and directories not
marked like this are then unloaded.
With this logic we do the best when the inotify queue overflows: we
synchronize our in-memory state again with what's on disk.
This contains some refactoring of the directory logic, to share more
code between uuid directories and "root" directories and generally make
things a bit more readable by splitting things up into smaller bits.
See: #7998#8032
If `journalctl` take a long time to process messages, and during that
time journal file rotation occurs, a `journalctl` client will keep
those rotated files open until it calls `sd_journal_process()`, which
typically happens as a result of calling `sd_journal_wait()` below in
the "following" case. By periodically calling `sd_journal_process()`
during the processing loop we shrink the window of time a client
instance has open file descriptors for rotated (deleted) journal
files.
**Warning**
This change does not appear to solve the case of a "paused" output
stream. If somebody is using `journalctl | less` and pauses the
output, then without a background thread periodically listening for
inotify delete events and cleaning up, journal logs will eventually
stop flowing in cases where a journal client with enough open files
causes the "free" disk space threshold to be crossed.