Let's just use the path that is already stored in JournalStorage,
instead of generating our own. While we are at it, split out the loop
into its own function.
Write a user unit's invocation ID to /run/user/<uid>/systemd/units/ similar
to how a system unit's invocation ID is written to /run/systemd/units/.
This lets the journal read and add a user unit's invocation ID to the
_SYSTEMD_INVOCATION_ID field of logs instead of the user manager's
invocation ID.
Fixes#12474
This cleans up and unifies the outut of --help texts a bit:
1. Highlight the human friendly description string, not the command
line via ANSI sequences. Previously both this description string and
the brief command line summary was marked with the same ANSI
highlight sequence, but given we auto-page to less and less does not
honour multi-line highlights only the command line summary was
affectively highlighted. Rationale: for highlighting the description
instead of the command line: the command line summary is relatively
boring, and mostly the same for out tools, the description on the
other hand is pregnant, important and captions the whole thing and
hence deserves highlighting.
2. Always suffix "Options" with ":" in the help text
3. Rename "Flags" → "Options" in one case
4. Move commands to the top in a few cases
5. add coloring to many more help pages
6. Unify on COMMAND instead of {COMMAND} in the command line summary.
Some tools did it one way, others the other way. I am not sure what
precisely {} is supposed to mean, that uppercasing doesn't, hence
let's simplify and stick to the {}-less syntax
And minor other tweaks.
For some unrelated stuff I wanted the machine ID in UUID format, and it
was annoying doing that manually. So let's add a switch for this, so
that this works:
systemd-id128 machine-id -u
journald assumes that getsockopt(SO_PEERCRED) correctly identifies the
process on the remote end of the socket. However, this is incorrect
according to man 7 socket:
The returned credentials are those that were in effect at the
time of the call to connect(2) or socketpair(2).
This becomes a problem when a new process inherits the stdout stream
from a parent. First, log messages from the child process will
be attributed to the parent. Second, the struct ucred used by journald
becomes invalid as soon as the parent exits. Further sendmsg calls then
fail with ENOENT. Logs for the child process then vanish from the journal.
Fix this by using recvmsg on the stdout stream, and refreshing the cached
struct ucred if SCM_CREDENTIALS indicate a new process.
Fixes#13708
Right now the `systemd-journal-remote` service does not constrain its
resource usage (I just run out of space on my 100GB partition, for
example). This patch does not change that, but it at least makes it
possible to run something like:
journalctl --directory /var/log/journal/remote --rotate --vacuum-size=90G
fixes#2376
Co-authored-by: Mike Auty <ikelos@gentoo.org>
chase_symlinks() would return negative on error, and either a non-negative status
or a non-negative fd when CHASE_OPEN was given. This made the interface quite
complicated, because dependning on the flags used, we would get two different
"types" of return object. Coverity was always confused by this, and flagged
every use of chase_symlinks() without CHASE_OPEN as a resource leak (because it
would this that an fd is returned). This patch uses a saparate output parameter,
so there is no confusion.
(I think it is OK to have functions which return either an error or an fd. It's
only returning *either* an fd or a non-fd that is confusing.)
"ratelimit" is a real word, so we don't need to use the other form anywhere.
We had both forms in various places, let's standarize on the shorter and more
correct one.
The use of an unordered hashmap means that the output of
'journalctl --update-catalog' differs between runs despite there being no
changes in the input files.
By changing all instances of Hashmap to OrderedHashmap we fix this, and now
the catalog is reproducible.
Motivation: https://reproducible-builds.org
Signed-off-by: Daniel Edgecumbe <git@esotericnonsense.com>
journalctl --unit= already did this, and allows you to tail all the logs
for a certain slice easily. It seemed only natural to make --user-unit
behave in a similar way.
The _SYSTEMD_USER_SLICE field was not documented as being added by
journald, so I have added that to the documentation too.
Furthermore, I have documented the existing behaviour of --unit= and the
new behaviour of --user-unit=
The behaviour was actually not documented before, so I am also OK with
removing the match for the --unit= command instead. The user would then
have to manually provide _SYSTEMD_SLICE= filter to journalctl in both
cases. Both options work for me.
Ubunut autopkgtest fails with:
405/501 test-journal-flush FAIL 0.74 s (killed by signal 6 SIGABRT)
--- command ---
SYSTEMD_KBD_MODEL_MAP='/tmp/autopkgtest.BgjJJv/build.yAM/systemd/src/locale/kbd-model-map' SYSTEMD_LANGUAGE_FALLBACK_MAP='/tmp/autopkgtest.BgjJJv/build.yAM/systemd/src/locale/language-fallback-map' PATH='/tmp/autopkgtest.BgjJJv/build.yAM/systemd/build-deb:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games' /tmp/autopkgtest.BgjJJv/build.yAM/systemd/build-deb/test-journal-flush
--- stderr ---
Assertion 'r >= 0' failed at src/journal/test-journal-flush.c:48, function main(). Aborting.
-------
It's hard to say what is going on here without any error messages whatsoever.
The test goes into deep details of journal file handling, so it needs to also
do logging on its own.
https://bugzilla.redhat.com/show_bug.cgi?id=1715699
> /dev/mapper/live-rw 6.4G 5.7G 648M 91% /
> systemd-journald[905]: Fixed min_use=1.0M max_use=648.7M max_size=81.0M min_size=512.0K keep_free=973.1M n_max_files=100
When journald is started, we pick keep_free as 15% of the disk size. When the
fs is almost filled, we will only keep one journal file around and rotate very
often (because min_size is very small).
Let's set min use to something reasonable, so that we get more useful logs that
will cover at least the full boot.
Some cases considered in the PR:
> /dev/mapper/live-rw 6.4G 5.7G 648M 91% /
keep_free→MIN(327,100)→100 MB.
min_use→16MB.
effective range: 16 MB – 548 MB
> /dev/mapper/fedora_krowka-root 78G 69G 5.7G 93% /
keep_free → MIN(4GB, 100MB)→100MB
min_use→16MB
effective range: 16 MB – 5.6 GB
(but then there's the max_use limit, which cuts the range down)
> 4TB, 4GB free
keep_free → MIN(209715, 100) → 100 MB
min_use→16MB
effective range: 16 MB – 4.9 GB
(also effectively limited by max_use)
Also replace unneeded width suffixes with spaces, I think this is more
readable, and drop DEFAULT_ prefixes in cases where this setting is
simply a bound, and cannot be overridden by user config, hence is not
a default.
Not everybody has those dirs in the filesystem (and they don't need to).
When creating an installation package using $DESTDIR, it is easy enough to
remove or ignore those directories, but if installing into a real root, it
is ugly to create and remove them. Let's add an option so people can skip
it if they want.
Inspired by #12930.
The functions to retrieve and print process cmdlines were based on the
assumption that they contain printable ASCII, and everything else
should be filtered out. That assumption doesn't hold in today's world,
where people are free to use unicode everywhere.
This replaces the custom cmdline reading code with a more generic approach
using utf8_escape_non_printable_full().
For kernel threads, truncation is done on the parenthesized name, so we'll
get "[worker]", "[worker…]", …, "[w…]", "[…", "…" as we reduce the number of
available columns.
This implementation is most likely slower for very long cmdlines, but I don't
think this is very important. The common case is to have short commandlines,
and should print those properly. Absurdly long cmdlines are the exception,
which needs to be handled correctly and safely, but speed is not too important.
Fixes#12532.
v2:
- use size_t for the number of columns. This change propagates into various
other functions that call get_process_cmdline(), increasing the size of the
patch, but the changes are rather trivial.
When journalctl is compiled with PCRE2 support, let's return a non-zero
exit code when --grep is used and no match for given pattern is found.
This should allow users to use journalctl --grep in scripts instead of
piping journalctl into grep
Fixes#8152
The latter is identical to the former, but becomes a NOP if
/var/log/journal is on the same mount as /, and thus during shutdown
unmounting /var is not necessary and hence we can keep logging until the
very end.
When emitting the calendarspec warning we want to see some color.
Follow-up for 04220fda5c.
Exceptions:
- systemctl, because it has a lot hand-crafted coloring
- tmpfiles, sysusers, stdio-bridge, etc, because they are also used in
services and I'm not sure if this wouldn't mess up something.
We had all kinds of indentation: 2 sp, 3 sp, 4 sp, 8 sp, and mixed.
4 sp was the most common, in particular the majority of scripts under test/
used that. Let's standarize on 4 sp, because many commandlines are long and
there's a lot of nesting, and with 8sp indentation less stuff fits. 4 sp
also seems to be the default indentation, so this will make it less likely
that people will mess up if they don't load the editor config. (I think people
often use vi, and vi has no support to load project-wide configuration
automatically. We distribute a .vimrc file, but it is not loaded by default,
and even the instructions in it seem to discourage its use for security
reasons.)
Also remove the few vim config lines that were left. We should either have them
on all files, or none.
Also remove some strange stuff like '#!/bin/env bash', yikes.
The journal files might not be tiny hence let's write them to /var/tmp/
instead of /tmp. Also, let's turn on NOCOW on the files, as these tests
might apparently be slow on btrfs.
Fixes: #12210
systemd-journal-remote always wrote the boot-id of the device it was running on
to the header of its journal files. When the source had a different boot-id
(because it was generated on a different boot, or a different device), the
boot-ids in the file were inconsistent. The _BOOT_ID field was that of the
source, but the journal file header and each entry object header were that of
the device systemd-journal-remote ran on. This breaks journalctl --list-boots
on any of these files.
Set the boot-id in the header to be that of the source. This also fixes the
entry object headers.
Let's be helpful to static analyzers which care about whether we
knowingly ignore return values. We do in these cases, since they are
usually part of error paths.
Let's unify the two similar code paths to watch /run/systemd/journal.
The code in manager.c is similar, but it uses mkdir_p_label(), and unifying
that would be too much trouble, so let's just adjust the error messages to
be the same.
CID #1400224.
The thread launched in journal_file_set_offline() accesses a memory
mapped file, so it needs to handle SIGBUS. Leave SIGBUS unblocked on the
offlining thread so that it uses the same handler as the main thread.
The result of triggering SIGBUS in a thread where it's blocked is
undefined in Linux. The tested implementations were observed to cause
the default handler to run, taking down the whole journald process.
We can leave SIGBUS unblocked in multiple threads since it's handler is
thread-safe. If SIGBUS is sent to the journald process asynchronously
(i.e. with kill, sigqueue, or raise), either thread handling it will
result in the same behavior: it will install the default handler and
reraise the signal, killing the process.
Fixes: #12042
let's use mkdir_parents() (because its shorter), and 0755 as access
mode, so that things have the access mode tmpfiles.d also suggests.
Prompted by: #11903
The option cursor-file takes a filename as argument. If the file exists and
contains a valid cursor, this is used to start the output after this position.
At the end, the last cursor gets written to the file.
This allows for an easy implementation of a timer that regularly looks in the
journal for some messages.
journalctl --cursor-file err-cursor -b -p err
journalctl --cursor-file audit-cursor -t audit --grep DENIED
Or you might want to walk the journal in steps of 10 messages:
journalctl --cursor-file ./curs -n10 --since=today -t systemd
We could potentially create an unterminated string and then call normal string
operations on it. Let's be more careful: first remove the suffix we ignore anyway,
then find if the string is of acceptable length, and possibly ignore it if it
is too long. The code rejects lengths above 31 bytes. Language names that are
actually used are much shorter, so this doesn't matter much.
In normal use, this allow us to drop dead entries from the cache and reduces
the cache size so that we don't evict entries unnecessarily. The time limit is
there mostly to serve as a guard against malicious logging from many different
PIDs.
This is far from perfect, but should give mostly reasonable values. My
assumption is that if somebody has a few hundred MB of memory, they are
unlikely to have thousands of processes logging. A hundred would already be a
lot. So let's scale the cache size propritionally to the total memory size,
with clamping on both ends.
The formula gives 64 cache entries for each GB of RAM.
Found by inspecting results of running this small program:
int main(int argc, const char **argv) {
for (int i = 1; i < argc; i++) {
FILE *f;
char line[1024], prev[1024], *r;
int lineno;
prev[0] = '\0';
lineno = 1;
f = fopen(argv[i], "r");
if (!f)
exit(1);
do {
r = fgets(line, sizeof(line), f);
if (!r)
break;
if (strcmp(line, prev) == 0)
printf("%s:%d: error: dup %s", argv[i], lineno, line);
lineno++;
strcpy(prev, line);
} while (!feof(f));
fclose(f);
}
}
We immediately read the whole contents into memory, making thigs much more
expensive. Sealed fds should be used instead since they are more efficient
on our side.
We'd first parse all or most of the message, and only then consider if it
is not too large. Also, when encountering a single field over the limit,
we'd still process the preceding part of the message. Let's be stricter,
and check size limits early, and let's refuse the whole message if it fails
any of the size limits.
We allocate a iovec entry for each field, so with many short entries,
our memory usage and processing time can be large, even with a relatively
small message size. Let's refuse overly long entries.
CVE-2018-16865
https://bugzilla.redhat.com/show_bug.cgi?id=1653861
What from I can see, the problem is not from an alloca, despite what the CVE
description says, but from the attack multiplication that comes from creating
many very small iovecs: (void* + size_t) for each three bytes of input message.
This fixes a crash where we would read the commandline, whose length is under
control of the sending program, and then crash when trying to create a stack
allocation for it.
CVE-2018-16864
https://bugzilla.redhat.com/show_bug.cgi?id=1653855
The message actually doesn't get written to disk, because
journal_file_append_entry() returns -E2BIG.
Nitpicky, but we've used a lot of random spacings and names in the past,
but we're trying to be completely consistent on "cgroup vN" now.
Generated by `fd -0 | xargs -0 -n1 sed -ri --follow-symlinks 's/cgroups? ?v?([0-9])/cgroup v\1/gI'`.
I manually ignored places where it's not appropriate to replace (eg.
"cgroup2" fstype and in src/shared/linux).
Continuation of a3ebe5eb620e49f0d24082876cafc7579261e64f:
in other places we sometimes use assert_se(), and sometimes normal error
handling. sigfillset and sigaddset can only fail if mask is NULL (which cannot
happen if we are passing in a reference), or if the signal number is invalid
(which really shouldn't happen when we are using a constant like SIGCHLD. If
SIGCHLD is invalid, we have a bigger problem). So let's simplify things and
always use assert_se() in those cases.
In sigset_add_many() we could conceivably pass an invalid signal, so let's keep
normal error handling here. The caller can do assert_se() around the
sigprocmask_many() call if appropriate.
'>= 0' is used for consistency with the rest of the codebase.
With cgroupsv1 a zombie process is migrated to root cgroup in all
hierarchies. This was changed for unified hierarchy and /proc/PID/cgroup
reports cgroup to which process belonged before it exited.
Be more suspicious about cgroup path reported by the kernel and use
unit_id provided by the log client if the kernel reports that process is
running in the root cgroup.
Users tend to care the most about 'log->unit_id' mapping so systemctl
status can correctly report last log lines. Also we wouldn't be able to
infer anything useful from "/" path anyway.
See: 2e91fa7f6d
This splits out a bunch of functions from fileio.c that have to do with
temporary files. Simply to make the header files a bit shorter, and to
group things more nicely.
No code changes, just some rearranging of source files.
Ideally, coccinelle would strip unnecessary braces too. But I do not see any
option in coccinelle for this, so instead, I edited the patch text using
search&replace to remove the braces. Unfortunately this is not fully automatic,
in particular it didn't deal well with if-else-if-else blocks and ifdefs, so
there is an increased likelikehood be some bugs in such spots.
I also removed part of the patch that coccinelle generated for udev, where we
returns -1 for failure. This should be fixed independently.
This way, we can extend the macro a bit with stuff pulled in from other
headers without this affecting everything which pulls in macro.h, which
is one of our most basic headers.
This is just refactoring, no change in behaviour, in prepartion for
later changes.
It's possible for sscanf to receive strings containing all three fields
and not matching the template at the same time. When this happens the
value of k doesn't change, which basically means that process_audit_string
tries to access memory randomly. Sometimes it works and sometimes it doesn't :-)
See also https://bugzilla.redhat.com/show_bug.cgi?id=1059314.
Now that we don't (mis-)use the env file parser to parse kernel command
lines there's no need anymore to override the used newline character
set. Let's hence drop the argument and just "\n\r" always. This nicely
simplifies our code.
All users of the macro (except for one, in serialize.c), use the macro in
connection with read_line(), so they must include fileio.h. Let's not play
libc games and require multiple header file to be included for the most common
use of a function.
The removal of def.h includes is not exact. I mostly went over the commits that
switch over to use read_line() and add def.h at the same time and reverted the
addition of def.h in those files.
Pretty much everything uses just the first argument, and this doesn't make this
common pattern more complicated, but makes it simpler to pass multiple options.
With lz4 1.8.3, this function can now decompress partial results into a smaller
buffer. The release news don't say anything interesting, but the test case that
was previously failing now works OK.
Fixes#10259.
A test is added. It shows that with *older* lz4, a partial decompression can
occur with the returned size smaller then the requested number of bytes _and_
smaller then the size of the compressed data:
(lz4-libs-1.8.2-1.fc29.x86_64)
Compressed 4194304 → 16464
Decompressed → 4194304
Decompressed partial 12/4194304 → 4194304
Decompressed partial 1/1 → -2 (bad)
Decompressed partial 2/2 → -2 (bad)
Decompressed partial 3/3 → -2 (bad)
Decompressed partial 4/4 → -2 (bad)
Decompressed partial 5/5 → -2 (bad)
Decompressed partial 6/6 → 6 (good)
Decompressed partial 7/7 → 6 (good)
Decompressed partial 8/8 → 6 (good)
Decompressed partial 9/9 → 6 (good)
Decompressed partial 10/10 → 6 (good)
Decompressed partial 11/11 → 6 (good)
Decompressed partial 12/12 → 6 (good)
Decompressed partial 13/13 → 6 (good)
Decompressed partial 14/14 → 6 (good)
Decompressed partial 15/15 → 6 (good)
Decompressed partial 16/16 → 6 (good)
Decompressed partial 17/17 → 6 (good)
Decompressed partial 18/18 → -16459 (bad)
(lz4-libs-1.8.3-1.fc29.x86_64)
Compressed 4194304 → 16464
Decompressed → 4194304
Decompressed partial 12/4194304 → 12
Decompressed partial 1/1 → 1 (good)
Decompressed partial 2/2 → 2 (good)
Decompressed partial 3/3 → 3 (good)
Decompressed partial 4/4 → 4 (good)
...
If we got such a short "successful" decompression in decompress_startswith() as
implemented before this patch, we could be confused and return a false negative
result. But it turns out that this only occurs with small output buffer
sizes. We use greedy_realloc() to manager the buffer, so it is always at least
64 bytes. I couldn't hit a case where decompress_startswith() would actually
return a bogus result. But since the lack of proof is not conclusive, the code
for *older* lz4 is changed too, just to be safe. We cannot rule out that on a
different architecture or with some unlucky compressed string we could hit this
corner case.
The fallback code is guarded by a version check. The check uses a function not
the compile-time define, because there was no soversion bump in lz4 or new
symbols, and we could be compiled against a newer lz4 and linked at runtime
with an older one. (This happens routinely e.g. when somebody upgrades a subset
of distro packages.)
I thought this might fail with lz4 < 1.8.3, but it seems that because of
greedy_realloc, we always use a buffer that is large enough, and it always
passes.
lz4-r130 was released on May 29th, 2015. Let's drop the work-around for older
versions. In particular, we won't test any new code against those ancient
releases, so we shouldn't pretend they are supported.
Before this when asked for rotation we'd only rotate files we have open
anyway. However there might be a number of other files on disk that are
active (i.e. not archived yet) but not open. Let's take care of those
too, so that rotation is always comprehensive, and the user gets the
guarantee that afterthe rotation all stored data is in archived files.
Fixes: #1017
journalctl --vacuum-*= only vacuums archived files. To archive all
active files the rotate operation is used. Let's add a new switch that
combines both, so that the user a single command to first move all
running journal files into archival and then vacuum them.
See: #1017
Let's split out the part that actually renames the file in case we can't
open it into a new function journal_file_dispose().
This way we can reuse the function in other cases where we want to open
a file but can't.
Let's split the function in three: the part where we archive the old
file into journal_file_archive(), and the part where we initiate the
deferred closing into journal_file_initiate_close().
journal_file_rotate() then simply becomes a wrapper around these two
calls, and the opening of the new journal file.
This useful so that we can archive journal files without having to open
new ones, i.e. to do only the archival part of the rotation, without the
rotation part.
Let's store array sizes and indexes in size_t. And let's count numbers
of files in uint64_t (simply because that is the type of the input
parameter for this of the function)
Add LogRateLimitIntervalSec= and LogRateLimitBurst= options for
services. If provided, these values get passed to the journald
client context, and those values are used in the rate limiting
function in the journal over the the journald.conf values.
Part of #10230
This makes use of rlimit_nofile_bump() in all tools that access the
journal. In some cases this replaces older code to achieve this, and
others we add it in where it was missing.