Embedding sd_id128_t's in constant strings was rather cumbersome. We had
SD_ID128_CONST_STR which returned a const char[], but it had two problems:
- it wasn't possible to statically concatanate this array with a normal string
- gcc wasn't really able to optimize this, and generated code to perform the
"conversion" at runtime.
Because of this, even our own code in coredumpctl wasn't using
SD_ID128_CONST_STR.
Add a new macro to generate a constant string: SD_ID128_MAKE_STR.
It is not as elegant as SD_ID128_CONST_STR, because it requires a repetition
of the numbers, but in practice it is more convenient to use, and allows gcc
to generate smarter code:
$ size .libs/systemd{,-logind,-journald}{.old,}
text data bss dec hex filename
1265204 149564 4808 1419576 15a938 .libs/systemd.old
1260268 149564 4808 1414640 1595f0 .libs/systemd
246805 13852 209 260866 3fb02 .libs/systemd-logind.old
240973 13852 209 255034 3e43a .libs/systemd-logind
146839 4984 34 151857 25131 .libs/systemd-journald.old
146391 4984 34 151409 24f71 .libs/systemd-journald
It is also much easier to check if a certain binary uses a certain MESSAGE_ID:
$ strings .libs/systemd.old|grep MESSAGE_ID
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
$ strings .libs/systemd|grep MESSAGE_ID
MESSAGE_ID=c7a787079b354eaaa9e77b371893cd27
MESSAGE_ID=b07a249cd024414a82dd00cd181378ff
MESSAGE_ID=641257651c1b4ec9a8624d7a40a9e1e7
MESSAGE_ID=de5b426a63be47a7b6ac3eaac82e2f6f
MESSAGE_ID=d34d037fff1847e6ae669a370e694725
MESSAGE_ID=7d4958e842da4a758f6c1cdc7b36dcc5
MESSAGE_ID=1dee0369c7fc4736b7099b38ecb46ee7
MESSAGE_ID=39f53479d3a045ac8e11786248231fbf
MESSAGE_ID=be02cf6855d2428ba40df7e9d022f03d
MESSAGE_ID=7b05ebc668384222baa8881179cfda54
MESSAGE_ID=9d1aaa27d60140bd96365438aad20286
gperf-3.1 generates lookup functions that take a size_t length
parameter instead of unsigned int. Test for this at configure time.
Fixes: https://github.com/systemd/systemd/issues/5039
This changes journald to not write to /var/log/journal until it received
SIGUSR1 for the first time, thus having been requested to flush the runtime
journal to disk.
This makes the journal work nicer with systems which have the root file system
writable early, but still need to rearrange /var before journald should start
writing and creating files to it, for example because ACLs need to be applied
first, or because /var is to be mounted from another file system, NFS or tmpfs
(as is the case for systemd.volatile=state).
Before this change we required setupts with /var split out to mount the root
disk read-only early on, and ship an /etc/fstab that remounted it writable only
after having placed /var at the right place. But even that was racy for various
preparations as journald might end up accessing the file system before it was
entirely set up, as soon as it was writable.
With this change we make scheduling when to start writing to /var/log/journal
explicit. This means persistent mode now requires
systemd-journal-flush.service in the mix to work, as otherwise journald would
never write to the directory.
See: #1397
Updating min_use is rather an unusual operation that is limited when we first
open the journal files, therefore extracts it from determine_space_for() and
create a function of its own and call this new function when needed.
determine_space_for() is now dealing with storage space (cached) values only.
There should be no functional changes.
The set of storage space values we cache are calculated according to a couple
of filesystem statistics (free blocks, block size).
This patch caches the vfs stats we're interested in so these values are
available later and coherent with the rest of the space cached values.
This commit simply extracts from determine_space_for() the code which emits the
storage usage message and put it into a function of its own so it can be reused
by others paths later.
No functional changes.
This structure keeps track of specificities for a given journal type
(persistent or volatile) such as metrics, name, etc...
The cached space values are now moved in this structure so that each
journal has its own set of cached values.
Previously only one set existed and we didn't know if the cached
values were for the runtime journal or the persistent one.
When doing:
determine_space_for(s, runtime_metrics, ...);
determine_space_for(s, system_metrics, ...);
the second call returned the cached values for the runtime metrics.
As soon as we notice that the clock jumps backwards, rotate journal files. This
is beneficial, as this makes sure that the entries in journal files remain
strictly ordered internally, and thus the bisection algorithm applied on it is
not confused.
This should help avoiding borked wallclock-based bisection on journal files as
witnessed in #4278.
This adds a new invocation ID concept to the service manager. The invocation ID
identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is
generated each time a unit moves from and inactive to an activating or active
state.
The primary usecase for this concept is to connect the runtime data PID 1
maintains about a service with the offline data the journal stores about it.
Previously we'd use the unit name plus start/stop times, which however is
highly racy since the journal will generally process log data after the service
already ended.
The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel,
except that it applies to an individual unit instead of the whole system.
The invocation ID is passed to the activated processes as environment variable.
It is additionally stored as extended attribute on the cgroup of the unit. The
latter is used by journald to automatically retrieve it for each log logged
message and attach it to the log entry. The environment variable is very easily
accessible, even for unprivileged services. OTOH the extended attribute is only
accessible to privileged processes (this is because cgroupfs only supports the
"trusted." xattr namespace, not "user."). The environment variable may be
altered by services, the extended attribute may not be, hence is the better
choice for the journal.
Note that reading the invocation ID off the extended attribute from journald is
racy, similar to the way reading the unit name for a logging process is.
This patch adds APIs to read the invocation ID to sd-id128:
sd_id128_get_invocation() may be used in a similar fashion to
sd_id128_get_boot().
PID1's own logging is updated to always include the invocation ID when it logs
information about a unit.
A new bus call GetUnitByInvocationID() is added that allows retrieving a bus
path to a unit by its invocation ID. The bus path is built using the invocation
ID, thus providing a path for referring to a unit that is valid only for the
current runtime cycleof it.
Outlook for the future: should the kernel eventually allow passing of cgroup
information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we
can alter the invocation ID to be generated as hash from that rather than
entirely randomly. This way we can derive the invocation race-freely from the
messages.
In this mode, messages from processes which are not part of the session
land in the main journal file, and only output of processes which are
properly part of the session land in the user's journal. This is
confusing, in particular because systemd-coredump runs outside of the
login session.
"Deprecate" SplitMode=login by removing it from documentation, to
discourage people from using it.
When we rotate journals, we must set offline and close the current one,
but don't generally need to wait for this to complete.
Instead, we'll initiate an asynchronous offline via
journal_file_set_offline(oldfile, false), and add the file to a
per-server set of deferred closes to be closed later when they
won't block.
There's one complication however; journal_file_open() via
journal_file_verify_header() assumes that any writable journal in the
online state is the product of an unclean shutdown or other form of
corruption.
Thus there's a need for journal_file_open() to be aware of deferred
closes and synchronize with their completion when opening preexisting
journals for writing. To facilitate this the deferred closes set is
supplied to the journal_file_open() function where the deferred closes
may be closed synchronously before verifying the header in such
circumstances.
The format of the journald disk usage log entry was changed back and
forth a few times. It is annoying to have a very verbose message, but
if it is short it is hard to understand. But we have a tool for this,
the catalogue.
$ journalctl -x -u systemd-journald
Jan 23 18:48:50 rawhide systemd-journald[891]: Runtime journal (/run/log/journal/) is 8.0M, max 196.2M, 188.2M free.
-- Subject: Disk space used by the journal
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Runtime journal (/run/log/journal/) is currently using 8.0M.
-- Maximum allowed usage is set to 196.2M.
-- Leaving at least 294.3M free (of currently available 1.9G of disk space).
-- Enforced usage limit is thus 196.2M, of which 188.2M are still available.
--
-- The limits controlling how much disk space is used by the journal may
-- be configured with SystemMaxUse=, SystemKeepFree=, SystemMaxFileSize=,
-- RuntimeMaxUse=, RuntimeKeepFree=, RuntimeMaxFileSize= settings in
-- /etc/systemd/journald.conf. See journald.conf(5) for details.
Jan 23 18:48:50 rawhide systemd-journald[891]: System journal (/var/log/journal/) is 480.1M, max 1.6G, 1.2G free.
-- Subject: Disk space used by the journal
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- System journal (/var/log/journal/) is currently using 480.1M.
-- Maximum allowed usage is set to 1.6G.
-- Leaving at least 2.5G free (of currently available 5.8G of disk space).
-- Enforced usage limit is thus 1.6G, of which 1.2G are still available.
--
-- The limits controlling how much disk space is used by the journal may
-- be configured with SystemMaxUse=, SystemKeepFree=, SystemMaxFileSize=,
-- RuntimeMaxUse=, RuntimeKeepFree=, RuntimeMaxFileSize= settings in
-- /etc/systemd/journald.conf. See journald.conf(5) for details.
The code to format the iovec is shared with log.c. All call sites to
server_driver_message are changed to include the additional "MESSAGE="
part, but the new functionality is not used and change in functionality
is not expected.
iovec is preallocated, so the maximum number of messages is limited.
In server_driver_message N_IOVEC_PAYLOAD_FIELDS is currently set to 1.
New code is not oom safe, it will fail if memory cannot be allocated.
This will be fixed in subsequent commit.
Most of the function is moved to acl-util.c to make it possible to
add tests in subsequent commit.
Setting of the mode in server_fix_perms is removed:
- we either just created the file ourselves, and the permission be better right,
- or the file was already there, and we should not modify the permissions.
server_fix_perms is renamed to server_fix_acls to better reflect new
meaning, and made static because it is only used in one file.
With this new "--sync" switch we add a synchronous way to sync
everything queued to disk, and return only after that's complete. This
command gives the guarantee that anything queued before has hit the disk
before the command returns.
While we are at it, also improve the man pages and help text for
journalctl a bit.
Otherwise we might run into deadlocks, when journald blocks on the
notify socket on PID 1, and PID 1 blocks on IPC to dbus-daemon and
dbus-daemon blocks on logging to journald. Break this cycle by making
sure that journald never ever blocks on PID 1.
Note that this change disables support for event loop watchdog support,
as these messages are sent in blocking style by sd-event. That should
not be a big loss though, as people reported frequent problems with the
watchdog hitting journald on excessively slow IO.
Fixes: #1505.
The files are named too generically, so that they might conflict with
the upstream project headers. Hence, let's add a "-util" suffix, to
clarify that this are just our utility headers and not any official
upstream headers.
Implement a maximum limit on number of journal files to keep around.
Enforcing a limit is useful on this since our performance when viewing
pays a heavy penalty for each journal file to interleve. This setting is
turned on now by default, and set to 100.
Also, actully implement what 348ced9097
promised: use whatever we find on disk at startup as lower bound on how
much disk space we can use. That commit introduced some provisions to
implement this, but actually never did.
This also adds "journalctl --vacuum-files=" to vacuum files on disk by
their number explicitly.
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
Now that we actually can distuingish system and normal users there's no
point in taking session information into account anymore when splitting
up logs.
This has the beenfit with that coredump information will actually end up
in each user's own journal.
This will let journald forward logs as messages sent to all logged in
users (like wall).
Two options are added:
* ForwardToWall (default yes)
* MaxLevelWall (default emerg)
'ForwardToWall' is overridable by kernel command line option
'systemd.journald.forward_to_wall'.
This is used to emulate the traditional syslogd behaviour of sending
emergency messages to all logged in users.
Pass on the line on which a section was decleared to the parsers, so they
can distinguish between multiple sections (if they chose to). Currently
no parsers take advantage of this, but a follow-up patch will do that
to distinguish
[Address]
Address=192.168.0.1/24
Label=one
[Address]
Address=192.168.0.2/24
Label=two
from
[Address]
Address=192.168.0.1/24
Label=one
Address=192.168.0.2/24
Label=two
In order to avoid a deadlock between journald looking up the
"systemd-journal" group name, and nscd (or anyother NSS backing daemon)
logging something back to the journal avoid all NSS in journald the same
way as we avoid it from PID 1.
With this change we rely on the kernel file system logic to adjust the
group of created journal files via the SETGID bit on the journal
directory. To ensure that it is always set, even after the user created
it with a simply "mkdir" on the shell we fix it up via tmpfiles on boot.
When journald encounters a message with OBJECT_PID= set
coming from a priviledged process (UID==0), additional fields
will be added to the message:
OBJECT_UID=,
OBJECT_GID=,
OBJECT_COMM=,
OBJECT_EXE=,
OBJECT_CMDLINE=,
OBJECT_AUDIT_SESSION=,
OBJECT_AUDIT_LOGINUID=,
OBJECT_SYSTEMD_CGROUP=,
OBJECT_SYSTEMD_SESSION=,
OBJECT_SYSTEMD_OWNER_UID=,
OBJECT_SYSTEMD_UNIT= or OBJECT_SYSTEMD_USER_UNIT=.
This is for other logging daemons, like setroubleshoot, to be able to
augment their logs with data about the process.
https://bugzilla.redhat.com/show_bug.cgi?id=951627
I'm assuming that it's fine if a _const_ or _pure_ function
calls assert. It is assumed that the assert won't trigger,
and even if it does, it can only trigger on the first call
with a given set of parameters, and we don't care if the
compiler moves the order of calls.
The information about the unit for which files are being parsed
is passed all the way down. This way messages land in the journal
with proper UNIT=... or USER_UNIT=... attribution.
'systemctl status' and 'journalctl -u' not displaying those messages
has been a source of confusion for users, since the journal entry for
a misspelt setting was often logged quite a bit earlier than the
failure to start a unit.
Based-on-a-patch-by: Oleksii Shevchuk <alxchk@gmail.com>
Add option to force journal sync with fsync. Default timeout is 5min.
Interval configured via SyncIntervalSec option at journal.conf. Synced
journal files will be marked as OFFLINE.
Manual sync can be performed via sending SIGUSR1.
The point is to allow the use of journald functions by other binaries.
Before, journald code was split into multiple files (journald-*.[ch]),
but all those files all required functions from journald.c. And
journald.c has its own main(). Now, it is possible to link against
those functions, e.g. from test binaries.
This constitutes a fix for https://bugzilla.redhat.com/show_bug.cgi?id=872638.
The patch does the following:
1. rename journald.h to journald-server.h and move corresponding code
to journald-server.c.
2. add journald-server.c and other journald-*.c parts to
libsystemd-journal-internal.
3. remove journald-syslog.c from test_journal_syslog_SOURCES, since
it is now contained in libsystemd-journal-internal.
There are no code changes, apart from the removal of a few static's,
to allow function calls between files.