At various places we only want to close fds if they are not
stdin/stdout/stderr, i.e. fds 0, 1, 2. Let's add a unified helper call
for that, and port everything over.
Then it can be used in the asserts in logging functions without causing
infinite recursion. The error is just printed to stderr, it should be
good enough for the common case.
If log_do_header() was called with overly long parameters, it'd generate
improper output. Essentially, it'd be truncated at random point, in particular
missing a newline at the end, so it'd run with the next field, usually MESSAGE=.
log_do_header is called with parameters from compiled code (file name, lien
nubmer, etc), so in practice this was unlikely to ever be a problem, but it is
possible. In particular, if systemd was compiled from sources in some deeply
nested directory (which happens for example in mock and other build roots), the
filename could be very long.
As a safety measure, let's truncate all parameters to 256 bytes. So we have
5 fields which are 256 bytes (plus the field name prefix), and a few other
fields with fixed width. This must always fit in the 2048 byte buffer.
I don't think there's much gain in calculating the required length precisely,
since it's a lot of fields and a few bytes allocated on the stack don't matter.
log_dispatch_internal has only one caller where the extra_field/extra
params are not null: log_unit_full. When log_unit_full() was called,
when we got to log_dispatch_internal, our header would look like this:
PRIORITY=7
SYSLOG_FACILITY=3
CODE_FILE=../src/core/manager.c
CODE_LINE=2145
CODE_FUNC=manager_invoke_sigchld_event
USER_UNIT=gnome-terminal-server.service
65dffa7a3b984a6d9a46f0b8fb57710bUSER_INVOCATION_ID=
SYSLOG_IDENTIFIER=systemd
It took me a while to understand why I'm not seeing mangled messages in the
journal (after all, "" is a valid rvalue for log messages). The answer is that
journald rejects any field name which starts with a digit, and the MESSAGE_ID
that was used here starts with a digit. Hence, those lines would be silently
filtered out.
Previously, we'd try to open kmsg on failure of the journal/syslog even
if no automatic fallback to kmsg was requested — and we wouldn't even
use the open connection afterwards...
This adds some paranoia code that moves some of the fds we allocate for
longer periods of times to fds > 2 if they are allocated below this
boundary. This is a paranoid safety thing, in order to avoid that
external code might end up erroneously use our fds under the assumption
they were valid stdin/stdout/stderr. Think: some app closes
stdin/stdout/stderr and then invokes 'fprintf(stderr, …' which causes
writes on our fds.
This both adds the helper to do the moving as well as ports over a
number of users to this new logic. Since we don't want to litter all our
code with invocations of this I tried to strictly focus on fds we keep
open for long periods of times only and only in code that is frequently
loaded into foreign programs (under the assumptions that in our own
codebase we are smart enough to always keep stdin/stdout/stderr
allocated to avoid this pitfall). Specifically this means all code used
by NSS and our sd-xyz API:
1. our logging APIs
2. sd-event
3. sd-bus
4. sd-resolve
5. sd-netlink
This changed was inspired by this:
https://github.com/systemd/systemd/issues/8075#issuecomment-363689755
This shows that apparently IRL there are programs that do close
stdin/stdout/stderr, and we should accomodate for that.
Note that this won't fix any bugs, this just makes sure that buggy
programs are less likely to interfere with out own code.
log_open_console() did not switch from stderr to /dev/console, when
"always_reopen_console" was set. It was necessary to call
log_close_console() first.
By contrast, log_open() did switch between e.g. journald and kmsg according
to the value of "prohibit_ipc".
Let's fix log_open() to respect the values of all the log options, and we
can make log_close_*() private.
Also log_close_console() is changed. There was some precaution, avoiding
closing the console fd if we are not PID 1. I think commit 48a601fe made
a little mistake in leaving this in, and it only served to confuse
readers :).
Also I changed systemd-shutdown. Now we have log_set_prohibit_ipc(), let's
use it to clarify that systemd-shutdown is not expected to try and log via
journald (which it is about to kill). We avoided ever asking it to, but
it's more convenient for the reader if they don't have to think about that.
In that sense, it's similar to using assert() to validate a function's
arguments.
This removes LOG_TARGET_SAFE. It's made redundant by the new
"prohibit-ipc" logging flag, as it used to have a similar effect: avoid
logging to the journal/syslog, i.e. any local services in order to avoid
deadlocks when we lock from PID 1 or its utility processes (such as
generators).
All previous users of LOG_TARGET_SAFE are switched over to the new
setting. This makes things a bit safer for all, as not even the
SYSTEMD_LOG_TARGET env var can be used to accidentally log to the
journal anymore in these programs.
If set, we'll avoid logging to any IPC log targets, i.e. syslog or the
journal, but allow stderr, kmsg, console logging.
This is useful as PID 1 wants to turn this off explicitly as long as the
journal is not up.
Previously we'd open/close the log stream to these services whenever
needed but this is incompatible with the "open_when_needed" logic
introduced in #6915, which might open the log streams whenever it likes,
including possibly inside of the child process we fork off that'll
become journald later on. Hence, let's make this all explicit, and
instead of managing when we open/close log streams add a boolean that
clearly prohibits the IPC targets when needed, so that opening can be
done at any time, but will honour this.
See: #7985
This doesn't matter much, and we don't rely on it, but I think it's much
nicer if we log_set_target() and log_set_upgrade_syslog_to_journal() can
be called in either order and have the same effect.
The macro used utf8.h functions without including that. Let's clean this
up, by moving that code inside of log.c.
Let's also make the call return -EINVAL in all cases. This is in line
with log_oom() which also returns a well-defined error code even though
it doesn#t take one.
log.h really should only include the bare minimum of other headers, as
it is really pulled into pretty much everything else and already in
itself one of the most basic pieces of code we have.
Let's hence drop inclusion of:
1. sd-id128.h because it's entirely unneeded in current log.h
2. errno.h, dito.
3. sys/signalfd.h which we can replace by a simple struct forward
declaration
4. process-util.h which was needed for getpid_cached() which we now hide
in a funciton log_emergency_level() instead, which nicely abstracts
the details away.
5. sys/socket.h which was needed for struct iovec, but a simple struct
forward declaration suffices for that too.
Ultimately this actually makes our source tree larger (since users of
the functionality above must now include it themselves, log.h won't do
that for them), but I think it helps to untangle our web of includes a
tiny bit.
(Background: I'd like to isolate the generic bits of src/basic/ enough
so that we can do a git submodule import into casync for it)
And let's make use of it to implement two new unit settings with it:
1. LogLevelMax= is a new per-unit setting that may be used to configure
log priority filtering: set it to LogLevelMax=notice and only
messages of level "notice" and lower (i.e. more important) will be
processed, all others are dropped.
2. LogExtraFields= is a new per-unit setting for configuring per-unit
journal fields, that are implicitly included in every log record
generated by the unit's processes. It takes field/value pairs in the
form of FOO=BAR.
Also, related to this, one exisiting unit setting is ported to this new
facility:
3. The invocation ID is now pulled from /run/systemd/units/ instead of
cgroupfs xattrs. This substantially relaxes requirements of systemd
on the kernel version and the privileges it runs with (specifically,
cgroupfs xattrs are not available in containers, since they are
stored in kernel memory, and hence are unsafe to permit to lesser
privileged code).
/run/systemd/units/ is a new directory, which contains a number of files
and symlinks encoding the above information. PID 1 creates and manages
these files, and journald reads them from there.
Note that this is supposed to be a direct path between PID 1 and the
journal only, due to the special runtime environment the journal runs
in. Normally, today we shouldn't introduce new interfaces that (mis-)use
a file system as IPC framework, and instead just an IPC system, but this
is very hard to do between the journal and PID 1, as long as the IPC
system is a subject PID 1 manages, and itself a client to the journal.
This patch cleans up a couple of types used in journal code:
specifically we switch to size_t for a couple of memory-sizing values,
as size_t is the right choice for everything that is memory.
Fixes: #4089Fixes: #3041Fixes: #4441
This we can then make use in execute.c to make error logging a bit less
special when preparing for process execution, as we can still log but
don't have any fds open continously.
This adds a new recognizable log message for each unit invocation that
contains structured information about consumed resources of the unit as
a whole after it terminated. This is particular useful for apps that
want to figure out what the resource consumption of a unit given a
specific invocation ID was.
The log message is only generated for units that have at least one
XyzAccounting= property turned on, and currently only covers IP traffic and CPU
time metrics.
This adds IOVEC_INIT() and IOVEC_MAKE() for initializing iovec structures
from a pointer and a size. On top of these IOVEC_INIT_STRING() and
IOVEC_MAKE_STRING() are added which take a string and automatically
determine the size of the string using strlen().
This patch removes the old IOVEC_SET_STRING() macro, given that
IOVEC_MAKE_STRING() is now useful for similar purposes. Note that the
old IOVEC_SET_STRING() invocations were two characters shorter than the
new ones using IOVEC_MAKE_STRING(), but I think the new syntax is more
readable and more generic as it simply resolves to a C99 literal
structure initialization. Moreover, we can use very similar syntax now
for initializing strings and pointer+size iovec entries. We canalso use
the new macros to initialize function parameters on-the-fly or array
definitions. And given that we shouldn't have so many ways to do the
same stuff, let's just settle on the new macros.
(This also converts some code to use _cleanup_ where dynamically
allocated strings were using IOVEC_SET_STRING() before, to modernize
things a bit)
This moves pretty much all uses of getpid() over to getpid_raw(). I
didn't specifically check whether the optimization is worth it for each
replacement, but in order to keep things simple and systematic I
switched over everything at once.
secure_getenv does not work when the process has a nonempty permitted
capability set, which means that it's unduly hard to configure logging in
systemd-logind, systemd-resolved, and others.
secure_getenv is useful for code in libraries which might get called from a
setuid application. log_parse_environment() is never called from our library
code, but directly form various top-level executables. None of them are
installed suid, and none are prepared to be used this way, since many
additional changes would be required to make that safe. We may just as well
drop the check and allow SYSTEMD_LOG_* to properly parsed.
Fixes#4900.
The single log level is split into an array of log levels. Which index in the
array is used can be determined for each compilation unit separately by setting
a macro before including log.h. All compilation units use the same index
(LOG_REALM_SYSTEMD), so there should be no functional change.
v2:
- the "realm" is squished into the level (upper bits that are not used by
priority or facility), and unsquished later in functions in log.c.
v3:
- rename REALM_PLUS_LEVEL to LOG_REALM_PLUS_LEVEL and REALM to LOG_REALM_REMOVE_LEVEL.
This is useful when we want to avoid printf formatting on the message.
It's nicer than using log_struct with "%s" as the format, because printf
is slow and with a large message (like from a backtrace) this would require
extra unnecessary memory.
I'm not exposing all the fields in the wrapper: only level and errno.
Those are the most likely to be useful.
Embedding sd_id128_t's in constant strings was rather cumbersome. We had
SD_ID128_CONST_STR which returned a const char[], but it had two problems:
- it wasn't possible to statically concatanate this array with a normal string
- gcc wasn't really able to optimize this, and generated code to perform the
"conversion" at runtime.
Because of this, even our own code in coredumpctl wasn't using
SD_ID128_CONST_STR.
Add a new macro to generate a constant string: SD_ID128_MAKE_STR.
It is not as elegant as SD_ID128_CONST_STR, because it requires a repetition
of the numbers, but in practice it is more convenient to use, and allows gcc
to generate smarter code:
$ size .libs/systemd{,-logind,-journald}{.old,}
text data bss dec hex filename
1265204 149564 4808 1419576 15a938 .libs/systemd.old
1260268 149564 4808 1414640 1595f0 .libs/systemd
246805 13852 209 260866 3fb02 .libs/systemd-logind.old
240973 13852 209 255034 3e43a .libs/systemd-logind
146839 4984 34 151857 25131 .libs/systemd-journald.old
146391 4984 34 151409 24f71 .libs/systemd-journald
It is also much easier to check if a certain binary uses a certain MESSAGE_ID:
$ strings .libs/systemd.old|grep MESSAGE_ID
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
$ strings .libs/systemd|grep MESSAGE_ID
MESSAGE_ID=c7a787079b354eaaa9e77b371893cd27
MESSAGE_ID=b07a249cd024414a82dd00cd181378ff
MESSAGE_ID=641257651c1b4ec9a8624d7a40a9e1e7
MESSAGE_ID=de5b426a63be47a7b6ac3eaac82e2f6f
MESSAGE_ID=d34d037fff1847e6ae669a370e694725
MESSAGE_ID=7d4958e842da4a758f6c1cdc7b36dcc5
MESSAGE_ID=1dee0369c7fc4736b7099b38ecb46ee7
MESSAGE_ID=39f53479d3a045ac8e11786248231fbf
MESSAGE_ID=be02cf6855d2428ba40df7e9d022f03d
MESSAGE_ID=7b05ebc668384222baa8881179cfda54
MESSAGE_ID=9d1aaa27d60140bd96365438aad20286
This improves kernel command line parsing in a number of ways:
a) An kernel option "foo_bar=xyz" is now considered equivalent to
"foo-bar-xyz", i.e. when comparing kernel command line option names "-" and
"_" are now considered equivalent (this only applies to the option names
though, not the option values!). Most of our kernel options used "-" as word
separator in kernel command line options so far, but some used "_". With
this change, which was a source of confusion for users (well, at least of
one user: myself, I just couldn't remember that it's systemd.debug-shell,
not systemd.debug_shell). Considering both as equivalent is inspired how
modern kernel module loading normalizes all kernel module names to use
underscores now too.
b) All options previously using a dash for separating words in kernel command
line options now use an underscore instead, in all documentation and in
code. Since a) has been implemented this should not create any compatibility
problems, but normalizes our documentation and our code.
c) All kernel command line options which take booleans (or are boolean-like)
have been reworked so that "foobar" (without argument) is now equivalent to
"foobar=1" (but not "foobar=0"), thus normalizing the handling of our
boolean arguments. Specifically this means systemd.debug-shell and
systemd_debug_shell=1 are now entirely equivalent.
d) All kernel command line options which take an argument, and where no
argument is specified will now result in a log message. e.g. passing just
"systemd.unit" will no result in a complain that it needs an argument. This
is implemented in the proc_cmdline_missing_value() function.
e) There's now a call proc_cmdline_get_bool() similar to proc_cmdline_get_key()
that parses booleans (following the logic explained in c).
f) The proc_cmdline_parse() call's boolean argument has been replaced by a new
flags argument that takes a common set of bits with proc_cmdline_get_key().
g) All kernel command line APIs now begin with the same "proc_cmdline_" prefix.
h) There are now tests for much of this. Yay!
systemd.journal-fields(7) documents CODE_FUNC=. Internally, we were
inconsistent: sd_journal_print uses CODE_FUNC=, log.h has CODE_FUNCTION=,
python-systemd and bootchart also used CODE_FUNC=, when they were internal.
Most external projects use sd_journal_* functions, so CODE_FUNC=,
python-systemd still uses CODE_FUNC=, as does systemd-bootchart, and
independent reimplementations in golang-github-coreos-go-systemd, qtbase,
network manager, glib, pulseaudio. Hence, I don't think there's much
choice.
Those square brackets don't fit how our other messages look like; we use colons
everywhere else. The "[a:b]" format was originally added in
ed5bcfbe3c, and remained unchanged for 7 years,
but in the meantime other conventions evolved.
The new version is also one character shorter.
[/etc/systemd/system/systemd-networkd.service.d/override.conf:2] Failed to parse sec value, ignoring: ...
↓
/etc/systemd/system/systemd-networkd.service.d/override.conf:2: Failed to parse sec value, ignoring: ...
We don't have plural in the name of any other -util files and this
inconsistency trips me up every time I try to type this file name
from memory. "formats-util" is even hard to pronounce.
This reverts some changes introduced in d054f0a4d4.
xsprintf should be used in cases where we calculated the right buffer
size by hand (using DECIMAL_STRING_MAX and such), and never in cases where
we are printing externally specified strings of arbitrary length.
Fixes#4534.
This stripping is contolled by a new boolean parameter. When the parameter
is true, it means that the caller does not care about the distinction between
initrd and real root, and wants to act on both rd-dot-prefixed and unprefixed
parameters in the initramfs, and only on the unprefixed parameters in real
root. If the parameter is false, behaviour is the same as before.
Changes by caller:
log.c (systemd.log_*): changed to accept rd-dot-prefix params
pid1: no change, custom logic
cryptsetup-generator: no change, still accepts rd-dot-prefix params
debug-generator: no change, does not accept rd-dot-prefix params
fsck: changed to accept rd-dot-prefix params
fstab-generator: no change, custom logic
gpt-auto-generator: no change, custom logic
hibernate-resume-generator: no change, does not accept rd-dot-prefix params
journald: changed to accept rd-dot-prefix params
modules-load: no change, still accepts rd-dot-prefix params
quote-check: no change, does not accept rd-dot-prefix params
udevd: no change, still accepts rd-dot-prefix params
I added support for "rd." params in the three cases where I think it's
useful: logging, fsck options, journald forwarding options.
This adds a new invocation ID concept to the service manager. The invocation ID
identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is
generated each time a unit moves from and inactive to an activating or active
state.
The primary usecase for this concept is to connect the runtime data PID 1
maintains about a service with the offline data the journal stores about it.
Previously we'd use the unit name plus start/stop times, which however is
highly racy since the journal will generally process log data after the service
already ended.
The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel,
except that it applies to an individual unit instead of the whole system.
The invocation ID is passed to the activated processes as environment variable.
It is additionally stored as extended attribute on the cgroup of the unit. The
latter is used by journald to automatically retrieve it for each log logged
message and attach it to the log entry. The environment variable is very easily
accessible, even for unprivileged services. OTOH the extended attribute is only
accessible to privileged processes (this is because cgroupfs only supports the
"trusted." xattr namespace, not "user."). The environment variable may be
altered by services, the extended attribute may not be, hence is the better
choice for the journal.
Note that reading the invocation ID off the extended attribute from journald is
racy, similar to the way reading the unit name for a logging process is.
This patch adds APIs to read the invocation ID to sd-id128:
sd_id128_get_invocation() may be used in a similar fashion to
sd_id128_get_boot().
PID1's own logging is updated to always include the invocation ID when it logs
information about a unit.
A new bus call GetUnitByInvocationID() is added that allows retrieving a bus
path to a unit by its invocation ID. The bus path is built using the invocation
ID, thus providing a path for referring to a unit that is valid only for the
current runtime cycleof it.
Outlook for the future: should the kernel eventually allow passing of cgroup
information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we
can alter the invocation ID to be generated as hash from that rather than
entirely randomly. This way we can derive the invocation race-freely from the
messages.
Commit d054f0a4 ("tree-wide: use xsprintf() where applicable") used a
semantic patch approach to change a number of locations from
snprintf(buf, sizeof(buf), FMT, ...)
to
xsprintf(buf, FMT, ...)
The problem is that xsprintf() wraps the snprintf() in an
assert_message_se(), so if snprintf() reports an overflow of the
destination buffer, the binary will now terminate.
This hit a user running a version of systemd that was built from a
deeply nested system path.
Fix this by
a) Switching back to snprintf() for this particular case. We should really
rather truncate the location string than crash in such situations.
b) Increasing the size of that static string buffer, to make the event more
unlikely.
The macro determines the right length of a AF_UNIX "struct sockaddr_un" to pass to
connect() or bind(). It automatically figures out if the socket refers to an
abstract namespace socket, or a socket in the file system, and properly handles
the full length of the path field.
This macro is not only safer, but also simpler to use, than the usual
offsetof() + strlen() logic.
The code to format the iovec is shared with log.c. All call sites to
server_driver_message are changed to include the additional "MESSAGE="
part, but the new functionality is not used and change in functionality
is not expected.
iovec is preallocated, so the maximum number of messages is limited.
In server_driver_message N_IOVEC_PAYLOAD_FIELDS is currently set to 1.
New code is not oom safe, it will fail if memory cannot be allocated.
This will be fixed in subsequent commit.