This macro will read a pointer of any type, return it, and set the
pointer to NULL. This is useful as an explicit concept of passing
ownership of a memory area between pointers.
This takes inspiration from Rust:
https://doc.rust-lang.org/std/option/enum.Option.html#method.take
and was suggested by Alan Jenkins (@sourcejedi).
It drops ~160 lines of code from our codebase, which makes me like it.
Also, I think it clarifies passing of ownership, and thus helps
readability a bit (at least for the initiated who know the new macro)
Let's make use this at various places we call fsync(), to make things
fully reliable, as the kernel devs suggest to first fsync() files and
then fsync() the directories they are located in.
This commint adds a new command line parameter to sytemd-coredump. The
parameter should be mappend to core_pattern's placeholder %h - hostname.
The field _HOSTNAME holds the name from the kernel's namespaces which might be
different then the one comming from process' namespaces.
It is true that the real hostname is usually available in the field
COREDUMP_ENVIRON (environment variables) but I believe it is more reliable to
use the value passed by kernel.
----
The length of iovec is no longer static and hence I corrected the declarations
of the functions set_iovec_field and set_iovec_field_free.
Thank you @yuwata and @poettering!
First, let's rename it to disable_coredumps(), as in the rest of our
codebase we spell it "coredump" rather than "core_dump", so let's stick
to that.
However, also log about failures to turn off core dumpling on LOG_DEBUG,
because debug logging is always a good idea.
Let's replace usage of fputc_unlocked() and friends by __fsetlocking(f,
FSETLOCKING_BYCALLER). This turns off locking for the entire FILE*,
instead of doing individual per-call decision whether to use normal
calls or _unlocked() calls.
This has various benefits:
1. It's easier to read and easier not to forget
2. It's more comprehensive, as fprintf() and friends are covered too
(as these functions have no _unlocked() counterpart)
3. Philosophically, it's a bit more correct, because it's more a
property of the file handle really whether we ever pass it on to another
thread, not of the operations we then apply to it.
This patch reworks all pieces of codes that so far used fxyz_unlocked()
calls to use __fsetlocking() instead. It also reworks all places that
use open_memstream(), i.e. use stdio FILE* for string manipulations.
Note that this in some way a revert of 4b61c87511.
Our CODING_STYLE suggests not comparing with NULL, but relying on C's
downgrade-to-bool feature for that. Fix up some code to match these
guidelines. (This is not comprehensive, the coccinelle output for this
is unfortunately kinda borked)
The "nobody" user might possibly be seen by the journal or coredumping
code if unmapped userns-using processes are somehow visible to them.
Let's make sure we don't do the ACL magic for this user either, since
this is a special system user that might be backed by different real
users in different contexts.
This adds uid_is_system() and gid_is_system(), similar in style to
uid_is_dynamic(). That a helper like this is useful is illustrated by
the fact that test-condition.c didn't get the check right so far, which
this patch fixes.
The advantage is that is the name is mispellt, cpp will warn us.
$ git grep -Ee "conf.set\('(HAVE|ENABLE)_" -l|xargs sed -r -i "s/conf.set\('(HAVE|ENABLE)_/conf.set10('\1_/"
$ git grep -Ee '#ifn?def (HAVE|ENABLE)' -l|xargs sed -r -i 's/#ifdef (HAVE|ENABLE)/#if \1/; s/#ifndef (HAVE|ENABLE)/#if ! \1/;'
$ git grep -Ee 'if.*defined\(HAVE' -l|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_]*)\)/\1/g'
$ git grep -Ee 'if.*defined\(ENABLE' -l|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_]*)\)/\1/g'
+ manual changes to meson.build
squash! build-sys: use #if Y instead of #ifdef Y everywhere
v2:
- fix incorrect setting of HAVE_LIBIDN2
This adds IOVEC_INIT() and IOVEC_MAKE() for initializing iovec structures
from a pointer and a size. On top of these IOVEC_INIT_STRING() and
IOVEC_MAKE_STRING() are added which take a string and automatically
determine the size of the string using strlen().
This patch removes the old IOVEC_SET_STRING() macro, given that
IOVEC_MAKE_STRING() is now useful for similar purposes. Note that the
old IOVEC_SET_STRING() invocations were two characters shorter than the
new ones using IOVEC_MAKE_STRING(), but I think the new syntax is more
readable and more generic as it simply resolves to a C99 literal
structure initialization. Moreover, we can use very similar syntax now
for initializing strings and pointer+size iovec entries. We canalso use
the new macros to initialize function parameters on-the-fly or array
definitions. And given that we shouldn't have so many ways to do the
same stuff, let's just settle on the new macros.
(This also converts some code to use _cleanup_ where dynamically
allocated strings were using IOVEC_SET_STRING() before, to modernize
things a bit)
As a follow-up for db3f45e2d2 let's do the
same for all other cases where we create a FILE* with local scope and
know that no other threads hence can have access to it.
For most cases this shouldn't change much really, but this should speed
dbus introspection and calender time formatting up a bit.
This was exposed by the previous commit. This could be potentially
unpleasant, but we are saved by the fact that this code path was only
taken for journald crashes, where we control COMM and know that it doesn't
contain any special characters. Use log_dispatch which does not do any
format processing to push the message out.
We check these a number of times, hence let's unify these checks here.
This also allows us to make the PID 1 check more elaborate as we can
check both the PID and the cgroup. Checking the PID has the benefit that
we'll also cover cases where PID 1 might still be in the root cgroup, and
the cgroup check has the benefit that we also cover crashes in forked
off crasher processes (the way we actually do it in systemd)
Given that this is a field primarily processed by computers, and not so
much by humans, assign "1" instead of "yes". Also, use parse_boolean()
as we usually do for parsing it again.
This makes things more alike udev options (as one example), such as
SYSTEMD_READY where we also spit out "1" and "0", and parse with
parse_boolean().
We would only log a terse message when pid1 or systemd-journald crashed.
It seems better to reuse the normal code paths as much as possible,
with the following differences:
- if pid1 crashes, we cannot launch the helper, so we don't analyze the
coredump, just write it to file directly from the helper invoked by the
kernel;
- if journald crashes, we can produce the backtrace, but we don't log full
structured messages.
With comparison to previous code, advantages are:
- we go through most of the steps, so for example vacuuming is performed,
- we gather and log more data. In particular for journald and pid1 crashes we
generate a backtrace, and for pid1 crashes we record the metadata (fdinfo,
maps, etc.),
- coredumpctl shows pid1 crashes.
A disavantage (inefficiency) is that we gather metadata for journald crashes
which is then ignored because _TRANSPORT=kernel does not support structued
messages.
Messages for the systemd-journald "crash" have _TRANSPORT=kernel, and
_TRANSPORT=journal for the pid1 "crash".
Feb 26 16:27:55 systemd[1]: systemd-journald.service: Main process exited, code=dumped, status=11/SEGV
Feb 26 16:27:55 systemd[1]: systemd-journald.service: Unit entered failed state.
Feb 26 16:37:54 systemd-coredump[18801]: Process 18729 (systemd-journal) of user 0 dumped core.
Feb 26 16:37:54 systemd-coredump[18801]: Coredump diverted to /var/lib/systemd/coredump/core.systemd-journal.0.36c14bf3c6ce4c38914f441038990979.18729.1488145074000000.lz4
Feb 26 16:37:54 systemd-coredump[18801]: Stack trace of thread 18729:
Feb 26 16:37:54 systemd-coredump[18801]: #0 0x00007f46d6a06b8d fsync (libpthread.so.0)
Feb 26 16:37:54 systemd-coredump[18801]: #1 0x00007f46d71bfc47 journal_file_set_online (libsystemd-shared-233.so)
Feb 26 16:37:54 systemd-coredump[18801]: #2 0x00007f46d71c1c31 journal_file_append_object (libsystemd-shared-233.so)
Feb 26 16:37:54 systemd-coredump[18801]: #3 0x00007f46d71c3405 journal_file_append_data (libsystemd-shared-233.so)
Feb 26 16:37:54 systemd-coredump[18801]: #4 0x00007f46d71c4b7c journal_file_append_entry (libsystemd-shared-233.so)
Feb 26 16:37:54 systemd-coredump[18801]: #5 0x00005577688cf056 write_to_journal (systemd-journald)
Feb 26 16:37:54 systemd-coredump[18801]: #6 0x00005577688d2e98 dispatch_message_real (systemd-journald)
Feb 26 16:37:54 kernel: systemd-coredum: 9 output lines suppressed due to ratelimiting
Feb 26 16:37:54 systemd-journald[18810]: Journal started
Feb 26 16:50:59 systemd-coredump[19229]: Due to PID 1 having crashed coredump collection will now be turned off.
Feb 26 16:51:00 systemd[1]: Caught <SEGV>, dumped core as pid 19228.
Feb 26 16:51:00 systemd[1]: Freezing execution.
Feb 26 16:51:00 systemd-coredump[19229]: Process 19228 (systemd) of user 0 dumped core.
Stack trace of thread 19228:
#0 0x00007fab82075c47 kill (libc.so.6)
#1 0x000055fdf7c38b6b crash (systemd)
#2 0x00007fab824175c0 __restore_rt (libpthread.so.0)
#3 0x00007fab82148573 epoll_wait (libc.so.6)
#4 0x00007fab8366f84a sd_event_wait (libsystemd-shared-233.so)
#5 0x00007fab836701de sd_event_run (libsystemd-shared-233.so)
#6 0x000055fdf7c4a380 manager_loop (systemd)
#7 0x000055fdf7c402c2 main (systemd)
#8 0x00007fab82060401 __libc_start_main (libc.so.6)
#9 0x000055fdf7c3818a _start (systemd)
Poor machine ;)
Unit systemd-coredump@1-3854-0.service is failed/failed, not counting it.
TIME PID UID GID SIG COREFILE EXE
Fri 2017-02-24 11:11:00 EST 10002 1000 1000 6 none /home/zbyszek/src/systemd-work/.libs/lt-Sat 2017-02-25 00:49:32 EST 26921 0 0 11 error /usr/libexec/fprintd
Sat 2017-02-25 11:56:30 EST 30703 1000 1000 - - /usr/bin/python3.5
Sat 2017-02-25 13:16:54 EST 3275 1000 1000 11 present /usr/bin/bash
Sat 2017-02-25 17:25:40 EST 4049 1000 1000 11 truncated /usr/bin/bash
For info and gdb output, the filename is marked in red and "(truncated)" is
appended. (Red is necessary because the annotation is hard to see when running
under a pager.)
Fixed#3883.
We logged about this, but did not attach information directly to the log
entry. It *would* be nice to log the full untruncated size, but afaict, to do
this, we would have to read the full data from the kernel. Doing this just to
log that information seems a bit excessive, in particular when the limit could
be set quite low. So for now let's just add a boolean field.
Most of the fields in the context array come from the kernel (passed
through argv), but two are special: comm and exe. We allocate them
ourselves. We forgot to initialize context[CONTEXT_COMM] with the value
we allocated (introduced in 9aa8202314).
To simplify things, just set context[CONTEXT_COMM] and context[CONTEXT_EXE],
and free those two fields at the end.
Fixes#5442.
Our coredump handler operates on a "context" supplied by the kernel via
the core_pattern arguments. When we pass off a coredump for processing
to coredumpd we pass along enough information for this context to be
reconstructed. This information is passed in the usual journal fields,
and that means we extended the 1s granularity timestamp to 1µs
granularity by appending 6 zeroes. We need to chop them off again when
reconstructing the original kernel context.
Fixes: #4779
(Note that we only do this for the journal metadata, not for the xattrs,
as the xattrs are only supposed to store the original 1:1 info we
acquired from the kernel.)
This adds a unified "copy_flags" parameter to all copy_xyz() function
calls, replacing the various boolean flags so far used. This should make
many invocations more readable as it is clear what behaviour is
precisely requested. This also prepares ground for adding support for
more modes later on.
Embedding sd_id128_t's in constant strings was rather cumbersome. We had
SD_ID128_CONST_STR which returned a const char[], but it had two problems:
- it wasn't possible to statically concatanate this array with a normal string
- gcc wasn't really able to optimize this, and generated code to perform the
"conversion" at runtime.
Because of this, even our own code in coredumpctl wasn't using
SD_ID128_CONST_STR.
Add a new macro to generate a constant string: SD_ID128_MAKE_STR.
It is not as elegant as SD_ID128_CONST_STR, because it requires a repetition
of the numbers, but in practice it is more convenient to use, and allows gcc
to generate smarter code:
$ size .libs/systemd{,-logind,-journald}{.old,}
text data bss dec hex filename
1265204 149564 4808 1419576 15a938 .libs/systemd.old
1260268 149564 4808 1414640 1595f0 .libs/systemd
246805 13852 209 260866 3fb02 .libs/systemd-logind.old
240973 13852 209 255034 3e43a .libs/systemd-logind
146839 4984 34 151857 25131 .libs/systemd-journald.old
146391 4984 34 151409 24f71 .libs/systemd-journald
It is also much easier to check if a certain binary uses a certain MESSAGE_ID:
$ strings .libs/systemd.old|grep MESSAGE_ID
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
$ strings .libs/systemd|grep MESSAGE_ID
MESSAGE_ID=c7a787079b354eaaa9e77b371893cd27
MESSAGE_ID=b07a249cd024414a82dd00cd181378ff
MESSAGE_ID=641257651c1b4ec9a8624d7a40a9e1e7
MESSAGE_ID=de5b426a63be47a7b6ac3eaac82e2f6f
MESSAGE_ID=d34d037fff1847e6ae669a370e694725
MESSAGE_ID=7d4958e842da4a758f6c1cdc7b36dcc5
MESSAGE_ID=1dee0369c7fc4736b7099b38ecb46ee7
MESSAGE_ID=39f53479d3a045ac8e11786248231fbf
MESSAGE_ID=be02cf6855d2428ba40df7e9d022f03d
MESSAGE_ID=7b05ebc668384222baa8881179cfda54
MESSAGE_ID=9d1aaa27d60140bd96365438aad20286
The entry must be a single entry in the journal export format, including the
terminating double newline. The MESSAGE field is now generated on the sender
side.
The advantage is that the reporter can easily pass additional metadata.
Continuing with the example of the python excepthook:
COREDUMP_PYTHON_EXECUTABLE=/usr/bin/python3
COREDUMP_PYTHON_VERSION=3.5.2 (default, Sep 14 2016, 11:28:32)
[GCC 6.2.1 20160901 (Red Hat 6.2.1-1)]
COREDUMP_PYTHON_THREAD_INFO=sys.thread_info(name='pthread', lock='semaphore', version='NPTL 2.24')
COREDUMP_PYTHON_EXCEPTION_TYPE=ZeroDivisionError
COREDUMP_PYTHON_EXCEPTION_VALUE=division by zero
MESSAGE=Process 29514 (systemd_coredump_exception_handler.py) of user zbyszek failed with ZeroDivisionError: division by zero
Traceback (most recent call last):
File "systemd_coredump_exception_handler.py", line 134, in <module>
g()
File "systemd_coredump_exception_handler.py", line 133, in g
f()
File "systemd_coredump_exception_handler.py", line 131, in f
div0 = 1 / 0
ZeroDivisionError: division by zero
Local variables in innermost frame:
a=3
h=<function f at 0x7efdc14b6ea0>
One consideration is whether to use the Journal Export Format, or send packets
over a UNIX socket instead. The advantage of current solution is that although
parsing is more complicated on the receiver side, it is much easier to use on the
sender side. I hope this can be used by various languages for which writing
binary structures to a UNIX socket is harder and more likely to be done wrong
than piping of a simple textyish format.
In preparation for subsequenct changes...
Various stack allocations are changed to use the heap. This might be minimally
slower, but probably doesn't matter. The upside is that we will now properly
free all memory that is allocated.
For normal arches this doesn't matter, but on arm32 arg_journal_size_max was smaller
than the other *SizeMax variables. This doesn't seem useful.
This is anothet part of the fix in 5206a724a0.
In file included from ./src/basic/macro.h:415:0,
from ./src/shared/acl-util.h:28,
from src/coredump/coredump.c:36:
src/coredump/coredump.c: In function ‘submit_coredump’:
src/coredump/coredump.c:711:26: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 7 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat=]
log_info("The core will not be stored: size %zu is greater than %zu (the configured maximum)",
^
./src/basic/log.h:175:82: note: in definition of macro ‘log_full_errno’
? log_internal(_level, _e, __FILE__, __LINE__, __func__, __VA_ARGS__) \
^~~~~~~~~~~
./src/basic/log.h:183:28: note: in expansion of macro ‘log_full’
#define log_info(...) log_full(LOG_INFO, __VA_ARGS__)
^~~~~~~~
src/coredump/coredump.c:711:17: note: in expansion of macro ‘log_info’
log_info("The core will not be stored: size %zu is greater than %zu (the configured maximum)",
^~~~~~~~
src/coredump/coredump.c:711:26: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 8 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat=]
log_info("The core will not be stored: size %zu is greater than %zu (the configured maximum)",
^
./src/basic/log.h:175:82: note: in definition of macro ‘log_full_errno’
? log_internal(_level, _e, __FILE__, __LINE__, __func__, __VA_ARGS__) \
^~~~~~~~~~~
./src/basic/log.h:183:28: note: in expansion of macro ‘log_full’
#define log_info(...) log_full(LOG_INFO, __VA_ARGS__)
^~~~~~~~
src/coredump/coredump.c:711:17: note: in expansion of macro ‘log_info’
log_info("The core will not be stored: size %zu is greater than %zu (the configured maximum)",
^~~~~~~~
src/coredump/coredump.c:741:27: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 7 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat=]
log_debug("Not generating stack trace: core size %zu is greater than %zu (the configured maximum)",
^
./src/basic/log.h:175:82: note: in definition of macro ‘log_full_errno’
? log_internal(_level, _e, __FILE__, __LINE__, __func__, __VA_ARGS__) \
^~~~~~~~~~~
./src/basic/log.h:182:28: note: in expansion of macro ‘log_full’
#define log_debug(...) log_full(LOG_DEBUG, __VA_ARGS__)
^~~~~~~~
src/coredump/coredump.c:741:17: note: in expansion of macro ‘log_debug’
log_debug("Not generating stack trace: core size %zu is greater than %zu (the configured maximum)",
^~~~~~~~~
src/coredump/coredump.c:741:27: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 8 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat=]
log_debug("Not generating stack trace: core size %zu is greater than %zu (the configured maximum)",
^
./src/basic/log.h:175:82: note: in definition of macro ‘log_full_errno’
? log_internal(_level, _e, __FILE__, __LINE__, __func__, __VA_ARGS__) \
^~~~~~~~~~~
./src/basic/log.h:182:28: note: in expansion of macro ‘log_full’
#define log_debug(...) log_full(LOG_DEBUG, __VA_ARGS__)
^~~~~~~~
src/coredump/coredump.c:741:17: note: in expansion of macro ‘log_debug’
log_debug("Not generating stack trace: core size %zu is greater than %zu (the configured maximum)",
^~~~~~~~~
src/coredump/coredump.c:768:34: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 7 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat=]
log_info("The core will not be stored: size %zu is greater than %zu (the configured maximum)",
^
./src/basic/log.h:175:82: note: in definition of macro ‘log_full_errno’
? log_internal(_level, _e, __FILE__, __LINE__, __func__, __VA_ARGS__) \
^~~~~~~~~~~
./src/basic/log.h:183:28: note: in expansion of macro ‘log_full’
#define log_info(...) log_full(LOG_INFO, __VA_ARGS__)
^~~~~~~~
src/coredump/coredump.c:768:25: note: in expansion of macro ‘log_info’
log_info("The core will not be stored: size %zu is greater than %zu (the configured maximum)",
^~~~~~~~
coredump had code to check if copy_bytes() hit the max_bytes limit,
and refuse further processing in that case.
But in 84ee096044, the return convention for copy_bytes() was changed
from -EFBIG to 1 for the case when the limit is hit, so the condition
check in coredump couldn't ever trigger.
But it seems that *do* want to process such truncated cores [1].
So change the code to detect truncation properly, but instead of
returning an error, give a nice log entry.
[1] https://github.com/systemd/systemd/issues/3883#issuecomment-239106337
Should fix (or at least alleviate) #3883.
Back when external storage was initially added in 34c10968cb, this mode of
storage was added. This could have made some sense back when XZ compression was
used, and an uncompressed core on disk could be used as short-lived cache file
which does require costly decompression. But now fast LZ4 compression is used
(by default) both internally and externally, so we have duplicated storage,
using the same compression and same default maximum core size in both cases,
but with different expiration lifetimes. Even the uncompressed-external,
compressed-internal mode is not very useful: for small files, decompression
with LZ4 is fast enough not to matter, and for large files, decompression is
still relatively fast, but the disk-usage penalty is very big.
An additional problem with the two modes of storage is that it complicates
the code and makes it much harder to return a useful error message to the user
if we cannot find the core file, since if we cannot find the file we have to
check the internal storage first.
This patch drops "both" storage mode. Effectively this means that if somebody
configured coredump this way, they will get a warning about an unsupported
value for Storage, and the default of "external" will be used.
I'm pretty sure that this mode is very rarely used anyway.
If ulimit is smaller than page_size(), function save_external_coredump()
returns -EBADSLT and this causes skipping whole core dumping part in
submit_coredump(). Initializing coredump_size to UINT64_MAX prevents
evaluating a condition with uninitialized varialbe which leads to
calling allocate_journal_field() with coredump_fd = -1 which causes
aborting.
Signed-off-by: Matej Habrnal <mhabrnal@redhat.com>
The kernel treats values below a certain threshold (minfmt->min_coredump
which is initialized do ELF_EXEC_PAGESIZE, which varies between architectures,
but is usually the same as PAGE_SIZE) as disabling coredumps [1].
Any core image below ELF_EXEC_PAGESIZE will yield an invalid backtrace anyway [2],
so follow the kernel and not try to parse or store such images.
[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/coredump.c#n660
[2] systemd-coredump[16260]: Process 16258 (sleep) of user 1002 dumped core.
Stack trace of thread 16258:
#0 0x00007f1d8b3d3810 n/a (n/a)
https://bugzilla.redhat.com/show_bug.cgi?id=1309172#c19