This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
After all it is now much more like strjoin() than strappend(). At the
same time, add support for NULL sentinels, even if they are normally not
necessary.
If we scale our buffer to be wide enough for the format string, we
should expect that the calculation was correct.
char_array_0() invocations are removed, since snprintf nul-terminates
the output in any case.
A similar wrapper is used for strftime calls, but only in timedatectl.c.
This undoes a small part of 13790add4b
which was erroneously added, given that zero length datagrams are OK,
and hence zero length reads on a SOCK_DGRAM be no means mean EOF.
Making use of the fd storage capability of the previous commit, allow
restarting journald by serilizing stream state to /run, and pushing open
fds to PID 1.
Even though we use fallocate() it appears that file systems like btrfs
will trigger SIGBUS on certain low-disk-space situation. We should
handle that, hence catch the signal, add it to a list of invalidated
pages, and replace the page with an empty memory area. After each write
check if SIGBUS was triggered, and consider the write invalid if it was.
This should make journald a lot more robust with file systems where
fallocate() is not reliable, for example all CoW file systems
(btrfs...), where changing written data can fail with disk full errors.
https://bugzilla.redhat.com/show_bug.cgi?id=1045810
This is mostly likely the audit socket, and we really should close it
if we cannot make sense of it, since as long as it is open the kernel
might disable the kmsg forwarding of audit msgs, and we should avoid
that, since audit msgs might get completely lost then.
I also downgraded the log message we show a bit, after all things should
really work fine, and we proceed fine with it.
If the format string contains %m, clearly errno must have a meaningful
value, so we might as well use log_*_errno to have ERRNO= logged.
Using:
find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\((".*%m.*")/log_\1_errno(errno, \2/'
Plus some whitespace, linewrap, and indent adjustments.
As a followup to 086891e5c1 "log: add an "error" parameter to all
low-level logging calls and intrdouce log_error_errno() as log calls
that take error numbers", use sed to convert the simple cases to use
the new macros:
find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\("(.*)%s"(.*), strerror\(-([a-zA-Z_]+)\)\);/log_\1_errno(-\4, "\2%m"\3);/'
Multi-line log_*() invocations are not covered.
And we also should add log_unit_*_errno().
systemd-journald would refuse to start if it received an unknown
socket from systemd. This is annoying, because the failure more for
systemd-journald is unpleasant: systemd will keep restarting journald,
but most likely the same error will occur every time. It is better
to continue. journald will try to open missing sockets on its own,
so things should mostly work.
One question is whether to close the sockets which cannot be parsed or
to keep them open. Either way we might lose some messages. This
failure is most likely for the audit socket (selinux issues), which
can be opened multiple times so this not a problem, so I decided to
keep them open because it makes it easier to debug the issue after the
system is fully started.
Commit 74055aa762 'journalctl: add new --flush command and make use of
it in systemd-journal-flush.service' broke flushing because journald
checks for the /run/systemd/journal/flushed file before opening the
permanent journal. When the creation of this file was postponed,
flushing stoppage ensued.
This new command will ask the journal daemon to flush all log data
stored in /run to /var, and wait for it to complete. This is useful, so
that in case of Storage=persistent we can order systemd-tmpfiles-setup
afterwards, to ensure any possibly newly created directory in /var/log
gets proper access mode and owners.
runtime journal is migrated to system journal when only
"/run/systemd/journal/flushed" exist. It's ok but according to this
the system journal directory size(max use) can be over the config. If
journal is not rotated during some time the journal directory can be
remained as over the config(or default) size. To avoid, do
server_vacuum just after the system journal migration from runtime.
It is redundant to store 'hash' and 'compare' function pointers in
struct Hashmap separately. The functions always comprise a pair.
Store a single pointer to struct hash_ops instead.
systemd keeps hundreds of hashmaps, so this saves a little bit of
memory.
String which ended in an unfinished quote were accepted, potentially
with bad memory accesses.
Reject anything which ends in a unfished quote, or contains
non-whitespace characters right after the closing quote.
_FOREACH_WORD now returns the invalid character in *state. But this return
value is not checked anywhere yet.
Also, make 'word' and 'state' variables const pointers, and rename 'w'
to 'word' in various places. Things are easier to read if the same name
is used consistently.
mbiebl_> am I correct that something like this doesn't work
mbiebl_> ExecStart=/usr/bin/encfs --extpass='/bin/systemd-ask-passwd "Unlock EncFS"'
mbiebl_> systemd seems to strip of the quotes
mbiebl_> systemctl status shows
mbiebl_> ExecStart=/usr/bin/encfs --extpass='/bin/systemd-ask-password Unlock EncFS $RootDir $MountPoint
mbiebl_> which is pretty weird
Special care is needed so that we get an error message if the
file failed to parse, but not when it is missing. To avoid duplicating
the same error check in every caller, add an additional 'warn' boolean
to tell config_parse whether a message should be issued.
This makes things both shorter and more robust wrt. to error reporting.
Now that we actually can distuingish system and normal users there's no
point in taking session information into account anymore when splitting
up logs.
This has the beenfit with that coredump information will actually end up
in each user's own journal.
Introduce a new configuration file /etc/systemd/coredump.conf to
configure when to place coredumps in the journal and when on disk.
Since the coredumps are quite large, default to storing them only on
disk.
We shouldn't destroy IPC objects of system users on logout.
http://lists.freedesktop.org/archives/systemd-devel/2014-April/018373.html
This introduces SYSTEM_UID_MAX defined to the maximum UID of system
users. This value is determined compile-time, either as configure switch
or from /etc/login.defs. (We don't read that file at runtime, since this
is really a choice for a system builder, not the end user.)
While we are at it we then also update journald to use SYSTEM_UID_MAX
when we decide whether to split out log data for a specific client.
safe_close() automatically becomes a NOP when a negative fd is passed,
and returns -1 unconditionally. This makes it easy to write lines like
this:
fd = safe_close(fd);
Which will close an fd if it is open, and reset the fd variable
correctly.
By making use of this new scheme we can drop a > 200 lines of code that
was required to test for non-negative fds or to reset the closed fd
variable afterwards.
GCC optimizes strlen("string constant") to a constant, even with -O0.
Thus, replace patterns like sizeof("string constant")-1 with
strlen("string constant") where possible, for clarity. In particular,
for expressions intended to add up the lengths of components going into
a string, this often makes it clearer that the expression counts the
trailing '\0' exactly once, by putting the +1 for the '\0' at the end of
the expression, rather than hidden in a sizeof in the middle of the
expression.
This will let journald forward logs as messages sent to all logged in
users (like wall).
Two options are added:
* ForwardToWall (default yes)
* MaxLevelWall (default emerg)
'ForwardToWall' is overridable by kernel command line option
'systemd.journald.forward_to_wall'.
This is used to emulate the traditional syslogd behaviour of sending
emergency messages to all logged in users.
Bring some arrays that are used for DEFINE_STRING_TABLE_LOOKUP() in the
same order than the enums they reference.
Also, pass the corresponding _MAX value to the array initalizer where
appropriate.
Prior to 3.2, /proc/sys/kernel/hostname isn't a pollable file and
sd_event_add_io will return EPERM. Ignore this failure, since it isn't
critical to journald operation.
Reported and tested by user sraue on IRC.
Previously the returned object of constructor functions where sometimes
returned as last, sometimes as first and sometimes as second parameter.
Let's clean this up a bit. Here are the new rules:
1. The object the new object is derived from is put first, if there is any
2. The object we are creating will be returned in the next arguments
3. This is followed by any additional arguments
Rationale:
For functions that operate on an object we always put that object first.
Constructors should probably not be too different in this regard. Also,
if the additional parameters might want to use varargs which suggests to
put them last.
Note that this new scheme only applies to constructor functions, not to
all other functions. We do give a lot of freedom for those.
Note that this commit only changes the order of the new functions we
added, for old ones we accept the wrong order and leave it like that.
Before, journald would remove journal files until both MaxUse= and
KeepFree= settings would be satisfied. The first one depends (if set
automatically) on the size of the file system and is constant. But
the second one depends on current use of the file system, and a spike
in disk usage would cause journald to delete journal files, trying to
reach usage which would leave 15% of the disk free. This behaviour is
surprising for the user who doesn't expect his logs to be purged when
disk usage goes above 85%, which on a large disk could be some
gigabytes from being full. In addition attempting to keep 15% free
provides an attack vector where filling the disk sufficiently disposes
of almost all logs.
Instead, obey KeepFree= only as a limit on adding additional files.
When replacing old files with new, ignore KeepFree=. This means that
if journal disk usage reached some high point that at some later point
start to violate the KeepFree= constraint, journald will not add files
to go above this point, but it will stay (slightly) below it. When
journald is restarted, it forgets the previous maximum usage value,
and sets the limit based on the current usage, so if disk remains to
be filled, journald might use one journal-file-size less on each
restart, if restarts happen just after rotation. This seems like a
reasonable compromise between implementation complexity and robustness.
With this change a failing event source handler will not cause the
entire event loop to fail. Instead, we just disable the specific event
source, log a message at debug level and go on.
This also introduces a new concept of "exit code" which can be stored in
the event loop and is returned by sd_event_loop(). We also rename "quit"
to "exit" everywhere else.
Altogether this should make things more robus and keep errors local
while still providing a way to return event loop errors in a clear way.
In the time it takes to process incoming log messages, the process we
are logging details for may exit. This means the cgroup data is no
longer available from '/proc'. Unfortunately, the way the code was
structured before, we never log _SYSTEMD_UNIT if we don't have this
cgroup information.
Add an else if case that allows the passed in unit_id to be logged even
if we couldn't capture cgroup information. This ensures a command like
`journalctl -u run-XXX` will return all log messages from a oneshot
process.
Instead of individually checking for containers in each user do this
once in a new call proc_cmdline() that read the file only if we are not
in a container.
Before, when the user journal file was rotated, journal_file_rotate
could close the old file and fail to open the new file. In that
case, we would leave the old (deallocated) file in the hashmap.
On subsequent accesses, we could retrieve this stale entry, leading
to a segfault.
When journal_file_rotate fails with the file pointer set to 0,
old file is certainly gone, and cannot be used anymore.
https://bugzilla.redhat.com/show_bug.cgi?id=890463
In order to avoid a deadlock between journald looking up the
"systemd-journal" group name, and nscd (or anyother NSS backing daemon)
logging something back to the journal avoid all NSS in journald the same
way as we avoid it from PID 1.
With this change we rely on the kernel file system logic to adjust the
group of created journal files via the SETGID bit on the journal
directory. To ensure that it is always set, even after the user created
it with a simply "mkdir" on the shell we fix it up via tmpfiles on boot.
Vacuuming behaviour is a bit confusing, and/or we have some bugs,
so those additional messages should help to find out what's going
on. Also, rotation of journal files shouldn't be happening too
often, so the level of the messages is bumped to info, so that
they'll be logged under normal operation.
Since the journal can handle multiple lines just well natively,
and rsyslog can be configured to handle them as well, there is no need
to truncate messages from syslog() after the first newline.
Reproducer:
1. Add following four lines to /etc/rsyslog.conf
----------
$EscapeControlCharactersOnReceive off
$ActionFileDefaultTemplate RSYSLOG_SysklogdFileFormat
$SpaceLFOnReceive on
$DropTrailingLFOnReception off
----------
3. Restart rsyslog
# service rsyslog restart
4. Compile and run the following program
----------
#include <stdio.h>
#include <syslog.h>
int main()
{
syslog(LOG_INFO, "aaa%caaa", '\n');
return 0;
}
----------
Actual results:
Below message appears in /var/log/messages.
----------
Sep 7 19:19:39 localhost test2: aaa
----------
Expected results:
Below message, which worked prior to systemd-journald
appears in /var/log/messages.
----------
Sep 7 19:19:39 localhost test2: aaa aaa
https://bugzilla.redhat.com/show_bug.cgi?id=855313
When using Storage=none there is no point in collecting all the
information just to throw them away. After this change journald
consumes a lot less CPU time when only forwarding messages.
Reporting of the free space was bogus, since the remaining space
was compared with the maximum allowed, instead of the current
use being compared with the maximum allowed. Simplify and fix
by reporting limits directly at the point where they are calculated.
Also, assign a UUID to the message.
When journald encounters a message with OBJECT_PID= set
coming from a priviledged process (UID==0), additional fields
will be added to the message:
OBJECT_UID=,
OBJECT_GID=,
OBJECT_COMM=,
OBJECT_EXE=,
OBJECT_CMDLINE=,
OBJECT_AUDIT_SESSION=,
OBJECT_AUDIT_LOGINUID=,
OBJECT_SYSTEMD_CGROUP=,
OBJECT_SYSTEMD_SESSION=,
OBJECT_SYSTEMD_OWNER_UID=,
OBJECT_SYSTEMD_UNIT= or OBJECT_SYSTEMD_USER_UNIT=.
This is for other logging daemons, like setroubleshoot, to be able to
augment their logs with data about the process.
https://bugzilla.redhat.com/show_bug.cgi?id=951627
Since the system journal wasn't open yet, available_space() returned 0.
Before:
systemd-journal[22170]: Allowing system journal files to grow to 4.0G.
systemd-journal[22170]: Journal size currently limited to 0B due to SystemKeepFree.
After:
systemd-journal[22178]: Allowing system journal files to grow to 4.0G.
systemd-journal[22178]: Journal size currently limited to 3.0G due to SystemKeepFree.
Also, when failing to write a message, show how much space was needed:
"Failed to write entry (26 items, 260123456 bytes) despite vacuuming, ignoring: ...".
In the following scenario:
server creates system.journal
server creates user-1000.journal
both journals share the same seqnum_id.
Then
server writes to user-1000.journal first,
and server writes to system.journal a bit later,
and everything is fine.
The server then terminates (crash, reboot, rsyslog testing,
whatever), and user-1000.journal has entries which end with
a lower seqnum than system.journal. Now
server is restarted
server opens user-1000.journal and writes entries to it...
BAM! duplicate seqnums for the same seqnum_id.
Now, we usually don't see that happen, because system.journal
is closed last, and opened first. Since usually at least one
message is written during boot and lands in the system.journal,
the seqnum is initialized from it, and is set to a number higher
than than anything found in user journals. Nevertheless, if
system.journal is corrupted and is rotated, it can happen that
an entry is written to the user journal with a seqnum that is
a duplicate with an entry found in the corrupted system.journal~.
When browsing the journal, journalctl can fall into a loop
where it tries to follow the seqnums, and tries to go the
next location by seqnum, and is transported back in time to
to the older duplicate seqnum. There is not way to find
out the maximum seqnum used in a multiple files, without
actually looking at all of them. But we don't want to do
that because it would be slow, and actually it isn't really
possible, because a file might e.g. be temporarily unaccessible.
Fix the problem by using different seqnum series for user
journals. Using the same seqnum series for rotated journals
is still fine, because we know that nothing will write
to the rotated journal anymore.
Likely related:
https://bugs.freedesktop.org/show_bug.cgi?id=64566https://bugs.freedesktop.org/show_bug.cgi?id=59856https://bugs.freedesktop.org/show_bug.cgi?id=64296https://bugs.archlinux.org/task/35581https://bugzilla.novell.com/show_bug.cgi?id=817778
Possibly related:
https://bugs.freedesktop.org/show_bug.cgi?id=64293
Since 11ec7ce, journald isn't setting the ACLs properly anymore if
the files had no ACLs to begin with: acl_set_fd fails with EINVAL.
An ACL with ACL_USER or ACL_GROUP entries but no ACL_MASK entry is
invalid, so make sure a mask exists before trying to set the ACL.
Bump the minimal size of the journal so that we can be sure creating the
journal file will always succeed. Previously the minimum size was
smaller than a empty jounral file...
Session objects will now get the .session suffix, user objects the .user
suffix, nspawn containers the .nspawn suffix.
This also changes the user cgroups to be named after the numeric UID
rather than the username, since this allows us the parse these paths
standalone without requiring access to the cgroup file system.
This also changes the mapping of instanced units to cgroups. Instead of
mapping foo@bar.service to the cgroup path /user/foo@.service/bar we
will now map it to /user/foo@.service/foo@bar.service, in order to
ensure that all our objects are properly suffixed in the tree.
The information about the unit for which files are being parsed
is passed all the way down. This way messages land in the journal
with proper UNIT=... or USER_UNIT=... attribution.
'systemctl status' and 'journalctl -u' not displaying those messages
has been a source of confusion for users, since the journal entry for
a misspelt setting was often logged quite a bit earlier than the
failure to start a unit.
Based-on-a-patch-by: Oleksii Shevchuk <alxchk@gmail.com>
Containers will now carry a label (normally derived from the root
directory name, but configurable by the user), and the container's root
cgroup is /machine/<label>. This label is called "machine name", and can
cover both containers and VMs (as soon as libvirt also makes use of
/machine/).
libsystemd-login can be used to query the machine name from a process.
This patch also includes numerous clean-ups for the cgroup code.
Avoid the dynamic allocation for the _UID, _GID, and _PID strings.
The maximum size of the string can be determined at compile time.
The code has only been compile tested.
When systemd was compiled without audit support, do not collect the
audit session and loginuid in the journal. This is saving a couple of
syscalls and memory allocations per log message.
Before, we would initialize many fields twice: first
by filling the structure with zeros, and then a second
time with the real values. We can let the compiler do
the job for us, avoiding one copy.
A downside of this patch is that text gets slightly
bigger. This is because all zero() calls are effectively
inlined:
$ size build/.libs/systemd
text data bss dec hex filename
before 897737 107300 2560 1007597 f5fed build/.libs/systemd
after 897873 107300 2560 1007733 f6075 build/.libs/systemd
… actually less than 1‰.
A few asserts that the parameter is not null had to be removed. I
don't think this changes much, because first, it is quite unlikely
for the assert to fail, and second, an immediate SEGV is almost as
good as an assert.
Add option to force journal sync with fsync. Default timeout is 5min.
Interval configured via SyncIntervalSec option at journal.conf. Synced
journal files will be marked as OFFLINE.
Manual sync can be performed via sending SIGUSR1.
Previously all journal files were owned by "adm". In order to allow
specific users to read the journal files without granting it access to
the full "adm" powers, introduce a new specific group for this.
"systemd-journal" has to be created by the packaging scripts manually at
installation time. It's a good idea to assign a static UID/GID to this
group, since /var/log/journal might be shared across machines via NFS.
This commit also grants read access to the journal files by default to
members of the "wheel" and "adm" groups via file system ACLs, since
these "almost-root" groups should be able to see what's going on on the
system. These ACLs are created by "make install". Packagers probably
need to duplicate this logic in their postinst scripts.
This also adds documentation how to grant access to the journal to
additional users or groups via fs ACLs.
Code above this attempted to load loginuid, if this failed for
whatever reason, we'd still end up using that value (0) in place of
realuid. Fix this by setting a bool when we know the loginuid is
valid.
This fixes journal messages showing up in per-user journals in
gnome-ostree (not configured with loginuid, but I'll shortly fix
that).
New file output.h with output flags and modes.
--full parameter also for cgls and loginctl.
Include 'all' parameter in flags (show_cgroup_by_path, show_cgroup,
show_cgroup_and_extra, show_cgroup_and_extra_by_spec).
get_process_cmdline with max_length == 0 will not ellipsize output.
Replace LINE_MAX with 0 in some calls of get_process_cmdline.
[zj: Default to --full when under pager for clgs.
Drop '-f' since it wasn't documented and didn't actually work.
Reindent a bit.
]
This introduces a new data threshold setting for sd_journal objects
which controls the maximum size of objects to decompress. This is
relieves the library from having to decompress full data objects even
if a client program is only interested in the initial part of them.
This speeds up "systemd-coredumpctl" drastically when invoked without
parameters.
The point is to allow the use of journald functions by other binaries.
Before, journald code was split into multiple files (journald-*.[ch]),
but all those files all required functions from journald.c. And
journald.c has its own main(). Now, it is possible to link against
those functions, e.g. from test binaries.
This constitutes a fix for https://bugzilla.redhat.com/show_bug.cgi?id=872638.
The patch does the following:
1. rename journald.h to journald-server.h and move corresponding code
to journald-server.c.
2. add journald-server.c and other journald-*.c parts to
libsystemd-journal-internal.
3. remove journald-syslog.c from test_journal_syslog_SOURCES, since
it is now contained in libsystemd-journal-internal.
There are no code changes, apart from the removal of a few static's,
to allow function calls between files.