Let's not have #ifdeffery both in the consumers and the providers of the
selinux glue code. Unless the code is particularly complex, let's do the
ifdeffery only in the provider of the selinux glue code, and let's keep
the consumers simple and just invoke it.
It fully initializes the address structure, so no need for pre-initialization,
and also returns the length of the address, so no need to recalculate using
SOCKADDR_UN_LEN().
socklen_t is unsigned, so let's not use an int for it. (It doesn't matter, but
seems cleaner and more portable to not assume anything about the type.)
If we do, we operate on a separate set of logs and runtime objects
The namespace is configured via argv[1].
Fixes: #12123Fixes: #10230#9519
(These latter two issues ask for slightly different stuff, but the
usecases generally can be solved by running separate instances of
journald now, hence also declaring that as "Fixes:")
We immediately read the whole contents into memory, making thigs much more
expensive. Sealed fds should be used instead since they are more efficient
on our side.
We'd first parse all or most of the message, and only then consider if it
is not too large. Also, when encountering a single field over the limit,
we'd still process the preceding part of the message. Let's be stricter,
and check size limits early, and let's refuse the whole message if it fails
any of the size limits.
We allocate a iovec entry for each field, so with many short entries,
our memory usage and processing time can be large, even with a relatively
small message size. Let's refuse overly long entries.
CVE-2018-16865
https://bugzilla.redhat.com/show_bug.cgi?id=1653861
What from I can see, the problem is not from an alloca, despite what the CVE
description says, but from the attack multiplication that comes from creating
many very small iovecs: (void* + size_t) for each three bytes of input message.
All over the place we define local variables for the various sockopts
that take a bool-like "int" value. Sometimes they are const, sometimes
static, sometimes both, sometimes neither.
Let's clean this up, introduce a common const variable "const_int_one"
(as well as one matching "const_int_zero") and use it everywhere, all
acorss the codebase.
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
And let's make use of it to implement two new unit settings with it:
1. LogLevelMax= is a new per-unit setting that may be used to configure
log priority filtering: set it to LogLevelMax=notice and only
messages of level "notice" and lower (i.e. more important) will be
processed, all others are dropped.
2. LogExtraFields= is a new per-unit setting for configuring per-unit
journal fields, that are implicitly included in every log record
generated by the unit's processes. It takes field/value pairs in the
form of FOO=BAR.
Also, related to this, one exisiting unit setting is ported to this new
facility:
3. The invocation ID is now pulled from /run/systemd/units/ instead of
cgroupfs xattrs. This substantially relaxes requirements of systemd
on the kernel version and the privileges it runs with (specifically,
cgroupfs xattrs are not available in containers, since they are
stored in kernel memory, and hence are unsafe to permit to lesser
privileged code).
/run/systemd/units/ is a new directory, which contains a number of files
and symlinks encoding the above information. PID 1 creates and manages
these files, and journald reads them from there.
Note that this is supposed to be a direct path between PID 1 and the
journal only, due to the special runtime environment the journal runs
in. Normally, today we shouldn't introduce new interfaces that (mis-)use
a file system as IPC framework, and instead just an IPC system, but this
is very hard to do between the journal and PID 1, as long as the IPC
system is a subject PID 1 manages, and itself a client to the journal.
This patch cleans up a couple of types used in journal code:
specifically we switch to size_t for a couple of memory-sizing values,
as size_t is the right choice for everything that is memory.
Fixes: #4089Fixes: #3041Fixes: #4441
../src/journal/journald-native.c:341:13: warning: variable 'context' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
if (ucred && pid_is_valid(ucred->pid)) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/journal/journald-native.c:350:42: note: uninitialized use occurs here
context, ucred, tv, label, label_len);
^~~~~~~
../src/journal/journald-native.c:335:31: note: initialize the variable 'context' to silence this warning
ClientContext *context;
^
= NULL
Very nice reporting!
Functions that we call can handle context == NULL, so it's enough to simply
initialize the variable.
The advantage is that is the name is mispellt, cpp will warn us.
$ git grep -Ee "conf.set\('(HAVE|ENABLE)_" -l|xargs sed -r -i "s/conf.set\('(HAVE|ENABLE)_/conf.set10('\1_/"
$ git grep -Ee '#ifn?def (HAVE|ENABLE)' -l|xargs sed -r -i 's/#ifdef (HAVE|ENABLE)/#if \1/; s/#ifndef (HAVE|ENABLE)/#if ! \1/;'
$ git grep -Ee 'if.*defined\(HAVE' -l|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_]*)\)/\1/g'
$ git grep -Ee 'if.*defined\(ENABLE' -l|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_]*)\)/\1/g'
+ manual changes to meson.build
squash! build-sys: use #if Y instead of #ifdef Y everywhere
v2:
- fix incorrect setting of HAVE_LIBIDN2
This adds IOVEC_INIT() and IOVEC_MAKE() for initializing iovec structures
from a pointer and a size. On top of these IOVEC_INIT_STRING() and
IOVEC_MAKE_STRING() are added which take a string and automatically
determine the size of the string using strlen().
This patch removes the old IOVEC_SET_STRING() macro, given that
IOVEC_MAKE_STRING() is now useful for similar purposes. Note that the
old IOVEC_SET_STRING() invocations were two characters shorter than the
new ones using IOVEC_MAKE_STRING(), but I think the new syntax is more
readable and more generic as it simply resolves to a C99 literal
structure initialization. Moreover, we can use very similar syntax now
for initializing strings and pointer+size iovec entries. We canalso use
the new macros to initialize function parameters on-the-fly or array
definitions. And given that we shouldn't have so many ways to do the
same stuff, let's just settle on the new macros.
(This also converts some code to use _cleanup_ where dynamically
allocated strings were using IOVEC_SET_STRING() before, to modernize
things a bit)
Cache client metadata, in order to be improve runtime behaviour under
pressure.
This is inspired by @vcaputo's work, specifically:
https://github.com/systemd/systemd/pull/2280
That code implements related but different semantics.
For a longer explanation what this change implements please have a look
at the long source comment this patch adds to journald-context.c.
After this commit:
# time bash -c 'dd bs=$((1024*1024)) count=$((1*1024)) if=/dev/urandom | systemd-cat'
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.2783 s, 95.2 MB/s
real 0m11.283s
user 0m0.007s
sys 0m6.216s
Before this commit:
# time bash -c 'dd bs=$((1024*1024)) count=$((1*1024)) if=/dev/urandom | systemd-cat'
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 52.0788 s, 20.6 MB/s
real 0m52.099s
user 0m0.014s
sys 0m7.170s
As side effect, this corrects the journal's rate limiter feature: we now
always use the unit name as key for the ratelimiter.
MESSAGE=data\n and MESSAGE\n40000000data\n are both valid serializations, so
they should be stored in the journal. Before, MESSAGE, SYSLOG_FACILITY,
SYSLOG_IDENTIFIER, PRIORITY, and OBJECT_PID would be only honoured if they were
given in the first form.
Fixed#5973.
For all except the last entry in a single packet, we would dispatch the
message to the journal, but not forward it, nor perform proper cleanup.
Rewrite the code to process each entry in a helper function, and make
server_process_native_message() just call this function in a loop.
Fixes#5643.
v2:
- properly decrement *remaining when processing entry separator
This reverts commit 6355e75610.
The previously mentioned commit inadvertently broke a lot of SELinux related
functionality for both unprivileged users and systemd instances running as
MANAGER_USER. In particular, setting the correct SELinux context after a User=
directive is used would fail to work since we attempt to set the security
context after changing UID. Additionally, it causes activated socket units to
be mislabeled for systemd --user processes since setsockcreatecon() would never
be called.
Reverting this fixes the issues with labeling outlined above, and reinstates
SELinux access checks on unprivileged user services.
Native journal messages (_TRANSPORT=journal) typically don't have a
syslog facility attached to it. As a result when forwarding the messages
to syslog they ended up with facility 0 (LOG_KERN).
Apply syslog_fixup_facility() so we use LOG_USER instead.
Fixes: #5640
The macro determines the right length of a AF_UNIX "struct sockaddr_un" to pass to
connect() or bind(). It automatically figures out if the socket refers to an
abstract namespace socket, or a socket in the file system, and properly handles
the full length of the path field.
This macro is not only safer, but also simpler to use, than the usual
offsetof() + strlen() logic.
The stream event source has a priority of SD_EVENT_PRIORITY_NORMAL+5,
and stdout source +10, but the native and syslog event sources are left
at the default of 0.
As a result, any heavy native or syslog logger can cause starvation of
the other loggers. This is trivially demonstrated by running:
dd if=/dev/urandom bs=8k | od | systemd-cat & # native spammer
systemd-run echo hello & # stream logger
journalctl --follow --output=verbose --no-pager --identifier=echo &
... and wait, and wait, the "hello" never comes.
Now kill %1, "hello" arrives finally.
Let's distuingish the cases where our code takes an active role in
selinux management, or just passively reports whatever selinux
properties are set.
mac_selinux_have() now checks whether selinux is around for the passive
stuff, and mac_selinux_use() for the active stuff. The latter checks the
former, plus also checks UID == 0, under the assumption that only when
we run priviliged selinux management really makes sense.
Fixes: #1941