For the definition of assert_cc() we try to use static_assert and check
for it with "#ifdef". But that can only work if assert.h is imported
before. Hence let's do so.
This is a nice little optimization when using static const strings: we
can now use them directly as JsonVariant objecs, without any additional
allocation.
Simply as a safety precaution so that json objects we read are not
arbitrary amounts deep, so that code that processes json objects
recursively can't be easily exploited (by hitting stack limits).
Follow-up for oss-fuzz#10908
(Nice is that we can accomodate for this counter without increasing the
size of the JsonVariant object.)
Let's move things around a bit, so that the trailing unused whitespace
within the structure due to padding is placed together, so that it is
easier to use for new fields. (Found with pahole)
Let's clean up the failure codepaths, by using _cleanup_.
This relies on the new behaviour of env_append() introduced in the
previous commit that guarantess the list always remains properly NULL
terminated
Following discussions with some kernel folks at All Systems Go! it
appears that file descriptors are not really as expensive as they used
to be (both memory and performance-wise) and it should thus be OK to allow
programs (including unprivileged ones) to have more of them without ill
effects.
Unfortunately we can't just raise the RLIMIT_NOFILE soft limit
globally for all processes, as select() and friends can't handle fds
>= 1024, and thus unexpecting programs might fail if they accidently get
an fd outside of that range. We can however raise the hard limit, so
that programs that need a lot of fds can opt-in into getting fds beyond
the 1024 boundary, simply by bumping the soft limit to the now higher
hard limit.
This is useful for all our client code that accesses the journal, as the
journal merging logic might need a lot of fds. Let's add a unified
function for bumping the limit in a robust way.
This simply adds a new constant we can use for bumping RLIMIT_NOFILE to
a "high" value. It default to 256K for now, which is pretty high, but
smaller than the kernel built-in limit of 1M.
Previously, some tools that needed a higher RLIMIT_NOFILE bumped it to
16K. This new define goes substantially higher than this, following the
discussion with the kernel folks.
All over the place we define local variables for the various sockopts
that take a bool-like "int" value. Sometimes they are const, sometimes
static, sometimes both, sometimes neither.
Let's clean this up, introduce a common const variable "const_int_one"
(as well as one matching "const_int_zero") and use it everywhere, all
acorss the codebase.
The helper is supposed to properly handle cases where .sun_path does not
contain a NUL byte, and thus copies out the path suffix a NUL as
necessary.
This also reworks the more specific socket_address_unlink() to be a
wrapper around the more generic sockaddr_un_unlink()
This makes use of assert_cc() to guard against missing CASE macros,
instead of a manual implementation that might result in a static
variable to be allocated.
More importantly though this changes the base type for the array used to
determine the number of arguments for the compile time check from "int"
to "long double". This is done in order to avoid warnings from "ubsan"
that possibly large constants are assigned to small types. "long double"
hopefully isn't vulnerable to that.
Fixes: #10332
Mempool use is enabled or disabled based on the mempool_use_allowed symbol that
is linked in.
Should fix assert crashes in external programs caused by #9792.
Replaces #10286.
v2:
- use two different source files instead of a gcc constructor
linux/capability.h's CAP_TO_MASK potentially shifts a signed int "1"
(i.e. 32bit wide) left by 31 which means it becomes negative. That's
just weird, and ubsan complains about it. Let's introduce our own macro
CAP_TO_MASK_CORRECTED which doesn't fall into this trap, and make use of
it.
Fixes: #10347
As preparation for OCI support in nspawn, let's add a JSON parser.
The json.h file contains an explanation why this is new code instead of
just us linking against an existing JSON library.
Cgroup v2 provides the eBPF-based device controller, which isn't currently
supported by systemd. This commit aims to provide such support.
There are no user-visible changes, just the device policy and whitelist
start working if cgroup v2 is used.
Let's make sure the integers we parse out are not larger than USHRT_MAX.
This is a good idea as the kernel's TIOCSWINSZ ioctl for sizing
terminals can't take larger values, and we shouldn't risk an overflow.
Comes with tests.
Also add direct test for $SYSTEMD_PROC_CMDLINE.
In test-proc-cmdline, "true" was masquerading as PROC_CMDLINE_STRIP_RD_PREFIX,
fix that. Also, reorder functions to match call order.
Previously, together with kill_dots true, patch like
".", "./.", ".//.//" would all return an empty string.
That is wrong. There must be one "." left to reference
the current directory.
Also, the comment with examples was wrong.
Apparently FAT on some recent kernels can't do RENAME_NOREPLACE, and of
course cannot do linkat()/unlinkat() either (as the hard link concept
does not exist on FAT). Add a fallback using an explicit beforehand
faccessat() check. This sucks, but what we can do if the safe operations
are not available?
Fixes: #10063
LGTM was complaining:
> Multiplication result may overflow 'int' before it is converted to 'long'.
Fix this by changing all types to ssize_t and add a check for overflow
while at it.
v2: fix error in free_and_strndup()
When the orignal and copied message were the same, but shorter than specified
length l, memory read past the end of the buffer would be performed. A test
case is included: a string that had an embedded NUL ("q\0") is used to replace
"q".
v3: Fix one more bug in free_and_strndup and add tests.
v4: Some style fixed based on review, one more use of free_and_replace, and
make the tests more comprehensive.
Allows configuring the watchdog signal (with a default of SIGABRT).
This allows an alternative to SIGABRT when coredumps are not desirable.
Appropriate references to SIGABRT or aborting were renamed to reflect
more liberal watchdog signals.
Closes#8658
Let's change utf16_to_utf8() prototype to refer to utf16 chars with char16_t rather than void
Let's not cast away a "const" needlessly.
Let's add a few comments.
Let's fix the calculations of the buffer size to allocate, and how long
to run the loop in case of uneven byte numbers
Let's fix an indentation issue.
Let's avoid yoda comparisons.
Let's drop unnecessary ().
Let's make sure we convert 16bit values to 32bit before shifting them by
10bit to the left, to avoid overflows.
Let's avoid comparisons between signed literals and unsigned variables,
in particular if the literals are outside of the minimum range C
requires for "int".
Let's avoid a few casts in the function. Also, let's drop the "const"
when returning the string, for similar reasons as strchr() and friends
drop it: so that we don't add a const if the user passes in a non-const
string.
"because blacklisted" is not grammatical, it should be something like
"becuase it is blacklisted", which in turn in too verbose. Let's use
a shorter form.
Quoting https://github.com/systemd/systemd/issues/10074:
> detect_vm_uml() reads /proc/cpuinfo with read_full_file()
> read_full_file() has a file max limit size of READ_FULL_BYTES_MAX=(4U*1024U*1024U)
> Unfortunately, the size of my /proc/cpuinfo is bigger, approximately:
> echo $(( 4* $(cat /proc/cpuinfo | wc -c)))
> 9918072
> This causes read_full_file() to fail and the Condition test fallout.
Let's just read line by line until we find an intersting line. This also
helps if not running under UML, because we avoid reading as much data.
optind may be used in each verb, e.g., udevadm. So, let's initialize
optind before calling verbs.
Without this, e.g., udevadm -d hwdb --update causes error in parsing arguments.
Both SO_SNDBUFFORCE and SO_RCVBUFFORCE requires capability 'net_admin'.
If this capability is not granted to the service the first attempt to increase
the recv/snd buffers (via sd_notify()) with SO_RCVBUFFORCE/SO_SNDBUFFORCE will
fail, even if the requested size is lower than the limit enforced by the
kernel.
If apparmor is used, the DENIED logs for net_admin will show up. These log
entries are seen as red warning light, because they could indicate that a
program has been hacked and tries to compromise the system.
It would be nicer if they can be avoided without giving services (relying on
sd_notify) net_admin capability or dropping DENIED logs for all such services
via their apparmor profile.
I'm not sure if sd_notify really needs to forcibly increase the buffer sizes,
but at least if the requested size is below the kernel limit, the capability
(hence the log entries) should be avoided.
Hence let's first ask politely for increasing the buffers and only if it fails
then ignore the kernel limit if we have sufficient privileges.
According to RFC2616[1], HTTP header names are case-insensitive. So
it's totally valid to have a header starting with either `Date:` or
`date:`.
However, when systemd-importd pulls an image from an HTTP server, it
parses HTTP headers by comparing header names as-is, without any
conversion. That causes failures when some HTTP servers return headers
with different combinations of upper-/lower-cases.
An example:
https://alpha.release.flatcar-linux.net/amd64-usr/current/flatcar_developer_container.bin.bz2 returns `Etag: "pe89so9oir60"`,
while https://alpha.release.core-os.net/amd64-usr/current/coreos_developer_container.bin.bz2
returns `ETag: "f03372edea9a1e7232e282c346099857"`.
Since systemd-importd expects to see `ETag`, the etag for the Container Linux image
is correctly interpreted as a part of the hidden file name.
However, it cannot parse etag for Flatcar Linux, so the etag the Flatcar Linux image
is not appended to the hidden file name.
```
$ sudo ls -al /var/lib/machines/
-r--r--r-- 1 root root 3303014400 Aug 21 20:07 '.raw-https:\x2f\x2falpha\x2erelease\x2ecore-os\x2enet\x2famd64-usr\x2fcurrent\x2fcoreos_developer_container\x2ebin\x2ebz2.\x22f03372edea9a1e7232e282c346099857\x22.raw'
-r--r--r-- 1 root root 3303014400 Aug 17 06:15 '.raw-https:\x2f\x2falpha\x2erelease\x2eflatcar-linux\x2enet\x2famd64-usr\x2fcurrent\x2fflatcar_developer_container\x2ebin\x2ebz2.raw'
```
As a result, when the Flatcar image is removed and downloaded again,
systemd-importd is not able to determine if the file has been already
downloaded, so it always download it again. Then it fails to rename it
to an expected name, because there's already a hidden file.
To fix this issue, let's introduce a new helper function
`memory_startswith_no_case()`, which compares memory regions in a
case-insensitive way. Use this function in `curl_header_strdup()`.
See also https://github.com/kinvolk/kube-spawn/issues/304
[1]: https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2
This work add support to generic netlink to sd-netlink.
See https://lwn.net/Articles/208755/
networkd: add support FooOverUDP support to IPIP tunnel netdev
https://lwn.net/Articles/614348/
Example conf:
/lib/systemd/network/1-fou-tunnel.netdev
```
[NetDev]
Name=fou-tun
Kind=fou
[FooOverUDP]
Port=5555
Protocol=4
```
/lib/systemd/network/ipip-tunnel.netdev
```
[NetDev]
Name=ipip-tun
Kind=ipip
[Tunnel]
Independent=true
Local=10.65.208.212
Remote=10.65.208.211
FooOverUDP=true
FOUDestinationPort=5555
```
$ ip -d link show ipip-tun
```
5: ipip-tun@NONE: <POINTOPOINT,NOARP> mtu 1472 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ipip 10.65.208.212 peer 10.65.208.211 promiscuity 0
ipip remote 10.65.208.211 local 10.65.208.212 ttl inherit pmtudisc encap fou encap-sport auto encap-dport 5555 noencap-csum noencap-csum6 noencap-remcsum numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
```
Pretty much all intel cpus have had RDRAND in a long time. While
CPU-internal RNG are widely not trusted, for seeding hash tables it's
perfectly OK to use: we don't high quality entropy in that case, hence
let's use it.
This is only hooked up with 'high_quality_required' is false. If we
require high quality entropy the kernel is the only source we should
use.
Let's fold get_user_creds_clean() into get_user_creds(), and introduce a
flags argument for it to select "clean" behaviour. This flags parameter
also learns to other new flags:
- USER_CREDS_SYNTHESIZE_FALLBACK: in this mode the user records for
root/nobody are only synthesized as fallback. Normally, the synthesized
records take precedence over what is in the user database. With this
flag set this is reversed, and the user database takes precedence, and
the synthesized records are only used if they are missing there. This
flag should be set in cases where doing NSS is deemed safe, and where
there's interest in knowing the correct shell, for example if the
admin changed root's shell to zsh or suchlike.
- USER_CREDS_ALLOW_MISSING: if set, and a UID/GID is specified by
numeric value, and there's no user/group record for it accept it
anyway. This allows us to fix#9767
This then also ports all users to set the most appropriate flags.
Fixes: #9767
[zj: remove one isempty() call]
This is a bit like the info link in most of GNU's --help texts, but we
don't do info but man pages, and we make them properly clickable on
terminal supporting that, because awesome.
I think it's generally advisable to link up our (brief) --help texts and
our (more comprehensive) man pages a bit, so this should be an easy and
straight-forward way to do it.
tmpfiles now passes an O_PATH fd to btrfs_subvol_make_fd() under the
assumption it will accept it like mkdirat() does. So far this assumption
was wrong, let's correct that.
Without that tmpfiles' on btrfs file systems failed systematically...
Looked for definitions of functions using the *_compare_func() suffix.
Tested:
- Unit tests passed (ninja -C build/ test)
- Installed this build and booted with it.
Macro returns -1, 0, 1 depending on whether a < b, a == b or a > b.
It's safe to use on unsigned types.
Add tests to confirm corner cases are properly covered.
Drop __extension__, since we don't use gcc -Wpedantic or -ansi.
Reformat code for spacing. Add spaces after commas almost everywhere.
Reindent code blocks in macro definitions, for consistency.
When clients don't follow protocol and use the same object from
different threads, then we previously would silently corrupt memory.
With this assert we'll fail with an assert(). This doesn't fix anything
but certainly makes mis-uses easier to detect and debug.
Triggered by https://bugzilla.redhat.com/show_bug.cgi?id=1609349
These take a struct iovec to send data together with the passed FD.
The receive function returns the FD through an output argument. In case data is
received, but no FD is passed, the receive function will set the output
argument to -1 explicitly.
Update code in dynamic-user to use the new helpers.
This flag mimics what "O_NOFOLLOW|O_PATH" does for open(2) that is
chase_symlinks() will not resolve the final pathname component if it's a
symlink and instead will return a file descriptor referring to the symlink
itself.
Note: if CHASE_SAFE is also passed, no safety checking is performed on the
transition done if the symlink would have been followed.
machined exposes the pseudo-container ".host" as a reference to the host
system, and this means "machinectl login .host" and "machinectl shell
.host" get your a login/shell on the host. systemd-run currently doesn't
allow that. Let's fix that, and make sd-bus understand ".host" as an
alias for connecting to the host system.
If 'v' is negative, it's wrong to add the decimal to it, as we'd
actually need to subtract it in this case. But given that we don't want
to allow negative vaues anyway, simply check earlier whether what we
have parsed so far was negative, and react to that before adding the
decimal to it.
We often open the parent directory of a path. Let's add a common helper
for that, that shortens our code a bit and adds some extra safety
checks, for example it will fail if used on the root directory (which
doesn't really have a parent).
The helper is actually generalized from a function in btrfs-util.[ch]
which already existed for this purpose.
Remove "arbitrary named hierarchies" from the list of things that
cg_kernel_controllers() might return, and clarify that "name="
pseudo-controllers are not included in the returned list.
/proc/cgroups does not contain "name=" pseudo-controllers, and
cg_kernel_controllers() makes no effort to enumerate them via a different
mechanism.
These custom macros make the expression go through a function, in order
to prevent ASSERT_SIDE_EFFECT false positives on our macros such as
assert_se() and assert_return() that cannot be disabled and will always
evaluate their expressions.
This technique has been described and recommended in:
https://community.synopsys.com/s/question/0D534000046Yuzb/suppressing-assertsideeffect-for-functions-that-allow-for-sideeffects
Tested by doing a local cov-build and uploading the resulting tarball to
scan.coverity.com, confirmed that the ASSERT_SIDE_EFFECT false positives
were gone.
key_serial_t is defined in keyutil.h, which wasn't included in the header list
in the test, so the test always failed. We were always compiling stuff with
!HAVE_KEY_SERIAL_T.
We could try to add keyutil.h to the test, but then we'd have to first check if
it is available, which just doesn't seem worth the trouble.
key_serial_t should always be defined as int32_t. Let's keep the uncoditional
define, since repeated compatible typedefs are not a problem, and it allows us
to compile even if the header file is missing. If there's ever a change in the
definition, we'll have to adjust the code for the different type anyway, and
our compiler will tell us.
Using _GNU_SOURCE is better because that's how we include the headers in the
actual build, and some headers define different stuff when it is defined.
sys/stat.h for example defines 'struct statx' conditionally.
Unfortunately this needs libshared to link to libkmod. Before it was linked
into systemd-udevd, udevadm, and systemd each seperately. On most systems this
doesn't make much difference, because at least systemd would be installed, but
it might not be in small chroots. It is a small library, so I hope this is not
a big issue.
Starting with glibc 2.27.9000-36.fc29, include file sys/stat.h will have a
definition for struct statx, in which case include file linux/stat.h should be
avoided, in order to prevent a duplicate definition.
In file included from ../src/basic/missing.h:18,
from ../src/basic/util.h:28,
from ../src/basic/hashmap.h:10,
from ../src/shared/bus-util.h:12,
from ../src/libsystemd/sd-bus/bus-creds.c:11:
/usr/include/linux/stat.h:99:8: error: redefinition of ‘struct statx’
struct statx {
^~~~~
In file included from /usr/include/sys/stat.h:446,
from ../src/basic/util.h:19,
from ../src/basic/hashmap.h:10,
from ../src/shared/bus-util.h:12,
from ../src/libsystemd/sd-bus/bus-creds.c:11:
/usr/include/bits/statx.h:36:8: note: originally defined here
struct statx
^~~~~
Extend our meson.build to look for struct statx when only sys/stat.h is
included and, in that case, do not include linux/stat.h anymore.
Tested that systemd builds correctly when using a glibc version that includes a
definition for struct statx.
glibc Fedora RPM update:
28cb5d31fc
glibc upstream commit:
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=fd70af45528d59a00eb3190ef6706cb299488fcd