Also, even if login.defs are not present, don't start allocating at 1, but at
SYSTEM_UID_MIN.
Fixes#9769.
The test is adjusted. Actually, it was busted before, because sysusers would
never use SYSTEM_GID_MIN, so if SYSTEM_GID_MIN was different than
SYSTEM_UID_MIN, the tests would fail. On all "normal" systems the two are
equal, so we didn't notice. Since sysusers now always uses the minimum of the
two, we only need to substitute one value.
We don't (and shouldn't I think) look at them when determining the type of the
user, but they should be used during user/group allocation. (For example, an
admin may specify SYS_UID_MIN==200 to allow statically numbered users that are
shared with other systems in the range 1–199.)
It makes little sense to make the boundary between systemd and user guids
configurable. Nevertheless, a completely fixed compile-time define is not
enough in two scenarios:
- the systemd_uid_max boundary has moved over time. The default used to be
500 for a long time. Systems which are upgraded over time might have users
in the wrong range, but changing existing systems is complicated and
expensive (offline disks, backups, remote systems, read-only media, etc.)
- systems are used in a heterogenous enviornment, where some vendors pick
one value and others another.
So let's make this boundary overridable using /etc/login.defs.
Fixes#3855, #10184.
Fixes#17035. We use "," as the separator between arguments in fstab and crypttab
options field, but the kernel started using "," within arguments. Users will need
to escape those nested commas.
We had a special test case that the second semicolon would be interpreted
as an executable name. We would then try to find the executable and rely
on ";" not being found to cause ENOEXEC to be returned. I think that's just
crazy. Let's treat the second semicolon as a separator and ignore the
whole thing as we would whitespace.
test-execute is quite long and even with the test name it takes a moment
to find the relevant spot when something fails. Let's make things easier
by printing the exact location.
Define explicit action "kill" for SystemCallErrorNumber=.
In addition to errno code, allow specifying "kill" as action for
SystemCallFilter=.
---
v7: seccomp_parse_errno_or_action() returns -EINVAL if !HAVE_SECCOMP
v6: use streq_ptr(), let errno_to_name() handle bad values, kill processes,
init syscall_errno
v5: actually use seccomp_errno_or_action_to_string(), don't fail bus unit
parsing without seccomp
v4: fix build without seccomp
v3: drop log action
v2: action -> number
On centos7 ci:
--- test-libcrypt-util begin ---
Found container virtualization none.
/* test_hash_password */
ew3bU1.hoKk4o: yes
$1$gc5rWpTB$wK1aul1PyBn9AX1z93stk1: no
$2b$12$BlqcGkB/7BFvNMXKGxDea.5/8D6FTny.cbNcHW/tqcrcyo6ZJd8u2: no
$5$lGhDrcrao9zb5oIK$05KlOVG3ocknx/ThreqXE/gk.XzFFBMTksc4t2CPDUD: no
$6$c7wB/3GiRk0VHf7e$zXJ7hN0aLZapE.iO4mn/oHu6.prsXTUG/5k1AxpgR85ELolyAcaIGRgzfwJs3isTChMDBjnthZyaMCfCNxo9I.: no
$y$j9T$$9cKOWsAm4m97WiYk61lPPibZpy3oaGPIbsL4koRe/XD: no
This lets the libc/xcrypt allocate as much storage area as it needs.
Should fix#16965:
testsuite-46.sh[74]: ==74==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7f3e972e1080 at pc 0x7f3e9be8deed bp 0x7ffce4f28530 sp 0x7ffce4f27ce0
testsuite-46.sh[74]: WRITE of size 131232 at 0x7f3e972e1080 thread T0
testsuite-46.sh[74]: #0 0x7f3e9be8deec (/usr/lib/clang/10.0.1/lib/linux/libclang_rt.asan-x86_64.so+0x9feec)
testsuite-46.sh[74]: #1 0x559cd05a6412 in user_record_make_hashed_password /systemd-meson-build/../build/src/home/user-record-util.c:818:21
testsuite-46.sh[74]: #2 0x559cd058fb03 in create_home /systemd-meson-build/../build/src/home/homectl.c:1112:29
testsuite-46.sh[74]: #3 0x7f3e9b5b3058 in dispatch_verb /systemd-meson-build/../build/src/shared/verbs.c:103:24
testsuite-46.sh[74]: #4 0x559cd058c101 in run /systemd-meson-build/../build/src/home/homectl.c:3325:16
testsuite-46.sh[74]: #5 0x559cd058c00a in main /systemd-meson-build/../build/src/home/homectl.c:3328:1
testsuite-46.sh[74]: #6 0x7f3e9a88b151 in __libc_start_main (/usr/lib/libc.so.6+0x28151)
testsuite-46.sh[74]: #7 0x559cd0583e7d in _start (/usr/bin/homectl+0x24e7d)
testsuite-46.sh[74]: Address 0x7f3e972e1080 is located in stack of thread T0 at offset 32896 in frame
testsuite-46.sh[74]: #0 0x559cd05a60df in user_record_make_hashed_password /systemd-meson-build/../build/src/home/user-record-util.c:789
testsuite-46.sh[74]: This frame has 6 object(s):
testsuite-46.sh[74]: [32, 40) 'priv' (line 790)
testsuite-46.sh[74]: [64, 72) 'np' (line 791)
testsuite-46.sh[74]: [96, 104) 'salt' (line 809)
testsuite-46.sh[74]: [128, 32896) 'cd' (line 810)
testsuite-46.sh[74]: [33152, 33168) '.compoundliteral' <== Memory access at offset 32896 partially underflows this variable
testsuite-46.sh[74]: [33184, 33192) 'new_array' (line 832) <== Memory access at offset 32896 partially underflows this variable
testsuite-46.sh[74]: HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
testsuite-46.sh[74]: (longjmp and C++ exceptions *are* supported)
testsuite-46.sh[74]: SUMMARY: AddressSanitizer: stack-buffer-overflow (/usr/lib/clang/10.0.1/lib/linux/libclang_rt.asan-x86_64.so+0x9feec)
It seems 'struct crypt_data' is 32896 bytes, but libclang_rt wants more, at least 33168?
Two functional changes:
- "/" is now refused. The test is adjusted.
- The trailing NUL is *not* included in the returned size for abstract size. The
comments in sockaddr_un_set_path() indicate that this is the right thing to do,
and the code in socket_address_parse() wasn't doing that.
Closes#12624.
The formatting in systemd.socket.xml is updated a bit.
Currently in_addr_port_ifindex_name_to_string() always prints the ifindex
numerically. This is not super useful since the interface numbers are
semi-random. Should we use interface names in preference?
We would set .type to a fake value. All real callers (outside of tests)
immediately overwrite .type with a proper value after calling
socket_address_parse(). So let's not set it and adjust the few places
that relied on it being set to the fake value.
socket_address_parse() is modernized to only set the output argument on
success.
We would use the return value from the tested function to decide
what to print as "expected", which is confusing when something is wrong
with the tested function.
This is useful for duplicating trees that contain hardlinks: we keep
track of potential hardlinks and try to reproduce them within the
destination tree. (We do not hardlink between source and destination!).
This is useful for trees like ostree images which heavily use hardlinks
and which are otherwise exploded into separate copies of all files when
we duplicate the trees.
Behaviour is not identical, as shown by the tests in test-strv.
The combination of EXTRACT_UNQUOTE without EXTRACT_RELAX only appears in
the test, so it doesn't seem particularly important. OTOH, the difference
in handling of squished parameters could make a difference. New behaviour
is what both bash and python do, so I think we can ignore this corner case.
This change has the following advantages:
- the duplication of code paths that do a very similar thing is removed
- extract_one_word() / strv_split_extract() return a proper error code.
This is useful information, I don't know why we forgot to add it there.
gcc doesn't like arithemetic on a pointer to a function or void*, so don't
print signedness info there. It doesn't matter anyway.
C says function pointers can be different... Though I guess our code isn't
prepared for that.
If the directory (/var/lib/private is most likely) has borked permissions, the
test will fail with a cryptic message and EXIT_STATE_DIRECTORY or similar. The
message from the child with more details gets lost somewhere. Let's avoid running
the test in that case and provide a simple error message instead.
E.g. systemd-238-12.git07f8cd5.fc28.ppc64 (which I encountered on a test machine)
has /var/lib/private with 0755.
I had to move STRV_MAKE to macro.h. There is a circular dependency between
extract-word.h, strv.h, and string-util.h that makes it hard to define the
inline function otherwise.
This is cleaner that way given that we create our own half-virtualizes
device tree, and really shouldn't pull selinux labelling and access
control into that, we can only lose, in particular as our overmounted
/sys/ actually lacks /sys/fs/selinux.
(This fixes udev test woes introduced by #16821 where suddenly the test
would fail because libselinux assumed selinux was on, but selinuxfs
wasn't actually available)
JSON strings must be utf-8-clean. We also verify this in json_parse_string()
so we would reject a message with invalid utf-8 anyway.
It would probably be slightly cheaper to detect non-conformaning strings in
serialization, but then we'd have to fail serialization. By doing this early,
we give the caller a chance to handle the error nicely.
The test is adjusted to contain a valid utf-8 string after decoding of the
utf-32 encoding in json ("विवेकख्यातिरविप्लवा हानोपायः।", something about the
cessation of ignorance).
There is little point in #defining and #undefining CAP_LAST_CAP multiple times.
The check is only done in developer mode. After all, it's not an error to
compile on a newer kernel, and we shouldn't even warn in that case.
I find this version much more readable.
Add replacement defines so that when acl/libacl.h is not available, the
ACL_{READ,WRITE,EXECUTE} constants are also defined. Those constants were
declared in the kernel headers already in 1da177e4c3f41524e886b7f1b8a0c1f,
so they should be the same pretty much everywhere.
Ideally we would like to hide all other service's credentials for all
services. That would imply for us to enable mount namespacing for all
services, which is something we cannot do, both due to compatibility
with the status quo ante, and because a number of services legitimately
should be able to install mounts in the host hierarchy.
Hence we do the second best thing, we hide the credentials automatically
for all services that opt into mount namespacing otherwise. This is
quite different from other mount sandboxing options: usually you have to
explicitly opt into each. However, given that the credentials logic is a
brand new concept we invented right here and now, and particularly
security sensitive it's OK to reverse this, and by default hide
credentials whenever we can (i.e. whenever mount namespacing is
otherwise opt-ed in to).
Long story short: if you want to hide other service's credentials, the
most basic options is to just turn on PrivateMounts= and there you go,
they should all be gone.
When removing a directory tree as unprivileged user we might encounter
files owned by us but not deletable since the containing directory might
have the "r" bit missing in its access mode. Let's try to deal with
this: optionally if we get EACCES try to set the bit and see if it works
then.
Kernel 5.8 gained a hidepid= implementation that is truly per procfs,
which allows us to mount a distinct once into every unit, with
individual hidepid= settings. Let's expose this via two new settings:
ProtectProc= (wrapping hidpid=) and ProcSubset= (wrapping subset=).
Replaces: #11670
it's not entirely clear what shall be passed via parameter and what via
struct, but these two definitely fit well with the other protect_xyz
fields, hence let's move them over.
We probably should move a lot more more fields into the structure
actuall (most? all even?).
This patch adds seccomp support to the riscv64 architecture. seccomp
support is available in the riscv64 kernel since version 5.5, and it
has just been added to the libseccomp library.
riscv64 uses generic syscalls like aarch64, so I used that architecture
as a reference to find which code has to be modified.
With this patch, the testsuite passes successfully, including the
test-seccomp test. The system boots and works fine with kernel 5.4 (i.e.
without seccomp support) and kernel 5.5 (i.e. with seccomp support). I
have also verified that the "SystemCallFilter=~socket" option prevents a
service to use the ping utility when running on kernel 5.5.
For some reason this failed in koji build on s390x:
--- command ---
16:12:46 PATH='/builddir/build/BUILD/systemd-stable-246.1/s390x-redhat-linux-gnu:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/sbin' SYSTEMD_LANGUAGE_FALLBACK_MAP='/builddir/build/BUILD/systemd-stable-246.1/src/locale/language-fallback-map' SYSTEMD_KBD_MODEL_MAP='/builddir/build/BUILD/systemd-stable-246.1/src/locale/kbd-model-map' /builddir/build/BUILD/systemd-stable-246.1/s390x-redhat-linux-gnu/test-acl-util
--- stdout ---
-rw-r-----. 1 mockbuild mock 0 Aug 7 16:12 /tmp/test-empty.7RzmEc
other::---
--- stderr ---
Assertion 'r >= 0' failed at src/test/test-acl-util.c:42, function test_add_acls_for_user(). Aborting.
The concept is flawed, and mostly useless. Let's finally remove it.
It has been deprecated since 90a2ec10f2 (6
years ago) and we started to warn since
55dadc5c57 (1.5 years ago).
Let's get rid of it altogether.
Let's make /run/host the sole place we pass stuff from host to container
in and place the "inaccessible" nodes in /run/host too.
In contrast to the previous two commits this is a minor compat break, but
not a relevant one I think. Previously the container manager would place
these nodes in /run/systemd/inaccessible/ and that's where PID 1 in the
container would try to add them too when missing. Container manager and
PID 1 in the container would thus manage the same dir together.
With this change the container manager now passes an immutable directory
to the container and leaves /run/systemd entirely untouched, and managed
exclusively by PID 1 inside the container, which is nice to have clear
separation on who manages what.
In order to make sure systemd then usses the /run/host/inaccesible/
nodes this commit changes PID 1 to look for that dir and if it exists
will symlink it to /run/systemd/inaccessible.
Now, this will work fine if new nspawn and new pid 1 in the container
work together. as then the symlink is created and the difference between
the two dirs won't matter.
For the case where an old nspawn invokes a new PID 1: in this case
things work as they always worked: the dir is managed together.
For the case where different container manager invokes a new PID 1: in
this case the nodes aren't typically passed in, and PID 1 in the
container will try to create them and will likely fail partially (though
gracefully) when trying to create char/block device nodes. THis is fine
though as there are fallbacks in place for that case.
For the case where a new nspawn invokes an old PID1: this is were the
(minor) incompatibily happens: in this case new nspawn will place the
nodes in the /run/host/inaccessible/ subdir, but the PID 1 in the
container won't look for them there. Since the nodes are also not
pre-created in /run/systed/inaccessible/ PID 1 will try to create them
there as if a different container manager sets them up. This is of
course not sexy, but is not a total loss, since as mentioned fallbacks
are in place anyway. Hence I think it's OK to accept this minor
incompatibility.
The usage in unit_get_own_mask is redundant, we only need apply
disable_mask at the end befor application, i.e. calculating enable or
target mask.
(IOW, we allow all configurations, but disabling affects effective
controls.)
Modify tests accordingly and add testing of enable mask.
This is intended as cleanup, with no effect but changing unit_dump
output.
Unprivileged test-fs-util fails on my system since /sys/dev/block is
inaccessible for unprivileged users, so let's skip encrypted path test if we
get EACCES or similar.
Follows the same pattern and features as RootImage, but allows an
arbitrary mount point under / to be specified by the user, and
multiple values - like BindPaths.
Original implementation by @topimiettinen at:
https://github.com/systemd/systemd/pull/14451
Reworked to use dissect's logic instead of bare libmount() calls
and other review comments.
Thanks Topi for the initial work to come up with and implement
this useful feature.