This reworks the user validation infrastructure. There are now two
modes. In regular mode we are strict and test against a strict set of
valid chars. And in "relaxed" mode we just filter out some really
obvious, dangerous stuff. i.e. strict is whitelisting what is OK, but
"relaxed" is blacklisting what is really not OK.
The idea is that we use strict mode whenver we allocate a new user
(i.e. in sysusers.d or homed), while "relaxed" mode is when we process
users registered elsewhere, (i.e. userdb, logind, …)
The requirements on user name validity vary wildly. SSSD thinks its fine
to embedd "@" for example, while the suggested NAME_REGEX field on
Debian does not even allow uppercase chars…
This effectively liberaralizes a lot what we expect from usernames.
The code that warns about questionnable user names is now optional and
only used at places such as unit file parsing, so that it doesn't show
up on every userdb query, but only when processing configuration files
that know better.
Fixes: #15149#15090
split() and FOREACH_WORD really should die, and everything be moved to
extract_first_word() and friends, but let's at least make sure that for
the remaining code using it we can't deadlock by not progressing in the
word iteration.
Fixes: #15305
The kernel does not sanitize /proc/cmdline. E.g. when running under qemu, it is
easy to pass a string with newline by mistake. We use read_one_line_file(), so
we would read only the first list of the file, and
write_string_file(WRITE_STRING_FILE_VERIFY_ON_FAILURE) would fail because the
target file is obviously different. Change to a kernel-generated file to avoid
the issue.
v2:
- use /proc/version instead of /proc/uptime for attempted writes, so the test
test passes even if test_write_string_file_verify() takes more than 10 ms ;]
I think the two names were both pretty bad. They did not give a proper hint
what the difference between the two functions is, and sd_path_home sounds like
it is somehow related to /home or home directories or whatever, when in fact
both functions return the same set of paths as either a colon-delimited string
or a strv. "_strv" suffix is used by various functions in sd-bus, so let's
reuse that.
Those functions are not public yet, so let's rename.
In 1a29610f5f the change inadvertedly
disabled names with digit as the first character. This follow-up change
allows a digit as the first character in compat mode.
Fixes: #15141
It returns 32 bits, unsigned on amd64, so it's probably similar everywhere
with glibc. But let's make the code generic, without assuming specific size
or signedness.
The man pages state that the '+' prefix in Exec* directives should
ignore filesystem namespacing options such as PrivateTmp. Now it does.
This is very similar to #8842, just with PrivateTmp instead of
PrivateDevices.
Fix the following issues in test-cgroup:
1, commit 65be7e0652 ("pid1: do not reset subtree_control on
already-existing units with delegation") changed the return value
of cg_create () as follows:
"Returns 0 if the group already existed, 1 on success, negative
otherwise."
So we need to modify the test cases related to cg_create ().
2, commit efdb02375b ("core: unified cgroup hierarchy support")
changed cg_delete () to cg_rmdir (), so the test cases also need
to be adjusted a bit.
3. There is no cleanup of "test-a". If we execute test-cgroup
multiple times, we will encounter an error.
Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
For #8495: it is arguably useful to not show the length of the password
in public spaces. It is possible to press TAB or BS to cancel the asterisks,
but this is not very discoverable. Let's make it discoverable by showing
a message (in gray). The message is "erased" after the first character
is entered.
test-ask-password-api would crash if ^D was pressed.
If think the callers generally expect a non-empty strv as reply. Let's
return an error if we have nothing to return.
Also modernize test-ask-password-api a bit.
We use those strings as hash keys. While writing "a...b" looks strange,
"a///b" does not look so strange. Both syntaxes would actually result in the
value being correctly written to the file, but they would confuse our
de-deplication over keys. So let's normalize. Output also becomes nicer.
Add test.
This way we can use libxcrypt specific functionality such as
crypt_gensalt() and thus take benefit of the newer algorithms libxcrypt
implements. (Also adds support for a new env var $SYSTEMD_CRYPT_PREFIX
which may be used to select the hash algorithm to use for libxcrypt.)
Also, let's move the weird crypt.h inclusion into libcrypt.h so that
there's a single place for it.
We'd just print nothing and exit with 0. If the user gave an explicit
name, we should fail. If a pattern didn't match, we should at least warn.
$ networkctl status enx54ee75cb1dc0a* --no-pager && echo $?
No interfaces matched.
0
$ networkctl status enx54ee75cb1dc0a --no-pager
Interface "enx54ee75cb1dc0a" not found.
1
In subsequent commits, calls to if_nametoindex() will be replaced by a wrapper
that falls back to alternative name resolution over netlink. netlink support
requires libsystemd (for sd-netlink), and we don't want to add any functions
that require netlink in basic/. So stuff that calls if_nametoindex() for user
supplied interface names, and everything that depends on that, needs to be
moved.
If we have a template unit template@.service, it should be allowed to specify a
dependency on a unit without an instance, bar@.service. When the unit is created,
the instance will be propagated into the target, so template@inst.service will
depend on bar@inst.service.
This commit changes unit_dependency_name_compatible(), which makes the manager
accept links like that, and unit_file_verify_alias(), so that the installation
function will agree to create a symlink like that, and finally the tests are
adjusted to pass.
This mostly reuses existing checkers used by pid1, so handling of aliases
should be consistent. Hopefully, with the test it'll be clearer what it
happening.
Support for .wants/.requires "aliases" is restored. Those are still used in the
wild quite a bit, so we need to support them.
See https://github.com/systemd/systemd/pull/13119 for a discussion of aliases
with an instance that point to a different template: this is allowed.
Let's add the actual unicode names of the glyphs we use. Let's also add
in comments what the width expectations of these glyphs are on the
console.
Also, remove the "mdash" definition. First of all it wasn't used, but
what's worse the glyph encoded was actually an "ndash"...
Fixes: #14075
This has been requested many times before. Let's add it finally.
GPT auto-discovery for /var is a bit more complex than for other
partition types: the other partitions can to some degree be shared
between multiple OS installations on the same disk (think: swap, /home,
/srv). However, /var is inherently something bound to an installation,
i.e. specific to its identity, or actually *is* its identity, and hence
something that cannot be shared.
To deal with this this new code is particularly careful when it comes to
/var: it will not mount things blindly, but insist that the UUID of the
partition matches a hashed version of the machine-id of the
installation, so that each installation has a very specific /var
associated with it, and would never use any other. (We actually use
HMAC-SHA256 on the GPT partition type for /var, keyed by the machine-id,
since machine-id is something we want to keep somewhat private).
Setting the right UUID for installations takes extra care. To make
things a bit simpler to set up, we avoid this safety check for nspawn
and RootImage= in unit files, under the assumption that such container
and service images unlikely will have multiple installations on them.
The check is hence only required when booting full machines, i.e. in
in systemd-gpt-auto-generator.
To help with putting together images for full machines, PR #14368
introduces a repartition tool that can automatically fill in correctly
calculated UUIDs on first boot if images have the var partition UUID
initialized to all zeroes. With that in place systems can be put
together in a way that on first boot the machine ID is determined and
the partition table automatically adjusted to have the /var partition
with the right UUID.
To support ProtectHome=y in a user namespace (which mounts the inaccessible
nodes), the nodes need to be accessible by the user. Create these paths and
devices in the user runtime directory so they can be used later if needed.
Like with shmat already the actual results of the test
test_memory_deny_write_execute_mmap depend on kernel/libseccomp/glibc
of the platform it is running on.
There are known-good platforms, but on the others do not assert success
(which implies test has actually failed as no seccomp blocking was achieved),
but instead make the check dependent to the success of the mmap call
on that platforms.
Finally the assert of the munmap on that valid pointer should return ==0,
so that is what the check should be for in case of p != MAP_FAILED.
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Let's add a concept of normalization: as preparation for signing json
records let's add a mechanism to bring JSON records into a well-defined
order so that we can safely validate JSON records.
This adds two booleans to each JsonVariant object: "sorted" and
"normalized". The latter indicates whether a variant is fully sorted
(i.e. all keys of objects listed in alphabetical order) recursively down
the tree. The former is a weaker property: it only checks whether the
keys of the object itself are sorted. All variants which are
"normalized" are also "sorted", but not vice versa.
The knowledge of the "sorted" property is then used to optimize
searching for keys in the variant by using bisection.
Both properties are determined at the moment the variants are allocated.
Since our objects are immutable this is safe.
This will call json_variant_sensitive() internally while parsing for
each allocated sub-variant. This is better than calling it a posteriori
at the end, because partially parsed variants will always be properly
erased from memory this way.
Change test_set_ambient_caps() to test_apply_ambient_caps(), since the
function capability_ambient_set_apply() not only sets ambient
capabilities, but clears inherited capabilities that are not explicitly
requested by the caller.
The conf-parser machinery already removed whitespace before and after "=", no
need to repeat this step.
The test is adjusted to pass. It was testing an code path that doesn't happen
normally, no point in doing that.
EOF is defined to -1, hence on platforms that have "char" unsigned we
can't compare it as-is, except if we accept an implicit cast. let's make
it an explicit cast, acknowledging the issue.
Fixes: #14118
If we ignore any uknown section, we will not be able to show any
warning if a typo in a section name is made. Let's reverse our
approach, and explicitly list sections to ignore instead.
I opted to make use the same section list for this, instead of adding a second
list, because this list is passed through to many functions and adding yet
another parameter to the long signature would be very noisy.
TasksMax= and DefaultTasksMax= can be specified as percentages. We don't
actually document of what the percentage is relative to, but the implementation
uses the smallest of /proc/sys/kernel/pid_max, /proc/sys/kernel/threads-max,
and /sys/fs/cgroup/pids.max (when present). When the value is a percentage,
we immediately convert it to an absolute value. If the limit later changes
(which can happen e.g. when systemd-sysctl runs), the absolute value becomes
outdated.
So let's store either the percentage or absolute value, whatever was specified,
and only convert to an absolute value when the value is used. For example, when
starting a unit, the absolute value will be calculated when the cgroup for
the unit is created.
Fixes#13419.
This partially reverts db11487d10 (the logic to
calculate the correct value is removed, we always use the same setting as for
the system manager). Distributions have an easy mechanism to override this if
they wish.
I think making this configurable is better, because different distros clearly
want different defaults here, and making this configurable is nice and clean.
If we don't make it configurable, distros which either have to carry patches,
or what would be worse, rely on some other configuration mechanism, like
/etc/profile. Those other solutions do not apply everywhere (they usually
require the shell to be used at some point), so it is better if we provide
a nice way to override the default.
Fixes #13469.
This makes it easier to see what unit_name_is_valid() returns at a glance.
The output is not whitespace clean, but I think it's good enough for a test.
With meson-0.52.0-1.module_f31+6771+f5d842eb.noarch I get:
src/test/meson.build:19: WARNING: Overriding previous value of environment variable 'PATH' with a new one
When we're using *prepend*, the whole point is to modify an existing variable,
so meson shouldn't warn. But let's set avoid the warning and shorten things by
setting the final value immediately.
The code in cgroup.c has support for all hierarchies, but the test,
as written, will only work on unified. Since the test is really about
bpf code, and not the legacy devices controller, let's just skip
the test.
It turns out that the kernel verifier would reject a program we would build
if there was a whitelist, but no entries in the whitelist matched.
The program would approximately like this:
0: (61) r2 = *(u32 *)(r1 +0)
1: (54) w2 &= 65535
2: (61) r3 = *(u32 *)(r1 +0)
3: (74) w3 >>= 16
4: (61) r4 = *(u32 *)(r1 +4)
5: (61) r5 = *(u32 *)(r1 +8)
48: (b7) r0 = 0
49: (05) goto pc+1
50: (b7) r0 = 1
51: (95) exit
and insn 50 is unreachable, which is illegal. We would then either keep a
previous version of the program or allow everything. Make sure we build a
valid program that simply rejects everything.
64MB is not that much, but let's not be greedy, esp. because we may run
many things in parallel.
Also, rlim_cur should never be higher than rlim_max, so let's simplify our
code.
Discussed in #13743, the -.service semantic conflicts with the
existing root mount and slice names, making this feature not
uniformly extensible to all types. Change the name to be
<type>.d instead.
Updating to this format also extends the top-level dropin to
unit types.
When showing logs from a container, we would fail to show various lines:
Oct 29 09:50:51 krowka systemd-nspawn[61376]: Detected architecture x86-64.
Oct 29 09:50:51 krowka systemd-nspawn[61376]: [1B blob data]
Oct 29 09:50:51 krowka systemd-nspawn[61376]: Welcome to Fedora 32 (Rawhide)!
Oct 29 09:50:51 krowka systemd-nspawn[61376]: [1B blob data]
Those are only harmless \r characters that trail the line. We already replace
tabs and strip various ansi characters that we deem inconsequential, so let's
also strip trailing carriage returns. Non-trailing ones are different, because
they change what would be displayed.
chase_symlinks() would return negative on error, and either a non-negative status
or a non-negative fd when CHASE_OPEN was given. This made the interface quite
complicated, because dependning on the flags used, we would get two different
"types" of return object. Coverity was always confused by this, and flagged
every use of chase_symlinks() without CHASE_OPEN as a resource leak (because it
would this that an fd is returned). This patch uses a saparate output parameter,
so there is no confusion.
(I think it is OK to have functions which return either an error or an fd. It's
only returning *either* an fd or a non-fd that is confusing.)
.sun_path has 108 bytes, and we'd write a string of 108 bytes + NUL.
I added this test, but I don't know what it was supposed to test. Let's
just remove.
Fixes#13713. CID#1405854.
ARPHRD_NETROM was excluded, most likely just because it is protocol No. 0,
and ARPHRD_CISCO was reported under its alias name "HDLC". Let's just
allow defined aliases under the main name.
Our biggest object in libsystemd was a table full of zeros, for the arphdr
names. Let's use a switch (which gcc nicely optimizes for us), instead a
table with a gap between 826 and 65534:
$ ls -l build{,2}/src/basic/a6ba3eb@@basic@sta/arphrd-list.c.o
-rw-rw-r--. 1 zbyszek zbyszek 540232 Sep 22 00:29 build/src/basic/a6ba3eb\@\@basic\@sta/arphrd-list.c.o
-rw-rw-r--. 1 zbyszek zbyszek 20512 Sep 25 11:56 build2/src/basic/a6ba3eb\@\@basic\@sta/arphrd-list.c.o
$ ls -l build{,2}/src/shared/libsystemd-shared-243.so
-rwxrwxr-x. 1 zbyszek zbyszek 6774368 Sep 22 00:29 build/src/shared/libsystemd-shared-243.so
-rwxrwxr-x. 1 zbyszek zbyszek 6254808 Sep 25 12:16 build2/src/shared/libsystemd-shared-243.so
No functional change.
Introduce support for configuring cpus and mems for processes using
cgroup v2 CPUSET controller. This allows users to limit which cpus
and memory NUMA nodes can be used by processes to better utilize
system resources.
The cgroup v2 interfaces to control it are cpuset.cpus and cpuset.mems
where the requested configuration is written. However, it doesn't mean
that the requested configuration will be actually used as parent cgroup
may limit the cpus or mems as well. In order to reflect the real
configuration cgroup v2 provides read-only files cpuset.cpus.effective
and cpuset.mems.effective which are exported to users as well.
In various circumstances, overriding the kernel commandline can be inconvenient.
People have different bootloaders, and e.g. the grub config can be pretty scary.
grubby helps, but it isn't always available.
This option adds an alternative mechanism that can quite convenient on EFI
systems. cmdline settings have higher priority, because they can be (usually)
changed on the bootloader prompt.
$SYSTEMD_EFI_OPTIONS can be used to override, same as $SYSTEMD_PROC_CMDLINE.
I want to use efivars.[ch] in proc-cmdline.c, but most of the efivars stuff is
not needed in basic/. Move the file from shared/ to basic/, but then move back
most of the higher-level functions to the new shared/efi-loader.c file.
It if of course related to /proc/cmdline parsing, but is higher-level
functionality built on top of it. It should be in shared/ because it
is something to be used by pid1 and related utilities, not something for
level-level libraries.
This way less stuff needs to be in basic. Initially, I wanted to move all the
parts of cgroup-utils.[ch] that depend on efivars.[ch] to shared, because
efivars.[ch] is in shared/. Later on, I decide to split efivars.[ch], so the
move done in this patch is not necessary anymore. Nevertheless, it is still
valid on its own. If at some point we want to expose libbasic, it is better to
to not have stuff that belong in libshared there.
This avoid the use of the global variable.
Also rename cgroup_unified_update() to cgroup_unified_cached() and
cgroup_unified_flush() to cgroup_unified() to better reflect their new roles.
This function had two users (apart from tests), and both only used one
argument. And it seems likely that if we need to pass more directories,
either the _nulstr() or the _strv() form would be used. Let's simplify
the code.
Add a fido_id program meant to be run for devices in the hidraw
subsystem via an IMPORT directive. The program parses the HID report
descriptor and assigns the ID_SECURITY_TOKEN environment variable if a
declared usage matches the FIDO_CTAPHID_USAGE declared in the FIDO CTAP
specification. This replaces the previous approach of whitelisting all
known security token models manually.
This commit is accompanied by a test suite and a fuzzer target for the
descriptor parsing routine.
Fixes: #11996.
When we would iterate over the lookup paths for each unit, making the list as
short as possible was important for performance. With the current cache, it
doesn't matter much. Two classes of paths were being removed:
- paths which don't exist in the filesystem
- paths which symlink to a path earlier in the search list
Both of those points cause problems with the caching code:
- if a user creates a directory that didn't exist before and puts units there,
now we will notice the new mtime an properly load the unit. When the path
was removed from list, we wouldn't.
- we now properly detect whether a unit path is on the path or not.
Before, if e.g. /lib/systemd/system, /usr/lib/systemd/systemd were both on
the path, and /lib was a symlink to /usr/lib, the second directory would be
pruned from the path. Then, the code would think that a symlink
/etc/systemd/system/foo.service→/lib/systemd/system/foo.service is an alias,
but /etc/systemd/system/foo.service→/usr/lib/systemd/system/foo.service would
be considered a link (in the systemctl link sense).
Removing the pruning has a slight negative performance impact in case of
usr-merge systems which have systemd compiled with non-usr-merge paths.
Non-usr-merge systems are deprecated, and this impact should be very small, so
I think it's OK. If it turns out to be an issue, the loop in function that
builds the cache could be improved to skip over "duplicate" directories with
same logic that the cache pruning did before. I didn't want to add this,
becuase it complicates the code to improve a corner case.
Fixes#13272.
New functions are called valid_user_group_name_compat() and
valid_user_group_name_or_id_compat() and accept dots in the user
or group name. No functional change except the tests.
This particular test case keeps intermittently failing due to crashing
LSan when running under clang+ASan. Generally, sanitizers don't
like seccomp filters, so the best option here is to just switch this
test off for this scenario.
v2:
- do not watch mtime of transient and generated dirs
We'd reload the map after every transient unit we created, which we don't
need to do, since we create those units ourselves and know their fragment
path.
This reworks how we load units from disk. Instead of chasing symlinks every
time we are asked to load a unit by name, we slurp all symlinks from disk
and build two hashmaps:
1. from unit name to either alias target, or fragment on disk
(if an alias, we put just the target name in the hashmap, if a fragment
we put an absolute path, so we can distinguish both).
2. from a unit name to all aliases
Reading all this data can be pretty costly (40 ms) on my machine, so we keep it
around for reuse.
The advantage is that we can reliably know what all the aliases of a given unit
are. This means we can reliably load dropins under all names. This fixes#11972.
It turns out most possible symlinks are invalid, because the type has to match,
and template units can only be linked to template units.
I'm not sure if the existing code made the same checks consistently. At least
I don't see the same rules expressed in a single place.
This code is not part of the public API of sd-netlink, nor used by it
internally and hence should not be in the sd-netlink directory.
Also, move the test case for it to src/test/.
This file was testing a mix of functions from src/core/load-fragment.c and some
from src/shared/install.c. Let's name it more appropriately. I want to add
tests for the new unit-file.c too.
So far, we'd use hashmap_free_free to free both keys and values along with
the hashmap. I think it's better to make this more encapsulated: in this variant
the way contents are freed can be decided when the hashmap is created, and
users of the hashmap can always use hashmap_free.
This could already be done by calling unit_name_is_*(), but if we don't know
if the argument is a valid unit name, it is more convenient to have a single
function which returns the type or possibly an error if the unit name is not
valid.
The values in the enum are sorted "by length". Not really important, but it
seems more natural to me.
Some distros install nologin as /usr/sbin/nologin, others as
/sbin/nologin.
Since we can't really on merged-usr everywhere (where the path wouldn't
matter), make the path build time configurable via -Dnologin-path=.
Closes#13028
It's hard to even say what exactly this combination means. Escaping is
necessary when quoting to have quotes within the string. So the escaping of
quote characters is inherently tied to quoting. When unquoting, it seems
natural to remove escaping which was done for the quoting purposes. But with
both flags we would be expected to re-add this escaping after unqouting? Or
maybe keep the escaping which is not necessary for quoting but otherwise
present? This all seems too complicated, let's just forbid such usage and
always fully unescape when unquoting.
Coverity was complaining that read() does not terminate the data. But
we did that termination earlier, so covirity is wrong (CID#1402306, CID#1402340).
Let's modernize the style a bit nevertheless.
(size_t) cast is needed to avoid the warning about comparison, iff
the value is not a constant.
When using build/ directory inside of the source directory:
__FILE__: ../src/test/test-log.c
RELATIVE_SOURCE_PATH: ..
PROJECT_FILE: src/test/test-log.c
When using a build directory outside of the source directory:
__FILE__: ../../../home/zbyszek/src/systemd-work/src/test/test-log.c
RELATIVE_SOURCE_PATH: ../../../home/zbyszek/src/systemd-work
PROJECT_FILE: src/test/test-log.c
Before only one comparison was allowed. Let's make this more flexible:
ConditionKernelVersion = ">=4.0" "<=4.5"
Fixes#12881.
This also fixes expressions like "ConditionKernelVersion=>" which would
evaluate as true.
Whenever I see EXTRACT_QUOTES, I'm always confused whether it means to
leave the quotes in or to take them out. Let's say "unquote", like we
say "cunescape".
job_compare return value is undefined in case the jobs have a loop
between them, so better make a test to make sure transaction cycle
detection catches it.
The test-engine Test2 tests the cycle detection when units a, b and d
all start at once
,-------------------after-----------------,
v |
a/start ---after---> d/start ---after---> b/start
Extend the test with Test11 that adds i.service which causes a and d
stop (by unordered Conflicts=) while starting b. Because stops precede
starts, we effectively eliminate the job cycle and all transaction jobs
should be applicable.
,-------------------after-----------------,
v |
a/stop <---after--- d/stop <---after--- b/start
. . ^
. . |
'. . . . . . . . . i/start ---after------'
Takes a single /sys/fs/bpf/pinned_prog string as argument, but may be
specified multiple times. An empty assignment resets all previous filters.
Closes https://github.com/systemd/systemd/issues/10227
prefix_root() is equivalent to path_join() in almost all ways, hence
let's remove it.
There are subtle differences though: prefix_root() will try shorten
multiple "/" before and after the prefix. path_join() doesn't do that.
This means prefix_root() might return a string shorter than both its
inputs combined, while path_join() never does that. I like the
path_join() semantics better, hence I think dropping prefix_root() is
totally OK. In the end the strings generated by both functon should
always be identical in terms of path_equal() if not streq().
This leaves prefix_roota() in place. Ideally we'd have path_joina(), but
I don't think we can reasonably implement that as a macro. or maybe we
can? (if so, sounds like something for a later PR)
Also add in a few missing OOM checks
This does the following:
- rename enum udev_builtin_cmd -> UdevBuiltinCmd
- rename struct udev_builtin -> UdevBuiltin
- move type definitions to udev-rules.h
- move prototypes of functions defined in udev-rules.c to udev-rules.h
- drop to use strbuf
- propagate critical errors in applying rules,
- drop limitation for number of tokens per line.
Should finally fix oss-fuzz-14688.
8688c29b5a wasn't enough.
The buffer retrieved from memstream has the size that the same as the written
data. When we write do write(f, s, strlen(s)), then no terminating NUL is written,
and the buffer is not (necessarilly) a proper C string.