let's return ENOSYS in that case, to make things a bit less confusng.
Previously we'd just propagate ENOENT, which people might mistake as
applying to the object being modified rather than /proc/ just not being
there.
Let's return ENOSYS instead, i.e. an error clearly indicating that some
kernel API is not available. This hopefully should put people on a
better track.
Note that we only do the procfs check in the error path, which hopefully
means it's the less likely path.
We probably can add similar bits to more suitable codepaths dealing with
/proc/self/fd, but for now, let's pick to the ones noticed in #14745.
Fixes: #14745
When accessing journal files we generally are fine when values change
beneath our feet, while we are looking at them, as long as they change
from something valid to zero. This is required since we nowadays
forcibly unallocate journal files on vacuuming, to ensure they are
actually released.
However, we need to make sure that the validity checks we enforce are
done on suitable copies of the fields in the file. Thus provide a macro
that forces a copy, and disallows the compiler from merging our copy
with the actually memory where it is from.
Let's be extra careful whenever we return from recvmsg() and see
MSG_CTRUNC set. This generally means we ran into a programming error, as
we didn't size the control buffer large enough. It's an error condition
we should at least log about, or propagate up. Hence do that.
This is particularly important when receiving fds, since for those the
control data can be of any size. In particular on stream sockets that's
nasty, because if we miss an fd because of control data truncation we
cannot recover, we might not even realize that we are one off.
(Also, when failing early, if there's any chance the socket might be
AF_UNIX let's close all received fds, all the time. We got this right
most of the time, but there were a few cases missing. God, UNIX is hard
to use)
ubsan complains that we add an offset to a NULL ptr here in some cases.
Which isn't really a bug though, since we only use it as the end
condition for a for loop, but we can still fix it...
Fixes: #15522
It's not that I think that "hostname" is vastly superior to "host name". Quite
the opposite — the difference is small, and in some context the two-word version
does fit better. But in the tree, there are ~200 occurrences of the first, and
>1600 of the other, and consistent spelling is more important than any particular
spelling choice.
We use udev to wait for /dev/loopX devices to be fully proped hence we
need an implicit ordering dependency on it, for RootImage= to work
reliably in early boot, too.
Fixes: #14972
ebf963c551 changed the 'sep' argument to always
be either " " or "\n", which broke the indentation logic for the first line
in base64_append_width(). Since it now always is one character, and never NULL,
let's change the type to char and simplify the logic a bit.
$ COLUMNS=30 build/test-dns-packet test/test-resolve/org~20200417.pkts
============== test/test-resolve/org~20200417.pkts ==============
org IN DNSKEY 256 3 RSASHA1-NSEC3-SHA1
AwEAAcLPVEcg0hFBheXQf
QOqqLiRgckk69o2KTAsq3
lNRY0c9mnEjzZDGsGmXNy
2EQ6yelkIYYus7KLor2Fz
x59hEqcM82zqkdHV6hXvZ
yjxxSHG3nl8xQS6gF8mdI
YouDTWWhTInfjSKoIeDok
Hq3S67EjSngV7/wVCMTbI
amS0NF4H
-- Flags: ZONE_KEY
-- Key tag: 37022
...
$ COLUMNS=120 build/test-dns-packet test/test-resolve/org~20200417.pkts
============== test/test-resolve/org~20200417.pkts ==============
org IN DNSKEY 256 3 RSASHA1-NSEC3-SHA1 AwEAAcLPVEcg0hFBheXQfQOqqLiRgckk69o2KTAsq3lNRY0c9mnEjzZDGsGmXNy2EQ6yelkIYYus7KLor
2Fzx59hEqcM82zqkdHV6hXvZyjxxSHG3nl8xQS6gF8mdIYouDTWWhTInfjSKoIeDokHq3S67EjSngV7/w
VCMTbIamS0NF4H
-- Flags: ZONE_KEY
-- Key tag: 37022
...
proot provides userspace-powered emulation of chroot and mount --bind,
lending it to be used on environments without unprivileged user
namespaces, or in otherwise restricted environments like Android.
In order to achieve this, proot makes use of the kernel's ptrace()
facility, which we can use in order to detect its presence. Since it
doesn't use any kind of namespacing, including PID namespacing, we don't
need to do any tricks when trying to get the tracer's metadata.
For our purposes, proot is listed as a "container", since we mostly use
this also as the bucket for non-container-but-container-like
technologies like WSL. As such, it seems like a good fit for this
section as well.
An stdio FILE* stream usually refers to something with a file
descriptor, but that's just "usually". It doesn't have to, when taking
fmemopen() and similar into account. Most of our calls to fileno()
assumed the call couldn't fail. In most cases this was correct, but in
some cases where we didn't know whether we work on files or memory we'd
use the returned fd as if it was unconditionally valid while it wasn't,
and passed it to a multitude of kernel syscalls. Let's fix that, and do
something reasonably smart when encountering this case.
(Running test-fileio with this patch applied will remove tons of ioctl()
calls on -1).
This patch changes the way user managers set the default umask for the units it
manages.
Indeed one can expect that if user manager's umask is redefined through PAM
(via /etc/login.defs or pam_umask), all its children including the units it
spawns have their umask set to the new value.
Hence make user units inherit their umask value from their parent instead of
the hard coded value 0022 but allow them to override this value via their unit
file.
Note that reexecuting managers with 'systemctl daemon-reexec' after changing
UMask= has no effect. To take effect managers need to be restarted with
'systemct restart' instead. This behavior was already present before this
patch.
Fixes#6077.
non-underlined yellow uses RGB ANSI sequences while the underlined
version uses the paletted ANSI sequences. Let's unify that and use the
RGB sequence for both cases, so that underlined or not doesn't alter the
color.
This reworks the user validation infrastructure. There are now two
modes. In regular mode we are strict and test against a strict set of
valid chars. And in "relaxed" mode we just filter out some really
obvious, dangerous stuff. i.e. strict is whitelisting what is OK, but
"relaxed" is blacklisting what is really not OK.
The idea is that we use strict mode whenver we allocate a new user
(i.e. in sysusers.d or homed), while "relaxed" mode is when we process
users registered elsewhere, (i.e. userdb, logind, …)
The requirements on user name validity vary wildly. SSSD thinks its fine
to embedd "@" for example, while the suggested NAME_REGEX field on
Debian does not even allow uppercase chars…
This effectively liberaralizes a lot what we expect from usernames.
The code that warns about questionnable user names is now optional and
only used at places such as unit file parsing, so that it doesn't show
up on every userdb query, but only when processing configuration files
that know better.
Fixes: #15149#15090
From https://github.com/microsoft/WSL/issues/423#issuecomment-221627364:
> it's unlikely we'll change it to something that doesn't contain "Microsoft"
> or "WSL".
... but well, it happened. If they change it incompatibly w/o adding an stable
detection mechanism, I think we should not add yet another detection method.
But adding a different casing of "microsoft" is not a very big step, so let's
do that.
Follow-up for #11932.
split() and FOREACH_WORD really should die, and everything be moved to
extract_first_word() and friends, but let's at least make sure that for
the remaining code using it we can't deadlock by not progressing in the
word iteration.
Fixes: #15305
Some fdopendir() calls remain where safe_close() is manually
performed, those could be simplified as well by converting to
use the _cleanup_close_ machinery, but makes things less trivial
to review so left for a future cleanup.
With the addition of _cleanup_close_ there's a repetitious
pattern of assigning -1 to the fd after a successful fdopen
to prevent its close on cleanup now that the FILE * owns the
fd.
This introduces a wrapper that instead takes a pointer to the
fd being opened, and always overwrites the fd with -1 on success.
A future commit will cleanup all the fdopen call sites to use the
wrapper and elide the manual -1 fd assignment.
When we are supposed to accept numeric UIDs formatted as string, then
let's check that first, before passing things on to
valid_user_group_name_full(), since that might log about, and not the
other way round.
See: #15201
Follow-up for: 93c23c9297
https://github.com/systemd/systemd/pull/14133 made
capability_ambient_set_apply() acquire capabilities that were explicitly
asked for and drop all others. This change means the function is called
even with an empty capability set, opening up a code path for users
without ambient capabilities to call this function. This function will
error with EINVAL out on kernels < 4.3 because PR_CAP_AMBIENT is not
understood. This turns capability_ambient_set_apply() into a noop for
kernels < 4.3
Fixes https://github.com/systemd/systemd/issues/15225
I put the helper functions in a separate header file, because they don't fit
anywhere else. pthread_mutex_{lock,unlock} is used in two places: nss-systemd
and hashmap. I don't indent to convert hashmap to use the helpers, because
there it'd make the code more complicated. Is it worth to create a new header
file even if the only use is in nss-systemd.c? I think yes, because it feels
clean and also I think it's likely that pthread_mutex_{lock,unlock} will be
used in other places later.
In 1a29610f5f the change inadvertedly
disabled names with digit as the first character. This follow-up change
allows a digit as the first character in compat mode.
Fixes: #15141
A common pattern in the codebase is reading a cgroup memory value
and converting it to a uint64_t. Let's make it a helper and refactor a
few places to use it so it's more concise.
Without changing the SELinux label for private /dev of a service, it will take
a generic file system label:
system_u:object_r:tmpfs_t:s0
After this change it is the same as without `PrivateDevices=yes`:
system_u:object_r:device_t:s0
This helps writing SELinux policies, as the same rules for `/dev` will apply
despite any `PrivateDevices=yes` setting.
This ensures we will recieve SIGTSTP if the user presses Ctrl-Z.
We continue blocking SIGCHLD to ensure the child is processed by
wait_for_terminate_and_check.
Fixes: https://github.com/systemd/systemd/issues/9806
As in 2a5fcfae02
and in 3e67e5c992
using /usr/bin/env allows bash to be looked up in PATH
rather than being hard-coded.
As with the previous changes the same arguments apply
- distributions have scripts to rewrite shebangs on installation and
they know what locations to rely on.
- For tests/compilation we should rather rely on the user to have setup
there PATH correctly.
In particular this makes testing from git easier on NixOS where do not provide
/bin/bash to improve compose-ability.
Reload the internal selabel cache automatically on SELinux policy reloads so non pid-1 daemons are participating.
Run the reload function `mac_selinux_reload()` not manually on daemon-reload, but rather pass it as callback to libselinux.
Trigger the callback prior usage of the systemd internal selabel cache by depleting the selinux netlink socket via `avc_netlink_check_nb()`.
Improves: a9dfac21ec ("core: reload SELinux label cache on daemon-reload")
Improves: #13363
Inside format_bytes, we return NULL if the value is UINT64_MAX. This
makes some kind of sense where this has some other semantic meaning than
being a value, but in this case the value is both a.) not the default
(so we definitely want to display it), and b.) means "infinity" (or
"max" in cgroup terminology).
This patch adds a small wrapper around format_bytes that can be used for
these cases, to avoid the following situation:
[root@tangsanjiao ~]# cat /sys/fs/cgroup/workload.slice/memory.low
max
[root@tangsanjiao ~]# systemctl show workload.slice -p MemoryLow
MemoryLow=infinity
[root@tangsanjiao ~]# systemctl status workload.slice | grep low:
Memory: 14.9G (low: (null))
After the patch:
[root@tangsanjiao ~]# systemctl status workload.slice | grep low:
Memory: 15.1G (low: infinity)
`log_enforcing()` and `log_enforcing_errno()` are only used for important messages, which describe failures in enforced mode.
They are guarded either by `!mac_selinux_use()` or `!label_hnd` checks, where the latter is itself guarded by the former.
Only SELinux enabled systems print these logs.
This helps to configure a system in permissive mode, without getting surprising failures when switching to enforced mode.