When a unit job finishes early (e.g. when fork(2) fails) triggered unit goes
through states
stopped->failed (or failed->failed),
in case a ExecStart= command fails unit passes through
stopped->starting->failed.
The former transition doesn't result in unit active/inactive timestamp being
updated and timer (OnUnitActiveSec= or OnUnitInactiveSec=) would use an expired
timestamp triggering immediately again (repeatedly).
This patch exploits timer's last trigger timestamp to ensure the timer isn't
triggered more frequently than OnUnitActiveSec=/OnUnitInactiveSec= period.
Steps to reproduce:
0) Create sample units:
cat >~/.config/systemd/user/looper.service <<EOD
[Service]
ExecStart=/usr/bin/sleep 2
EOD
cat >~/.config/systemd/user/looper.timer <<EOD
[Timer]
AccuracySec=5
OnUnitActiveSec=5
EOD
1) systemctl --user daemon-reload
2) systemctl --user start looper.timer
# to have first activation timestamp/sentinel
systemctl --user start looper.service
o Observe the service is being regularly triggered.
3) systemctl set-property user@$UID.service TasksMax=2
o Observe the tight looping as long as the looper.service cannot be started.
Ref: #5969
This happens to be almost the same moment as when we send READY=1 in the user
instance, but the logic is slightly different, since we log taint when
basic.target is reached in the system manager, but we send the notification
only in the user manager. So add a separate flag for this and propagate it
across reloads.
Fixes#7683.
For people who use debug messages, maybe it is helpful to know that
PrivateDevices= failed due to mknod(), and which device node.
(The other (un-logged) failures could be while mounting filesystems e.g. no
CAP_SYS_ADMIN which is the common case, or missing /dev/shm or /dev/pts,
or missing /dev/ptmx).
#7886 caused PrivateDevices= to silently fail-open.
https://github.com/systemd/systemd/pull/7886#issuecomment-358542849
Allow PrivateDevices= to succeed, in creating /dev/ptmx, even though
DeviceControl=closed applies.
No specific justification was given for blocking mknod of /dev/ptmx. Only
that we didn't seem to need it, because we weren't creating it correctly as
a device node.
This should have no behavioural effect; it just confused me.
All the other mount directories in this function are created as 0755.
Some of the mounts are allowed to fail - mqueue and hugepages.
If the /dev/mqueue mount target was created with the permissive mode 01777,
to match the filesystem we're trying to mount there, then a mount failure
would allow unprivileged users to write to the /dev filesystem, e.g. to
exhaust the available space. There is no reason to allow this.
(Allowing the user read access (0755) seems a reasonable idea though, e.g. for
quicker troubleshooting.)
We do not allow failure of the /dev/shm mount, so it doesn't matter that
it is created as 01777. But on the same grounds, we have no *reason* to
create it as any specific mode. 0755 is equally fine.
This function will be clearer by using 0755 throughout, to avoid
unintentionally implying some connection between the mode of the mount
target, and the mode of the mounted filesystem.
If /dev/tty did not exist, or had st_rdev == 0, we ignored it. And the
same is true for null, zero, full, random, urandom.
If /dev/ptmx did not exist, we treated this as a failure. If /dev/ptmx had
st_rdev == 0, we ignored it.
This was a very recent change, but there was no reason for ptmx creation
specifically to treat st_rdev == 0 differently from non-existence. This
confuses me when reading it.
Change the creation of /dev/ptmx so that st_rdev == 0 is
treated as failure.
This still leaves /dev/ptmx as a special case with stricter handling.
However it is consistent with the immediately preceding creation of
/dev/pts/, which is treated as essential, and is directly related to ptmx.
I don't know why we check st_rdev. But I'd prefer to have only one
unanswered question here, and not to have a second unanswered question
added on top.
…otherwise try to clone it as a device node
On most contemporary distros /dev/ptmx is a device node, and
/dev/pts/ptmx has 000 inaccessible permissions. In those cases
the symlink /dev/ptmx -> /dev/pts/ptmx breaks the pseudo tty support.
In that case we better clone the device node.
OTOH, in nspawn containers (and possibly others), /dev/pts/ptmx has
normal permissions, and /dev/ptmx is a symlink. In that case make the
same symlink.
fixes#7878
Testing the previous commit with `systemctl stop tmp.mount` logged the
reason for failure as expected, but unexpectedly the message was repeated
32 times.
The retry is a special case for umount; it is only supposed to cover the
case where the umount command was _successful_, but there was still some
remaining mount(s) underneath. Fix it by making sure to test the first
condition :).
Re-tested with and without a preceding `mount --bind /mnt /tmp`,
and using `findmnt` to check the end result.
Documentation - systemd.exec - strongly implies mount units get logging.
It is safe for mounts to depend on systemd-journald.socket. There is no
cyclic dependency generated. This is because the root, -.mount, was
already deliberately set to EXEC_OUTPUT_NULL. See comment in
mount_load_root_mount(). And /run is excluded from being a mount unit.
Nor does systemd-journald depend on /var. It starts earlier, initially
logging to /run.
Tested before/after using `systemctl stop tmp.mount`.
in other way we will get a warning during build:
../src/core/dbus-util.h:55:13: warning: ‘bus_set_transient_errno’
defined but not used [-Wunused-function]
int bus_set_transient_##function(
Let's be more restrictive when validating PID files and MAINPID=
messages: don't accept PIDs that make no sense, and if the configuration
source is not trusted, don't accept out-of-cgroup PIDs. A configuratin
source is considered trusted when the PID file is owned by root, or the
message was received from root.
This should lock things down a bit, in case service authors write out
PID files from unprivileged code or use NotifyAccess=all with
unprivileged code. Note that doing so was always problematic, just now
it's a bit less problematic.
When we open the PID file we'll now use the CHASE_SAFE chase_symlinks()
logic, to ensure that we won't follow an unpriviled-owned symlink to a
privileged-owned file thinking this was a valid privileged PID file,
even though it really isn't.
Fixes: #6632
If we have to chose between truncated escape sequences and strings
exploded to 4 times the desried length by fully escaping, prefer the
latter.
It's for debug only, hence doesn't really matter much.
log.h really should only include the bare minimum of other headers, as
it is really pulled into pretty much everything else and already in
itself one of the most basic pieces of code we have.
Let's hence drop inclusion of:
1. sd-id128.h because it's entirely unneeded in current log.h
2. errno.h, dito.
3. sys/signalfd.h which we can replace by a simple struct forward
declaration
4. process-util.h which was needed for getpid_cached() which we now hide
in a funciton log_emergency_level() instead, which nicely abstracts
the details away.
5. sys/socket.h which was needed for struct iovec, but a simple struct
forward declaration suffices for that too.
Ultimately this actually makes our source tree larger (since users of
the functionality above must now include it themselves, log.h won't do
that for them), but I think it helps to untangle our web of includes a
tiny bit.
(Background: I'd like to isolate the generic bits of src/basic/ enough
so that we can do a git submodule import into casync for it)
The aim of this change is to make sure that we properly log about all
D-Bus connection problems. After all, we only ever attempt to get on the
bus if dbus-daemon is around, so any failure in the process should be
treated as an error.
bus_init_system() is only called from bus_init() and in
bus_init() we have a bool flag which governs whether we should attempt
to connect to the system bus or not.
Hence if we are in bus_init_system() then it is clear we got called from
a context where connection to the bus is actually required and therefore
shouldn't be treated as the "best effort" type of operation. Same
applies to bus_init_api().
We make use of those error codes in bus_init() and log high level
message that informs admin about what is going on (and is easy to spot
and makes sense to an end user).
Also "retrying later" bit is actually a lie. We won't retry unless we
are explicitly told to reconnect via SIGUSR1 or re-executed. This is
because bus_init() is always called from the context where dbus-daemon
is already around and hence bus_init() won't be called again from
unit_notify().
Fixes#7782
By default systemd-shutdown will wait for 90s after SIGTERM was sent
for all processes to exit. This is way too long and effectively defeats
an emergency watchdog reboot via "reboot-force" actions. Instead now
use DefaultTimeoutStopSec which is configurable.
First, let's rename it to disable_coredumps(), as in the rest of our
codebase we spell it "coredump" rather than "core_dump", so let's stick
to that.
However, also log about failures to turn off core dumpling on LOG_DEBUG,
because debug logging is always a good idea.
Let's rename it manager_sanitize_environment() which is a more precise
name. Moreover, sort the environment implicitly inside it, as all our
callers do that anyway afterwards and we can save some code this way.
Also, update the list of env vars to drop, i.e. the env vars we manage
ourselves and don't want user code to interfear with. Also sort this
list to make it easier to update later on.
If the system is finally shutting down it makes no sense to write core
dumps as the last remaining processes are terminated / killed. This is
especially significant in case of a "force reboot" where all processes
are hit concurrently with a SIGTERM and no orderly shutdown of
processes takes place.
Currently, if there are two /proc/self/mountinfo entries with the same
mount point path, the mount setup flags computed for the second of
these two entries will overwrite the mount setup flags computed for
the first of these two entries. This is the root cause of issue #7798.
This patch changes mount_setup_existing_unit to prevent the
just_mounted mount setup flag from being overwritten if it is set to
true. This will allow all mount units created from /proc/self/mountinfo
entries to be initialized properly.
Fixes: #7798
This way, clients can install the very same match on direct and broker
connections as in both cases the messages will originate from the o.f.s1
service.
Let's remove a number of synchronization points from our service
startups: let's drop synchronous match installation, and let's opt for
asynchronous instead.
Also, let's use sd_bus_match_signal() instead of sd_bus_add_match()
where we can.
We'd like to use inotify to get notified when AF_UNIX sockets become
connectable. That happens at the moment of listen(), but this is doesn't
necessarily create in a watchable inotify event. Hence, let's synthesize
one whenever we generically create a socket, or when we know we created
it for a D-Bus server.
Ideally we wouldn't have to do this, and the kernel would generate an
event anyway for this. Doing this explicitly isn't too bad however, as
the event is still nicely associated with the AF_UNIX socket node, and
we generate all D-Bus sockets in our code hence it's safe.
This is useful so that callers know whether anything at all and how much
was flushed.
This patches through users of this functions to ensure that the return
values > 0 which may be returned now are not propagated in public APIs.
Also, users that ignore the return value are changed to do so explicitly
now.
Using wait_for_terminate_and_check() instead of wait_for_terminate()
let's us simplify, shorten and unify the return value checking and
logging of waitid(). Hence, let's use it all over the place.
This new flag will cause safe_fork() to wait for the forked off child
before returning. This allows us to unify a number of cases where we
immediately wait on the forked off child, witout running any code in the
parent after the fork, and without direct interest in the precise exit
status of the process, except recgonizing EXIT_SUCCESS vs everything
else.
This renames wait_for_terminate_and_warn() to
wait_for_terminate_and_check(), and adds a flags parameter, that
controls how much to log: there's one flag that means we log about
abnormal stuff, and another one that controls whether we log about
non-zero exit codes. Finally, there's a shortcut flag value for logging
in both cases, as that's what we usually use.
All callers are accordingly updated. At three occasions duplicate logging
is removed, i.e. where the old function was called but logged in the
caller, too.
We don't use the return value, and we don't have to, as the call already
initializes &ret, which is the one we return as exit code from the
process.
CID#1384230