Commit graph

3657 commits

Author SHA1 Message Date
Michal Koutný 204d140c4d core/timer: Prevent timer looping when unit cannot start
When a unit job finishes early (e.g. when fork(2) fails) triggered unit goes
through states
        stopped->failed (or failed->failed),
in case a ExecStart= command fails unit passes through
        stopped->starting->failed.

The former transition doesn't result in unit active/inactive timestamp being
updated and timer (OnUnitActiveSec= or OnUnitInactiveSec=) would use an expired
timestamp triggering immediately again (repeatedly).

This patch exploits timer's last trigger timestamp to ensure the timer isn't
triggered more frequently than OnUnitActiveSec=/OnUnitInactiveSec= period.

Steps to reproduce:

0) Create sample units:

cat >~/.config/systemd/user/looper.service <<EOD
[Service]
ExecStart=/usr/bin/sleep 2
EOD

cat >~/.config/systemd/user/looper.timer <<EOD
[Timer]
AccuracySec=5
OnUnitActiveSec=5
EOD

1) systemctl --user daemon-reload

2) systemctl --user start looper.timer
   # to have first activation timestamp/sentinel
   systemctl --user start looper.service

o  Observe the service is being regularly triggered.

3) systemctl set-property user@$UID.service TasksMax=2

o  Observe the tight looping as long as the looper.service cannot be started.

Ref: #5969
2018-01-22 17:13:00 +01:00
Zbigniew Jędrzejewski-Szmek d8eb10d61a core: delay logging the taint string until after basic.target is reached (#7935)
This happens to be almost the same moment as when we send READY=1 in the user
instance, but the logic is slightly different, since we log taint when
basic.target is reached in the system manager, but we send the notification
only in the user manager. So add a separate flag for this and propagate it
across reloads.

Fixes #7683.
2018-01-21 21:17:54 +09:00
Alan Jenkins 68f7480b7e
Merge pull request #7913 from sourcejedi/devpts
3 nitpicks from core/namespace.c
2018-01-18 21:56:26 +00:00
Alan Jenkins 225874dc9c core: clone_device_node(): add debug message
For people who use debug messages, maybe it is helpful to know that
PrivateDevices= failed due to mknod(), and which device node.

(The other (un-logged) failures could be while mounting filesystems e.g. no
CAP_SYS_ADMIN which is the common case, or missing /dev/shm or /dev/pts,
or missing /dev/ptmx).
2018-01-18 13:58:13 +00:00
Alan Jenkins 5a7f87a9e0 core: un-break PrivateDevices= by allowing it to mknod /dev/ptmx
#7886 caused PrivateDevices= to silently fail-open.
https://github.com/systemd/systemd/pull/7886#issuecomment-358542849

Allow PrivateDevices= to succeed, in creating /dev/ptmx, even though
DeviceControl=closed applies.

No specific justification was given for blocking mknod of /dev/ptmx.  Only
that we didn't seem to need it, because we weren't creating it correctly as
a device node.
2018-01-18 12:10:20 +00:00
Alan Jenkins 8d95368210 core: namespace: remove unnecessary mode on /dev/shm mount target
This should have no behavioural effect; it just confused me.

All the other mount directories in this function are created as 0755.
Some of the mounts are allowed to fail - mqueue and hugepages.
If the /dev/mqueue mount target was created with the permissive mode 01777,
to match the filesystem we're trying to mount there, then a mount failure
would allow unprivileged users to write to the /dev filesystem, e.g. to
exhaust the available space.  There is no reason to allow this.

(Allowing the user read access (0755) seems a reasonable idea though, e.g. for
quicker troubleshooting.)

We do not allow failure of the /dev/shm mount, so it doesn't matter that
it is created as 01777.  But on the same grounds, we have no *reason* to
create it as any specific mode.  0755 is equally fine.

This function will be clearer by using 0755 throughout, to avoid
unintentionally implying some connection between the mode of the mount
target, and the mode of the mounted filesystem.
2018-01-17 18:04:34 +00:00
Alan Jenkins 98b1d2b8d9 core: namespace: nitpick /dev/ptmx error handling
If /dev/tty did not exist, or had st_rdev == 0, we ignored it.  And the
same is true for null, zero, full, random, urandom.

If /dev/ptmx did not exist, we treated this as a failure.  If /dev/ptmx had
st_rdev == 0, we ignored it.

This was a very recent change, but there was no reason for ptmx creation
specifically to treat st_rdev == 0 differently from non-existence.  This
confuses me when reading it.

Change the creation of /dev/ptmx so that st_rdev == 0 is
treated as failure.

This still leaves /dev/ptmx as a special case with stricter handling.
However it is consistent with the immediately preceding creation of
/dev/pts/, which is treated as essential, and is directly related to ptmx.

I don't know why we check st_rdev.  But I'd prefer to have only one
unanswered question here, and not to have a second unanswered question
added on top.
2018-01-17 13:28:32 +00:00
Alan Jenkins e41090db89
Merge pull request #7886 from gdamjan/fix-ptmx
namespace: make /dev/ptmx a copy of the host not a symlink
2018-01-17 09:24:00 +00:00
Дамјан Георгиевски 414b304ba2 namespace: only make the symlink /dev/ptmx if it was already a symlink
…otherwise try to clone it as a device node

On most contemporary distros /dev/ptmx is a device node, and
/dev/pts/ptmx has 000 inaccessible permissions. In those cases
the symlink /dev/ptmx -> /dev/pts/ptmx breaks the pseudo tty support.

In that case we better clone the device node.

OTOH, in nspawn containers (and possibly others), /dev/pts/ptmx has
normal permissions, and /dev/ptmx is a symlink. In that case make the
same symlink.

fixes #7878
2018-01-17 01:19:46 +01:00
Дамјан Георгиевски b5e99f23ed namespace: extract clone_device_node function from mount_private_dev 2018-01-16 21:41:10 +01:00
Zbigniew Jędrzejewski-Szmek e0b6d3cabe
Merge pull request #7816 from poettering/chase-pid
Make MAINPID= and PIDFile= handling more restrictive (and other stuff)
2018-01-15 14:14:34 +04:00
Zbigniew Jędrzejewski-Szmek 67ddb52432
Merge pull request #7855 from poettering/log-h-includes
log.h #include cleanups
2018-01-15 13:43:09 +04:00
Alan Jenkins 3cc9685649 core: prevent spurious retries of umount
Testing the previous commit with `systemctl stop tmp.mount` logged the
reason for failure as expected, but unexpectedly the message was repeated
32 times.

The retry is a special case for umount; it is only supposed to cover the
case where the umount command was _successful_, but there was still some
remaining mount(s) underneath.  Fix it by making sure to test the first
condition :).

Re-tested with and without a preceding `mount --bind /mnt /tmp`,
and using `findmnt` to check the end result.
2018-01-13 17:22:46 +00:00
Alan Jenkins 5804e1b6ff core: fix output (logging) for mount units (#7603)
Documentation - systemd.exec - strongly implies mount units get logging.

It is safe for mounts to depend on systemd-journald.socket.  There is no
cyclic dependency generated.  This is because the root, -.mount, was
already deliberately set to EXEC_OUTPUT_NULL.  See comment in
mount_load_root_mount().  And /run is excluded from being a mount unit.

Nor does systemd-journald depend on /var.  It starts earlier, initially
logging to /run.

Tested before/after using `systemctl stop tmp.mount`.
2018-01-13 13:03:13 +00:00
0xAX aad67b80c5 dbus-execute: define bus_set_transient_errno() only if HAVE_SECCOMP (#7869)
in other way we will get a warning during build:

../src/core/dbus-util.h:55:13: warning: ‘bus_set_transient_errno’
defined but not used [-Wunused-function]

    int bus_set_transient_##function(
2018-01-13 08:48:53 +09:00
Lennart Poettering db256aab13 core: be stricter when handling PID files and MAINPID sd_notify() messages
Let's be more restrictive when validating PID files and MAINPID=
messages: don't accept PIDs that make no sense, and if the configuration
source is not trusted, don't accept out-of-cgroup PIDs. A configuratin
source is considered trusted when the PID file is owned by root, or the
message was received from root.

This should lock things down a bit, in case service authors write out
PID files from unprivileged code or use NotifyAccess=all with
unprivileged code. Note that doing so was always problematic, just now
it's a bit less problematic.

When we open the PID file we'll now use the CHASE_SAFE chase_symlinks()
logic, to ensure that we won't follow an unpriviled-owned symlink to a
privileged-owned file thinking this was a valid privileged PID file,
even though it really isn't.

Fixes: #6632
2018-01-11 15:12:16 +01:00
Lennart Poettering 15e23e8cdf manager: make use of pid_is_valid() where appropriate 2018-01-11 15:12:16 +01:00
Lennart Poettering 007e4b5490 manager: make use of NEWLINE macro where appropriate 2018-01-11 15:12:16 +01:00
Lennart Poettering d6552eaa6c dbus-util: properly parse timeout values
This makes transient TimeoutStopSec= properties work. After all they are
64bit entitites, not 32bit ones.
2018-01-11 15:12:16 +01:00
Lennart Poettering da5fb86100 manager: swap order in which we ellipsize/escape sd_notify() messages for debugging
If we have to chose between truncated escape sequences and strings
exploded to 4 times the desried length by fully escaping, prefer the
latter.

It's for debug only, hence doesn't really matter much.
2018-01-11 15:12:16 +01:00
Lennart Poettering 8895eb7815 unit: log when we cannot add a watch on a specific PID 2018-01-11 15:07:14 +01:00
Lennart Poettering dccca82b1a log: minimize includes in log.h
log.h really should only include the bare minimum of other headers, as
it is really pulled into pretty much everything else and already in
itself one of the most basic pieces of code we have.

Let's hence drop inclusion of:

1. sd-id128.h because it's entirely unneeded in current log.h
2. errno.h, dito.
3. sys/signalfd.h which we can replace by a simple struct forward
   declaration
4. process-util.h which was needed for getpid_cached() which we now hide
   in a funciton log_emergency_level() instead, which nicely abstracts
   the details away.
5. sys/socket.h which was needed for struct iovec, but a simple struct
   forward declaration suffices for that too.

Ultimately this actually makes our source tree larger (since users of
the functionality above must now include it themselves, log.h won't do
that for them), but I think it helps to untangle our web of includes a
tiny bit.

(Background: I'd like to isolate the generic bits of src/basic/ enough
so that we can do a git submodule import into casync for it)
2018-01-11 14:44:31 +01:00
Michal Sekletar dc7118ba09 dbus: propagate errors from bus_init_system() and bus_init_api()
The aim of this change is to make sure that we properly log about all
D-Bus connection problems. After all, we only ever attempt to get on the
bus if dbus-daemon is around, so any failure in the process should be
treated as an error.

bus_init_system() is only called from bus_init() and in
bus_init() we have a bool flag which governs whether we should attempt
to connect to the system bus or not.
Hence if we are in bus_init_system() then it is clear we got called from
a context where connection to the bus is actually required and therefore
shouldn't be treated as the "best effort" type of operation. Same
applies to bus_init_api().

We make use of those error codes in bus_init() and log high level
message that informs admin about what is going on (and is easy to spot
and makes sense to an end user).

Also "retrying later" bit is actually a lie. We won't retry unless we
are explicitly told to reconnect via SIGUSR1 or re-executed. This is
because bus_init() is always called from the context where dbus-daemon
is already around and hence bus_init() won't be called again from
unit_notify().

Fixes #7782
2018-01-11 14:41:34 +01:00
Zbigniew Jędrzejewski-Szmek a9883559e3
Merge pull request #7846 from poettering/nobody-getenv
some assorted fixes and additions, in particular a way to turn off "nobody" synthesizing on a specific system
2018-01-10 20:18:51 +01:00
Jan Klötzke e73c54b838 shutdown: make kill timeout configurable (#7835)
By default systemd-shutdown will wait for 90s after SIGTERM was sent
for all processes to exit. This is way too long and effectively defeats
an emergency watchdog reboot via "reboot-force" actions. Instead now
use DefaultTimeoutStopSec which is configurable.
2018-01-10 19:00:20 +01:00
Lennart Poettering e557b1a655 util: minor tweaks to disable_core_dumps()
First, let's rename it to disable_coredumps(), as in the rest of our
codebase we spell it "coredump" rather than "core_dump", so let's stick
to that.

However, also log about failures to turn off core dumpling on LOG_DEBUG,
because debug logging is always a good idea.
2018-01-10 18:44:09 +01:00
Lennart Poettering 47cf8ff206 manager: rework manager_clean_environment()
Let's rename it manager_sanitize_environment() which is a more precise
name. Moreover, sort the environment implicitly inside it, as all our
callers do that anyway afterwards and we can save some code this way.

Also, update the list of env vars to drop, i.e. the env vars we manage
ourselves and don't want user code to interfear with. Also sort this
list to make it easier to update later on.
2018-01-10 18:30:06 +01:00
Lennart Poettering ad5d4b1703 cocci: use strempty() at more places
This shortens the code by a few lines.
2018-01-10 17:11:19 +01:00
Jan Klötzke 27b372c1c2 shutdown: prevent core dumps in final shutdown stage
If the system is finally shutting down it makes no sense to write core
dumps as the last remaining processes are terminated / killed. This is
especially significant in case of a "force reboot" where all processes
are hit concurrently with a SIGTERM and no orderly shutdown of
processes takes place.
2018-01-10 10:54:40 +01:00
Jan Klötzke 9ce1759311 tree-wide: introduce disable_core_dumps helper and port existing users
Changes the core_pattern to prevent any core dumps by the kernel. Does
nothing if we're in a container environment as this is system wide
setting.
2018-01-10 10:54:40 +01:00
Zbigniew Jędrzejewski-Szmek 2df4611205
Merge pull request #7714 from poettering/sd-bus-watch-connect
many sd-bus improvements
2018-01-05 19:46:38 +01:00
rkolchmeyer 65d36b4950 core: Fix edge case when processing /proc/self/mountinfo (#7811)
Currently, if there are two /proc/self/mountinfo entries with the same
mount point path, the mount setup flags computed for the second of
these two entries will overwrite the mount setup flags computed for
the first of these two entries. This is the root cause of issue #7798.
This patch changes mount_setup_existing_unit to prevent the
just_mounted mount setup flag from being overwritten if it is set to
true. This will allow all mount units created from /proc/self/mountinfo
entries to be initialized properly.

Fixes: #7798
2018-01-05 19:28:23 +01:00
Lennart Poettering a5ea30b75b pid1: set org.freedesktop.systemd1 as sender service name for direct connections
This way, clients can install the very same match on direct and broker
connections as in both cases the messages will originate from the o.f.s1
service.
2018-01-05 13:58:33 +01:00
Lennart Poettering 75152a4d6a tree-wide: install matches asynchronously
Let's remove a number of synchronization points from our service
startups: let's drop synchronous match installation, and let's opt for
asynchronous instead.

Also, let's use sd_bus_match_signal() instead of sd_bus_add_match()
where we can.
2018-01-05 13:58:32 +01:00
Lennart Poettering 0c0b930647 tree-wide: make name requesting asynchronous in all our services
This optimizes service startup a bit, and makes it less prone to
deadlocks.
2018-01-05 13:58:32 +01:00
Lennart Poettering 5b5e6deabb bus: touch() the AF_UNIX sockets we listen() on after the fact
We'd like to use inotify to get notified when AF_UNIX sockets become
connectable. That happens at the moment of listen(), but this is doesn't
necessarily create in a watchable inotify event. Hence, let's synthesize
one whenever we generically create a socket, or when we know we created
it for a D-Bus server.

Ideally we wouldn't have to do this, and the kernel would generate an
event anyway for this. Doing this explicitly isn't too bad however, as
the event is still nicely associated with the AF_UNIX socket node, and
we generate all D-Bus sockets in our code hence it's safe.
2018-01-05 13:55:08 +01:00
Lennart Poettering 665dfe9318 io-util: make flush_fd() return how many bytes where flushed
This is useful so that callers know whether anything at all and how much
was flushed.

This patches through users of this functions to ensure that the return
values > 0 which may be returned now are not propagated in public APIs.

Also, users that ignore the return value are changed to do so explicitly
now.
2018-01-05 13:55:08 +01:00
Zbigniew Jędrzejewski-Szmek 5035800495
Merge pull request #7763 from yuwata/fix-7761
Revert "core/execute: RuntimeDirectory= or friends requires mount namespace"
2018-01-05 12:38:29 +01:00
Lennart Poettering d2e0ac3d1e tree-wide: unify the process name we pass to wait_for_terminate_and_check() with the one we pass to safe_fork() 2018-01-04 13:27:27 +01:00
Lennart Poettering 2e87a1fde9 tree-wide: make use of wait_for_terminate_and_check() at various places
Using wait_for_terminate_and_check() instead of wait_for_terminate()
let's us simplify, shorten and unify the return value checking and
logging of waitid().  Hence, let's use it all over the place.
2018-01-04 13:27:27 +01:00
Lennart Poettering 1f5d1e0247 process-spec: add another flag FORK_WAIT to safe_fork()
This new flag will cause safe_fork() to wait for the forked off child
before returning. This allows us to unify a number of cases where we
immediately wait on the forked off child, witout running any code in the
parent after the fork, and without direct interest in the precise exit
status of the process, except recgonizing EXIT_SUCCESS vs everything
else.
2018-01-04 13:27:27 +01:00
Lennart Poettering 7d4904fe7a process-util: rework wait_for_terminate_and_warn() to take a flags parameter
This renames wait_for_terminate_and_warn() to
wait_for_terminate_and_check(), and adds a flags parameter, that
controls how much to log: there's one flag that means we log about
abnormal stuff, and another one that controls whether we log about
non-zero exit codes. Finally, there's a shortcut flag value for logging
in both cases, as that's what we usually use.

All callers are accordingly updated. At three occasions duplicate logging
is removed, i.e. where the old function was called but logged in the
caller, too.
2018-01-04 13:27:27 +01:00
Lennart Poettering b6e1fff13d process-util: add another fork_safe() flag for enabling LOG_ERR/LOG_WARN logging 2018-01-04 13:27:26 +01:00
Lennart Poettering 8ed7742aa2 ip-address-access: let's exit the loop after invalidating our entry a (#7803)
CID#1382967
2018-01-04 13:24:40 +01:00
Lennart Poettering 3046b6db1d
main: don't bother with the return value of invoke_mainloop() (#7802)
We don't use the return value, and we don't have to, as the call already
initializes &ret, which is the one we return as exit code from the
process.

CID#1384230
2018-01-04 12:55:21 +01:00
Zbigniew Jędrzejewski-Szmek 97149f405c core: fix mac_selinux_setup return value check
Introduced in 74da609f0d. CID #1384210.
2018-01-04 11:31:37 +01:00
Zbigniew Jędrzejewski-Szmek 1330648562 core: double free in bus_timer_set_transient_property
Introduced in 3e3c5a4571. CID #1384233.
2018-01-04 11:31:37 +01:00
Yu Watanabe 4657abb5d4 execute: make "runtime" argument const in exec_needs_mount_namespace()
The argument can be const, then let's make so.
2018-01-04 00:48:18 +09:00
Yu Watanabe b43ee82fc1 core: RuntimeDirectory= does not request new mount namespace
Now RuntimeDirectory= does not create 'private' directory.
Thus, it is not neccessary to request new mount namespace.

Follow-up for 8092a48cc1.
2018-01-04 00:26:35 +09:00
Yu Watanabe 42b1d8e0f5 Revert "core/execute: RuntimeDirectory= or friends requires mount namespace"
This reverts commit 652bb2637a.

Fixes #7761.
2018-01-04 00:26:11 +09:00