This adds a new bus call to service and scope units called
AttachProcesses() that moves arbitrary processes into the cgroup of the
unit. The primary user for this new API is systemd itself: the systemd
--user instance uses this call of the systemd --system instance to
migrate processes if itself gets the request to migrate processes and
the kernel refuses this due to access restrictions.
The primary use-case of this is to make "systemd-run --scope --user …"
invoked from user session scopes work correctly on pure cgroupsv2
environments. There, the kernel refuses to migrate processes between two
unprivileged-owned cgroups unless the requestor as well as the ownership
of the closest parent cgroup all match. This however is not the case
between the session-XYZ.scope unit of a login session and the
user@ABC.service of the systemd --user instance.
The new logic always tries to move the processes on its own, but if
that doesn't work when being the user manager, then the system manager
is asked to do it instead.
The new operation is relatively restrictive: it will only allow to move
the processes like this if the caller is root, or the UID of the target
unit, caller and process all match. Note that this means that
unprivileged users cannot attach processes to scope units, as those do
not have "owning" users (i.e. they have now User= field).
Fixes: #3388
This splits out the code that translates a unit name into a Unit* object
from method_get_unit(), and reuses it all other functions that operate
similar to it. This effectively means all those calls now optionally
take an empty unit string which now means the same as the client's unit.
This useful behaviour of the GetUnit() bus call is thus extended to all
other matching bus calls.
Similar, the same logic from method_load_unit() is also generalized and
reused wherever appropriate.
StartLimitIntervalSec= and DefaultStartLimitIntervalSec= are the
last options whose suffix is 'Sec' instead of 'USec'.
All the other option has suffix 'USec'. So, let's rename them.
The tainting logic existed for a long time, but was hidden inside the
bus interfaces. Let's give it a small bit more coverage, by logging its
value early at boot during initialization.
Let's make our finished checks a bit more readable. Checking the
timestamp is not entirely obvious, hence let's abstract that a bit by
adding a macro that shows what we are doing here, not how we doing it.
This is particularly useful if we want to change the definition of
"finished" later on, in particular, when we try to fix#7023.
It's like manager_dump(), but returns a string. This allows us to reduce
some duplicate code. Also, while we are at it, turn off stdio locking
while we write to the memory FILE *f.
This makes things quite a bit more systematic I think, as we can
systematically operate on all timestamps, for example for the purpose of
serialization/deserialization.
This rework doesn't necessarily make things shorter in the individual
lines, but it does reduce the line count a bit.
(This is useful particularly when we want to add additional timestamps,
for example to solve #7023)
This reverts commit ee043777be.
It broke almost everywhere it touched. The places that
handn't been converted, were mostly followed by special
handling for the invalid PID `0`. That explains why they
tested for `pid < 0` instead of `pid <= 0`.
I think that one was the first commit I reviewed, heh.
In some cases there might be unit symlinks in .wants/ or .requires/
directories even though the unit is otherwise fully removed. In this
case, don't fail removal, but still remove the symlinks.
This reworks the symlink marking logic to always add unit files that we
are missing to the changes list, but proceed with any symlink removal
for them. This way we'll still generate useful hints that a unit is
missing if you invoke "systemctl disable idontexist.service", but also
still remove any link to it.
Fixes: #4995
Previously, we'd refuse the GetUnitProcesses() bus call if the unit file
couldn't be loaded. Which is wrong, as admins should be able to inspect
services whose unit files was deleted. Change this logic, so that we
permit introspecting the processes of any unit that is loaded,
regardless if it has a unit file or not.
(Note that we won't load unit files in GetUnitProcess(), but only
operate on already loaded ones. That's because only loaded units can
have processes — as that's how our GC logic works — and hence loading
the unit just for the process tree is pointless, as it would be empty).
See: #4995
Let's more verbose error messages when validating the input parameters fails.
Also, call path_is_os_tree() properly, as it doesn't return a boolean, but
possibly also an error. Finally, check for the existance of the new init
process with chase_symlinks() to properly handle possible symlinks on the init
binary (which might actually be pretty likely).
We don't have plural in the name of any other -util files and this
inconsistency trips me up every time I try to type this file name
from memory. "formats-util" is even hard to pronounce.
It may be desired by users to know what targets a particular service is
installed into. Improve user friendliness by teaching the is-enabled
command to show such information when used with --full.
This patch makes use of the newly added UnitFileFlags and adds
UNIT_FILE_DRY_RUN flag into it. Since the API had already been modified,
it's now easy to add the new dry-run feature for other commands as
well. As a next step, --dry-run could be added to systemctl, which in
turn might pave the way for a long requested dry-run feature when
running systemctl start.
This adds a new invocation ID concept to the service manager. The invocation ID
identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is
generated each time a unit moves from and inactive to an activating or active
state.
The primary usecase for this concept is to connect the runtime data PID 1
maintains about a service with the offline data the journal stores about it.
Previously we'd use the unit name plus start/stop times, which however is
highly racy since the journal will generally process log data after the service
already ended.
The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel,
except that it applies to an individual unit instead of the whole system.
The invocation ID is passed to the activated processes as environment variable.
It is additionally stored as extended attribute on the cgroup of the unit. The
latter is used by journald to automatically retrieve it for each log logged
message and attach it to the log entry. The environment variable is very easily
accessible, even for unprivileged services. OTOH the extended attribute is only
accessible to privileged processes (this is because cgroupfs only supports the
"trusted." xattr namespace, not "user."). The environment variable may be
altered by services, the extended attribute may not be, hence is the better
choice for the journal.
Note that reading the invocation ID off the extended attribute from journald is
racy, similar to the way reading the unit name for a logging process is.
This patch adds APIs to read the invocation ID to sd-id128:
sd_id128_get_invocation() may be used in a similar fashion to
sd_id128_get_boot().
PID1's own logging is updated to always include the invocation ID when it logs
information about a unit.
A new bus call GetUnitByInvocationID() is added that allows retrieving a bus
path to a unit by its invocation ID. The bus path is built using the invocation
ID, thus providing a path for referring to a unit that is valid only for the
current runtime cycleof it.
Outlook for the future: should the kernel eventually allow passing of cgroup
information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we
can alter the invocation ID to be generated as hash from that rather than
entirely randomly. This way we can derive the invocation race-freely from the
messages.
This adds two (privileged) bus calls Ref() and Unref() to the Unit interface.
The two calls may be used by clients to pin a unit into memory, so that various
runtime properties aren't flushed out by the automatic GC. This is necessary
to permit clients to race-freely acquire runtime results (such as process exit
status/code or accumulated CPU time) on successful service termination.
Ref() and Unref() are fully recursive, hence act like the usual reference
counting concept in C. Taking a reference is a privileged operation, as this
allows pinning units into memory which consumes resources.
Transient units may also gain a reference at the time of creation, via the new
AddRef property (that is only defined for transient units at the time of
creation).
This adds a new boolean setting DynamicUser= to service files. If set, a new
user will be allocated dynamically when the unit is started, and released when
it is stopped. The user ID is allocated from the range 61184..65519. The user
will not be added to /etc/passwd (but an NSS module to be added later should
make it show up in getent passwd).
For now, care should be taken that the service writes no files to disk, since
this might result in files owned by UIDs that might get assigned dynamically to
a different service later on. Later patches will tighten sandboxing in order to
ensure that this cannot happen, except for a few selected directories.
A simple way to test this is:
systemd-run -p DynamicUser=1 /bin/sleep 99999
The unit load queue can be processed in the middle of setting the
unit's properties, so its load_state would no longer be UNIT_STUB
for the check in bus_unit_set_properties(), which would cause it to
incorrectly return an error.
This new method returns information by unit names. Instead of ListUnitsByPatterns
this method returns information of inactive and even unexisting units.
Moved dbus unit reply logic into a separate shared function.
Resolves https://github.com/coreos/fleet/pull/1418
We generally follow the rule that for time settings we suffix the setting name
with "Sec" to indicate the default unit if none is specified. The only
exception was the rate limiting interval settings. Fix this, and keep the old
names for compatibility.
Do the same for journald's RateLimitInterval= setting
This adds a new GetProcesses() bus call to the Unit object which returns an
array consisting of all PIDs, their process names, as well as their full cgroup
paths. This is then used by "systemctl status" to show the per-unit process
tree.
This has the benefit that the client-side no longer needs to access the
cgroupfs directly to show the process tree of a unit. Instead, it now uses this
new API, which means it also works if -H or -M are used correctly, as the
information from the specific host is used, and not the one from the local
system.
Fixes: #2945
Fixes#2191:
$ systemctl --root=/ enable sddm
Created symlink /etc/systemd/system/display-manager.service, pointing to /usr/lib/systemd/system/sddm.service.
$ sudo build/systemctl --root=/ enable gdm
Failed to enable unit, file /etc/systemd/system/display-manager.service already exists and is a symlink to /usr/lib/systemd/system/sddm.service.
$ sudo build/systemctl --root= enable sddm
$ sudo build/systemctl --root= enable gdm
Failed to enable unit: File /etc/systemd/system/display-manager.service already exists and is a symlink to /usr/lib/systemd/system/sddm.service.
(I tried a few different approaches to pass the error information back to the
caller. Adding a new parameter to hold the error results in a gigantic patch
and a lot of hassle to pass the args arounds. Adding this information to the
changes array is straightforward and can be more easily extended in the
future.)
In case local installation is performed, the full set of errors can be reported
and we do that. When running over dbus, only the first error is reported.
With any masked unit that would that would be enabled by presets, we'd get:
test@rawhide $ sudo systemctl preset-all
Failed to execute operation: Unit file is masked.
test@rawhide $ sudo systemctl --root=/ preset-all
Operation failed: Cannot send after transport endpoint shutdown
Simply ignore those units:
test@rawhide $ sudo systemctl preset-all
Unit xxx.service is masked, ignoring.
If the error code ever leaks (we print the strerror error instead of providing
our own), the message for ESHUTDOWN is "Cannot send after transport endpoint
shutdown", which can be misleading. In particular it suggest that some
mishandling of the dbus connection occured. Let's change that to ERFKILL which
has the advantage that a) it sounds implausible as actual error, b) has the
connotation of disabling something manually.
We don't allow using config symlinks to enable units, but the error message we
printed was awful. Fix that, and generate a more readable error.
Fixes#3010.
This allows dropping all user configuration and reverting back to the vendor
default of a unit file. It basically undoes what "systemctl edit", "systemctl
set-property" and "systemctl mask" do.
Now, that the search path logic knows the unit path for transient units we also
can introduce an explicit unit file state "transient" that clarifies to the
user what kind of unit file he is encountering.
Previously, we had two enums ManagerRunningAs and UnitFileScope, that were
mostly identical and converted from one to the other all the time. The latter
had one more value UNIT_FILE_GLOBAL however.
Let's simplify things, and remove ManagerRunningAs and replace it by
UnitFileScope everywhere, thus making the translation unnecessary. Introduce
two new macros MANAGER_IS_SYSTEM() and MANAGER_IS_USER() to simplify checking
if we are running in one or the user context.
A long time ago – when generators where first introduced – the directories for
them were randomly created via mkdtemp(). This was changed later so that they
use fixed name directories now. Let's make use of this, and add the genrator
dirs to the LookupPaths structure and into the unit file search path maintained
in it. This has the benefit that the generator dirs are now normal part of the
search path for all tools, and thus are shown in "systemctl list-unit-files"
too.
Add path argument to clock_is_localtime() and default to "/etc/adjtime" if it's
NULL. This makes the function testable.
Add test-clock: initial test cases for some scenarios, using a temporary file.
This also checks the behaviour with a NULL (i. e. the system's /etc/adjtime)
file.
How to reproduce
$ systemctl set-default multi-user # https://github.com/systemd/systemd/issues/2298
$ systemctl preset-all
Failed to execute operation: Too many levels of symbolic links
$ systemctl poweroff
Fixes:
==1==
==1== HEAP SUMMARY:
==1== in use at exit: 65,645 bytes in 7 blocks
==1== total heap usage: 40,539 allocs, 40,532 frees, 30,147,547 bytes allocated
==1==
==1== 109 (24 direct, 85 indirect) bytes in 1 blocks are definitely lost in loss record 2 of 7
==1== at 0x4C2BBCF: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1== by 0x4C2DE2F: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1== by 0x23DA71: unit_file_changes_add (install.c:233)
==1== by 0x23E45D: remove_marked_symlinks_fd (install.c:453)
==1== by 0x23E267: remove_marked_symlinks_fd (install.c:405)
==1== by 0x23E641: remove_marked_symlinks (install.c:494)
==1== by 0x243A91: execute_preset (install.c:2190)
==1== by 0x244343: unit_file_preset_all (install.c:2351)
==1== by 0x18AAA2: method_preset_all_unit_files (dbus-manager.c:1846)
==1== by 0x1D8157: method_callbacks_run (bus-objects.c:420)
==1== by 0x1DA9E9: object_find_and_run (bus-objects.c:1257)
==1== by 0x1DB02B: bus_process_object (bus-objects.c:1373)
==1==
==1== LEAK SUMMARY:
==1== definitely lost: 24 bytes in 1 blocks
==1== indirectly lost: 85 bytes in 1 blocks
==1== possibly lost: 0 bytes in 0 blocks
==1== still reachable: 65,536 bytes in 5 blocks
==1== suppressed: 0 bytes in 0 blocks
==1== Reachable blocks (those to which a pointer was found) are not shown.
==1== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==1==
==1== For counts of detected and suppressed errors, rerun with: -v
==1== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Fixes:
==1== by 0x23E44C: remove_marked_symlinks_fd (install.c:453)
==1== by 0x23E256: remove_marked_symlinks_fd (install.c:405)
==1== by 0x23E630: remove_marked_symlinks (install.c:494)
==1== by 0x2427A0: unit_file_disable (install.c:1876)
==1== by 0x18A633: method_disable_unit_files_generic (dbus-manager.c:1760)
==1== by 0x18A6CA: method_disable_unit_files (dbus-manager.c:1768)
==1== by 0x1D8146: method_callbacks_run (bus-objects.c:420)
==1== by 0x1DA9D8: object_find_and_run (bus-objects.c:1257)
==1== by 0x1DB01A: bus_process_object (bus-objects.c:1373)
==1==
==1== 228 (48 direct, 180 indirect) bytes in 2 blocks are definitely lost in loss record 8 of 14
==1== at 0x4C2BBCF: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1== by 0x4C2DE2F: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1== by 0x23DA60: unit_file_changes_add (install.c:233)
==1== by 0x23DDB2: create_symlink (install.c:298)
==1== by 0x240C5C: install_info_symlink_wants (install.c:1328)
==1== by 0x240FC8: install_info_apply (install.c:1384)
==1== by 0x241211: install_context_apply (install.c:1439)
==1== by 0x242563: unit_file_enable (install.c:1830)
==1== by 0x18A06E: method_enable_unit_files_generic (dbus-manager.c:1650)
==1== by 0x18A141: method_enable_unit_files (dbus-manager.c:1660)
==1== by 0x1D8146: method_callbacks_run (bus-objects.c:420)
==1== by 0x1DA9D8: object_find_and_run (bus-objects.c:1257)
==1==
==1== 467 (144 direct, 323 indirect) bytes in 3 blocks are definitely lost in loss record 9 of 14
==1== at 0x4C2DD9F: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1== by 0x23DA60: unit_file_changes_add (install.c:233)
==1== by 0x23DE97: create_symlink (install.c:320)
==1== by 0x242CFC: unit_file_set_default (install.c:1951)
==1== by 0x18A881: method_set_default_target (dbus-manager.c:1802)
==1== by 0x1D8146: method_callbacks_run (bus-objects.c:420)
==1== by 0x1DA9D8: object_find_and_run (bus-objects.c:1257)
==1== by 0x1DB01A: bus_process_object (bus-objects.c:1373)
==1== by 0x259143: process_message (sd-bus.c:2567)
==1== by 0x259326: process_running (sd-bus.c:2609)
==1== by 0x259BDC: bus_process_internal (sd-bus.c:2798)
==1== by 0x259CAD: sd_bus_process (sd-bus.c:2817)
==1==
==1== LEAK SUMMARY:
==1== definitely lost: 216 bytes in 6 blocks
==1== indirectly lost: 560 bytes in 14 blocks
==1== possibly lost: 0 bytes in 0 blocks
==1== still reachable: 65,536 bytes in 5 blocks
==1== suppressed: 0 bytes in 0 blocks
==1== Reachable blocks (those to which a pointer was found) are not shown.
==1== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==1==