For executables which take a verb, we should list the verbs first, and
then options which modify those verbs second. The general layout of
the man page is from general description to specific details, usually
Overview, Commands, Options, Return Value, Examples, References.
Introduce support for configuring cpus and mems for processes using
cgroup v2 CPUSET controller. This allows users to limit which cpus
and memory NUMA nodes can be used by processes to better utilize
system resources.
The cgroup v2 interfaces to control it are cpuset.cpus and cpuset.mems
where the requested configuration is written. However, it doesn't mean
that the requested configuration will be actually used as parent cgroup
may limit the cpus or mems as well. In order to reflect the real
configuration cgroup v2 provides read-only files cpuset.cpus.effective
and cpuset.mems.effective which are exported to users as well.
The "Ex" variant was originally only added for ExecStartXYZ= but it makes
sense to have feature parity for the rest of the exec command properties
as well (e.g. ExecReload=, ExecStop=, etc).
v2:
- do not watch mtime of transient and generated dirs
We'd reload the map after every transient unit we created, which we don't
need to do, since we create those units ourselves and know their fragment
path.
This reworks how we load units from disk. Instead of chasing symlinks every
time we are asked to load a unit by name, we slurp all symlinks from disk
and build two hashmaps:
1. from unit name to either alias target, or fragment on disk
(if an alias, we put just the target name in the hashmap, if a fragment
we put an absolute path, so we can distinguish both).
2. from a unit name to all aliases
Reading all this data can be pretty costly (40 ms) on my machine, so we keep it
around for reuse.
The advantage is that we can reliably know what all the aliases of a given unit
are. This means we can reliably load dropins under all names. This fixes#11972.
If for some reason we do not know some signal, instead of silently
skipping it, let's print it numerically. Likewise, 'show' is not the
right place to do value filtering for exit codes. If pid1 accepted it,
let's just print it with no fuss.
We were passing 1/4th of the size in bytes as argument. So depending
on the size of the array, either we'd only transfer a subset of values,
or we'd get an alignment error.
The info printed in this function is the same as the non-Ex version of the
property so there's no point double printing.
Other places that print ExecXYZEx= properties are left alone since the
displayed information is different.
Similar variables had differing names: unit, path, unit_path. We also
have file system paths in surrounding code. Let's make this easier for
the reader and use "dbus_path" consistently.
It had two users, but it is just a very thin wrapper around
unit_file_find_dropin_paths(), so using it seems more complicated than directly
invoking unit_file_find_dropin_paths() twice.
"systemctl --failed" suggested I pass "--all" to see units in the inactive
state as well. I thought this was not very useful. If you explicitly
asked for units in a specific state, then you already know you have
narrowed it down. And if you ran "systemctl --state=inactive", it is even
more strange to see this message.
@keszybz suggests we probably don't want to suggest "list-unit-files"
either :-). Let's only suggest that if the user passed "--state=inactive".
Finally, this means the output for "systemctl --failed" could be just
"0 loaded units listed". In this case, we don't need any highlight on that
text, to distinguish it from the hint. This matches "list-unit-files".
This also means we happen to avoid using red highlight, when there are zero
failed units, as if that itself was a failure. @kesbyz pointed out that
old behaviour was a bit weird.
let's add [static] where it was missing so far
Drop [static] on parameters that can be NULL.
Add an assert() around parameters that have [static] and can't be NULL
hence.
Add some "const" where it was forgotten.
Make possible to set NUMA allocation policy for manager. Manager's
policy is by default inherited to all forked off processes. However, it
is possible to override the policy on per-service basis. Currently we
support, these policies: default, prefer, bind, interleave, local.
See man 2 set_mempolicy for details on each policy.
Overall NUMA policy actually consists of two parts. Policy itself and
bitmask representing NUMA nodes where is policy effective. Node mask can
be specified using related option, NUMAMask. Default mask can be
overwritten on per-service level.
Originally, `systemctl cat` would match only active units, for example:
$ systemctl cat sshd.service
would cat the sshd.service unit file even if the service was inactive.
However:
$ systemctl cat ssh*
would show it only if it was active.
Let's unify the behavior and cat all unit files regardless of a state,
if no state was given explicitly to filter.
(before)
$ build/systemctl list-machines
Need to be root.
$ sudo build/systemctl list-machines
NAME STATE FAILED JOBS
krowka (host) running 0 0
rawhide running 0 0
2 machines listed.
(after)
$ build/systemctl list-machines
NAME STATE FAILED JOBS
krowka (host) running 0 0
rawhide n/a 0 0
2 machines listed.
The output for non-root is missing some bits of information, but we display
what information is missing nicely, and e.g. in the case when no machines are
running at all, or only VMs are running, the unprivileged output would be the
same as the privileged one.
In cgroup v2 we have protection tunables -- currently MemoryLow and
MemoryMin (there will be more in future for other resources, too). The
design of these protection tunables requires not only intermediate
cgroups to propagate protections, but also the units at the leaf of that
resource's operation to accept it (by setting MemoryLow or MemoryMin).
This makes sense from an low-level API design perspective, but it's a
good idea to also have a higher-level abstraction that can, by default,
propagate these resources to children recursively. In this patch, this
happens by having descendants set memory.low to N if their ancestor has
DefaultMemoryLow=N -- assuming they don't set a separate MemoryLow
value.
Any affected unit can opt out of this propagation by manually setting
`MemoryLow` to some value in its unit configuration. A unit can also
stop further propagation by setting `DefaultMemoryLow=` with no
argument. This removes further propagation in the subtree, but has no
effect on the unit itself (for that, use `MemoryLow=0`).
Our use case in production is simplifying the configuration of machines
which heavily rely on memory protection tunables, but currently require
tweaking a huge number of unit files to make that a reality. This
directive makes that significantly less fragile, and decreases the risk
of misconfiguration.
After this patch is merged, I will implement DefaultMemoryMin= using the
same principles.
Some chattrs only work sensible if you set them right after opening a
file for create (think: FS_NOCOW_FL). Others only work when they are
applied when the file is fully written (think: FS_IMMUTABLE_FL). Let's
take that into account when copying files and applying a chattr to them.
Commit d85515edcf changed logic how reboot is
executed. That commit changed behavior to use emergency action reboot code path
to perform the reboot.
This inadvertently broke rebooting with argument:
$ systemctl reboot custom-reason
Restore original behavior so that if reboot service unit similar to
systemd-reboot.service is executed it is possible to override reboot reason
with "systemctl reboot ARG".
When "systemctl reboot ARG" is executed ARG is placed in file
/run/systemd/reboot-param and reboot is issued using logind's Reboot
dbus-service.
If RebootArgument is specified in systemd-reboot.service it takes precedence
over what systemctl sets.
Fixes: #11828
(This also removes support for booting into the EFI firmware setup
without logind. That's because otherwise the non-EFI fallback logind
implements can't work.)
Fixes: #9896
Now only two operations are left. Let's just move this into the caller,
since it should make things simpler, clearer and shorter, in particular
as there's only a single user for this.
find_default_boot_entry() is only used by systemctl.c, and currently
handles one log message in the caller instead of the callee. Let's
simplify that and move it over, too
Run: systemctl show -a dbus.service | grep -E "SELinuxContext|AppArmorProfile|SmackProcessLabel"
Before patch:
SELinuxContext=[unprintable]
AppArmorProfile=[unprintable]
SmackProcessLabel=[unprintable]
After patch:
SELinuxContext=[""|"value of context"]
AppArmorProfile=[""|"value of context"]
SmackProcessLabel=[""|"value of context"]
This 'root' field contains the root path of the partition we found the
snippet in. The 'kernel', 'initrd', 'efi', … fields are relative to this
path.
This becomes particularly useful later when we add support for loading
snippets from both the ESP and XBOOTLDR, but already simplifies the code
for us a bit in systemctl.
This has been irritating me for quite a while: let's prefix these enum
values with a common prefix, like we do for almost all other enums.
No change in behaviour, just some renaming.
https://bugzilla.redhat.com/show_bug.cgi?id=1656639
Using "--" is a trick that is hard to discover. Let's give users a hint:
$ build/systemctl status -.service
build/systemctl: invalid option -- '.'
Hint: to specify units starting with a dash, use "--":
build/systemctl [OPTIONS...] {COMMAND} -- -.service ...
I use program_invocation_name because that's what getopt seems to use.
"::" is used in the option string so that getopt doesn't complain about
a missing argument in case somebody passes "-." as the argument. After all
"." is not a real option.
This splits out a bunch of functions from fileio.c that have to do with
temporary files. Simply to make the header files a bit shorter, and to
group things more nicely.
No code changes, just some rearranging of source files.
Let's not honour PropertiesChanged signals unless the Jobs properties is
empy. After all we shouldn't consider a service finished unless its
state is inactive/failed *and* no job is queued for it anymore.
Similar to the previous commit: in many cases no further fd processing
needs to be done in forked of children before execve() or any of its
flavours are called. In those case we can use FORK_RLIMIT_NOFILE_SAFE
instead.
Whenever we invoke external, foreign code from code that has
RLIMIT_NOFILE's soft limit bumped to high values, revert it to 1024
first. This is a safety precaution for compatibility with programs using
select() which cannot operate with fds > 1024.
This commit adds the call to rlimit_nofile_safe() to all invocations of
exec{v,ve,l}() and friends that either are in code that we know runs
with RLIMIT_NOFILE bumped up (which is PID 1 and all journal code for
starters) or that is part of shared code that might end up there.
The calls are placed as early as we can in processes invoking a flavour
of execve(), but after the last time we do fd manipulations, so that we
can still take benefit of the high fd limits for that.
Previously, "systemctl edit" exclusively used the service manager's
per-unit FragmentPath property to figure out which file to edit, when
operating on a non-template unit. If for some reason loading the unit
file failed entirely though (LoadState=error), then FragmentPath would
be empty, and thus the unit not editable.
Let's fix this, by falling back to client-side unit file searching in
this case.
(Also, various other clean-ups to make the relevant functions follow our
coding style)
Fixes: #9561
Having systemctl disable/unmask remove all symlinks in /etc and /run is
unintuitive and breaks existing use cases.
systemctl should behave symmetrically.
A "systemctl --runtime unmask" should undo a "systemctl --runtime mask"
action.
Say you have a service, which was masked by the admin in /etc.
If you temporarily want to mask the execution of the service (say in a
script), you'd create a runtime mask via "systemctl --runtime mask".
It is is now no longer possible to undo this temporary mask without
nuking the admin changes, unless you start rm'ing files manually.
While it is useful to be able to remove all enablement/mask symlinks in
one go, this should be done via a separate command line switch, like
"systemctl --all unmask".
This reverts commit 4910b35078.
Fixes: #9393
Ideally, coccinelle would strip unnecessary braces too. But I do not see any
option in coccinelle for this, so instead, I edited the patch text using
search&replace to remove the braces. Unfortunately this is not fully automatic,
in particular it didn't deal well with if-else-if-else blocks and ifdefs, so
there is an increased likelikehood be some bugs in such spots.
I also removed part of the patch that coccinelle generated for udev, where we
returns -1 for failure. This should be fixed independently.
Pretty much everything uses just the first argument, and this doesn't make this
common pattern more complicated, but makes it simpler to pass multiple options.
This makes use of rlimit_nofile_bump() in all tools that access the
journal. In some cases this replaces older code to achieve this, and
others we add it in where it was missing.
Let's split exit code handling in two: "r" is only used for errno-style
errors, and "ret" is used for exit() codes. Then, let's use EXIT_SUCCESS
for checking whether the latter is already used.
This way it should always be clear what kind of error we are processing,
and when we propaate one into the other.
Moreover this allows us to drop "q" form all inner loops, avoiding
confusion when to use "q" and when "r" to store received errors.
Fixes: #9704