Make possible to set NUMA allocation policy for manager. Manager's
policy is by default inherited to all forked off processes. However, it
is possible to override the policy on per-service basis. Currently we
support, these policies: default, prefer, bind, interleave, local.
See man 2 set_mempolicy for details on each policy.
Overall NUMA policy actually consists of two parts. Policy itself and
bitmask representing NUMA nodes where is policy effective. Node mask can
be specified using related option, NUMAMask. Default mask can be
overwritten on per-service level.
Originally, `systemctl cat` would match only active units, for example:
$ systemctl cat sshd.service
would cat the sshd.service unit file even if the service was inactive.
However:
$ systemctl cat ssh*
would show it only if it was active.
Let's unify the behavior and cat all unit files regardless of a state,
if no state was given explicitly to filter.
(before)
$ build/systemctl list-machines
Need to be root.
$ sudo build/systemctl list-machines
NAME STATE FAILED JOBS
krowka (host) running 0 0
rawhide running 0 0
2 machines listed.
(after)
$ build/systemctl list-machines
NAME STATE FAILED JOBS
krowka (host) running 0 0
rawhide n/a 0 0
2 machines listed.
The output for non-root is missing some bits of information, but we display
what information is missing nicely, and e.g. in the case when no machines are
running at all, or only VMs are running, the unprivileged output would be the
same as the privileged one.
In cgroup v2 we have protection tunables -- currently MemoryLow and
MemoryMin (there will be more in future for other resources, too). The
design of these protection tunables requires not only intermediate
cgroups to propagate protections, but also the units at the leaf of that
resource's operation to accept it (by setting MemoryLow or MemoryMin).
This makes sense from an low-level API design perspective, but it's a
good idea to also have a higher-level abstraction that can, by default,
propagate these resources to children recursively. In this patch, this
happens by having descendants set memory.low to N if their ancestor has
DefaultMemoryLow=N -- assuming they don't set a separate MemoryLow
value.
Any affected unit can opt out of this propagation by manually setting
`MemoryLow` to some value in its unit configuration. A unit can also
stop further propagation by setting `DefaultMemoryLow=` with no
argument. This removes further propagation in the subtree, but has no
effect on the unit itself (for that, use `MemoryLow=0`).
Our use case in production is simplifying the configuration of machines
which heavily rely on memory protection tunables, but currently require
tweaking a huge number of unit files to make that a reality. This
directive makes that significantly less fragile, and decreases the risk
of misconfiguration.
After this patch is merged, I will implement DefaultMemoryMin= using the
same principles.
Some chattrs only work sensible if you set them right after opening a
file for create (think: FS_NOCOW_FL). Others only work when they are
applied when the file is fully written (think: FS_IMMUTABLE_FL). Let's
take that into account when copying files and applying a chattr to them.
Commit d85515edcf changed logic how reboot is
executed. That commit changed behavior to use emergency action reboot code path
to perform the reboot.
This inadvertently broke rebooting with argument:
$ systemctl reboot custom-reason
Restore original behavior so that if reboot service unit similar to
systemd-reboot.service is executed it is possible to override reboot reason
with "systemctl reboot ARG".
When "systemctl reboot ARG" is executed ARG is placed in file
/run/systemd/reboot-param and reboot is issued using logind's Reboot
dbus-service.
If RebootArgument is specified in systemd-reboot.service it takes precedence
over what systemctl sets.
Fixes: #11828
(This also removes support for booting into the EFI firmware setup
without logind. That's because otherwise the non-EFI fallback logind
implements can't work.)
Fixes: #9896
Now only two operations are left. Let's just move this into the caller,
since it should make things simpler, clearer and shorter, in particular
as there's only a single user for this.
find_default_boot_entry() is only used by systemctl.c, and currently
handles one log message in the caller instead of the callee. Let's
simplify that and move it over, too
Run: systemctl show -a dbus.service | grep -E "SELinuxContext|AppArmorProfile|SmackProcessLabel"
Before patch:
SELinuxContext=[unprintable]
AppArmorProfile=[unprintable]
SmackProcessLabel=[unprintable]
After patch:
SELinuxContext=[""|"value of context"]
AppArmorProfile=[""|"value of context"]
SmackProcessLabel=[""|"value of context"]
This 'root' field contains the root path of the partition we found the
snippet in. The 'kernel', 'initrd', 'efi', … fields are relative to this
path.
This becomes particularly useful later when we add support for loading
snippets from both the ESP and XBOOTLDR, but already simplifies the code
for us a bit in systemctl.
This has been irritating me for quite a while: let's prefix these enum
values with a common prefix, like we do for almost all other enums.
No change in behaviour, just some renaming.
https://bugzilla.redhat.com/show_bug.cgi?id=1656639
Using "--" is a trick that is hard to discover. Let's give users a hint:
$ build/systemctl status -.service
build/systemctl: invalid option -- '.'
Hint: to specify units starting with a dash, use "--":
build/systemctl [OPTIONS...] {COMMAND} -- -.service ...
I use program_invocation_name because that's what getopt seems to use.
"::" is used in the option string so that getopt doesn't complain about
a missing argument in case somebody passes "-." as the argument. After all
"." is not a real option.
This splits out a bunch of functions from fileio.c that have to do with
temporary files. Simply to make the header files a bit shorter, and to
group things more nicely.
No code changes, just some rearranging of source files.
Let's not honour PropertiesChanged signals unless the Jobs properties is
empy. After all we shouldn't consider a service finished unless its
state is inactive/failed *and* no job is queued for it anymore.
Similar to the previous commit: in many cases no further fd processing
needs to be done in forked of children before execve() or any of its
flavours are called. In those case we can use FORK_RLIMIT_NOFILE_SAFE
instead.
Whenever we invoke external, foreign code from code that has
RLIMIT_NOFILE's soft limit bumped to high values, revert it to 1024
first. This is a safety precaution for compatibility with programs using
select() which cannot operate with fds > 1024.
This commit adds the call to rlimit_nofile_safe() to all invocations of
exec{v,ve,l}() and friends that either are in code that we know runs
with RLIMIT_NOFILE bumped up (which is PID 1 and all journal code for
starters) or that is part of shared code that might end up there.
The calls are placed as early as we can in processes invoking a flavour
of execve(), but after the last time we do fd manipulations, so that we
can still take benefit of the high fd limits for that.
Previously, "systemctl edit" exclusively used the service manager's
per-unit FragmentPath property to figure out which file to edit, when
operating on a non-template unit. If for some reason loading the unit
file failed entirely though (LoadState=error), then FragmentPath would
be empty, and thus the unit not editable.
Let's fix this, by falling back to client-side unit file searching in
this case.
(Also, various other clean-ups to make the relevant functions follow our
coding style)
Fixes: #9561
Having systemctl disable/unmask remove all symlinks in /etc and /run is
unintuitive and breaks existing use cases.
systemctl should behave symmetrically.
A "systemctl --runtime unmask" should undo a "systemctl --runtime mask"
action.
Say you have a service, which was masked by the admin in /etc.
If you temporarily want to mask the execution of the service (say in a
script), you'd create a runtime mask via "systemctl --runtime mask".
It is is now no longer possible to undo this temporary mask without
nuking the admin changes, unless you start rm'ing files manually.
While it is useful to be able to remove all enablement/mask symlinks in
one go, this should be done via a separate command line switch, like
"systemctl --all unmask".
This reverts commit 4910b35078.
Fixes: #9393
Ideally, coccinelle would strip unnecessary braces too. But I do not see any
option in coccinelle for this, so instead, I edited the patch text using
search&replace to remove the braces. Unfortunately this is not fully automatic,
in particular it didn't deal well with if-else-if-else blocks and ifdefs, so
there is an increased likelikehood be some bugs in such spots.
I also removed part of the patch that coccinelle generated for udev, where we
returns -1 for failure. This should be fixed independently.
Pretty much everything uses just the first argument, and this doesn't make this
common pattern more complicated, but makes it simpler to pass multiple options.
This makes use of rlimit_nofile_bump() in all tools that access the
journal. In some cases this replaces older code to achieve this, and
others we add it in where it was missing.
Let's split exit code handling in two: "r" is only used for errno-style
errors, and "ret" is used for exit() codes. Then, let's use EXIT_SUCCESS
for checking whether the latter is already used.
This way it should always be clear what kind of error we are processing,
and when we propaate one into the other.
Moreover this allows us to drop "q" form all inner loops, avoiding
confusion when to use "q" and when "r" to store received errors.
Fixes: #9704
Actually check the return code from logind_schedule_shutdown() and proceed to
immediate shutdown if that fails. Negative return codes can be returned if
systemctl is compiled without logind support, or if logind otherwise failed
(either too old, disabled/masked, or it is incomplete
systemd-shim/systemd-service implementation).
If logind is not supported, don't try to schedule a shutdown,
immediately poweroff. This is the behavior indicated by the current
message given to the user, but the command is returning an error. I
believe this was broken on this commit:
7f96539d45
This is a bit like the info link in most of GNU's --help texts, but we
don't do info but man pages, and we make them properly clickable on
terminal supporting that, because awesome.
I think it's generally advisable to link up our (brief) --help texts and
our (more comprehensive) man pages a bit, so this should be an easy and
straight-forward way to do it.
This makes it possible to wait until boot is finished without having to poll
for this command repeatedly, instead using the syntax:
$ systemctl is-system-running --wait
Waiting is implemented by waiting for the StartupFinished signal to be posted
on the bus.
Register the matcher before checking for the property to avoid race conditions.
Tested by artificially delaying startup with a oneshot service and calling this
command, checked that it emitted `running` and exited with a 0 return code as
soon as the delay service completed startup.
Also tested that booting to degraded state unblocks the command.
Inserted a delay between getting the property and waiting for the signal and
confirmed this seems to work free of race conditions.
Updated the --help text (under --wait) and the man page to document the new
feature.
the service manager serializes ExecStop= execution data after
ExecStart=, like it makes sense and how it should be expected. However,
systemctl previously would reverse them when deserializing them locally,
and thus show ExecStop= results before ExecStart= results. And that's
confusing. Let's fix that.
This got broken in 9d9dd746d4, because a template
is not a valid unit, so the check for being masked failed. Avoid this by
handling templates specially. Fixes#9554.
Also, this improves 'cat' with masked units:
(before) $ systemctl cat foofoofoo@.service
Failed to derive unit name prefix from unit name: Invalid argument
Failed to derive unit name prefix from unit name: Invalid argument
Failed to derive unit name prefix from unit name: Invalid argument
Failed to derive unit name prefix from unit name: Invalid argument
Failed to derive unit name prefix from unit name: Invalid argument
Failed to derive unit name prefix from unit name: Invalid argument
Failed to derive unit name prefix from unit name: Invalid argument
Failed to derive unit name prefix from unit name: Invalid argument
Failed to derive unit name prefix from unit name: Invalid argument
Failed to derive unit name prefix from unit name: Invalid argument
(after) $ build/systemctl cat foofoofoo@.service
In check_triggering_units(), the call to unit_is_masked() is replaced with an
open-coded check. This is a bit unfortunate, but unit_is_masked() now requires
LookupPaths to be initialized, which we don't have or need in this case, so it
seems easiest to just accept this tiny code duplication.
Back in 2012 the project was renamed, see the release notes for v 0.105
[https://cgit.freedesktop.org/polkit/tree/NEWS#n754]. Let's update our
documentation and comments to do the same. Referring to PolicyKit is confusing
to users because at the time the polkit api changed too, and we support the new
version. I updated NEWS too, since all the references to PolicyKit there were
added after the rename.
"PolicyKit" is unchanged in various URLs and method call names.
The kernel added support for a new cgroup memory controller knob memory.min in
bf8d5d52ffe8 ("memcg: introduce memory.min") which was merged during v4.18
merge window.
Add MemoryMin to support memory.min.