This adds support for a TasksMax=40% syntax for specifying values relative to
the system's configured maximum number of processes. This is useful in order to
neatly subdivide the available room for tasks within containers.
seccomp_syscall_resolve_name() can return a mix of positive and negative
(pseudo-) syscall numbers, while errors are signaled via __NR_SCMP_ERROR.
This commit lets the syscall filter parser only abort on real parsing
failures, letting libseccomp handle pseudo-syscall number on its own
and allowing proper multiplexed syscalls filtering.
If for whatever reason the file system is "corrupted", we want
to be resilient and ignore the error, as long as we can load the units
from a different place.
Arch bug https://bugs.archlinux.org/task/49547.
A user had an ntfs symlink (essentially a file) instead of a directory after
restoring from backup. We should just ignore that like we would treat a missing
directory, for general resiliency.
We should treat permission errors similarly. For example an unreadable
/usr/local/lib directory would prevent (user) instances of systemd from
loading any units. It seems better to continue.
If a percentage is used, it is taken relative to the installed RAM size. This
should make it easier to write generic unit files that adapt to the local system.
This patch implements the new magic character '!'. By putting '!' in front
of a command, systemd executes it with full privileges ignoring paramters
such as User, Group, SupplementaryGroups, CapabilityBoundingSet,
AmbientCapabilities, SecureBits, SystemCallFilter, SELinuxContext,
AppArmorProfile, SmackProcessLabel, and RestrictAddressFamilies.
Fixes partially https://github.com/systemd/systemd/issues/3414
Related to https://github.com/coreos/rkt/issues/2482
Testing:
1. Create a user 'bob'
2. Create the unit file /etc/systemd/system/exec-perm.service
(You can use the example below)
3. sudo systemctl start ext-perm.service
4. Verify that the commands starting with '!' were not executed as bob,
4.1 Looking to the output of ls -l /tmp/exec-perm
4.2 Each file contains the result of the id command.
`````````````````````````````````````````````````````````````````
[Unit]
Description=ext-perm
[Service]
Type=oneshot
TimeoutStartSec=0
User=bob
ExecStartPre=!/usr/bin/sh -c "/usr/bin/rm /tmp/exec-perm*" ;
/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start-pre"
ExecStart=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start" ;
!/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-star-2"
ExecStartPost=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start-post"
ExecReload=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-reload"
ExecStop=!/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-stop"
ExecStopPost=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-stop-post"
[Install]
WantedBy=multi-user.target]
`````````````````````````````````````````````````````````````````
Recently added cgroup unified hierarchy support uses "max" in configurations
for no upper limit. While consistent with what the kernel uses for no upper
limit, it is inconsistent with what systemd uses for other controllers such as
memory or pids. There's no point in introducing another term. Update cgroup
unified hierarchy support so that "infinity" is the only term that systemd
uses for no upper limit.
On the unified hierarchy, memory controller implements three control knobs -
low, high and max which enables more useable and versatile control over memory
usage. This patch implements support for the three control knobs.
* MemoryLow, MemoryHigh and MemoryMax are added for memory.low, memory.high and
memory.max, respectively.
* As all absolute limits on the unified hierarchy use "max" for no limit, make
memory limit parse functions accept "max" in addition to "infinity" and
document "max" for the new knobs.
* Implement compatibility translation between MemoryMax and MemoryLimit.
v2:
- Fixed missing else's in config_parse_memory_limit().
- Fixed missing newline when writing out drop-ins.
- Coding style updates to use "val > 0" instead of "val".
- Minor updates to documentation.
CGroupBlockIODeviceBandwith is used to keep track of IO bandwidth limits for
legacy cgroup hierarchies. Unlike the unified hierarchy counterpart
CGroupIODeviceLimit, a CGroupBlockIODeviceBandwiddth records either a read or
write limit and has a couple issues.
* There's no way to clear specific config entry.
* When configs are cleared for an IO direction of a unit, the kernel settings
aren't cleared accordingly creating discrepancies.
This patch updates CGroupBlockIODeviceBandwidth so that it behaves similarly to
CGroupIODeviceLimit - each entry records both rbps and wbps limits and is
cleared if both are at default values after kernel settings are updated.
Currently, there are two cgroup IO limits, bandwidth max for read and write,
and they are hard-coded in various places. This is fine for two limits but IO
is expected to grow more limits - low, high and max limits for bandwidth and
IOPS - and hard-coding each limit won't make sense.
This patch replaces hard-coded limits with an array indexed by
CGroupIOLimitType and accompanying string and default value tables so that new
limits can be added trivially.
On the unified hierarchy, blkio controller is renamed to io and the interface
is changed significantly.
* blkio.weight and blkio.weight_device are consolidated into io.weight which
uses the standardized weight range [1, 10000] with 100 as the default value.
* blkio.throttle.{read|write}_{bps|iops}_device are consolidated into io.max.
Expansion of throttling features is being worked on to support
work-conserving absolute limits (io.low and io.high).
* All stats are consolidated into io.stats.
This patchset adds support for the new interface. As the interface has been
revamped and new features are expected to be added, it seems best to treat it
as a separate controller rather than trying to expand the blkio settings
although we might add automatic translation if only blkio settings are
specified.
* io.weight handling is mostly identical to blkio.weight[_device] handling
except that the weight range is different.
* Both read and write bandwidth settings are consolidated into
CGroupIODeviceLimit which describes all limits applicable to the device.
This makes it less painful to add new limits.
* "max" can be used to specify the maximum limit which is equivalent to no
config for max limits and treated as such. If a given CGroupIODeviceLimit
doesn't contain any non-default configs, the config struct is discarded once
the no limit config is applied to cgroup.
* lookup_blkio_device() is renamed to lookup_block_device().
Signed-off-by: Tejun Heo <htejun@fb.com>
Previously, we had two enums ManagerRunningAs and UnitFileScope, that were
mostly identical and converted from one to the other all the time. The latter
had one more value UNIT_FILE_GLOBAL however.
Let's simplify things, and remove ManagerRunningAs and replace it by
UnitFileScope everywhere, thus making the translation unnecessary. Introduce
two new macros MANAGER_IS_SYSTEM() and MANAGER_IS_USER() to simplify checking
if we are running in one or the user context.
A long time ago – when generators where first introduced – the directories for
them were randomly created via mkdtemp(). This was changed later so that they
use fixed name directories now. Let's make use of this, and add the genrator
dirs to the LookupPaths structure and into the unit file search path maintained
in it. This has the benefit that the generator dirs are now normal part of the
search path for all tools, and thus are shown in "systemctl list-unit-files"
too.
* core/unit: extract checking of stat paths into helper function
The same code was repeated three times.
* core: treat masked files as "unchanged"
systemctl prints the "unit file changed on disk" warning
for a masked unit. I think it's better to print nothing in that
case.
When a masked unit is loaded, set mtime as 0. When checking
if a unit with mtime of 0 needs reload, check that the mask
is still in place.
* test-dnssec: fix build without gcrypt
Also reorder the test functions to follow the way they are called
from main().
systemctl prints the "unit file changed on disk" warning
for a masked unit. I think it's better to print nothing in that
case.
When a masked unit is loaded, set mtime as 0. When checking
if a unit with mtime of 0 needs reload, check that the mask
is still in place.
If first attempt to merge units failed and we are trying to do
merge the other way around and at the same time we are working with
template name, then other unit can't possibly be template, because it is
not possible to have template unit running, only instances of the
template. Thus we need to look for already active instance instead.
Throughout the tree there's spurious use of spaces separating ++ and --
operators from their respective operands. Make ++ and -- operator
consistent with the majority of existing uses; discard the spaces.
The setting is hardly useful (since its effect is generally reduced to zero due
to file system caps), and with the advent of ambient caps an actually useful
replacement exists, hence let's get rid of this.
I am pretty sure this was unused and our man page already recommended against
its use, hence this should be a safe thing to remove.
Support for net_cls.class_id through the NetClass= configuration directive
has been added in v227 in preparation for a per-unit packet filter mechanism.
However, it turns out the kernel people have decided to deprecate the net_cls
and net_prio controllers in v2. Tejun provides a comprehensive justification
for this in his commit, which has landed during the merge window for kernel
v4.5:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=bd1060a1d671
As we're aiming for full support for the v2 cgroup hierarchy, we can no
longer support this feature. Userspace tool such as nftables are moving over
to setting rules that are specific to the full cgroup path of a task, which
obsoletes these controllers anyway.
This commit removes support for tweaking details in the net_cls controller,
but keeps the NetClass= directive around for legacy compatibility reasons.
This clean-ups timeout handling in PID 1. Specifically, instead of storing 0 in internal timeout variables as
indication for a disabled timeout, use USEC_INFINITY which is in-line with how we do this in the rest of our code
(following the logic that 0 means "no", and USEC_INFINITY means "never").
This also replace all usec_t additions with invocations to usec_add(), so that USEC_INFINITY is properly propagated,
and sd-event considers it has indication for turning off the event source.
This also alters the deserialization of the units to restart timeouts from the time they were originally started from.
Before this patch timeouts would be restarted beginning with the time of the deserialization, which could lead to
artificially prolonged timeouts if a daemon reload took place.
Finally, a new RuntimeMaxSec= setting is introduced for service units, that specifies a maximum runtime after which a
specific service is forcibly terminated. This is useful to put time limits on time-intensive processing jobs.
This also simplifies the various xyz_spawn() calls of the various types in that explicit distruction of the timers is
removed, as that is done anyway by the state change handlers, and a state change is always done when the xyz_spawn()
calls fail.
Fixes: #2249
This patch adds support for ambient capabilities in service files. The
idea with ambient capabilities is that the execed processes can run with
non-root user and get some inherited capabilities, without having any
need to add the capabilities to the executable file.
You need at least Linux 4.3 to use ambient capabilities. SecureBit
keep-caps is automatically added when you use ambient capabilities and
wish to change the user.
An example system service file might look like this:
[Unit]
Description=Service for testing caps
[Service]
ExecStart=/usr/bin/sleep 10000
User=nobody
AmbientCapabilities=CAP_NET_ADMIN CAP_NET_RAW
After starting the service it has these capabilities:
CapInh: 0000000000003000
CapPrm: 0000000000003000
CapEff: 0000000000003000
CapBnd: 0000003fffffffff
CapAmb: 0000000000003000
Change the capability bounding set parser and logic so that the bounding
set is kept as a positive set internally. This means that the set
reflects those capabilities that we want to keep instead of drop.
When parse ExecXXX=, specifiers are not resolved in
config_parse_exec(). Finally, the specifiers are set into unit
properties. So, systemctl shows not resolved speicifier on "Process:"
field.
To set the exec properties well, resolve specifiers before parse the
rvale by unit_full_printf();
GLIB has recently started to officially support the gcc cleanup
attribute in its public API, hence let's do the same for our APIs.
With this patch we'll define an xyz_unrefp() call for each public
xyz_unref() call, to make it easy to use inside a
__attribute__((cleanup())) expression. Then, all code is ported over to
make use of this.
The new calls are also documented in the man pages, with examples how to
use them (well, I only added docs where the _unref() call itself already
had docs, and the examples, only cover sd_bus_unrefp() and
sd_event_unrefp()).
This also renames sd_lldp_free() to sd_lldp_unref(), since that's how we
tend to call our destructors these days.
Note that this defines no public macro that wraps gcc's attribute and
makes it easier to use. While I think it's our duty in the library to
make our stuff easy to use, I figure it's not our duty to make gcc's own
features easy to use on its own. Most likely, client code which wants to
make use of this should define its own:
#define _cleanup_(function) __attribute__((cleanup(function)))
Or similar, to make the gcc feature easier to use.
Making this logic public has the benefit that we can remove three header
files whose only purpose was to define these functions internally.
See #2008.
The new parser supports:
<value> - specify both limits to the same value
<soft:hard> - specify both limits
the size or time specific suffixes are supported, for example
LimitRTTIME=1sec
LimitAS=4G:16G
The patch introduces parse_rlimit_range() and rlim type (size, sec,
usec, etc.) specific parsers. No code is duplicated now.
The patch also sync docs for DefaultLimitXXX= and LimitXXX=.
References: https://github.com/systemd/systemd/issues/1769
Now we don't support the socket protocol like
sctp and udplite .
This patch add a new config param
SocketProtocol: udplite/sctp
With this now we can configure the protocol as
udplite = IPPROTO_UDPLITE
sctp = IPPROTO_SCTP
Tested with nspawn:
As discussed at systemd.conf 2015 and on also raised on the ML:
http://lists.freedesktop.org/archives/systemd-devel/2015-November/034880.html
This removes the two XyzOverridable= unit dependencies, that were
basically never used, and do not enhance user experience in any way.
Most folks looking for the functionality this provides probably opt for
the "ignore-dependencies" job mode, and that's probably a good idea.
Hence, let's simplify systemd's dependency engine and remove these two
dependency types (and their inverses).
The unit file parser and the dbus property parser will now redirect
the settings/properties to result in an equivalent non-overridable
dependency. In the case of the unit file parser we generate a warning,
to inform the user.
The dbus properties for this unit type stay available on the unit
objects, but they are now hidden from usual introspection and will
always return the empty list when queried.
This should provide enough compatibility for the few unit files that
actually ever made use of this.
3d793d2905 broke parsing of unit file
names that include backslashes, as extract_first_word() strips those.
Fix this, by introducing a new EXTRACT_RETAIN_ESCAPE flag which disables
looking at any flags, thus being compatible with the classic
FOREACH_WORD() behaviour.
This directive allows passing environment variables from the system
manager to spawned services. Variables in the system manager can be set
inside a container by passing `--set-env=...` options to systemd-spawn.
Tested with an on-disk test.service unit. Tested using multiple variable
names on a single line, with an empty setting to clear the current list
of variables, with non-existing variables.
Tested using `systemd-run -p PassEnvironment=VARNAME` to confirm it
works with transient units.
Confirmed that `systemctl show` will display the PassEnvironment
settings.
Checked that man pages are generated correctly.
No regressions in `make check`.
Let's make sure "LimitCPU=30min" can be parsed properly, following the
usual logic how we parse time values. Similar for LimitRTTIME=.
While we are at it, extend a bit on the man page section about resource
limits.
Fixes: #1772
Let's not convert RLIM_INFINITY to "unsigned long long" and then back to
rlim_t, but let's leave it in the right type right-away.
Parse resource limits as 64 bit in all cases, as according to the man
page that's what libc does anyway.
Make sure setting a resource limit to (uint64_t) -1 results in a parsing
error, and isn't implicitly converted to RLIM_INFINITY.
Let's generate a simple error, and that's it. Let's not try to be smart
and record the last word that failed.
Also, let's make sure we don't compare numeric values with 0 by relying
on C's downgrade-to-bool feature, as suggested in CODING_STYLE.
Let's make things more user-friendly and support for example
LimitAS=16G
rather than force users to always use LimitAS=16106127360.
The change is relevant for options:
[Default]Limit{FSIZE,DATA,STACK,CORE,RSS,AS,MEMLOCK,MSGQUEUE}
The patch introduces config_parse_bytes_limit(), it's the same as
config_parse_limit() but uses parse_size() tu support the suffixes.
Addresses: https://github.com/systemd/systemd/issues/1772
There are more than enough calls doing string manipulations to deserve
its own files, hence do something about it.
This patch also sorts the #include blocks of all files that needed to be
updated, according to the sorting suggestions from CODING_STYLE. Since
pretty much every file needs our string manipulation functions this
effectively means that most files have sorted #include blocks now.
Also touches a few unrelated include files.
- Really warn in all error cases, not just some. We need to make sure
that all errors are logged to not confuse the user.
- Explicitly check for EINVAL error code before claiming anything about
invalid escapes, could be ENOMEM after all.
This adds support for naming file descriptors passed using socket
activation. The names are passed in a new $LISTEN_FDNAMES= environment
variable, that matches the existign $LISTEN_FDS= one and contains a
colon-separated list of names.
This also adds support for naming fds submitted to the per-service fd
store using FDNAME= in the sd_notify() message.
This also adds a new FileDescriptorName= setting for socket unit files
to set the name for fds created by socket units.
This also adds a new call sd_listen_fds_with_names(), that is similar to
sd_listen_fds(), but also returns the names of the fds.
systemd-activate gained the new --fdname= switch to specify a name for
testing socket activation.
This is based on #1247 by Maciej Wereski.
Fixes#1247.
- Rely everywhere that we use abs() on the error code passed in anyway,
thus don't need to explicitly negate what we pass in
- Never attach synthetic error number information to log messages. Only
log about errors we *receive* with the error number we got there,
don't log any synthetic error, that don#t even propagate, but just eat
up.
- Be more careful with attaching exactly the error we get, instead of
errno or unrelated errors randomly.
- Fix one occasion where the error number and line number got swapped.
- Make sure we never tape over OOM issues, or inability to resolve
specifiers
If set to ~ the working directory is set to the home directory of the
user configured in User=.
This change also exposes the existing switch for the working directory
that allowed making missing working directories non-fatal.
This also changes "machinectl shell" to make use of this to ensure that
the invoked shell is by default in the user's home directory.
Fixes#1268.
Tested with a dummy service running 'sleep', modifying its CPUAffinity,
restarting the service and checking the ^Cpus_allowed entries in the
/proc/PID/status file.
Some additional files related to single socket may appear in the
filesystem and they should be opened and passed to related service.
This commit adds optional list of file descriptors, which are
dynamically discovered and opened.
Add a new config directive called NetClass= to CGroup enabled units.
Allowed values are positive numbers for fix assignments and "auto" for
picking a free value automatically, for which we need to keep track of
dynamically assigned net class IDs of units. Introduce a hash table for
this, and also record the last ID that was given out, so the allocator
can start its search for the next 'hole' from there. This could
eventually be optimized with something like an irb.
The class IDs up to 65536 are considered reserved and won't be
assigned automatically by systemd. This barrier can be made a config
directive in the future.
Values set in unit files are stored in the CGroupContext of the
unit and considered read-only. The actually assigned number (which
may have been chosen dynamically) is stored in the unit itself and
is guaranteed to remain stable as long as the unit is active.
In the CGroup controller, set the configured CGroup net class to
net_cls.classid. Multiple unit may share the same net class ID,
and those which do are linked together.
Let's stop using the "unsigned long" type for weights/shares, and let's
just use uint64_t for this, as that's what we expose on the bus.
Unify parsers, and always validate the range for these fields.
Correct the default blockio weight to 500, since that's what the kernel
actually uses.
When parsing the weight/shares settings from unit files accept the empty
string as a way to reset the weight/shares value. When getting it via
the bus, uniformly map (uint64_t) -1 to unset.
Open up StartupCPUShares= and StartupBlockIOWeight= to transient units.
This adds support for the new "pids" cgroup controller of 4.3 kernels.
It allows accounting the number of tasks in a cgroup and enforcing
limits on it.
This adds two new setting TasksAccounting= and TasksMax= to each unit,
as well as a gloabl option DefaultTasksAccounting=.
This also updated "cgtop" to optionally make use of the new
kernel-provided accounting.
systemctl has been updated to show the number of tasks for each service
if it is available.
This patch also adds correct support for undoing memory limits for units
using a MemoryLimit=infinity syntax. We do the same for TasksMax= now
and hence keep things in sync here.
off_t is a really weird type as it is usually 64bit these days (at least
in sane programs), but could theoretically be 32bit. We don't support
off_t as 32bit builds though, but still constantly deal with safely
converting from off_t to other types and back for no point.
Hence, never use the type anymore. Always use uint64_t instead. This has
various benefits, including that we can expose these values directly as
D-Bus properties, and also that the values parse the same in all cases.
.nspawn fiels are simple settings files that may accompany container
images and directories and contain settings otherwise passed on the
nspawn command line. This provides an efficient way to attach execution
data directly to containers.
We store the properties for transient units in drop-ins anyway, and
units don't have to have fragment files, hence don't bother with them,
and don't create them.
When generating utmp/wtmp entries, optionally add both LOGIN_PROCESS and
INIT_PROCESS entries or even all three of LOGIN_PROCESS, INIT_PROCESS
and USER_PROCESS entries, instead of just a single INIT_PROCESS entry.
With this change systemd may be used to not only invoke a getty directly
in a SysV-compliant way but alternatively also a login(1) implementation
or even forego getty and login entirely, and invoke arbitrary shells in
a way that they appear in who(1) or w(1).
This is preparation for a later commit that adds a "machinectl shell"
operation to invoke a shell in a container, in a way that is compatible
with who(1) and w(1).
This is so that, when called in a loop, unquote_first_word can
distinguish between reaching the end of a string because it has consumed
all the input before the end, and consuming all the input.
This is important because we later add a flag that allows
char *in = "";
char *out;
unquote_first_word(&in, &out, flags);
To put "" in out, and set in = NULL, so the trailing empty string of the
input can be consumed, and mark that the input has been consumed.
This is consistent with how an empty string works in an ExecStart=
statement. We should not differentiate between an empty string and
whitespace only (since they look the same.)
Update the test case with whitespace only to reflect that the list is
reset.
Tested that `test-unit-file` passes and other test cases are not
affected. Installed the patched systemd binaries on a machine, booted
it, looked for out of the usual behavior but did not find any.
Convert config_parse_exec() from using FOREACH_WORD_QUOTED into a loop
of unquote_first_word.
Loop through the arguments only once (the FOREACH_WORD_QUOTED
implementation did it twice, once to count them and another time to
process and store them.)
Use newly introduced flag UNQUOTE_UNESCAPE_RELAX to preserve
unrecognized escape sequences such as regexps matches such as "\w",
"\d", etc. (Valid escape sequences such as "\s" or "\b" still need an
extra backslash if literals are desired for regexps.)
Differences in behavior:
- Handle ; (command separator) in special, so that only ; on its own is
valid for that purpose, an quoted semicolon ";" or ';' will now behave
as a literal semicolon. This is probably what was initially intended.
- Handle \; (to introduce a literal semicolon) in special, so that only \;
is turned into a semicolon but not \\; or "\\;" or "\;" which are kept
as a literal \; in the output. This is probably what was initially
intended.
Known issues:
- Using an empty string (for example, ExecStartPre=<empty>) will empty
the list and remove the existing commands, but using whitespace only
(for example, ExecStartPre=<spaces>) will not. This is a pre-existing
issue and will be dealt with in a follow up commit.
Tested:
- Unit tests passing. Also `make distcheck` still works as expected.
- Installed it on a local machine and booted with it, checked console
output, systemctl and journalctl output, did not notice any issues
running the patched systemd binaries.
Relevant bug: https://bugs.freedesktop.org/show_bug.cgi?id=90794
The cunescape() helper function used to handle unknown escaping sequences
gracefully by copying them over verbatim.
Commit 527b7a42 ("util: rework cunescape(), improve error handling") added
a flag to make that behavior optional, and changed to default to error out
with -EINVAL otherwise.
However, config_parse_exec(), which is used to parse the
Exec{Start,Stop}{Post,Pre,} directives of unit files, was not changed along
with that commit, which means that directives with improperly escaped
command line strings are no longer parsed.
Relevant bugreports include:
https://bugs.freedesktop.org/show_bug.cgi?id=90794https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=787256
Fix this by passing UNESCAPE_RELAX to config_parse_exec() in order to
restore the original behavior.
Given that socket_address_parse() is mostly a "library" call it
shouldn't log on its own, but leave that to its caller.
This patch removes logging from the call in case IPv6 is not available
but and IPv6 address shall be parsed. Instead a new call
socket_address_parse_and_warn() is introduced which first invokes
socket_address_parse() and then logs if necessary.
This should fix "make check" on ipv6-less kernels:
http://lists.freedesktop.org/archives/systemd-devel/2015-April/031385.html
An Exec*= line with whitespace after modifiers, like
ExecStart=- /bin/true
is considered to have an empty command path. This is as specified, but causes
systemd to crash with
Assertion 'skip < l' failed at ../src/core/load-fragment.c:607, function config_parse_exec(). Aborting.
Aborted (core dumped)
Fix this by logging an error instead and ignoring the invalid line.
Add corresponding test cases. Also add a test case for a completely empty value
which resets the command list.
https://launchpad.net/bugs/1454173
It's primarily just a property of the Manager object after all, and we
try to refer to PID 1 as "manager" instead of "systemd", hence let's to
stick to this here too.
A variety of changes:
- Make sure all our calls distuingish OOM from other errors if OOM is
not the only error possible.
- Be much stricter when parsing escaped paths, do not accept trailing or
leading escaped slashes.
- Change unit validation to take a bit mask for allowing plain names,
instance names or template names or an combination thereof.
- Refuse manipulating invalid unit name
Change cunescape() to return a normal error code, so that we can
distuingish OOM errors from parse errors.
This also adds a flags parameter to control whether "relaxed" or normal
parsing shall be done. If set no parse failures are generated, and the
only reason why cunescape() can fail is OOM.
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
The handling of the command name and other arguments is unified. This
simplifies things and should make them more predictable for users.
Incidentally, this makes ExecStart handling match the .desktop file
specification, apart for the requirment for an absolute path.
https://bugs.freedesktop.org/show_bug.cgi?id=86171
Also, rename filename_is_safe() to filename_is_valid(), since it
actually does a full validation for what the kernel will accept as file
name, it's not just a heuristic.
In service file, if the file has some of special SMACK label in
ExecStart= and systemd has no permission for the special SMACK label
then permission error will occurred. To resolve this, systemd should
be able to set its SMACK label to something accessible of ExecStart=.
So introduce new SmackProcessLabel. If label is specified with
SmackProcessLabel= then the child systemd will set its label to
that. To successfully execute the ExecStart=, accessible label should
be specified with SmackProcessLabel=.
Additionally, by SMACK policy, if the file in ExecStart= has no
SMACK64EXEC then the executed process will have given label by
SmackProcessLabel=. But if the file has SMACK64EXEC then the
SMACK64EXEC label will be overridden.
[zj: reword man page]