This patch adds support for ambient capabilities in service files. The
idea with ambient capabilities is that the execed processes can run with
non-root user and get some inherited capabilities, without having any
need to add the capabilities to the executable file.
You need at least Linux 4.3 to use ambient capabilities. SecureBit
keep-caps is automatically added when you use ambient capabilities and
wish to change the user.
An example system service file might look like this:
[Unit]
Description=Service for testing caps
[Service]
ExecStart=/usr/bin/sleep 10000
User=nobody
AmbientCapabilities=CAP_NET_ADMIN CAP_NET_RAW
After starting the service it has these capabilities:
CapInh: 0000000000003000
CapPrm: 0000000000003000
CapEff: 0000000000003000
CapBnd: 0000003fffffffff
CapAmb: 0000000000003000
Change the capability bounding set parser and logic so that the bounding
set is kept as a positive set internally. This means that the set
reflects those capabilities that we want to keep instead of drop.
When parse ExecXXX=, specifiers are not resolved in
config_parse_exec(). Finally, the specifiers are set into unit
properties. So, systemctl shows not resolved speicifier on "Process:"
field.
To set the exec properties well, resolve specifiers before parse the
rvale by unit_full_printf();
GLIB has recently started to officially support the gcc cleanup
attribute in its public API, hence let's do the same for our APIs.
With this patch we'll define an xyz_unrefp() call for each public
xyz_unref() call, to make it easy to use inside a
__attribute__((cleanup())) expression. Then, all code is ported over to
make use of this.
The new calls are also documented in the man pages, with examples how to
use them (well, I only added docs where the _unref() call itself already
had docs, and the examples, only cover sd_bus_unrefp() and
sd_event_unrefp()).
This also renames sd_lldp_free() to sd_lldp_unref(), since that's how we
tend to call our destructors these days.
Note that this defines no public macro that wraps gcc's attribute and
makes it easier to use. While I think it's our duty in the library to
make our stuff easy to use, I figure it's not our duty to make gcc's own
features easy to use on its own. Most likely, client code which wants to
make use of this should define its own:
#define _cleanup_(function) __attribute__((cleanup(function)))
Or similar, to make the gcc feature easier to use.
Making this logic public has the benefit that we can remove three header
files whose only purpose was to define these functions internally.
See #2008.
The new parser supports:
<value> - specify both limits to the same value
<soft:hard> - specify both limits
the size or time specific suffixes are supported, for example
LimitRTTIME=1sec
LimitAS=4G:16G
The patch introduces parse_rlimit_range() and rlim type (size, sec,
usec, etc.) specific parsers. No code is duplicated now.
The patch also sync docs for DefaultLimitXXX= and LimitXXX=.
References: https://github.com/systemd/systemd/issues/1769
Now we don't support the socket protocol like
sctp and udplite .
This patch add a new config param
SocketProtocol: udplite/sctp
With this now we can configure the protocol as
udplite = IPPROTO_UDPLITE
sctp = IPPROTO_SCTP
Tested with nspawn:
As discussed at systemd.conf 2015 and on also raised on the ML:
http://lists.freedesktop.org/archives/systemd-devel/2015-November/034880.html
This removes the two XyzOverridable= unit dependencies, that were
basically never used, and do not enhance user experience in any way.
Most folks looking for the functionality this provides probably opt for
the "ignore-dependencies" job mode, and that's probably a good idea.
Hence, let's simplify systemd's dependency engine and remove these two
dependency types (and their inverses).
The unit file parser and the dbus property parser will now redirect
the settings/properties to result in an equivalent non-overridable
dependency. In the case of the unit file parser we generate a warning,
to inform the user.
The dbus properties for this unit type stay available on the unit
objects, but they are now hidden from usual introspection and will
always return the empty list when queried.
This should provide enough compatibility for the few unit files that
actually ever made use of this.
3d793d2905 broke parsing of unit file
names that include backslashes, as extract_first_word() strips those.
Fix this, by introducing a new EXTRACT_RETAIN_ESCAPE flag which disables
looking at any flags, thus being compatible with the classic
FOREACH_WORD() behaviour.
This directive allows passing environment variables from the system
manager to spawned services. Variables in the system manager can be set
inside a container by passing `--set-env=...` options to systemd-spawn.
Tested with an on-disk test.service unit. Tested using multiple variable
names on a single line, with an empty setting to clear the current list
of variables, with non-existing variables.
Tested using `systemd-run -p PassEnvironment=VARNAME` to confirm it
works with transient units.
Confirmed that `systemctl show` will display the PassEnvironment
settings.
Checked that man pages are generated correctly.
No regressions in `make check`.
Let's make sure "LimitCPU=30min" can be parsed properly, following the
usual logic how we parse time values. Similar for LimitRTTIME=.
While we are at it, extend a bit on the man page section about resource
limits.
Fixes: #1772
Let's not convert RLIM_INFINITY to "unsigned long long" and then back to
rlim_t, but let's leave it in the right type right-away.
Parse resource limits as 64 bit in all cases, as according to the man
page that's what libc does anyway.
Make sure setting a resource limit to (uint64_t) -1 results in a parsing
error, and isn't implicitly converted to RLIM_INFINITY.
Let's generate a simple error, and that's it. Let's not try to be smart
and record the last word that failed.
Also, let's make sure we don't compare numeric values with 0 by relying
on C's downgrade-to-bool feature, as suggested in CODING_STYLE.
Let's make things more user-friendly and support for example
LimitAS=16G
rather than force users to always use LimitAS=16106127360.
The change is relevant for options:
[Default]Limit{FSIZE,DATA,STACK,CORE,RSS,AS,MEMLOCK,MSGQUEUE}
The patch introduces config_parse_bytes_limit(), it's the same as
config_parse_limit() but uses parse_size() tu support the suffixes.
Addresses: https://github.com/systemd/systemd/issues/1772
There are more than enough calls doing string manipulations to deserve
its own files, hence do something about it.
This patch also sorts the #include blocks of all files that needed to be
updated, according to the sorting suggestions from CODING_STYLE. Since
pretty much every file needs our string manipulation functions this
effectively means that most files have sorted #include blocks now.
Also touches a few unrelated include files.
- Really warn in all error cases, not just some. We need to make sure
that all errors are logged to not confuse the user.
- Explicitly check for EINVAL error code before claiming anything about
invalid escapes, could be ENOMEM after all.
This adds support for naming file descriptors passed using socket
activation. The names are passed in a new $LISTEN_FDNAMES= environment
variable, that matches the existign $LISTEN_FDS= one and contains a
colon-separated list of names.
This also adds support for naming fds submitted to the per-service fd
store using FDNAME= in the sd_notify() message.
This also adds a new FileDescriptorName= setting for socket unit files
to set the name for fds created by socket units.
This also adds a new call sd_listen_fds_with_names(), that is similar to
sd_listen_fds(), but also returns the names of the fds.
systemd-activate gained the new --fdname= switch to specify a name for
testing socket activation.
This is based on #1247 by Maciej Wereski.
Fixes#1247.
- Rely everywhere that we use abs() on the error code passed in anyway,
thus don't need to explicitly negate what we pass in
- Never attach synthetic error number information to log messages. Only
log about errors we *receive* with the error number we got there,
don't log any synthetic error, that don#t even propagate, but just eat
up.
- Be more careful with attaching exactly the error we get, instead of
errno or unrelated errors randomly.
- Fix one occasion where the error number and line number got swapped.
- Make sure we never tape over OOM issues, or inability to resolve
specifiers
If set to ~ the working directory is set to the home directory of the
user configured in User=.
This change also exposes the existing switch for the working directory
that allowed making missing working directories non-fatal.
This also changes "machinectl shell" to make use of this to ensure that
the invoked shell is by default in the user's home directory.
Fixes#1268.
Tested with a dummy service running 'sleep', modifying its CPUAffinity,
restarting the service and checking the ^Cpus_allowed entries in the
/proc/PID/status file.
Some additional files related to single socket may appear in the
filesystem and they should be opened and passed to related service.
This commit adds optional list of file descriptors, which are
dynamically discovered and opened.
Add a new config directive called NetClass= to CGroup enabled units.
Allowed values are positive numbers for fix assignments and "auto" for
picking a free value automatically, for which we need to keep track of
dynamically assigned net class IDs of units. Introduce a hash table for
this, and also record the last ID that was given out, so the allocator
can start its search for the next 'hole' from there. This could
eventually be optimized with something like an irb.
The class IDs up to 65536 are considered reserved and won't be
assigned automatically by systemd. This barrier can be made a config
directive in the future.
Values set in unit files are stored in the CGroupContext of the
unit and considered read-only. The actually assigned number (which
may have been chosen dynamically) is stored in the unit itself and
is guaranteed to remain stable as long as the unit is active.
In the CGroup controller, set the configured CGroup net class to
net_cls.classid. Multiple unit may share the same net class ID,
and those which do are linked together.
Let's stop using the "unsigned long" type for weights/shares, and let's
just use uint64_t for this, as that's what we expose on the bus.
Unify parsers, and always validate the range for these fields.
Correct the default blockio weight to 500, since that's what the kernel
actually uses.
When parsing the weight/shares settings from unit files accept the empty
string as a way to reset the weight/shares value. When getting it via
the bus, uniformly map (uint64_t) -1 to unset.
Open up StartupCPUShares= and StartupBlockIOWeight= to transient units.
This adds support for the new "pids" cgroup controller of 4.3 kernels.
It allows accounting the number of tasks in a cgroup and enforcing
limits on it.
This adds two new setting TasksAccounting= and TasksMax= to each unit,
as well as a gloabl option DefaultTasksAccounting=.
This also updated "cgtop" to optionally make use of the new
kernel-provided accounting.
systemctl has been updated to show the number of tasks for each service
if it is available.
This patch also adds correct support for undoing memory limits for units
using a MemoryLimit=infinity syntax. We do the same for TasksMax= now
and hence keep things in sync here.
off_t is a really weird type as it is usually 64bit these days (at least
in sane programs), but could theoretically be 32bit. We don't support
off_t as 32bit builds though, but still constantly deal with safely
converting from off_t to other types and back for no point.
Hence, never use the type anymore. Always use uint64_t instead. This has
various benefits, including that we can expose these values directly as
D-Bus properties, and also that the values parse the same in all cases.
.nspawn fiels are simple settings files that may accompany container
images and directories and contain settings otherwise passed on the
nspawn command line. This provides an efficient way to attach execution
data directly to containers.
We store the properties for transient units in drop-ins anyway, and
units don't have to have fragment files, hence don't bother with them,
and don't create them.
When generating utmp/wtmp entries, optionally add both LOGIN_PROCESS and
INIT_PROCESS entries or even all three of LOGIN_PROCESS, INIT_PROCESS
and USER_PROCESS entries, instead of just a single INIT_PROCESS entry.
With this change systemd may be used to not only invoke a getty directly
in a SysV-compliant way but alternatively also a login(1) implementation
or even forego getty and login entirely, and invoke arbitrary shells in
a way that they appear in who(1) or w(1).
This is preparation for a later commit that adds a "machinectl shell"
operation to invoke a shell in a container, in a way that is compatible
with who(1) and w(1).
This is so that, when called in a loop, unquote_first_word can
distinguish between reaching the end of a string because it has consumed
all the input before the end, and consuming all the input.
This is important because we later add a flag that allows
char *in = "";
char *out;
unquote_first_word(&in, &out, flags);
To put "" in out, and set in = NULL, so the trailing empty string of the
input can be consumed, and mark that the input has been consumed.
This is consistent with how an empty string works in an ExecStart=
statement. We should not differentiate between an empty string and
whitespace only (since they look the same.)
Update the test case with whitespace only to reflect that the list is
reset.
Tested that `test-unit-file` passes and other test cases are not
affected. Installed the patched systemd binaries on a machine, booted
it, looked for out of the usual behavior but did not find any.
Convert config_parse_exec() from using FOREACH_WORD_QUOTED into a loop
of unquote_first_word.
Loop through the arguments only once (the FOREACH_WORD_QUOTED
implementation did it twice, once to count them and another time to
process and store them.)
Use newly introduced flag UNQUOTE_UNESCAPE_RELAX to preserve
unrecognized escape sequences such as regexps matches such as "\w",
"\d", etc. (Valid escape sequences such as "\s" or "\b" still need an
extra backslash if literals are desired for regexps.)
Differences in behavior:
- Handle ; (command separator) in special, so that only ; on its own is
valid for that purpose, an quoted semicolon ";" or ';' will now behave
as a literal semicolon. This is probably what was initially intended.
- Handle \; (to introduce a literal semicolon) in special, so that only \;
is turned into a semicolon but not \\; or "\\;" or "\;" which are kept
as a literal \; in the output. This is probably what was initially
intended.
Known issues:
- Using an empty string (for example, ExecStartPre=<empty>) will empty
the list and remove the existing commands, but using whitespace only
(for example, ExecStartPre=<spaces>) will not. This is a pre-existing
issue and will be dealt with in a follow up commit.
Tested:
- Unit tests passing. Also `make distcheck` still works as expected.
- Installed it on a local machine and booted with it, checked console
output, systemctl and journalctl output, did not notice any issues
running the patched systemd binaries.
Relevant bug: https://bugs.freedesktop.org/show_bug.cgi?id=90794
The cunescape() helper function used to handle unknown escaping sequences
gracefully by copying them over verbatim.
Commit 527b7a42 ("util: rework cunescape(), improve error handling") added
a flag to make that behavior optional, and changed to default to error out
with -EINVAL otherwise.
However, config_parse_exec(), which is used to parse the
Exec{Start,Stop}{Post,Pre,} directives of unit files, was not changed along
with that commit, which means that directives with improperly escaped
command line strings are no longer parsed.
Relevant bugreports include:
https://bugs.freedesktop.org/show_bug.cgi?id=90794https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=787256
Fix this by passing UNESCAPE_RELAX to config_parse_exec() in order to
restore the original behavior.
Given that socket_address_parse() is mostly a "library" call it
shouldn't log on its own, but leave that to its caller.
This patch removes logging from the call in case IPv6 is not available
but and IPv6 address shall be parsed. Instead a new call
socket_address_parse_and_warn() is introduced which first invokes
socket_address_parse() and then logs if necessary.
This should fix "make check" on ipv6-less kernels:
http://lists.freedesktop.org/archives/systemd-devel/2015-April/031385.html
An Exec*= line with whitespace after modifiers, like
ExecStart=- /bin/true
is considered to have an empty command path. This is as specified, but causes
systemd to crash with
Assertion 'skip < l' failed at ../src/core/load-fragment.c:607, function config_parse_exec(). Aborting.
Aborted (core dumped)
Fix this by logging an error instead and ignoring the invalid line.
Add corresponding test cases. Also add a test case for a completely empty value
which resets the command list.
https://launchpad.net/bugs/1454173
It's primarily just a property of the Manager object after all, and we
try to refer to PID 1 as "manager" instead of "systemd", hence let's to
stick to this here too.
A variety of changes:
- Make sure all our calls distuingish OOM from other errors if OOM is
not the only error possible.
- Be much stricter when parsing escaped paths, do not accept trailing or
leading escaped slashes.
- Change unit validation to take a bit mask for allowing plain names,
instance names or template names or an combination thereof.
- Refuse manipulating invalid unit name
Change cunescape() to return a normal error code, so that we can
distuingish OOM errors from parse errors.
This also adds a flags parameter to control whether "relaxed" or normal
parsing shall be done. If set no parse failures are generated, and the
only reason why cunescape() can fail is OOM.
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
The handling of the command name and other arguments is unified. This
simplifies things and should make them more predictable for users.
Incidentally, this makes ExecStart handling match the .desktop file
specification, apart for the requirment for an absolute path.
https://bugs.freedesktop.org/show_bug.cgi?id=86171
Also, rename filename_is_safe() to filename_is_valid(), since it
actually does a full validation for what the kernel will accept as file
name, it's not just a heuristic.
In service file, if the file has some of special SMACK label in
ExecStart= and systemd has no permission for the special SMACK label
then permission error will occurred. To resolve this, systemd should
be able to set its SMACK label to something accessible of ExecStart=.
So introduce new SmackProcessLabel. If label is specified with
SmackProcessLabel= then the child systemd will set its label to
that. To successfully execute the ExecStart=, accessible label should
be specified with SmackProcessLabel=.
Additionally, by SMACK policy, if the file in ExecStart= has no
SMACK64EXEC then the executed process will have given label by
SmackProcessLabel=. But if the file has SMACK64EXEC then the
SMACK64EXEC label will be overridden.
[zj: reword man page]
In order to make object destruction easier (in particular in combination
with _cleanup_) we usually make destructors deal with NULL objects as
NOPs. Change the calendar spec destructor to follow the same scheme.
It is redundant to store 'hash' and 'compare' function pointers in
struct Hashmap separately. The functions always comprise a pair.
Store a single pointer to struct hash_ops instead.
systemd keeps hundreds of hashmaps, so this saves a little bit of
memory.
Add a new directive called BusPolicy to define custom endpoint policies. If
one such directive is given, an endpoint object in the service's ExecContext is
created and the given policy is added to it.
String which ended in an unfinished quote were accepted, potentially
with bad memory accesses.
Reject anything which ends in a unfished quote, or contains
non-whitespace characters right after the closing quote.
_FOREACH_WORD now returns the invalid character in *state. But this return
value is not checked anywhere yet.
Also, make 'word' and 'state' variables const pointers, and rename 'w'
to 'word' in various places. Things are easier to read if the same name
is used consistently.
mbiebl_> am I correct that something like this doesn't work
mbiebl_> ExecStart=/usr/bin/encfs --extpass='/bin/systemd-ask-passwd "Unlock EncFS"'
mbiebl_> systemd seems to strip of the quotes
mbiebl_> systemctl status shows
mbiebl_> ExecStart=/usr/bin/encfs --extpass='/bin/systemd-ask-password Unlock EncFS $RootDir $MountPoint
mbiebl_> which is pretty weird
Special care is needed so that we get an error message if the
file failed to parse, but not when it is missing. To avoid duplicating
the same error check in every caller, add an additional 'warn' boolean
to tell config_parse whether a message should be issued.
This makes things both shorter and more robust wrt. to error reporting.
Also, rename ProtectedHome= to ProtectHome=, to simplify things a bit.
With this in place we now have two neat options ProtectSystem= and
ProtectHome= for protecting the OS itself (and optionally its
configuration), and for protecting the user's data.
With Symlinks= we can manage one or more symlinks to AF_UNIX or FIFO
nodes in the file system, with the same lifecycle as the socket itself.
This has two benefits: first, this allows us to remove /dev/log and
/dev/initctl from /dev, thus leaving only symlinks, device nodes and
directories in the /dev tree. More importantly however, this allows us
to move /dev/log out of /dev, while still making it accessible there, so
that PrivateDevices= can provide /dev/log too.
ReadOnlySystem= uses fs namespaces to mount /usr and /boot read-only for
a service.
ProtectedHome= uses fs namespaces to mount /home and /run/user
inaccessible or read-only for a service.
This patch also enables these settings for all our long-running services.
Together they should be good building block for a minimal service
sandbox, removing the ability for services to modify the operating
system or access the user's private data.
Only accept cpu quota values in percentages, get rid of period
definition.
It's not clear whether the CFS period controllable per-cgroup even has a
future in the kernel, hence let's simplify all this, hardcode the period
to 100ms and only accept percentage based quota values.
Similar to CPUShares= and BlockIOWeight= respectively. However only
assign the specified weight during startup. Each control group
attribute is re-assigned as weight by CPUShares=weight and
BlockIOWeight=weight after startup. If not CPUShares= or
BlockIOWeight= be specified, then the attribute is re-assigned to each
default attribute value. (default cpu.shares=1024, blkio.weight=1000)
If only CPUShares=weight or BlockIOWeight=weight be specified, then
that implies StartupCPUShares=weight and StartupBlockIOWeight=weight.
tcpwrap is legacy code, that is barely maintained upstream. It's APIs
are awful, and the feature set it exposes (such as DNS and IDENT
access control) questionnable. We should not support this natively in
systemd.
Hence, let's remove the code. If people want to continue making use of
this, they can do so by plugging in "tcpd" for the processes they start.
With that scheme things are as well or badly supported as they were from
traditional inetd, hence no functionality is really lost.
safe_close() automatically becomes a NOP when a negative fd is passed,
and returns -1 unconditionally. This makes it easy to write lines like
this:
fd = safe_close(fd);
Which will close an fd if it is open, and reset the fd variable
correctly.
By making use of this new scheme we can drop a > 200 lines of code that
was required to test for non-negative fds or to reset the closed fd
variable afterwards.
There are three directives to specify bus name polices in .busname
files:
* AllowUser [username] [access]
* AllowGroup [groupname] [access]
* AllowWorld [access]
Where [access] is one of
* 'see': The user/group/world is allowed to see a name on the bus
* 'talk': The user/group/world is allowed to talk to a name
* 'own': The user/group/world is allowed to own a name
There is no user added yet in this commit.
This new unit settings allows restricting which address families are
available to processes. This is an effective way to minimize the attack
surface of services, by turning off entire network stacks for them.
This is based on seccomp, and does not work on x86-32, since seccomp
cannot filter socketcall() syscalls on that platform.
According to Wikipedia it is customary to specify hardware metrics and
transfer speeds to the basis 1000 (SI decimal), while software metrics
and physical volatile memory (RAM) sizes to the basis 1024 (IEC binary).
So far we specified everything in IEC, let's fix that and be more
true to what's otherwise customary. Since we don't want to parse "Mi"
instead of "M" we document each time what the context used is.
This permit to switch to a specific apparmor profile when starting a daemon. This
will result in a non operation if apparmor is disabled.
It also add a new build requirement on libapparmor for using this feature.
This is useful to prohibit execution of non-native processes on systems,
for example 32bit binaries on 64bit systems, this lowering the attack
service on incorrect syscall and ioctl 32→64bit mappings.
- Allow configuration of an errno error to return from blacklisted
syscalls, instead of immediately terminating a process.
- Fix parsing logic when libseccomp support is turned off
- Only keep the actual syscall set in the ExecContext, and generate the
string version only on demand.
Previously we'd open the connection in the originating namespace, which
meant most peers of the bus would not be able to make sense of the
PID/UID/... identity of us since we didn't exist in the namespace they
run in. However they require this identity for privilege decisions,
hence disallowing access to anything from the host.
Instead, when connecting to a container, create a temporary subprocess,
make it join the container's namespace and then connect from there to
the kdbus instance. This is similar to how we do it for socket
conections already.
THis also unifies the namespacing code used by machinectl and the bus
APIs.
The only problem is that libgen.h #defines basename to point to it's
own broken implementation instead of the GNU one. This can be fixed
by #undefining basename.
Pass on the line on which a section was decleared to the parsers, so they
can distinguish between multiple sections (if they chose to). Currently
no parsers take advantage of this, but a follow-up patch will do that
to distinguish
[Address]
Address=192.168.0.1/24
Label=one
[Address]
Address=192.168.0.2/24
Label=two
from
[Address]
Address=192.168.0.1/24
Label=one
Address=192.168.0.2/24
Label=two
This patch converts PID 1 to libsystemd-bus and thus drops the
dependency on libdbus. The only remaining code using libdbus is a test
case that validates our bus marshalling against libdbus' marshalling,
and this dependency can be turned off.
This patch also adds a couple of things to libsystem-bus, that are
necessary to make the port work:
- Synthesizing of "Disconnected" messages when bus connections are
severed.
- Support for attaching multiple vtables for the same interface on the
same path.
This patch also fixes the SetDefaultTarget() and GetDefaultTarget() bus
calls which used an inappropriate signature.
As a side effect we will now generate PropertiesChanged messages which
carry property contents, rather than just invalidation information.
We now treat passno as boleans in the generators, and don't need this any more. fsck itself
is able to sequentialize checks on the same local media, so in the common case the ordering
is redundant.
It is still possible to force an order by using .d fragments, in case that is desired.
The code was actually safe, because b should
never be null, because if rvalue is empty, a different
branch is taken. But we *do* check for NULL in the
loop above, so it's better to also check here for symmetry.
Previously to automatically create dependencies between mount units we
matched every mount unit agains all others resulting in O(n^2)
complexity. On setups with large amounts of mount units this might make
things slow.
This change replaces the matching code to use a hashtable that is keyed
by a path prefix, and points to a set of units that require that path to
be around. When a new mount unit is installed it is hence sufficient to
simply look up this set of units via its own file system paths to know
which units to order after itself.
This patch also changes all unit types to only create automatic mount
dependencies via the RequiresMountsFor= logic, and this is exposed to
the outside to make things more transparent.
With this change we still have some O(n) complexities in place when
handling mounts, but that's currently unavoidable due to kernel APIs,
and still substantially better than O(n^2) as before.
https://bugs.freedesktop.org/show_bug.cgi?id=69740
The cgroup attribute memory.soft_limit_in_bytes is unlikely to stay
around in the kernel for good, so let's not expose it for now. We can
readd something like it later when the kernel guys decided on a final
API for this.
Previously the specifier calls could only indicate OOM by returning
NULL. With this change they will return negative errno-style error codes
like everything else.
BlockIOReadBandwidth and BlockIOWriteBandwidth both use
config_parse_blockio_bandwidth to set up CGroupBlockIODeviceBandwidth,
We should set the read value based on the left values
in config files.
Transient units can be created via the bus API. They are configured via
the method call parameters rather than on-disk files. They are subject
to normal GC. Transient units currently may only be created for
services (however, we will extend this), and currently only ExecStart=
and the cgroup parameters can be configured (also to be extended).
Transient units require a unique name, that previously had no
configuration file on disk.
A tool systemd-run is added that makes use of this functionality to run
arbitrary command lines as transient services:
$ systemd-run /bin/ping www.heise.de
Will cause systemd to create a new transient service and run ping in it.
Replace the very generic cgroup hookup with a much simpler one. With
this change only the high-level cgroup settings remain, the ability to
set arbitrary cgroup attributes is removed, so is support for adding
units to arbitrary cgroup controllers or setting arbitrary paths for
them (especially paths that are different for the various controllers).
This also introduces a new -.slice root slice, that is the parent of
system.slice and friends. This enables easy admin configuration of
root-level cgrouo properties.
This replaces DeviceDeny= by DevicePolicy=, and implicitly adds in
/dev/null, /dev/zero and friends if DeviceAllow= is used (unless this is
turned off by DevicePolicy=).
This complements existing functionality of setting variables
through 'systemctl set-environment', the kernel command line,
and through normal environment variables for systemd in session
mode.
In order to prepare for the kernel cgroup rework, let's introduce a new
unit type to systemd, the "slice". Slices can be arranged in a tree and
are useful to partition resources freely and hierarchally by the user.
Each service unit can now be assigned to one of these slices, and later
on login users and machines may too.
Slices translate pretty directly to the cgroup hierarchy, and the
various objects can be assigned to any of the slices in the tree.
Instead of having explicit type-specific callbacks that inform the
triggering unit when a triggered unit changes state, make this generic
so that state changes are forwarded betwee any triggered and triggering
unit.
Also, get rid of UnitRef references from automount, timer, path units,
to the units they trigger and rely exclsuively on UNIT_TRIGGER type
dendencies.
The information about the unit for which files are being parsed
is passed all the way down. This way messages land in the journal
with proper UNIT=... or USER_UNIT=... attribution.
'systemctl status' and 'journalctl -u' not displaying those messages
has been a source of confusion for users, since the journal entry for
a misspelt setting was often logged quite a bit earlier than the
failure to start a unit.
Based-on-a-patch-by: Oleksii Shevchuk <alxchk@gmail.com>
Internally we store all time values in usec_t, however parse_usec()
actually was used mostly to parse values in seconds (unless explicit
units were specified to define a different unit). Hence, be clear about
this and name the function about what we pass into it, not what we get
out of it.
Previously, it would set all caps, but it should drop them all, anything
else makes little sense.
Also, document that this works as it does, and what to do in order to
assign all caps to the bounding set.
https://bugzilla.redhat.com/show_bug.cgi?id=914705
This introduces a new static list of known attributes and their special
semantics. This means that cgroup attribute values can now be
automatically translated from user to kernel notation for command line
set settings, too.
This also adds proper support for multi-line attributes.
In the x32 ABI, syscall numbers start at 0x40000000. Mask that bit on
x32 for lookups in the syscall_names array and syscall_filter and ensure
that syscall.h is parsed correctly.
[zj: added SYSCALL_TO_INDEX, INDEX_TO_SYSCALL macros.]
strncmp() could be used with size bigger then the size of the string,
because MAX was used instead of MIN.
If failing, print just the offending mount flag.
Whenever a message fails, mention the offending word, instead
of just giving the whole line. If one bad word causes just this
word to be rejected, print only the word. If one bad word causes
the whole line to be rejected, print the whole line too.
https://bugs.freedesktop.org/show_bug.cgi?id=56874
A service that only sets the scheduling policy to round-robin
fails to be started. This is because the cpu_sched_priority is
initialized to 0 and is not adjusted when the policy is changed.
Clamp the cpu_sched_priority when the scheduler policy is set. Use
the current policy to validate the new priority.
Change the manual page to state that the given range only applies
to the real-time scheduling policies.
Add a testcase that verifies this change:
$ make test-sched-prio; ./test-sched-prio
[test/sched_idle_bad.service:6] CPU scheduling priority is out of range, ignoring: 1
[test/sched_rr_bad.service:7] CPU scheduling priority is out of range, ignoring: 0
[test/sched_rr_bad.service:8] CPU scheduling priority is out of range, ignoring: 100
The behaviour of the common name##_from_string conversion is surprising.
It accepts not only the strings from name##_table but also any number
that falls within the range of the table. The order of items in most of
our tables is an internal affair. It should not be visible to the user.
I know of a case where the surprising numeric conversion leads to a crash.
We will allow the direct numeric conversion only for the tables where the
mapping of strings to numeric values has an external meaning. This holds
for the following lookup tables:
- netlink_family, ioprio_class, ip_tos, sched_policy - their numeric
values are stable as they are defined by the Linux kernel interface.
- log_level, log_facility_unshifted - the well-known syslog interface.
We allow the user to use numeric values whose string names systemd does
not know. For instance, the user may want to test a new kernel featuring
a scheduling policy that did not exist when his systemd version was
released. A slightly unpleasant effect of this is that the
name##_to_string conversion cannot return pointers to constant strings
anymore. The strings have to be allocated on demand and freed by the
caller.
Add specifier expansion to Path and String conditions.
Specifier expansion for conditions will help create instance
and user session units by allowing us to template conditions
based on the instance or user session parameters.
An example would be a system-wide user session service file
that conditionally runs based on whether a user has the
service configured through a configuration file in ~/.config/.
There is no point in clearing the bits of a "struct stat" when the very
next statement just calls stat or fstat to fill in that same memory.
[zj: two more places]
Names= is a source of errors, simply because alias names specified like
this only become relevant after a unit has been loaded but cannot be
used to load a unit.
Let's get rid of the confusion and drop this field. To establish alias
names peope should use symlinks, which have the the benefit of being
useful as key to load a unit, even though they are not taken into
account if unit names are listed but they haven't been explicitly
referenced before.
This also ensures that caps dropped from the bounding set are also
dropped from the inheritable set, to be extra-secure. Usually that should
change very little though as the inheritable set is empty for all our uses
anyway.
UnitPath= is also writable via native units and may be used by generators
to clarify from which file a unit is generated. This patch also hooks up
the cryptsetup and fstab generators to set UnitPath= accordingly.
This should help making the boot process a bit easier to explore and
understand for the administrator. The simple idea is that "systemctl
status" now shows a link to documentation alongside the other status and
decriptionary information of a service.
This patch adds the necessary fields to all our shipped units if we have
proper documentation for them.
RequiresMountsFor= is a shortcut for adding requires and after
dependencies to all mount units neeed for the specified paths.
This solves a couple of issues regarding dep loop cycles for encrypted
swap.
We finally got the OK from all contributors with non-trivial commits to
relicense systemd from GPL2+ to LGPL2.1+.
Some udev bits continue to be GPL2+ for now, but we are looking into
relicensing them too, to allow free copy/paste of all code within
systemd.
The bits that used to be MIT continue to be MIT.
The big benefit of the relicensing is that closed source code may now
link against libsystemd-login.so and friends.