The files are named too generically, so that they might conflict with
the upstream project headers. Hence, let's add a "-util" suffix, to
clarify that this are just our utility headers and not any official
upstream headers.
There are more than enough calls doing string manipulations to deserve
its own files, hence do something about it.
This patch also sorts the #include blocks of all files that needed to be
updated, according to the sorting suggestions from CODING_STYLE. Since
pretty much every file needs our string manipulation functions this
effectively means that most files have sorted #include blocks now.
Also touches a few unrelated include files.
Before, we'd always reset acquired terminals, which is not really
desired, as we expose a setting TTYReset= which is supposed to control
whether the TTY is reset or not. Previously that setting would only
enable a second resetting of the TTY, which is of course pointless...
Hence, move the implicit resetting out of acquire_terminal() and make
the callers do it if they need it.
When starting a transient service, allow setting stdin/stdout/stderr fds
for it, by passing them in via the bus.
This also simplifies some of the serialization code for units.
This adds support for caching harddisk passwords in the kernel keyring
if it is available, thus supporting caching without Plymouth being
around.
This is also useful for hooking up "gdm-auto-login" with the collected
boot-time harddisk password, in order to support gnome keyring
passphrase unlocking via the HDD password, if it is the same.
Any passwords added to the kernel keyring this way have a timeout of
2.5min at which time they are purged from the kernel.
This adds support for naming file descriptors passed using socket
activation. The names are passed in a new $LISTEN_FDNAMES= environment
variable, that matches the existign $LISTEN_FDS= one and contains a
colon-separated list of names.
This also adds support for naming fds submitted to the per-service fd
store using FDNAME= in the sd_notify() message.
This also adds a new FileDescriptorName= setting for socket unit files
to set the name for fds created by socket units.
This also adds a new call sd_listen_fds_with_names(), that is similar to
sd_listen_fds(), but also returns the names of the fds.
systemd-activate gained the new --fdname= switch to specify a name for
testing socket activation.
This is based on #1247 by Maciej Wereski.
Fixes#1247.
If set to ~ the working directory is set to the home directory of the
user configured in User=.
This change also exposes the existing switch for the working directory
that allowed making missing working directories non-fatal.
This also changes "machinectl shell" to make use of this to ensure that
the invoked shell is by default in the user's home directory.
Fixes#1268.
This cleans up exec_child() function by moving mac_smack_apply_pid()
and setup_pam() to the same condition block, since both of them have
the same condition (i.e params->apply_permissions). It improves
readability without changing its operation.
When 'SmackProcessLabel=' is used in user@.service file, all processes
launched in systemd user session should be labeled as the designated name
of 'SmackProcessLabel' directive. However, if systemd has its own smack
label using '--with-smack-run-label' configuration, '(sd-pam)' is
labeled as the specific name of '--with-smack-run-label'. If
'SmackProcessLabel=' is used in user@.service file without
'--with-smack-run-label' configuration, (sd-pam) is labeled as "_" since
systemd (i.e. pid=1) is labeled as "_".
This is mainly because setup_pam() function is called before applying
smack label to child process. This patch fixes it by calling setup_pam()
after setting the smack label.
If we spawn a unit with a non-empty 'PAMName=', we fork off a
child-process _inside_ the unit, known as '(sd-pam)', which watches the
session. It waits for the main-process to exit and then finishes it via
pam_close_session(3).
However, the '(sd-pam)' setup is highly asynchronous. There is no
guarantee that process gets spawned before we finish the unit setup.
Therefore, there might be a root-owned process inside of the cgroup of
the unit, thus causing cg_migrate() to error-out with EPERM.
This patch makes setup_pam() synchronous and waits for the '(sd-pam)'
setup to finish before continuing. This guarantees that setresuid(2) was
at least tried before we continue with the child setup of the real unit.
Note that if setresuid(2) fails, we already warn loudly about it. You
really must make sure that you own the passed user if using 'PAMName='.
It seems very plausible to rely on that assumption.
When Group is set in the unit, the runtime directories are owned by
this group and not the default group of the user (same for cgroup paths
and standard outputs)
Fix#1231
Turns this:
r = -errno;
log_error_errno(errno, "foo");
into this:
r = log_error_errno(errno, "foo");
and this:
r = log_error_errno(errno, "foo");
return r;
into this:
return log_error_errno(errno, "foo");
When generating utmp/wtmp entries, optionally add both LOGIN_PROCESS and
INIT_PROCESS entries or even all three of LOGIN_PROCESS, INIT_PROCESS
and USER_PROCESS entries, instead of just a single INIT_PROCESS entry.
With this change systemd may be used to not only invoke a getty directly
in a SysV-compliant way but alternatively also a login(1) implementation
or even forego getty and login entirely, and invoke arbitrary shells in
a way that they appear in who(1) or w(1).
This is preparation for a later commit that adds a "machinectl shell"
operation to invoke a shell in a container, in a way that is compatible
with who(1) and w(1).
If a service has both ExecStart= and ExecStartPost= set with
Type=simple, then it might happen that we have two children create the
runtime directory of a service (as configured with RuntimeDirectory=) at
the same time. Previously we did this with mkdir_safe() which will
create the dir only if it is missing, but if it already exists will at
least verify the access mode and ownership to match the right values.
This is problematic in this case, since it creates and then adjusts the
settings, thus it might happen that one child creates the directory with
root owner, another one then verifies it, and only afterwards the
directory ownership is fixed by the original child, while the second
child already failed.
With this change we'll now always adjust the access mode, so that we
know that it is right. In the worst case this means we adjust the
mode/ownership even though its unnecessary, but this should have no
negative effect.
https://bugzilla.redhat.com/show_bug.cgi?id=1226509
When command path has access label and no SmackProcessLabel= is not
set, default process label will be set. But if the default process
label has no rule for the access label of the command path then smack
access error will be occurred.
So, if the command path has execute label then the child have to set
its label to the same of execute label of command path instead of
default process label.
The latest consolidation cleanup of write_string_file() revealed some users
of that helper which should have used write_string_file_no_create() in the
past but didn't. Basically, all existing users that write to files in /sys
and /proc should not expect to write to a file which is not yet existant.
Merge write_string_file(), write_string_file_no_create() and
write_string_file_atomic() into write_string_file() and provide a flags mask
that allows combinations of atomic writing, newline appending and automatic
file creation. Change all users accordingly.
Similar to SmackProcessLabel=, if this configuration is set, systemd
executes processes with given SMACK label. If unit has
SmackProcessLabel=, this config is overwritten.
But, do NOT be confused with SMACK64EXEC of execute file. This default
execute process label(and also label which is set by
SmackProcessLabel=) is set fork-ed process SMACK subject label and
used to access the execute file.
If the execution file has also SMACK64EXEC, finally executed process
has SMACK64EXEC subject.
While if the execution file has no SMACK64EXEC, the executed process
has label of this config(or label which is set by
SmackProcessLabel=). Because if execution file has no SMACK64EXEC then
excuted process inherits label from caller process(in this case, the
caller is systemd).
./configure --enable/disable-kdbus can be used to set the default
behavior regarding kdbus.
If no kdbus kernel support is available, dbus-dameon will be used.
With --enable-kdbus, the kernel command line option "kdbus=0" can
be used to disable kdbus.
With --disable-kdbus, the kernel command line option "kdbus=1" is
required to enable kdbus support.
Commit 72c0a2c25 ("everywhere: port everything to sigprocmask_many()
and friends") reworked code tree-wide to use the new sigprocmask_many()
helper. In this, it caused a regression in pam_setup, because it
dropped a line to initialize the 'ss' signal mask which is later used
in sigwait().
While at it, move the variable declaration to an inner scope.
This ports a lot of manual code over to sigprocmask_many() and friends.
Also, we now consistly check for sigprocmask() failures with
assert_se(), since the call cannot realistically fail unless there's a
programming error.
Also encloses a few sd_event_add_signal() calls with (void) when we
ignore the return values for it knowingly.
Also, when the child is potentially long-running make sure to set a
death signal.
Also, ignore the result of the reset operations explicitly by casting
them to (void).
When a service is chrooted with the option RootDirectory=/opt/..., then
the options PrivateDevices, PrivateTmp, ProtectHome, ProtectSystem must
mount the directories under $RootDirectory/{dev,tmp,home,usr,boot}.
The test-ns tool can test setup_namespace() with and without chroot:
$ sudo TEST_NS_PROJECTS=/home/lennart/projects ./test-ns
$ sudo TEST_NS_CHROOT=/home/alban/debian-tree TEST_NS_PROJECTS=/home/alban/debian-tree/home/alban/Documents ./test-ns
This changes log_unit_info() (and friends) to take a real Unit* object
insted of just a unit name as parameter. The call will now prefix all
logged messages with the unit name, thus allowing the unit name to be
dropped from the various passed romat strings, simplifying invocations
drastically, and unifying log output across messages. Also, UNIT= vs.
USER_UNIT= is now derived from the Manager object attached to the Unit
object, instead of getpid(). This has the benefit of correcting the
field for --test runs.
Also contains a couple of other logging improvements:
- Drops a couple of strerror() invocations in favour of using %m.
- Not only .mount units now warn if a symlinks exist for the mount
point already, .automount units do that too, now.
- A few invocations of log_struct() that didn't actually pass any
additional structured data have been replaced by simpler invocations
of log_unit_info() and friends.
- For structured data a new LOG_UNIT_MESSAGE() macro has been added,
that works like LOG_MESSAGE() but prefixes the message with the unit
name. Similar, there's now LOG_LINK_MESSAGE() and
LOG_NETDEV_MESSAGE().
- For structured data new LOG_UNIT_ID(), LOG_LINK_INTERFACE(),
LOG_NETDEV_INTERFACE() macros have been added that generate the
necessary per object fields. The old log_unit_struct() call has been
removed in favour of these new macros used in raw log_struct()
invocations. In addition to removing one more function call this
allows generated structured log messages that contain two object
fields, as necessary for example for network interfaces that are
joined into another network interface, and whose messages shall be
indexed by both.
- The LOG_ERRNO() macro has been removed, in favour of
log_struct_errno(). The latter has the benefit of ensuring that %m in
format strings is properly resolved to the specified error number.
- A number of logging messages have been converted to use
log_unit_info() instead of log_info()
- The client code in sysv-generator no longer #includes core code from
src/core/.
- log_unit_full_errno() has been removed, log_unit_full() instead takes
an errno now, too.
- log_unit_info(), log_link_info(), log_netdev_info() and friends, now
avoid double evaluation of their parameters
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
include-what-you-use automatically does this and it makes finding
unnecessary harder to spot. The only content of poll.h is a include
of sys/poll.h so should be harmless.
After all it is now much more like strjoin() than strappend(). At the
same time, add support for NULL sentinels, even if they are normally not
necessary.
Among other things, avoid log_struct() unless we really need it.
Also, use "r" as variable to store function errors in, instead of "err".
"r" is pretty much what we use everywhere else, hence using the same
here make sense.
FInally, in the child, when we want to log, make sure to open the
logging framework first, since it is explicitly closed in preparation
for the exec().
When systemd starts a service, it first opened /run/systemd/journal/stdout
socket, and only later switched to the right user.group (if they are
specified). Later on, journald looked at the credentials, and saw
root.root, because credentials are stored at the time the socket is
opened. As a result, all messages passed over _TRANSPORT=stdout were
logged with _UID=0, _GID=0.
Drop real uid and gid temporarily to fix the issue.
We need original socket_fd around otherwise mac_selinux_get_child_mls_label
fails with -EINVAL return code. Also don't call setexeccon twice but rather pass
context value of SELinuxContext option as an extra argument.
If the format string contains %m, clearly errno must have a meaningful
value, so we might as well use log_*_errno to have ERRNO= logged.
Using:
find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\((".*%m.*")/log_\1_errno(errno, \2/'
Plus some whitespace, linewrap, and indent adjustments.
As a followup to 086891e5c1 "log: add an "error" parameter to all
low-level logging calls and intrdouce log_error_errno() as log calls
that take error numbers", use sed to convert the simple cases to use
the new macros:
find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\("(.*)%s"(.*), strerror\(-([a-zA-Z_]+)\)\);/log_\1_errno(-\4, "\2%m"\3);/'
Multi-line log_*() invocations are not covered.
And we also should add log_unit_*_errno().
- Rename log_meta() → log_internal(), to follow naming scheme of most
other log functions that are usually invoked through macros, but never
directly.
- Rename log_info_object() to log_object_info(), simply because the
object should be before any other parameters, to follow OO-style
programming style.
In service file, if the file has some of special SMACK label in
ExecStart= and systemd has no permission for the special SMACK label
then permission error will occurred. To resolve this, systemd should
be able to set its SMACK label to something accessible of ExecStart=.
So introduce new SmackProcessLabel. If label is specified with
SmackProcessLabel= then the child systemd will set its label to
that. To successfully execute the ExecStart=, accessible label should
be specified with SmackProcessLabel=.
Additionally, by SMACK policy, if the file in ExecStart= has no
SMACK64EXEC then the executed process will have given label by
SmackProcessLabel=. But if the file has SMACK64EXEC then the
SMACK64EXEC label will be overridden.
[zj: reword man page]
For priviliged units this resource control property ensures that the
processes have all controllers systemd manages enabled.
For unpriviliged services (those with User= set) this ensures that
access rights to the service cgroup is granted to the user in question,
to create further subgroups. Note that this only applies to the
name=systemd hierarchy though, as access to other controllers is not
safe for unpriviliged processes.
Delegate=yes should be set for container scopes where a systemd instance
inside the container shall manage the hierarchies below its own cgroup
and have access to all controllers.
Delegate=yes should also be set for user@.service, so that systemd
--user can run, controlling its own cgroup tree.
This commit changes machined, systemd-nspawn@.service and user@.service
to set this boolean, in order to ensure that container management will
just work, and the user systemd instance can run fine.
If we don't have privileges to setup the namespaces then we are most likely
running inside some sort of unprivileged container, hence not being able to
create namespace is not a problem because spawned service can't access host
system anyway.
Since aa_change_onexec return the error code in errno, and return
-1, the current code do not give any useful information when
something fail. This make apparmor easier to debug, as seen on
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=760526
This makes possible to spawn service instances triggered by socket with
MLS/MCS SELinux labels which are created based on information provided by
connected peer.
Implementation of label_get_child_mls_label derived from xinetd.
Reviewed-by: Paul Moore <pmoore@redhat.com>
If BusPolicy= was passed, the parser function will have created
an ExecContext->bus_endpoint object, along with policy information.
In that case, create a kdbus endpoint, and pass its path name to the
namespace logic, to it will be mounted over the actual 'bus' node.
At endpoint creation time, no policy is updloaded. That is done after
fork(), through a separate call. This is necessary because we don't
know the real uid of the process earlier than that.
If a path to a previously created custom kdbus endpoint is passed in,
bind-mount a new devtmpfs that contains a 'bus' node, which in turn in
bind-mounted with the custom endpoint. This tmpfs then mounted over the
kdbus subtree that refers to the current bus.
This way, we can fake the bus node in order to lock down services with
a kdbus custom endpoint policy.
This factors out one conditional branch that has grown way too big, and
makes the code more readable by using return statements rather than jump
labels.
This way, the list of arguments to that function gets more comprehensive,
and we can get around passing lots of NULL and 0 arguments from socket.c,
swap.c and mount.c.
It also allows for splitting up the code in exec_spawn().
While at it, make ExecContext const in execute.c.
This makes possible to spawn service instances triggered by socket with
MLS/MCS SELinux labels which are created based on information provided by
connected peer.
Implementation of label_get_child_label derived from xinetd.
Reviewed-by: Paul Moore <pmoore@redhat.com>
A new tool "systemd-firstboot" can be used either interactively on boot,
where it will query basic locale, timezone, hostname, root password
information and set it. Or it can be used non-interactively from the
command line when prepareing disk images for booting. When used
non-inertactively the tool can either copy settings from the host, or
take settings on the command line.
$ systemd-firstboot --root=/path/to/my/new/root --copy-locale --copy-root-password --hostname=waldi
The tool will be automatically invoked (interactively) now on first boot
if /etc is found unpopulated.
This also creates the infrastructure for generators to be notified via
an environment variable whether they are running on the first boot, or
not.
Also, rename ProtectedHome= to ProtectHome=, to simplify things a bit.
With this in place we now have two neat options ProtectSystem= and
ProtectHome= for protecting the OS itself (and optionally its
configuration), and for protecting the user's data.
ReadOnlySystem= uses fs namespaces to mount /usr and /boot read-only for
a service.
ProtectedHome= uses fs namespaces to mount /home and /run/user
inaccessible or read-only for a service.
This patch also enables these settings for all our long-running services.
Together they should be good building block for a minimal service
sandbox, removing the ability for services to modify the operating
system or access the user's private data.
tcpwrap is legacy code, that is barely maintained upstream. It's APIs
are awful, and the feature set it exposes (such as DNS and IDENT
access control) questionnable. We should not support this natively in
systemd.
Hence, let's remove the code. If people want to continue making use of
this, they can do so by plugging in "tcpd" for the processes they start.
With that scheme things are as well or badly supported as they were from
traditional inetd, hence no functionality is really lost.
safe_close_pair() is more like safe_close(), except that it handles
pairs of fds, and doesn't make and misleading allusion, as it works
similarly well for socketpairs() as for pipe()s...
safe_close() automatically becomes a NOP when a negative fd is passed,
and returns -1 unconditionally. This makes it easy to write lines like
this:
fd = safe_close(fd);
Which will close an fd if it is open, and reset the fd variable
correctly.
By making use of this new scheme we can drop a > 200 lines of code that
was required to test for non-negative fds or to reset the closed fd
variable afterwards.
This new unit settings allows restricting which address families are
available to processes. This is an effective way to minimize the attack
surface of services, by turning off entire network stacks for them.
This is based on seccomp, and does not work on x86-32, since seccomp
cannot filter socketcall() syscalls on that platform.
This permit to switch to a specific apparmor profile when starting a daemon. This
will result in a non operation if apparmor is disabled.
It also add a new build requirement on libapparmor for using this feature.
- Allow configuration of an errno error to return from blacklisted
syscalls, instead of immediately terminating a process.
- Fix parsing logic when libseccomp support is turned off
- Only keep the actual syscall set in the ExecContext, and generate the
string version only on demand.
Let's always call the security labels the same way:
SMACK: "Smack Label"
SELINUX: "SELinux Security Context"
And the low-level encapsulation is called "seclabel". Now let's hope we
stick to this vocabulary in future, too, and don't mix "label"s and
"security contexts" and so on wildly.
This permit to let system administrators decide of the domain of a service.
This can be used with templated units to have each service in a différent
domain ( for example, a per customer database, using MLS or anything ),
or can be used to force a non selinux enabled system (jvm, erlang, etc)
to start in a different domain for each service.
Similar to PrivateNetwork=, PrivateTmp= introduce PrivateDevices= that
sets up a private /dev with only the API pseudo-devices like /dev/null,
/dev/zero, /dev/random, but not any physical devices in them.
It is nicer to predefine patterns using configure time check instead of
using casts everywhere.
Since we do not need to use any flags, include "%" in the format instead
of excluding it like PRI* macros.
Also, introduce a new environment variable named $WATCHDOG_PID which
cotnains the PID of the process that is supposed to send the keep-alive
events. This is similar how $LISTEN_FDS and $LISTEN_PID work together,
and protects against confusing processes further down the process tree
due to inherited environment.
The only problem is that libgen.h #defines basename to point to it's
own broken implementation instead of the GNU one. This can be fixed
by #undefining basename.
Previously we did operations like attach, trim or migrate only on the
controllers that were enabled for a specific unit. With this changes we
will now do them for all supproted controllers, and fall back to all
possible prefix paths if the specified paths do not exist.
This fixes issues if a controller is being disabled for a unit where it
was previously enabled, and makes sure that all processes stay as "far
down" the tree as groups exist.
Make Type=idle communication bidirectional: when bootup is finished,
the manager, as before, signals idling Type=idle jobs to continue.
However, if the boot takes too long, idling jobs signal the manager
that they have had enough, wait a tiny bit more, and continue, taking
ownership of the console. The manager, when signalled that Type=idle
jobs are done, makes a note and will not write to the console anymore.
This is a cosmetic issue, but quite noticable, so let's just fix it.
Based on Harald Hoyer's patch.
https://bugs.freedesktop.org/show_bug.cgi?id=54247http://unix.stackexchange.com/questions/51805/systemd-messages-after-starting-login/
The affected files in this patch had inconsistent use of tabs vs. spaces
for indentation, and this patch eliminates the stray tabs.
Also, the opening brace of sigchld_hdl() in activate.c was moved so the
opening braces are consistent throughout the file.
Replace the very generic cgroup hookup with a much simpler one. With
this change only the high-level cgroup settings remain, the ability to
set arbitrary cgroup attributes is removed, so is support for adding
units to arbitrary cgroup controllers or setting arbitrary paths for
them (especially paths that are different for the various controllers).
This also introduces a new -.slice root slice, that is the parent of
system.slice and friends. This enables easy admin configuration of
root-level cgrouo properties.
This replaces DeviceDeny= by DevicePolicy=, and implicitly adds in
/dev/null, /dev/zero and friends if DeviceAllow= is used (unless this is
turned off by DevicePolicy=).
I'm assuming that it's fine if a _const_ or _pure_ function
calls assert. It is assumed that the assert won't trigger,
and even if it does, it can only trigger on the first call
with a given set of parameters, and we don't care if the
compiler moves the order of calls.
Because "export key=val" is not supported by systemd, an error is logged
where the invalid assignment is coming from.
Introduce strv_env_clean_log() to log invalid environment assignments,
where logging is possible and allowed.
parse_env_file_internal() is modified to allow WHITESPACE in keys, to
report the issues later on.
Before, we would initialize many fields twice: first
by filling the structure with zeros, and then a second
time with the real values. We can let the compiler do
the job for us, avoiding one copy.
A downside of this patch is that text gets slightly
bigger. This is because all zero() calls are effectively
inlined:
$ size build/.libs/systemd
text data bss dec hex filename
before 897737 107300 2560 1007597 f5fed build/.libs/systemd
after 897873 107300 2560 1007733 f6075 build/.libs/systemd
… actually less than 1‰.
A few asserts that the parameter is not null had to be removed. I
don't think this changes much, because first, it is quite unlikely
for the assert to fail, and second, an immediate SEGV is almost as
good as an assert.
Implement this with a proper state machine, so that newlines and
escaped chars can appear in string assignments. This should bring the
parser much closer to shell.
Currently, PrivateTmp=yes means that the service cannot see the /tmp
shared by rest of the system and is isolated from other services using
PrivateTmp, but users can access and modify /tmp as seen by the
service.
Move the private /tmp and /var/tmp directories into a 0077-mode
directory. This way unpriviledged users on the system cannot see (or
modify) /tmp as seen by the service.
All Execs within the service, will get mounted the same
/tmp and /var/tmp directories, if service is configured with
PrivateTmp=yes. Temporary directories are cleaned up by service
itself in addition to systemd-tmpfiles. Directory which is mounted
as inaccessible is created at runtime in /run/systemd.
Similar to already existing is_terminal_input().
Note that the only current user (connect_logger_as) is never called
for EXEC_OUTPUT_TTY, so it won't mind whether we accept it.
journald is supposed to work. Failure to connect to its socket implies
losing messages. It should be a very unusual event. Log the failure with
LOG_CRIT.
Just because this unit's stdout/stderr failed to connect to the journal
does not necessarily mean that we shouldn't try to log the failure using
a structured entry, so let's use log_struct_unit.
Almost every unit logs to the journal. If journald gets a permanent
failure, units would not be able to start (exit code 209/STDOUT).
Add a fallback to /dev/null to avoid making the system entirely
unusable in such a case.
Now, actually check if the environment variable names and values used
are valid, before accepting them. With this in place are at some places
more rigid than POSIX, and less rigid at others. For example, this code
allows lower-case environment variables (which POSIX suggests not to
use), but it will not allow non-UTF8 variable values.
All in all this should be a good middle ground of what to allow and what
not to allow as environment variables.
(This also splits out all environment related calls into env-util.[ch])
In the x32 ABI, syscall numbers start at 0x40000000. Mask that bit on
x32 for lookups in the syscall_names array and syscall_filter and ensure
that syscall.h is parsed correctly.
[zj: added SYSCALL_TO_INDEX, INDEX_TO_SYSCALL macros.]
The behaviour of the common name##_from_string conversion is surprising.
It accepts not only the strings from name##_table but also any number
that falls within the range of the table. The order of items in most of
our tables is an internal affair. It should not be visible to the user.
I know of a case where the surprising numeric conversion leads to a crash.
We will allow the direct numeric conversion only for the tables where the
mapping of strings to numeric values has an external meaning. This holds
for the following lookup tables:
- netlink_family, ioprio_class, ip_tos, sched_policy - their numeric
values are stable as they are defined by the Linux kernel interface.
- log_level, log_facility_unshifted - the well-known syslog interface.
We allow the user to use numeric values whose string names systemd does
not know. For instance, the user may want to test a new kernel featuring
a scheduling policy that did not exist when his systemd version was
released. A slightly unpleasant effect of this is that the
name##_to_string conversion cannot return pointers to constant strings
anymore. The strings have to be allocated on demand and freed by the
caller.
- don't use pivot_root() anymore, just reuse root hierarchy
- first create all mounts, then mark them read-only so that we get the
right behaviour when people want writable mounts inside of
read-only mounts
- don't pass invalid combinations of MS_ constants to the kernel
This adds a timeout if the TTY cannot be acquired and makes sure we
always output the question to the console, never to the TTY of the
respective service.
As described in
https://bugs.freedesktop.org/show_bug.cgi?id=50184
the journal currently doesn't set fields such as _SYSTEMD_UNIT
properly for messages coming from processes that have already
terminated. This means among other things that "systemctl status" may
not show some of the output of services that wrote messages just
before they exited.
This patch fixes this by having processes that log to the journal
write their unit identifier to journald when the connection to
/run/systemd/journal/stdout is opened. Journald stores the unit ID
and uses it to fill in _SYSTEMD_UNIT when it cannot be obtained
normally (i.e. from the cgroup). To prevent impersonating another
unit, this information is only used when the caller is root.
This doesn't fix the general problem of getting metadata about
messages from terminated processes (which requires some kernel
support), but it allows "systemctl status" and similar queries to do
the Right Thing for units that log via stdout/stderr.
This also ensures that caps dropped from the bounding set are also
dropped from the inheritable set, to be extra-secure. Usually that should
change very little though as the inheritable set is empty for all our uses
anyway.
We want to avoid a deadlock when a service has ExecStartPre= programs
that wait for the job queue to run empty because of Type=idle, but which
themselves keep the queue non-empty because START_PRE was considered
ACTIVATING and hence the job not complete. With this patch we alter the
state translation table so that it is impossible ever to wait for
Type=idle unit, hence removing the deadlock.
The PAM helper thread needs to capture the death signal from the
parent, but is prohibited from doing so since when the child dies
as normal user, the kernel won't allow it to send a TERM to the
PAM helper thread which is running as root.
This causes the PAM threads to never exit, accumulating after
user sessions exit.
There is however really no need to keep the PAM threads running as
root, so, we can just setresuid() to the same user as defined in the
unit file for the parent thread (User=). This makes the TERM signal
arrive as normal. In case setresuid() fails, we ignore the error, so
we at least fall back to the current behaviour.
Type=idle is much like Type=simple, however between the fork() and the
exec() in the child we wait until PID 1 informs us that no jobs are
left.
This is mostly a cosmetic fix to make gettys appear only after all boot
output is finished and complete.
Note that this does not impact the normal job logic as we do not delay
the completion of any jobs. We just delay the invocation of the actual
binary, and only for services that otherwise would be of Type=simple.
Previously, we were brutally and onconditionally killing all processes
in a service's cgroup before starting the service anew, in order to
ensure that StartPre lines cannot be misused to spawn long-running
processes.
On logind-less systems this has the effect that restarting sshd
necessarily calls all active ssh sessions, which is usually not
desirable.
With this patch control processes for a service are placed in a
sub-cgroup called "control/". When starting a service anew we simply
kill this cgroup, but not the main cgroup, in order to avoid killing any
long-running non-control processes from previous runs.
https://bugzilla.redhat.com/show_bug.cgi?id=805942
We finally got the OK from all contributors with non-trivial commits to
relicense systemd from GPL2+ to LGPL2.1+.
Some udev bits continue to be GPL2+ for now, but we are looking into
relicensing them too, to allow free copy/paste of all code within
systemd.
The bits that used to be MIT continue to be MIT.
The big benefit of the relicensing is that closed source code may now
link against libsystemd-login.so and friends.