Commit Graph

67 Commits

Author SHA1 Message Date
Torstein Husebø 61233823aa treewide: fix typos and remove accidental repetition of words 2016-07-11 16:18:43 +02:00
Lennart Poettering f4170c671b execute: add a new easy-to-use RestrictRealtime= option to units
It takes a boolean value. If true, access to SCHED_RR, SCHED_FIFO and
SCHED_DEADLINE is blocked, which my be used to lock up the system.
2016-06-23 01:45:45 +02:00
Alessandro Puccetti cf677fe686 core/execute: add the magic character '!' to allow privileged execution (#3493)
This patch implements the new magic character '!'. By putting '!' in front
of a command, systemd executes it with full privileges ignoring paramters
such as User, Group, SupplementaryGroups, CapabilityBoundingSet,
AmbientCapabilities, SecureBits, SystemCallFilter, SELinuxContext,
AppArmorProfile, SmackProcessLabel, and RestrictAddressFamilies.

Fixes partially https://github.com/systemd/systemd/issues/3414
Related to https://github.com/coreos/rkt/issues/2482

Testing:
1. Create a user 'bob'
2. Create the unit file /etc/systemd/system/exec-perm.service
   (You can use the example below)
3. sudo systemctl start ext-perm.service
4. Verify that the commands starting with '!' were not executed as bob,
   4.1 Looking to the output of ls -l /tmp/exec-perm
   4.2 Each file contains the result of the id command.

`````````````````````````````````````````````````````````````````
[Unit]
Description=ext-perm

[Service]
Type=oneshot
TimeoutStartSec=0
User=bob
ExecStartPre=!/usr/bin/sh -c "/usr/bin/rm /tmp/exec-perm*" ;
    /usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start-pre"
ExecStart=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start" ;
    !/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-star-2"
ExecStartPost=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start-post"
ExecReload=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-reload"
ExecStop=!/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-stop"
ExecStopPost=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-stop-post"

[Install]
WantedBy=multi-user.target]
`````````````````````````````````````````````````````````````````
2016-06-10 18:19:54 +02:00
Topi Miettinen f3e4363593 core: Restrict mmap and mprotect with PAGE_WRITE|PAGE_EXEC (#3319) (#3379)
New exec boolean MemoryDenyWriteExecute, when set, installs
a seccomp filter to reject mmap(2) with PAGE_WRITE|PAGE_EXEC
and mprotect(2) with PAGE_EXEC.
2016-06-03 17:58:18 +02:00
Lennart Poettering 479050b363 core: drop Capabilities= setting
The setting is hardly useful (since its effect is generally reduced to zero due
to file system caps), and with the advent of ambient caps an actually useful
replacement exists, hence let's get rid of this.

I am pretty sure this was unused and our man page already recommended against
its use, hence this should be a safe thing to remove.
2016-02-13 11:59:34 +01:00
Daniel Mack 9ca6ff50ab Remove kdbus custom endpoint support
This feature will not be used anytime soon, so remove a bit of cruft.

The BusPolicy= config directive will stay around as compat noop.
2016-02-11 22:12:04 +01:00
Daniel Mack b26fa1a2fb tree-wide: remove Emacs lines from all files
This should be handled fine now by .dir-locals.el, so need to carry that
stuff in every file.
2016-02-10 13:41:57 +01:00
Lennart Poettering 1e22b5cda0 core: don't reset /dev/console if stdin/stdout/stderr as passed as fd in a transient service
Otherwise we might end resetting /dev/console all the time when a transient service starts or stops.

Fixes #2377
Fixes #2198
Fixes #2061
2016-01-28 16:25:39 +01:00
Ismo Puustinen 755d4b67a4 capabilities: added support for ambient capabilities.
This patch adds support for ambient capabilities in service files. The
idea with ambient capabilities is that the execed processes can run with
non-root user and get some inherited capabilities, without having any
need to add the capabilities to the executable file.

You need at least Linux 4.3 to use ambient capabilities. SecureBit
keep-caps is automatically added when you use ambient capabilities and
wish to change the user.

An example system service file might look like this:

[Unit]
Description=Service for testing caps

[Service]
ExecStart=/usr/bin/sleep 10000
User=nobody
AmbientCapabilities=CAP_NET_ADMIN CAP_NET_RAW

After starting the service it has these capabilities:

CapInh: 0000000000003000
CapPrm: 0000000000003000
CapEff: 0000000000003000
CapBnd: 0000003fffffffff
CapAmb: 0000000000003000
2016-01-12 12:14:50 +02:00
Ismo Puustinen a103496ca5 capabilities: keep bounding set in non-inverted format.
Change the capability bounding set parser and logic so that the bounding
set is kept as a positive set internally. This means that the set
reflects those capabilities that we want to keep instead of drop.
2016-01-12 12:14:50 +02:00
Thomas Hindoe Paaboel Andersen 71d35b6b55 tree-wide: sort includes in *.h
This is a continuation of the previous include sort patch, which
only sorted for .c files.
2015-11-18 23:09:02 +01:00
Filipe Brandenburger b4c14404b3 execute: Add new PassEnvironment= directive
This directive allows passing environment variables from the system
manager to spawned services. Variables in the system manager can be set
inside a container by passing `--set-env=...` options to systemd-spawn.

Tested with an on-disk test.service unit. Tested using multiple variable
names on a single line, with an empty setting to clear the current list
of variables, with non-existing variables.

Tested using `systemd-run -p PassEnvironment=VARNAME` to confirm it
works with transient units.

Confirmed that `systemctl show` will display the PassEnvironment
settings.

Checked that man pages are generated correctly.

No regressions in `make check`.
2015-11-11 07:55:23 -08:00
Lennart Poettering a34ceba66f core: add support for setting stdin/stdout/stderr for transient services
When starting a transient service, allow setting stdin/stdout/stderr fds
for it, by passing them in via the bus.

This also simplifies some of the serialization code for units.
2015-10-08 12:55:15 +02:00
Lennart Poettering 8dd4c05b54 core: add support for naming file descriptors passed using socket activation
This adds support for naming file descriptors passed using socket
activation. The names are passed in a new $LISTEN_FDNAMES= environment
variable, that matches the existign $LISTEN_FDS= one and contains a
colon-separated list of names.

This also adds support for naming fds submitted to the per-service fd
store using FDNAME= in the sd_notify() message.

This also adds a new FileDescriptorName= setting for socket unit files
to set the name for fds created by socket units.

This also adds a new call sd_listen_fds_with_names(), that is similar to
sd_listen_fds(), but also returns the names of the fds.

systemd-activate gained the new --fdname= switch to specify a name for
testing socket activation.

This is based on #1247 by Maciej Wereski.

Fixes #1247.
2015-10-06 11:52:48 +02:00
Lennart Poettering 5f5d8eab1f core: allow setting WorkingDirectory= to the special value ~
If set to ~ the working directory is set to the home directory of the
user configured in User=.

This change also exposes the existing switch for the working directory
that allowed making missing working directories non-fatal.

This also changes "machinectl shell" to make use of this to ensure that
the invoked shell is by default in the user's home directory.

Fixes #1268.
2015-09-29 21:55:51 +02:00
Lennart Poettering efdb02375b core: unified cgroup hierarchy support
This patch set adds full support the new unified cgroup hierarchy logic
of modern kernels.

A new kernel command line option "systemd.unified_cgroup_hierarchy=1" is
added. If specified the unified hierarchy is mounted to /sys/fs/cgroup
instead of a tmpfs. No further hierarchies are mounted. The kernel
command line option defaults to off. We can turn it on by default as
soon as the kernel's APIs regarding this are stabilized (but even then
downstream distros might want to turn this off, as this will break any
tools that access cgroupfs directly).

It is possibly to choose for each boot individually whether the unified
or the legacy hierarchy is used. nspawn will by default provide the
legacy hierarchy to containers if the host is using it, and the unified
otherwise. However it is possible to run containers with the unified
hierarchy on a legacy host and vice versa, by setting the
$UNIFIED_CGROUP_HIERARCHY environment variable for nspawn to 1 or 0,
respectively.

The unified hierarchy provides reliable cgroup empty notifications for
the first time, via inotify. To make use of this we maintain one
manager-wide inotify fd, and each cgroup to it.

This patch also removes cg_delete() which is unused now.

On kernel 4.2 only the "memory" controller is compatible with the
unified hierarchy, hence that's the only controller systemd exposes when
booted in unified heirarchy mode.

This introduces a new enum for enumerating supported controllers, plus a
related enum for the mask bits mapping to it. The core is changed to
make use of this everywhere.

This moves PID 1 into a new "init.scope" implicit scope unit in the root
slice. This is necessary since on the unified hierarchy cgroups may
either contain subgroups or processes but not both. PID 1 hence has to
move out of the root cgroup (strictly speaking the root cgroup is the
only one where processes and subgroups are still allowed, but in order
to support containers nicey, we move PID 1 into the new scope in all
cases.) This new unit is also used on legacy hierarchy setups. It's
actually pretty useful on all systems, as it can then be used to filter
journal messages coming from PID 1, and so on.

The root slice ("-.slice") is now implicitly created and started (and
does not require a unit file on disk anymore), since
that's where "init.scope" is located and the slice needs to be started
before the scope can.

To check whether we are in unified or legacy hierarchy mode we use
statfs() on /sys/fs/cgroup. If the .f_type field reports tmpfs we are in
legacy mode, if it reports cgroupfs we are in unified mode.

This patch set carefuly makes sure that cgls and cgtop continue to work
as desired.

When invoking nspawn as a service it will implicitly create two
subcgroups in the cgroup it is using, one to move the nspawn process
into, the other to move the actual container processes into. This is
done because of the requirement that cgroups may either contain
processes or other subgroups.
2015-09-01 23:52:27 +02:00
Thomas Hindoe Paaboel Andersen 2307f37e46 execute: make the invalid entry of the enum -1
Set _EXEC_UTMP_MODE_INVALID to -1. This matches the return value from
string_table_lookup.
2015-08-25 21:15:54 +02:00
Lennart Poettering 023a4f6701 core: optionally create LOGIN_PROCESS or USER_PROCESS utmp entries
When generating utmp/wtmp entries, optionally add both LOGIN_PROCESS and
INIT_PROCESS entries or even all three of LOGIN_PROCESS, INIT_PROCESS
and USER_PROCESS entries, instead of just a single INIT_PROCESS entry.

With this change systemd may be used to not only invoke a getty directly
in a SysV-compliant way but alternatively also a login(1) implementation
or even forego getty and login entirely, and invoke arbitrary shells in
a way that they appear in who(1) or w(1).

This is preparation for a later commit that adds a "machinectl shell"
operation to invoke a shell in a container, in a way that is compatible
with who(1) and w(1).
2015-08-24 22:46:45 +02:00
Dimitri John Ledkov f00929ad62 Default to /usr/bin/u?mount, configurable, rather than hard-coded /bin/u?mount. 2015-05-13 15:48:28 +02:00
Lennart Poettering f2341e0a87 core,network: major per-object logging rework
This changes log_unit_info() (and friends) to take a real Unit* object
insted of just a unit name as parameter. The call will now prefix all
logged messages with the unit name, thus allowing the unit name to be
dropped from the various passed romat strings, simplifying invocations
drastically, and unifying log output across messages. Also, UNIT= vs.
USER_UNIT= is now derived from the Manager object attached to the Unit
object, instead of getpid(). This has the benefit of correcting the
field for --test runs.

Also contains a couple of other logging improvements:

- Drops a couple of strerror() invocations in favour of using %m.

- Not only .mount units now warn if a symlinks exist for the mount
  point already, .automount units do that too, now.

- A few invocations of log_struct() that didn't actually pass any
  additional structured data have been replaced by simpler invocations
  of log_unit_info() and friends.

- For structured data a new LOG_UNIT_MESSAGE() macro has been added,
  that works like LOG_MESSAGE() but prefixes the message with the unit
  name. Similar, there's now LOG_LINK_MESSAGE() and
  LOG_NETDEV_MESSAGE().

- For structured data new LOG_UNIT_ID(), LOG_LINK_INTERFACE(),
  LOG_NETDEV_INTERFACE() macros have been added that generate the
  necessary per object fields. The old log_unit_struct() call has been
  removed in favour of these new macros used in raw log_struct()
  invocations. In addition to removing one more function call this
  allows generated structured log messages that contain two object
  fields, as necessary for example for network interfaces that are
  joined into another network interface, and whose messages shall be
  indexed by both.

- The LOG_ERRNO() macro has been removed, in favour of
  log_struct_errno(). The latter has the benefit of ensuring that %m in
  format strings is properly resolved to the specified error number.

- A number of logging messages have been converted to use
  log_unit_info() instead of log_info()

- The client code in sysv-generator no longer #includes core code from
  src/core/.

- log_unit_full_errno() has been removed, log_unit_full() instead takes
  an errno now, too.

- log_unit_info(), log_link_info(), log_netdev_info() and friends, now
  avoid double evaluation of their parameters
2015-05-11 22:24:45 +02:00
Thomas Hindoe Paaboel Andersen 2eec67acbb remove unused includes
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
2015-02-23 23:53:42 +01:00
Thomas Hindoe Paaboel Andersen c1ff5570f4 Add missing includes in header files
This fixes various issues found by globally reordering the include
sections of all .c files.
2015-02-12 20:44:32 +01:00
Lennart Poettering 4c08c8242a core: don't fail to run services in --user instances if $HOME is missing
Otherwise we cannot even invoke systemd-exit.service anymore, thus not
even exit.

https://bugs.freedesktop.org/show_bug.cgi?id=83100
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=759320
2015-02-12 12:21:16 +01:00
Zbigniew Jędrzejewski-Szmek f1acf85a36 core: make exec_command_free_list return NULL 2014-12-18 19:26:21 -05:00
WaLyong Cho 2ca620c4ed smack: introduce new SmackProcessLabel option
In service file, if the file has some of special SMACK label in
ExecStart= and systemd has no permission for the special SMACK label
then permission error will occurred. To resolve this, systemd should
be able to set its SMACK label to something accessible of ExecStart=.
So introduce new SmackProcessLabel. If label is specified with
SmackProcessLabel= then the child systemd will set its label to
that. To successfully execute the ExecStart=, accessible label should
be specified with SmackProcessLabel=.
Additionally, by SMACK policy, if the file in ExecStart= has no
SMACK64EXEC then the executed process will have given label by
SmackProcessLabel=. But if the file has SMACK64EXEC then the
SMACK64EXEC label will be overridden.

[zj: reword man page]
2014-11-24 10:20:53 -05:00
Lennart Poettering a931ad47a8 core: introduce new Delegate=yes/no property controlling creation of cgroup subhierarchies
For priviliged units this resource control property ensures that the
processes have all controllers systemd manages enabled.

For unpriviliged services (those with User= set) this ensures that
access rights to the service cgroup is granted to the user in question,
to create further subgroups. Note that this only applies to the
name=systemd hierarchy though, as access to other controllers is not
safe for unpriviliged processes.

Delegate=yes should be set for container scopes where a systemd instance
inside the container shall manage the hierarchies below its own cgroup
and have access to all controllers.

Delegate=yes should also be set for user@.service, so that systemd
--user can run, controlling its own cgroup tree.

This commit changes machined, systemd-nspawn@.service and user@.service
to set this boolean, in order to ensure that container management will
just work, and the user systemd instance can run fine.
2014-11-05 18:49:14 +01:00
Lukas Nykryn 7491ccf2cb environment: append unit_id to error messages regarding EnvironmentFile 2014-10-17 16:05:57 +02:00
Jan Synacek 86b23b07c9 swap: introduce Discard property
Process possible "discard" values from /etc/fstab.
2014-09-29 11:08:12 -04:00
Michal Sekletar 16115b0a7b socket: introduce SELinuxContextFromNet option
This makes possible to spawn service instances triggered by socket with
MLS/MCS SELinux labels which are created based on information provided by
connected peer.

Implementation of label_get_child_mls_label derived from xinetd.

Reviewed-by: Paul Moore <pmoore@redhat.com>
2014-09-19 12:32:06 +02:00
Daniel Mack e44da745d1 service: hook up custom endpoint logic
If BusPolicy= was passed, the parser function will have created
an ExecContext->bus_endpoint object, along with policy information.

In that case, create a kdbus endpoint, and pass its path name to the
namespace logic, to it will be mounted over the actual 'bus' node.

At endpoint creation time, no policy is updloaded. That is done after
fork(), through a separate call. This is necessary because we don't
know the real uid of the process earlier than that.
2014-09-08 14:15:02 +02:00
Daniel Mack bb7dd0b04a bus: add kdbus endpoint types
Add types to describe endpoints and associated policy entries,
and add a BusEndpoint instace to ExecContext.
2014-09-08 11:06:45 +02:00
Daniel Mack 9fa95f8539 exec: factor out most function arguments of exec_spawn() to ExecParameters
This way, the list of arguments to that function gets more comprehensive,
and we can get around passing lots of NULL and 0 arguments from socket.c,
swap.c and mount.c.

It also allows for splitting up the code in exec_spawn().

While at it, make ExecContext const in execute.c.
2014-09-05 12:18:57 +02:00
Lennart Poettering 3bb07b7680 Revert "socket: introduce SELinuxLabelViaNet option"
This reverts commit cf8bd44339.

Needs more discussion on the mailing list.
2014-08-19 19:16:08 +02:00
Michal Sekletar cf8bd44339 socket: introduce SELinuxLabelViaNet option
This makes possible to spawn service instances triggered by socket with
MLS/MCS SELinux labels which are created based on information provided by
connected peer.

Implementation of label_get_child_label derived from xinetd.

Reviewed-by: Paul Moore <pmoore@redhat.com>
2014-08-19 18:57:12 +02:00
Lennart Poettering 1b8689f949 core: rename ReadOnlySystem= to ProtectSystem= and add a third value for also mounting /etc read-only
Also, rename ProtectedHome= to ProtectHome=, to simplify things a bit.

With this in place we now have two neat options ProtectSystem= and
ProtectHome= for protecting the OS itself (and optionally its
configuration), and for protecting the user's data.
2014-06-04 18:12:55 +02:00
Lennart Poettering 417116f234 core: add new ReadOnlySystem= and ProtectedHome= settings for service units
ReadOnlySystem= uses fs namespaces to mount /usr and /boot read-only for
a service.

ProtectedHome= uses fs namespaces to mount /home and /run/user
inaccessible or read-only for a service.

This patch also enables these settings for all our long-running services.

Together they should be good building block for a minimal service
sandbox, removing the ability for services to modify the operating
system or access the user's private data.
2014-06-03 23:57:51 +02:00
Lennart Poettering 7f8aa67131 core: remove tcpwrap support
tcpwrap is legacy code, that is barely maintained upstream. It's APIs
are awful, and the feature set it exposes (such as DNS and IDENT
access control) questionnable. We should not support this natively in
systemd.

Hence, let's remove the code. If people want to continue making use of
this, they can do so by plugging in "tcpd" for the processes they start.
With that scheme things are as well or badly supported as they were from
traditional inetd, hence no functionality is really lost.
2014-03-24 20:07:42 +01:00
Lennart Poettering 760b9d7cba core: don't override NoNewPriviliges= from SystemCallFilter= if it is already explicitly set 2014-03-05 04:41:01 +01:00
Lennart Poettering 517d56b1d0 missing: if RLIMIT_RTTIME is not defined by the libc, then we need a new define for the max number of rlimits, too 2014-03-05 02:31:09 +01:00
Lennart Poettering e66cf1a3f9 core: introduce new RuntimeDirectory= and RuntimeDirectoryMode= unit settings
As discussed on the ML these are useful to manage runtime directories
below /run for services.
2014-03-03 17:55:32 +01:00
Lennart Poettering b64a3d86bc execute: no need to include seccomp.h from execute.h 2014-03-03 17:55:32 +01:00
Lennart Poettering 4298d0b512 core: add new RestrictAddressFamilies= switch
This new unit settings allows restricting which address families are
available to processes. This is an effective way to minimize the attack
surface of services, by turning off entire network stacks for them.

This is based on seccomp, and does not work on x86-32, since seccomp
cannot filter socketcall() syscalls on that platform.
2014-02-26 02:19:28 +01:00
Michael Scherer eef65bf3ee core: Add AppArmor profile switching
This permit to switch to a specific apparmor profile when starting a daemon. This
will result in a non operation if apparmor is disabled.
It also add a new build requirement on libapparmor for using this feature.
2014-02-21 03:44:20 +01:00
Lennart Poettering ac45f971a1 core: add Personality= option for units to set the personality for spawned processes 2014-02-19 03:27:03 +01:00
Lennart Poettering 5f8640fb62 core: store and expose SELinuxContext field normalized as bool + string 2014-02-17 16:52:52 +01:00
Lennart Poettering 57183d117a core: add SystemCallArchitectures= unit setting to allow disabling of non-native
architecture support for system calls

Also, turn system call filter bus properties into complex types instead
of concatenated strings.
2014-02-13 00:24:00 +01:00
Lennart Poettering 17df7223be core: rework syscall filter
- Allow configuration of an errno error to return from blacklisted
  syscalls, instead of immediately terminating a process.

- Fix parsing logic when libseccomp support is turned off

- Only keep the actual syscall set in the ExecContext, and generate the
  string version only on demand.
2014-02-12 18:30:36 +01:00
Ronny Chevalier c0467cf387 syscallfilter: port to libseccomp 2014-02-12 18:30:36 +01:00
Michael Scherer 7b52a628f8 exec: Add SELinuxContext configuration item
This permit to let system administrators decide of the domain of a service.
This can be used with templated units to have each service in a différent
domain ( for example, a per customer database, using MLS or anything ),
or can be used to force a non selinux enabled system (jvm, erlang, etc)
to start in a different domain for each service.
2014-02-10 13:18:16 +01:00
Lennart Poettering 7f112f50fe exec: introduce PrivateDevices= switch to provide services with a private /dev
Similar to PrivateNetwork=, PrivateTmp= introduce PrivateDevices= that
sets up a private /dev with only the API pseudo-devices like /dev/null,
/dev/zero, /dev/random, but not any physical devices in them.
2014-01-20 21:28:37 +01:00