Commit graph

628 commits

Author SHA1 Message Date
Lennart Poettering 31ce987c2b rlimit-util: add a common destructor call for arrays of struct rlimit 2018-05-17 20:36:52 +02:00
Lennart Poettering 6550c24c7f rlimit-util: rework rlimit_{from|to}_string() to work without "Limit" prefix
let's make the call more generic, so that we can also easily use it for
parsing "RLIMIT_xyz" style constants.
2018-05-17 20:36:52 +02:00
Felipe Sateler 57b7a260c2 core: undo the dependency inversion between unit.h and all unit types 2018-05-15 14:24:34 -04:00
Yu Watanabe 130d3d22e9 tree-wide: use strv_free_and_replace() macro 2018-05-10 00:57:34 +09:00
Yu Watanabe aa9d574de9 load-fragment: allow to specify RestrictNamespaces= multiple times
If multiple RestrictNamespaces= settings are set, then merge the settings.
This also drops supporting "~yes" and "~no".
2018-05-05 11:07:37 +09:00
Yu Watanabe 86c2a9f1c2 nsflsgs: drop namespace_flag_{from,to}_string()
This also drops namespace_flag_to_string_many_with_check(), and
renames namespace_flag_{from,to}_string_many() to
namespace_flags_{from,to}_string().
2018-05-05 11:07:37 +09:00
Yu Watanabe b5a33299b0 core: disable namespace sandboxing for '+' prefixed lines
Fixes #8842.
2018-05-01 13:44:06 +09:00
Lennart Poettering da6053d0a7 tree-wide: be more careful with the type of array sizes
Previously we were a bit sloppy with the index and size types of arrays,
we'd regularly use unsigned. While I don't think this ever resulted in
real issues I think we should be more careful there and follow a
stricter regime: unless there's a strong reason not to use size_t for
array sizes and indexes, size_t it should be. Any allocations we do
ultimately will use size_t anyway, and converting forth and back between
unsigned and size_t will always be a source of problems.

Note that on 32bit machines "unsigned" and "size_t" are equivalent, and
on 64bit machines our arrays shouldn't grow that large anyway, and if
they do we have a problem, however that kind of overly large allocation
we have protections for usually, but for overflows we do not have that
so much, hence let's add it.

So yeah, it's a story of the current code being already "good enough",
but I think some extra type hygiene is better.

This patch tries to be comprehensive, but it probably isn't and I missed
a few cases. But I guess we can cover that later as we notice it. Among
smaller fixes, this changes:

1. strv_length()' return type becomes size_t

2. the unit file changes array size becomes size_t

3. DNS answer and query array sizes become size_t

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=76745
2018-04-27 14:29:06 +02:00
Lennart Poettering 5d13a15b1d tree-wide: drop spurious newlines (#8764)
Double newlines (i.e. one empty lines) are great to structure code. But
let's avoid triple newlines (i.e. two empty lines), quadruple newlines,
quintuple newlines, …, that's just spurious whitespace.

It's an easy way to drop 121 lines of code, and keeps the coding style
of our sources a bit tigther.
2018-04-19 12:13:23 +02:00
Zbigniew Jędrzejewski-Szmek 11a1589223 tree-wide: drop license boilerplate
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.

I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
2018-04-06 18:58:55 +02:00
Yu Watanabe 1cc6c93a95 tree-wide: use TAKE_PTR() and TAKE_FD() macros 2018-04-05 14:26:26 +09:00
Dimitri John Ledkov e64c2d0b5f core: use setreuid/setregid trick to create session keyring with right ownership (#8447)
Re-use the hacks used to link user keyring, when creating the session
keyring. This way changing ownership of the keyring is not required, and thus
incovation_id can be correctly created in restricted environments.

Creating invocation_id with root permissions works and linking it into session
keyring works, as at that point session keyring is possessed.

Simple way to validate this is with following commands:

$ journalctl -f &
$ sudo systemd-run --uid 1000 /bin/sh -c 'keyctl describe @s; keyctl list @s; keyctl read `keyctl search @s user invocation_id`'

which now works in LXD containers as well as on the host.

Fixes: https://github.com/systemd/systemd/issues/7655
2018-03-27 12:58:10 +02:00
Lennart Poettering 959071cac2
Merge pull request #8552 from keszybz/test-improvements
Test and diagnostics improvements
2018-03-23 15:26:54 +01:00
Zbigniew Jędrzejewski-Szmek 37c1d5e97d tree-wide: warn when a directory path already exists but has bad mode/owner/type
When we are attempting to create directory somewhere in the bowels of /var/lib
and get an error that it already exists, it can be quite hard to diagnose what
is wrong (especially for a user who is not aware that the directory must have
the specified owner, and permissions not looser than what was requested). Let's
print a warning in most cases. A warning is appropriate, because such state is
usually a sign of borked installation and needs to be resolved by the adminstrator.

$ build/test-fs-util

Path "/tmp/test-readlink_and_make_absolute" already exists and is not a directory, refusing.
   (or)
Directory "/tmp/test-readlink_and_make_absolute" already exists, but has mode 0775 that is too permissive (0755 was requested), refusing.
   (or)
Directory "/tmp/test-readlink_and_make_absolute" already exists, but is owned by 1001:1000 (1000:1000 was requested), refusing.

Assertion 'mkdir_safe(tempdir, 0755, getuid(), getgid(), MKDIR_WARN_MODE) >= 0' failed at ../src/test/test-fs-util.c:320, function test_readlink_and_make_absolute(). Aborting.

No functional change except for the new log lines.
2018-03-23 10:26:38 +01:00
Lennart Poettering ae2a15bc14 macro: introduce TAKE_PTR() macro
This macro will read a pointer of any type, return it, and set the
pointer to NULL. This is useful as an explicit concept of passing
ownership of a memory area between pointers.

This takes inspiration from Rust:

https://doc.rust-lang.org/std/option/enum.Option.html#method.take

and was suggested by Alan Jenkins (@sourcejedi).

It drops ~160 lines of code from our codebase, which makes me like it.
Also, I think it clarifies passing of ownership, and thus helps
readability a bit (at least for the initiated who know the new macro)
2018-03-22 20:21:42 +01:00
Zbigniew Jędrzejewski-Szmek d50b5839b0 basic/mkdir: convert bool flag to enum
In preparation for subsequent changes...
2018-03-22 15:57:56 +01:00
Lennart Poettering 2b33ab0957 tree-wide: port various places over to use new rearrange_stdio() 2018-03-02 11:42:10 +01:00
Zbigniew Jędrzejewski-Szmek 30c81ce2ce pid1: when creating service directories, don't chown existing files (#8181)
This partially reverts 3536f49e8f and
3536f49e8f.

When the user is dynamic, and we are setting up state, cache, or logs dirs,
behaviour is unchanged, we always do a recursive chown. This is necessary
because the user number might change between invocations.

But when setting up a directory for non-dynamic user, or a runtime directory
for a dynamic user, do any ownership or mode changes only when the directory
is initially created. Nothing says that the files under those directories have
to be all recursively owned by our user. This restores behaviour before
3536f49e8f, so modifications to the state of
the runtime directory persist between ExecStartPre's and ExecStart's, and even
longer in case the directory is persistent.

I think it _would_ be a nice property if setting a user would automatically
propagate to ownership of any Runtime/Logs/Cache directories. But this is
incompatible with another nice property, namely preserving changes to those
directories made by an admin, and with allowing change of ownership of files
in those directories by the service (e.g. to allow other users to access them).
Of the two, I think the second property is more important. Also, it's backwards
compatible.

https://bugzilla.redhat.com/show_bug.cgi?id=1508495

There is no need to chmod a directory we just created, so move that step
up into a branch. After that, 'effective' is only used once, so get rid of
it too.
2018-02-22 11:30:59 +01:00
Yu Watanabe 2abd4e388a core: add new setting TemporaryFileSystem=
This introduces a new setting TemporaryFileSystem=. This is useful
to hide files not relevant to the processes invoked by unit, while
necessary files or directories can be still accessed by combining
with Bind{,ReadOnly}Paths=.
2018-02-21 09:17:52 +09:00
Yu Watanabe 4ca763a902 core/namespace: make '-' prefix in Bind{,ReadOnly}Paths= work
Each path in `Bind{ReadOnly}Paths=` accept '-' prefix. However,
the prefix is completely ignored.
This makes it work as expected.
2018-02-21 09:07:56 +09:00
Yu Watanabe 8e06d57ccb core/execute: clear bind_mounts 2018-02-21 09:05:37 +09:00
Yu Watanabe a635a7aec6 core/execute: simplify compile_bind_mounts()
It is not necessary to re-assign error code.
2018-02-21 09:05:35 +09:00
Lennart Poettering 7b91264852 terminal-util: make resolve_dev_console() less weird
Let's normalize the behaviour: return a negative errno style error code,
and return the resolved string directly as argument.
2018-02-14 17:30:37 +01:00
Lennart Poettering 8854d79504 terminal-util: rework acquire_terminal()
This modernizes acquire_terminal() in a couple of ways:

1. The three boolean arguments are replaced by a flags parameter, that
   should be more descriptive in what it does.

2. We now properly handle inotify queue overruns

3. We use _cleanup_ for closing the fds now, to shorten the code quite a
   bit.

Behaviour should not be altered by this.
2018-02-13 21:24:37 +01:00
Yu Watanabe 34cf6c4340 core/execute: make arguments constant if possible
Also make functions static if possible.
2018-02-06 16:00:50 +09:00
Yu Watanabe e8a565cb66 core: make ExecRuntime be manager managed object
Before this, each ExecRuntime object is owned by a unit. However,
it may be shared with other units which enable JoinsNamespaceOf=.
Thus, by the serialization/deserialization process, its sharing
information, more specifically, reference counter is lost, and
causes issue #7790.

This makes ExecRuntime objects be managed by manager, and changes
the serialization/deserialization process.

Fixes #7790.
2018-02-06 16:00:34 +09:00
Zbigniew Jędrzejewski-Szmek 5035800495
Merge pull request #7763 from yuwata/fix-7761
Revert "core/execute: RuntimeDirectory= or friends requires mount namespace"
2018-01-05 12:38:29 +01:00
Lennart Poettering 2e87a1fde9 tree-wide: make use of wait_for_terminate_and_check() at various places
Using wait_for_terminate_and_check() instead of wait_for_terminate()
let's us simplify, shorten and unify the return value checking and
logging of waitid().  Hence, let's use it all over the place.
2018-01-04 13:27:27 +01:00
Yu Watanabe 4657abb5d4 execute: make "runtime" argument const in exec_needs_mount_namespace()
The argument can be const, then let's make so.
2018-01-04 00:48:18 +09:00
Yu Watanabe b43ee82fc1 core: RuntimeDirectory= does not request new mount namespace
Now RuntimeDirectory= does not create 'private' directory.
Thus, it is not neccessary to request new mount namespace.

Follow-up for 8092a48cc1.
2018-01-04 00:26:35 +09:00
Yu Watanabe 42b1d8e0f5 Revert "core/execute: RuntimeDirectory= or friends requires mount namespace"
This reverts commit 652bb2637a.

Fixes #7761.
2018-01-04 00:26:11 +09:00
Lennart Poettering 4c253ed1ca tree-wide: introduce new safe_fork() helper and port everything over
This adds a new safe_fork() wrapper around fork() and makes use of it
everywhere. The new wrapper does a couple of things we previously did
manually and separately in a safer, more correct and automatic way:

1. Optionally resets signal handlers/mask in the child

2. Sets a name on all processes we fork off right after forking off (and
   the patch assigns useful names for all processes we fork off now,
   following a systematic naming scheme: always enclosed in () – in order
   to indicate that these are not proper, exec()ed processes, but only
   forked off children, and if the process is long-running with only our
   own code, without execve()'ing something else, it gets am "sd-" prefix.)

3. Optionally closes all file descriptors in the child

4. Optionally sets a PR_SET_DEATHSIG to SIGTERM in the child, in a safe
   way so that the parent dying before this happens being handled
   safely.

5. Optionally reopens the logs

6. Optionally connects stdin/stdout/stderr to /dev/null

7. Debug logs about the forked off processes.
2017-12-25 11:48:21 +01:00
Yu Watanabe 586290017d tree-wide: use !strv_isempty() instead of strv_length() > 0 2017-12-19 10:43:57 +09:00
Lennart Poettering f1d34068ef tree-wide: add DEBUG_LOGGING macro that checks whether debug logging is on (#7645)
This makes things a bit easier to read I think, and also makes sure we
always use the _unlikely_ wrapper around it, which so far we used
sometimes and other times we didn't. Let's clean that up.
2017-12-15 11:09:00 +01:00
Lennart Poettering 234519ae6d tree-wide: drop a few == NULL and != NULL comparison
Our CODING_STYLE suggests not comparing with NULL, but relying on C's
downgrade-to-bool feature for that. Fix up some code to match these
guidelines. (This is not comprehensive, the coccinelle output for this
is unfortunately kinda borked)
2017-12-11 16:05:40 +01:00
Yu Watanabe da681e1bd2 tree-wide: use cpu_set_mfree() 2017-12-06 10:32:38 +09:00
Yu Watanabe 7f59dd3566 execute: define the variable mac_selinux_contex_net only when build with SELinux 2017-12-05 14:08:09 +09:00
Yu Watanabe 92b423b9b4 execute: define setup_smack() only if SMACK is enabled
This suppresses the following warning
```
execute.c:2149:12: warning: ‘setup_smack’ defined but not used [-Wunused-function]
 static int setup_smack(
            ^~~~~~~~~~~
```
2017-12-05 14:04:15 +09:00
Lennart Poettering 949befd3f0
core: support upgrading from DynamicUser=0 to DynamicUser=1 for unit directories (#7507)
This makes sure we migrate /var/lib/<foo> if it exists to
/var/lib/private/<foo> if DynamicUser=1 is set. This is useful to allow
turning on DynamicUser= on services that previously didn't use it, and
we can deal with this, and migrate the relevant directories as
necessary.

Note that "downgrading" from DynamicUser=1 backto DynamicUser=0 works
too. However in that case we simply continue to use
/var/lib/private/<foo>, which works because /var/lib/<foo> is a symlink
there after all.
2017-11-30 11:52:39 +01:00
Lennart Poettering 62b9bb2661 cgroup-util: merge cg_set_tasks_access() and cg-set_group_access() into one
We never use these functions seperately, hence don't bother splitting
them into to.

Also, simplify things a bit, and maintain tables for the attribute files
to chown. Let's also update those tables a bit, and include thenew
"cgroup.threads" file in it, that needs to be delegated too, according
to the documentation.
2017-11-25 17:08:21 +01:00
jobol 37ac2744cc core/exec: Restore SmackProcessLabel setting (#7378)
Smack LSM needs the capability CAP_MAC_ADMIN to allow
setting of the current Smack exec label. Consequently,
dropping capabilities must be done after changing the
current exec label.

This is only related to Smack LSM. But for clarity and
regularity, all setting of security context moved before
dropping capabilities.

See Issue 7108
2017-11-21 12:01:13 +01:00
Lennart Poettering 0133d5553a
Merge pull request #7198 from poettering/stdin-stdout
Add StandardInput=data, StandardInput=file:... and more
2017-11-19 19:49:11 +01:00
Zbigniew Jędrzejewski-Szmek 53e1b68390 Add SPDX license identifiers to source files under the LGPL
This follows what the kernel is doing, c.f.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.
2017-11-19 19:08:15 +01:00
Lennart Poettering befc4a800e core: add exec_context_dump() support for fd: and file: stdio settings
This was missing for using fdnames as stdio, let's add support for
fdnames as well as file paths in one go.
2017-11-17 11:13:44 +01:00
Lennart Poettering 2038c3f584 core: add support for StandardInputFile= and friends
These new settings permit specifiying arbitrary paths as
stdin/stdout/stderr locations. We try to open/create them as necessary.
Some special magic is applied:

1) if the same path is specified for both input and output/stderr, we'll
   open it only once O_RDWR, and duplicate them fd instead.

2) If we an AF_UNIX socket path is specified, we'll connect() to it,
   rather than open() it. This allows invoking systemd services with
   stdin/stdout/stderr connected to arbitrary foreign service sockets.

Fixes: #3991
2017-11-17 11:13:44 +01:00
Lennart Poettering e75a9ed176 execute: some extra asserts
In some cases we checked for fd validity already explicitly, let's do
this for all our fds.
2017-11-17 11:13:44 +01:00
Lennart Poettering 5073ff6bec core: fold property_get_input_fdname() and property_get_output_fdname() into one
property_get_output_fdname() already had two different control flows for
stdout and stderr, it might as well handle stdin too, thus shortening
our code a bit.
2017-11-17 11:13:44 +01:00
Lennart Poettering 3a274a218d execute: fix type of open_terminal_as() flags parameter
It's the flags parameter we propagate here, not the mode parameter,
hence let's name it properly, and use the right type.
2017-11-17 11:13:44 +01:00
Lennart Poettering 08f3be7a38 core: add two new unit file settings: StandardInputData= + StandardInputText=
Both permit configuring data to pass through STDIN to an invoked
process. StandardInputText= accepts a line of text (possibly with
embedded C-style escapes as well as unit specifiers), which is appended
to the buffer to pass as stdin, followed by a single newline.
StandardInputData= is similar, but accepts arbitrary base64 encoded
data, and will not resolve specifiers or C-style escapes, nor append
newlines.

This may be used to pass input/configuration data to services, directly
in-line from unit files, either in a cooked or in a more raw format.
2017-11-17 11:13:44 +01:00
Lennart Poettering 1fb0682e78 execute: check whether we are actually on a TTY before doing TIOCSCTTY
Given that Linux assigns the same ioctl numbers ot multiple subsystems,
we should be careful when invoking ioctls, so that we don't end up
calling something we wouldn't want to call.
2017-11-17 11:13:44 +01:00
Lennart Poettering 046a82c1b2 fd-util: add new helper move_fd() and make use of it
We are using the same pattern at various places: call dup2() on an fd,
and close the old fd, usually in combination with some O_CLOEXEC
fiddling. Let's add a little helper for this, and port a few obvious
cases over.
2017-11-17 11:13:44 +01:00
Lennart Poettering d3070fbdf6 core: implement /run/systemd/units/-based path for passing unit info from PID 1 to journald
And let's make use of it to implement two new unit settings with it:

1. LogLevelMax= is a new per-unit setting that may be used to configure
   log priority filtering: set it to LogLevelMax=notice and only
   messages of level "notice" and lower (i.e. more important) will be
   processed, all others are dropped.

2. LogExtraFields= is a new per-unit setting for configuring per-unit
   journal fields, that are implicitly included in every log record
   generated by the unit's processes. It takes field/value pairs in the
   form of FOO=BAR.

Also, related to this, one exisiting unit setting is ported to this new
facility:

3. The invocation ID is now pulled from /run/systemd/units/ instead of
   cgroupfs xattrs. This substantially relaxes requirements of systemd
   on the kernel version and the privileges it runs with (specifically,
   cgroupfs xattrs are not available in containers, since they are
   stored in kernel memory, and hence are unsafe to permit to lesser
   privileged code).

/run/systemd/units/ is a new directory, which contains a number of files
and symlinks encoding the above information. PID 1 creates and manages
these files, and journald reads them from there.

Note that this is supposed to be a direct path between PID 1 and the
journal only, due to the special runtime environment the journal runs
in. Normally, today we shouldn't introduce new interfaces that (mis-)use
a file system as IPC framework, and instead just an IPC system, but this
is very hard to do between the journal and PID 1, as long as the IPC
system is a subject PID 1 manages, and itself a client to the journal.

This patch cleans up a couple of types used in journal code:
specifically we switch to size_t for a couple of memory-sizing values,
as size_t is the right choice for everything that is memory.

Fixes: #4089
Fixes: #3041
Fixes: #4441
2017-11-16 12:40:17 +01:00
Yu Watanabe 3df90f24cc core: allow to specify errno number in SystemCallErrorNumber= 2017-11-11 21:54:24 +09:00
Yu Watanabe 8cfa775f4f core: add support to specify errno in SystemCallFilter=
This makes each system call in SystemCallFilter= blacklist optionally
takes errno name or number after a colon. The errno takes precedence
over the one given by SystemCallErrorNumber=.

C.f. #7173.
Closes #7169.
2017-11-11 21:54:12 +09:00
Yu Watanabe 8092a48cc1 core/execute: do not create RuntimeDirectory= under private/ sub-directory
RuntimeDirectory= often used for sharing files or sockets with other
services. So, if creating them under private/ sub-directory, we cannot
set DynamicUser= to service units which want to share something through
RuntimeDirectory=.
This makes the directories given by RuntimeDirectory= are created under
/run/ even if DynamicUser= is set.

Fixes #7260.
2017-11-08 15:50:58 +09:00
Yu Watanabe 652bb2637a core/execute: RuntimeDirectory= or friends requires mount namespace
Since #6940, RuntimeDirectory= or their friends imply BindPaths=.
So, if at least one of them are set, mount namespace is required.
2017-11-08 15:48:51 +09:00
Yu Watanabe 7bcef4efe6 core: remove compile_read_write_paths()
From 6c47cd7d3b, RuntimeDirectory= and
their friends also imply BindPaths=. Thus, implying ReadWritePaths=
is meaningless.
2017-11-08 15:07:22 +09:00
Zbigniew Jędrzejewski-Szmek 895265ad7d Merge pull request #7059 from yuwata/dynamic-user-7013
dynamic-user: permit the case static uid and gid are different
2017-10-18 08:37:12 +02:00
Yu Watanabe e2b0cc3415 core: fix invalid error message
The error message corresponds to EILSEQ is "Invalid or incomplete
multibyte or wide character", and is not suitable in this case.
So, let's show a custom error message when the function
dynamic_creds_realize() returns -EILSEQ.
2017-10-18 08:57:59 +09:00
Yu Watanabe 709dbeac14 core: cleanup for enforce_groups() (#7064)
SupplementaryGroups= is preprocessed in get_supplementary_groups().
So, it is not necessary to input ExecContext to enforce_groups().
2017-10-12 08:10:25 +02:00
Yu Watanabe a8cabc612b core: fix segfault in compile_bind_mounts() when BindPaths= or BindReadOnlyPaths= is set
This fixes a bug introduced by 6c47cd7d3b.

Fixes #7055.
2017-10-11 12:28:22 +09:00
Zbigniew Jędrzejewski-Szmek 7081228acd Merge pull request #7045 from poettering/namespace-casing
some super-trivial fixes to namespace.c
2017-10-10 21:50:17 +02:00
Lennart Poettering b74023db06 Merge pull request #7003 from yuwata/enable-dynamic-user
timesyncd, journal-upload: Enable DynamicUser=
2017-10-10 10:05:43 +02:00
Lennart Poettering bb0ff3fb1b namespace: change NameSpace → Namespace
We generally use the casing "Namespace" for the word, and that's visible
in a number of user-facing interfaces, including "RestrictNamespace=" or
"JoinsNamespaceOf=". Let's make sure to use the same casing internally
too.

As discussed in #7024
2017-10-10 09:51:58 +02:00
Michal Sekletar 6e2d7c4f13 namespace: fall back gracefully when kernel doesn't support network namespaces (#7024) 2017-10-10 09:46:13 +02:00
Yu Watanabe c31ad02403 mkdir: introduce follow_symlink flag to mkdir_safe{,_label}() 2017-10-06 16:03:33 +09:00
Lennart Poettering 4aa1d31c89 Merge pull request #6974 from keszybz/clean-up-defines
Clean up define definitions
2017-10-04 19:25:30 +02:00
Lennart Poettering 5ad90fe376 Merge pull request #6985 from yuwata/empty
load-fragment: do not create empty array
2017-10-04 17:54:35 +02:00
Yu Watanabe 4c70109600 tree-wide: use IN_SET macro (#6977) 2017-10-04 16:01:32 +02:00
Zbigniew Jędrzejewski-Szmek f9fa32f09c build-sys: s/HAVE_SMACK/ENABLE_SMACK/
Same justification as for HAVE_UTMP.
2017-10-04 12:09:50 +02:00
Zbigniew Jędrzejewski-Szmek 349cc4a507 build-sys: use #if Y instead of #ifdef Y everywhere
The advantage is that is the name is mispellt, cpp will warn us.

$ git grep -Ee "conf.set\('(HAVE|ENABLE)_" -l|xargs sed -r -i "s/conf.set\('(HAVE|ENABLE)_/conf.set10('\1_/"
$ git grep -Ee '#ifn?def (HAVE|ENABLE)' -l|xargs sed -r -i 's/#ifdef (HAVE|ENABLE)/#if \1/; s/#ifndef (HAVE|ENABLE)/#if ! \1/;'
$ git grep -Ee 'if.*defined\(HAVE' -l|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_]*)\)/\1/g'
$ git grep -Ee 'if.*defined\(ENABLE' -l|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_]*)\)/\1/g'
+ manual changes to meson.build

squash! build-sys: use #if Y instead of #ifdef Y everywhere

v2:
- fix incorrect setting of HAVE_LIBIDN2
2017-10-04 12:09:29 +02:00
Zbigniew Jędrzejewski-Szmek ac6e8be66e core: use strv_isempty to check if supplementary_groups is empty
With the previous commit, we know that it will be NULL if empty, but
it's safe to always use strv_isempty() in case the code changes
in the future.
2017-10-04 11:33:30 +02:00
Lennart Poettering da50b85af7 core: when looking for a UID to use for a dynamic UID start with the current owner of the StateDirectory= and friends
Let's optimize dynamic UID allocation a bit: if a StateDirectory= (or
suchlike) is configured, we start our allocation loop from that UID and
use it if it currently isn't used otherwise. This is beneficial as it
saves us from having to expensively recursively chown() these
directories in the typical case (which StateDirectory= does when it
notices that the owner of the directory doesn't match the UID picked).

With this in place we now have the a three-phase logic for allocating a
dynamic UID:

a) first, we try to use the owning UID of StateDirectory=,
   CacheDirectory=, LogDirectory= if that exists and is currently
   otherwise unused.

b) if that didn't work out, we hash the UID from the service name

c) if that didn't yield an unused UID either, randomly pick new ones
   until we find a free one.
2017-10-02 17:41:44 +02:00
Lennart Poettering 6c47cd7d3b execute: make StateDirectory= and friends compatible with DynamicUser=1 and RootDirectory=/RootImage=
Let's clean up the interaction of StateDirectory= (and friends) to
DynamicUser=1: instead of creating these directories directly below
/var/lib, place them in /var/lib/private instead if DynamicUser=1 is
set, making that directory 0700 and owned by root:root. This way, if a
dynamic UID is later reused, access to the old run's state directory is
prohibited for that user. Then, use file system namespacing inside the
service to make /var/lib/private a readable tmpfs, hiding all state
directories that are not listed in StateDirectory=, and making access to
the actual state directory possible. Mount all directories listed in
StateDirectory= to the same places inside the service (which means
they'll now be mounted into the tmpfs instance). Finally, add a symlink
from the state directory name in /var/lib/ to the one in
/var/lib/private, so that both the host and the service can access the
path under the same location.

Here's an example: let's say a service runs with StateDirectory=foo.
When DynamicUser=0 is set, it will get the following setup, and no
difference between what the unit and what the host sees:

        /var/lib/foo (created as directory)

Now, if DynamicUser=1 is set, we'll instead get this on the host:

        /var/lib/private (created as directory with mode 0700, root:root)
        /var/lib/private/foo (created as directory)
        /var/lib/foo → private/foo (created as symlink)

And from inside the unit:

        /var/lib/private (a tmpfs mount with mode 0755, root:root)
        /var/lib/private/foo (bind mounted from the host)
        /var/lib/foo → private/foo (the same symlink as above)

This takes inspiration from how container trees are protected below
/var/lib/machines: they generally reuse UIDs/GIDs of the host, but
because /var/lib/machines itself is set to 0700 host users cannot access
files in the container tree even if the UIDs/GIDs are reused. However,
for this commit we add one further trick: inside and outside of the unit
/var/lib/private is a different thing: outside it is a plain,
inaccessible directory, and inside it is a world-readable tmpfs mount
with only the whitelisted subdirs below it, bind mounte din.  This
means, from the outside the dir acts as an access barrier, but from the
inside it does not. And the symlink created in /var/lib/foo itself
points across the barrier in both cases, so that root and the unit's
user always have access to these dirs without knowing the details of
this mounting magic.

This logic resolves a major shortcoming of DynamicUser=1 units:
previously they couldn't safely store persistant data. With this change
they can have their own private state, log and data directories, which
they can write to, but which are protected from UID recycling.

With this change, if RootDirectory= or RootImage= are used it is ensured
that the specified state/log/cache directories are always mounted in
from the host. This change of semantics I think is much preferable since
this means the root directory/image logic can be used easily for
read-only resource bundling (as all writable data resides outside of the
image). Note that this is a change of behaviour, but given that we
haven't released any systemd version with StateDirectory= and friends
implemented this should be a safe change to make (in particular as
previously it wasn't clear what would actually happen when used in
combination). Moreover, by making this change we can later add a "+"
modifier to these setings too working similar to the same modifier in
ReadOnlyPaths= and friends, making specified paths relative to the
container itself.
2017-10-02 17:41:44 +02:00
Lennart Poettering 72fd17682d core: usually our enum's _INVALID and _MAX special values are named after the full type
In most cases we followed the rule that the special _INVALID and _MAX
values we use in our enums use the full type name as prefix (in contrast
to regular values that we often make shorter), do so for
ExecDirectoryType as well.

No functional changes, just a little bit of renaming to make this code
more like the rest.
2017-10-02 17:41:43 +02:00
Lennart Poettering a1164ae380 core: chown() StateDirectory= and friends recursively when starting a service
This is particularly useful when used in conjunction with DynamicUser=1,
where the UID might change for every invocation, but is useful in other
cases too, for example, when these directories are shared between
systems where the UID assignments differ slightly.
2017-10-02 17:41:43 +02:00
Lennart Poettering 40a80078d2 execute: let's close glibc syslog channels too
Just in case something opened them, let's make sure glibc invalidates
them too.

Thankfully so far no library opened log channels behind our back, at
least as far as I know, hence this is actually a NOP, but let's better
be safe than sorry.
2017-09-26 17:52:25 +02:00
Lennart Poettering 12145637e9 execute: normalize logging in execute.c
Now that logging can implicitly reopen the log streams when needed we
can log errors without any special magic, hence let's normalize things,
and log the same way we do everywhere else.
2017-09-26 17:51:22 +02:00
Lennart Poettering 86ffb32560 execute: drop explicit log_open()/log_close() now that it is unnecessary 2017-09-26 17:46:34 +02:00
Lennart Poettering 2c027c62dd execute: make use of the new logging mode in execute.c 2017-09-26 17:46:34 +02:00
Lennart Poettering 82677ae4c7 execute: downgrade a log message ERR → WARNING, since we proceed ignoring its result 2017-09-26 17:46:33 +02:00
Lennart Poettering 8002fb9747 execute: rework logging in setup_keyring() to include unit info
Let's use log_unit_error() instead of log_error() everywhere (and
friends).
2017-09-26 17:46:33 +02:00
Lennart Poettering e6a7ec4b8e io-util: add new IOVEC_INIT/IOVEC_MAKE macros
This adds IOVEC_INIT() and IOVEC_MAKE() for initializing iovec structures
from a pointer and a size. On top of these IOVEC_INIT_STRING() and
IOVEC_MAKE_STRING() are added which take a string and automatically
determine the size of the string using strlen().

This patch removes the old IOVEC_SET_STRING() macro, given that
IOVEC_MAKE_STRING() is now useful for similar purposes. Note that the
old IOVEC_SET_STRING() invocations were two characters shorter than the
new ones using IOVEC_MAKE_STRING(), but I think the new syntax is more
readable and more generic as it simply resolves to a C99 literal
structure initialization. Moreover, we can use very similar syntax now
for initializing strings and pointer+size iovec entries. We canalso use
the new macros to initialize function parameters on-the-fly or array
definitions. And given that we shouldn't have so many ways to do the
same stuff, let's just settle on the new macros.

(This also converts some code to use _cleanup_ where dynamically
allocated strings were using IOVEC_SET_STRING() before, to modernize
things a bit)
2017-09-22 15:28:04 +02:00
Lennart Poettering f1c50becda core: make sure to log invocation ID of units also when doing structured logging 2017-09-22 15:24:55 +02:00
Jan Synacek f679ed6116 execute: fix typo in error message (#6881) 2017-09-21 10:38:52 +02:00
Zbigniew Jędrzejewski-Szmek 61ceaea5c8 Move one space from dbus-execute.c to execute.c
The number of spaces is conserved ;)
2017-09-16 08:45:02 +02:00
Zbigniew Jędrzejewski-Szmek 3d7d3cbbda Merge pull request #6832 from poettering/keyring-mode
Add KeyringMode unit property to fix cryptsetup key caching
2017-09-15 21:24:48 +02:00
Lennart Poettering b1edf4456e core: add new per-unit setting KeyringMode= for controlling kernel keyring setup
Usually, it's a good thing that we isolate the kernel session keyring
for the various services and disconnect them from the user keyring.
However, in case of the cryptsetup key caching we actually want that
multiple instances of the cryptsetup service can share the keys in the
root user's user keyring, hence we need to be able to disable this logic
for them.

This adds KeyringMode=inherit|private|shared:

    inherit: don't do any keyring magic (this is the default in systemd --user)
    private: a private keyring as before (default in systemd --system)
    shared: the new setting
2017-09-15 16:53:35 +02:00
Lennart Poettering 0460aa5c08 execute: improve and augment execution log messages
Let's generate friendly messages for more cases, and make slight
adjustments to the existing messages.
2017-09-15 16:43:06 +02:00
Lennart Poettering ab2116b140 core: make sure that $JOURNAL_STREAM prefers stderr over stdout information (#6824)
If two separate log streams are connected to stdout and stderr, let's
make sure $JOURNAL_STREAM points to the latter, as that's the preferred
log destination, and the environment variable has been created in order
to permit services to automatically upgrade from stderr based logging to
native journal logging.

Also, document this behaviour.

Fixes: #6800
2017-09-15 08:26:38 +02:00
Lennart Poettering 00819cc151 core: add new UnsetEnvironment= setting for unit files
With this setting we can explicitly unset specific variables for
processes of a unit, as last step of assembling the environment block
for them. This is useful to fix #6407.

While we are at it, greatly expand the documentation on how the
environment block for forked off processes is assembled.
2017-09-14 15:17:40 +02:00
Lennart Poettering 21022b9dde util-lib: wrap personality() to fix up broken glibc error handling (#6766)
glibc appears to propagate different errors in different ways, let's fix
this up, so that our own code doesn't get confused by this.

See #6752 + #6737 for details.

Fixes: #6755
2017-09-08 17:16:29 +03:00
Lennart Poettering aac8c0c382 execute: minor ExecOutput handling beautification (#6711)
Let's clean up the checking for the various ExecOutput values a bit,
let's use IN_SET everywhere, and the same concepts for all three bools
we pass to dprintf().
2017-09-01 09:04:27 +09:00
Lennart Poettering e8132d63fe seccomp: default to something resembling the current personality when locking it
Let's lock the personality to the currently set one, if nothing is
specifically specified. But do so with a grain of salt, and never
default to any exotic personality here, but only PER_LINUX or
PER_LINUX32.
2017-08-29 15:56:57 +02:00
Topi Miettinen 78e864e5b3 seccomp: LockPersonality boolean (#6193)
Add LockPersonality boolean to allow locking down personality(2)
system call so that the execution domain can't be changed.
This may be useful to improve security because odd emulations
may be poorly tested and source of vulnerabilities, while
system services shouldn't need any weird personalities.
2017-08-29 15:54:50 +02:00
Yu Watanabe 5ce96b141a Merge pull request #6582 from poettering/logind-tty
various tty path parsing fixes
2017-08-26 22:12:48 +09:00
Lennart Poettering 165a31c0db core: add two new special ExecStart= character prefixes
This patch adds two new special character prefixes to ExecStart= and
friends, in addition to the existing "-", "@" and "+":

"!"  → much like "+", except with a much reduced effect as it only
       disables the actual setresuid()/setresgid()/setgroups() calls, but
       leaves all other security features on, including namespace
       options. This is very useful in combination with
       RuntimeDirectory= or DynamicUser= and similar option, as a user
       is still allocated and used for the runtime directory, but the
       actual UID/GID dropping is left to the daemon process itself.
       This should make RuntimeDirectory= a lot more useful for daemons
       which insist on doing their own privilege dropping.

"!!" → Similar to "!", but on systems supporting ambient caps this
       becomes a NOP. This makes it relatively straightforward to write
       unit files that make use of ambient capabilities to let systemd
       drop all privs while retaining compatibility with systems that
       lack ambient caps, where priv dropping is the left to the daemon
       codes themselves.

This is an alternative approach to #6564 and related PRs.
2017-08-10 15:04:32 +02:00
Lennart Poettering 43b1f7092d execute: needs_{selinux,apparmor,smack} → use_{selinux,apparmor,smack}
These booleans simply store whether selinux/apparmor/smack are supposed
ot be used, and chache the various mac_xyz_use() calls before we
transition into the namespace, hence let's use the same verb for the
variables and the functions: "use"
2017-08-10 15:02:50 +02:00
Lennart Poettering 9f6444eb92 execute: make use of IN_SET() where we can 2017-08-10 15:02:50 +02:00
Lennart Poettering 937ccce94c execute: simplify needs_sandboxing checking
Let's merge three if blocks that shall only run when sandboxing is applied
into one.

Note that this changes behaviour in one corner case: PrivateUsers=1 is
now honours both PermissionsStartOnly= and the "+" modifier in
ExecStart=, and not just the former, as before. This was an oversight,
so let's fix this now, at a point in time the option isn't used much
yet.
2017-08-10 15:02:50 +02:00
Lennart Poettering 1703fa41a7 core: rename EXEC_APPLY_PERMISSIONS → EXEC_APPLY_SANDBOXING
"Permissions" was a bit of a misnomer, as it suggests that UNIX file
permission bits are adjusted, which aren't really changed here. Instead,
this is about UNIX credentials such as users or groups, as well as
namespacing, hence let's use a more generic term here, without any
misleading reference to UNIX file permissions: "sandboxing", which shall
refer to all kinds of sandboxing technologies, including UID/GID
dropping, selinux relabelling, namespacing, seccomp, and so on.
2017-08-10 15:02:50 +02:00
Lennart Poettering 584b8688d1 execute: also fold the cgroup delegate bit into ExecFlags 2017-08-10 15:02:50 +02:00
Lennart Poettering ac6479781e execute: also control the SYSTEMD_NSS_BYPASS_BUS through an ExecFlags field
Also, correct the logic while we are at it: the variable is only
required for system services, not user services.
2017-08-10 15:02:49 +02:00
Lennart Poettering c71b2eb77e core: don't chown() the configuration directory
The configuration directory is commonly not owned by a service, but
remains root-owned, hence don't change the owner automatically for it.
2017-08-10 15:02:49 +02:00
Lennart Poettering 8679efde21 execute: add one more ExecFlags flag, for controlling unconditional directory chowning
Let's decouple the Manager object from the execution logic a bit more
here too, and simply pass along the fact whether we should
unconditionally chown the runtime/... directories via the ExecFlags
field too.
2017-08-10 14:44:58 +02:00
Lennart Poettering af635cf377 execute: let's decouple execute.c a bit from the unit logic
Let's try to decouple the execution engine a bit from the Unit/Manager
concept, and hence pass one more flag as part of the ExecParameters flags
field.
2017-08-10 14:44:58 +02:00
Lennart Poettering 3ed0cd26ea execute: replace command flag bools by a flags field
This way, we can extend it later on in an easier way, and can pass it
along nicely.
2017-08-10 14:44:58 +02:00
Lennart Poettering a119ec7c82 util-lib: add a new skip_dev_prefix() helper
This new helper removes a leading /dev if there is one. We have code
doing this all over the place, let's unify this, and correct it while
we are at it, by using path_startswith() rather than startswith() to
drop the prefix.
2017-08-09 19:01:18 +02:00
Yu Watanabe 07d46372fe securebits-util: add secure_bits_{from_string,to_string_alloc}() 2017-08-07 23:40:25 +09:00
Yu Watanabe dd1f5bd0aa cap-list: add capability_set_{from_string,to_string_alloc}() 2017-08-07 23:25:11 +09:00
Yu Watanabe 837df14040 core: do not ignore returned values 2017-08-06 23:34:55 +09:00
Yu Watanabe ecfbc84f1c core: define variables only when they are required
Follow-up for 7f18ef0a55.
2017-08-06 13:08:34 +09:00
Fabio Kung 7f18ef0a55 core: check which MACs to use before a new mount ns is created (#6498)
/sys is not guaranteed to exist when a new mount namespace is created.
It is only mounted under conditions specified by
`namespace_info_mount_apivfs`.

Checking if the three available MAC LSMs are enabled requires a sysfs
mounted at /sys, so the checks are moved to before a new mount ns is
created.
2017-08-01 09:15:18 +02:00
Lennart Poettering c867611e0a execute: don't pass unit ID in --user mode to journald for stream logging
When we create a log stream connection to journald, we pass along the
unit ID. With this change we do this only when we run as system
instance, not as user instance, to remove the ambiguity whether a user
or system unit is specified. The effect of this change is minor:
journald ignores the field anyway from clients with UID != 0. This patch
hence only fixes the unit attribution for the --user instance of the
root user.
2017-07-31 18:01:42 +02:00
Lennart Poettering 92a17af991 execute: make some code shorter
Let's simplify some lines to make it shorter.
2017-07-31 18:01:42 +02:00
Lennart Poettering cad93f2996 core, sd-bus, logind: make use of uid_is_valid() in more places 2017-07-31 18:01:42 +02:00
Lennart Poettering df0ff12775 tree-wide: make use of getpid_cached() wherever we can
This moves pretty much all uses of getpid() over to getpid_raw(). I
didn't specifically check whether the optimization is worth it for each
replacement, but in order to keep things simple and systematic I
switched over everything at once.
2017-07-20 20:27:24 +02:00
Yu Watanabe 3536f49e8f core: add {State,Cache,Log,Configuration}Directory= (#6384)
This introduces {State,Cache,Log,Configuration}Directory= those are
similar to RuntimeDirectory=. They create the directories under
/var/lib, /var/cache/, /var/log, or /etc, respectively, with the mode
specified in {State,Cache,Log,Configuration}DirectoryMode=.

This also fixes #6391.
2017-07-18 14:34:52 +02:00
Lennart Poettering 688230d3a7 Merge pull request #6354 from walyong/smack_process_label_free
core: modify resource leak and missed security context dump
2017-07-17 10:04:12 +02:00
Yu Watanabe 23a7448efa core: support subdirectories in RuntimeDirectory= option 2017-07-17 16:30:53 +09:00
Yu Watanabe 53f47dfc7b core: allow preserving contents of RuntimeDirectory= over process restart
This introduces RuntimeDirectoryPreserve= option which takes a boolean
argument or 'restart'.

Closes #6087.
2017-07-17 16:22:25 +09:00
WaLyong Cho 80c21aea11 core: dump also missed security context 2017-07-13 13:12:24 +09:00
WaLyong Cho 5b8e1b7755 core: modify resource leak by SmackProcessLabel= 2017-07-13 13:12:15 +09:00
Lennart Poettering 782c925f7f Revert "core: link user keyring to session keyring (#6275)" (#6342)
This reverts commit 437a85112e.

The outcome of this isn't that clear, let's revert this for now, see
discussion on #6286.
2017-07-12 10:00:43 -04:00
Christian Hesse 437a85112e core: link user keyring to session keyring (#6275)
Commit  74dd6b515f (core: run each system
service with a fresh session keyring) broke adding keys to user keyring.
Added keys could not be accessed with error message:

keyctl_read_alloc: Permission denied

So link the user keyring to our session keyring.
2017-07-04 09:38:31 +02:00
Lennart Poettering 7f452159b8 core: make IOSchedulingClass= and IOSchedulingPriority= settable for transient units
This patch is a bit more complex thant I hoped. In particular the single
IOScheduling= property exposed on the bus is split up into
IOSchedulingClass= and IOSchedulingPriority= (though compat is
retained). Otherwise the asymmetry between setting props and getting
them is a bit too nasty.

Fixes #5613
2017-06-26 17:43:18 +02:00
Zbigniew Jędrzejewski-Szmek 7e867138f5 Merge pull request #5600 from fbuihuu/make-logind-restartable
Make logind restartable.
2017-06-24 18:58:36 -04:00
Franck Bui 4c47affcf1 core: remove the redundancy of 'n_fds' and 'n_storage_fds' in ExecParameters struct
'n_fds' field in the ExecParameters structure was counting the total number of
file descriptors to be passed to a unit.

This counter also includes the number of passed socket fds which is counted by
'n_socket_fds' already.

This patch removes that redundancy by replacing 'n_fds' with
'n_storage_fds'. The new field only counts the fds passed via the storage store
mechanism.  That way each fd is counted at one place only.

Subsequently the patch makes sure to fix code that used 'n_fds' and also wanted
to iterate through all of them by explicitly adding 'n_socket_fds' + 'n_storage_fds'.

Suggested by Lennart.
2017-06-08 16:21:35 +02:00
Franck Bui 9b1419111a core: only apply NonBlocking= to fds passed via socket activation
Make sure to only apply the O_NONBLOCK flag to the fds passed via socket
activation.

Previously the flag was also applied to the fds which came from the fd store
but this was incorrect since services, after being restarted, expect that these
passed fds have their flags unchanged and can be reused as before.

The documentation was a bit unclear about this so clarify it.
2017-06-06 22:42:50 +02:00
Zbigniew Jędrzejewski-Szmek 52511fae7b core: fix warning about unsigned variable (#5935)
Fixup for d8c92e8bc7.
2017-05-11 08:15:28 +02:00
Lennart Poettering 4e168f4606 Merge pull request #5420 from OpenDZ/tixxdz/namespace-fixes-v2
Namespace: RootImage= RootDirectory= and MountAPIVFS fixes
2017-05-09 20:42:32 +02:00
Aggelos Avgerinos 488ab41cb8 execute: Properly log errors considering socket fds (#5910)
Till now if the params->n_fds was 0, systemd was logging that there were
more than one sockets.

Thanks @gregoryp and @VFXcode who did the most work debugging this.
2017-05-08 19:09:22 -04:00
Zbigniew Jędrzejewski-Szmek d8c92e8bc7 execute: filter out "." for ".." in EnvironmentFile= globs too
This doesn't really matter much, only in case somebody would use
something strange like

  EnvironmentFile=/etc/something/.*

Make sure that "." and ".." is not returned by that glob. This makes
all our globbing patterns behave the same.
2017-04-27 13:21:08 -04:00
Djalal Harouni 74e941c022 Merge pull request #5774 from keszybz/printf-annotations
Printf annotation improvements
2017-04-23 01:03:42 +02:00
Zbigniew Jędrzejewski-Szmek ba360bb05c tree-wide: mark log_struct with _printf_ and fix fallout
log_struct takes multiple format strings, each one followed by arguments.
The _printf_ annotation is not sufficiently flexible to express this,
but we can still annotate the first format string, though not its
arguments (because their number is unknown).

With the annotation, the places which specified the message id or similar
as the first pattern cause a warning from -Wformat-nonliteral. This can
be trivially fixed by putting the MESSAGE= first.

This change will help find issues where a non-literal is erroneously used
as the pattern.
2017-04-21 13:37:04 -04:00
Yu Watanabe 4d8b0f0f7a core: downgrade error message if command is prefixed with - and the command is not found
Fixes #5621
2017-04-03 15:38:37 +09:00
Djalal Harouni 9c988f934b namespace: Apply MountAPIVFS= only when a Root directory is set
The MountAPIVFS= documentation says that this options has no effect
unless used in conjunction with RootDirectory= or RootImage= ,lets fix
this and avoid to create private mount namespaces where it is not
needed.
2017-03-05 21:39:43 +01:00
Zbigniew Jędrzejewski-Szmek 643f4706b0 core/execute: add (void)
CID #778045.
2017-02-20 16:02:18 -05:00
Zbigniew Jędrzejewski-Szmek 2b0445262a tree-wide: add SD_ID128_MAKE_STR, remove LOG_MESSAGE_ID
Embedding sd_id128_t's in constant strings was rather cumbersome. We had
SD_ID128_CONST_STR which returned a const char[], but it had two problems:
- it wasn't possible to statically concatanate this array with a normal string
- gcc wasn't really able to optimize this, and generated code to perform the
  "conversion" at runtime.
Because of this, even our own code in coredumpctl wasn't using
SD_ID128_CONST_STR.

Add a new macro to generate a constant string: SD_ID128_MAKE_STR.
It is not as elegant as SD_ID128_CONST_STR, because it requires a repetition
of the numbers, but in practice it is more convenient to use, and allows gcc
to generate smarter code:

$ size .libs/systemd{,-logind,-journald}{.old,}
   text	   data	    bss	    dec	    hex	filename
1265204	 149564	   4808	1419576	 15a938	.libs/systemd.old
1260268	 149564	   4808	1414640	 1595f0	.libs/systemd
 246805	  13852	    209	 260866	  3fb02	.libs/systemd-logind.old
 240973	  13852	    209	 255034	  3e43a	.libs/systemd-logind
 146839	   4984	     34	 151857	  25131	.libs/systemd-journald.old
 146391	   4984	     34	 151409	  24f71	.libs/systemd-journald

It is also much easier to check if a certain binary uses a certain MESSAGE_ID:

$ strings .libs/systemd.old|grep MESSAGE_ID
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x
MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x

$ strings .libs/systemd|grep MESSAGE_ID
MESSAGE_ID=c7a787079b354eaaa9e77b371893cd27
MESSAGE_ID=b07a249cd024414a82dd00cd181378ff
MESSAGE_ID=641257651c1b4ec9a8624d7a40a9e1e7
MESSAGE_ID=de5b426a63be47a7b6ac3eaac82e2f6f
MESSAGE_ID=d34d037fff1847e6ae669a370e694725
MESSAGE_ID=7d4958e842da4a758f6c1cdc7b36dcc5
MESSAGE_ID=1dee0369c7fc4736b7099b38ecb46ee7
MESSAGE_ID=39f53479d3a045ac8e11786248231fbf
MESSAGE_ID=be02cf6855d2428ba40df7e9d022f03d
MESSAGE_ID=7b05ebc668384222baa8881179cfda54
MESSAGE_ID=9d1aaa27d60140bd96365438aad20286
2017-02-15 00:45:12 -05:00
Lennart Poettering 6818c54ca6 core: skip ReadOnlyPaths= and other permission-related mounts on PermissionsStartOnly= (#5309)
ReadOnlyPaths=, ProtectHome=, InaccessiblePaths= and ProtectSystem= are
about restricting access and little more, hence they should be disabled
if PermissionsStartOnly= is used or ExecStart= lines are prefixed with a
"+". Do that.

(Note that we will still create namespaces and stuff, since that's about
a lot more than just permissions. We'll simply disable the effect of
the four options mentioned above, but nothing else mount related.)

This also adds a test for this, to ensure this works as intended.

No documentation updates, as the documentation are already vague enough
to support the new behaviour ("If true, the permission-related execution
options…"). We could clarify this further, but I think we might want to
extend the switches' behaviour a bit more in future, hence leave it at
this for now.

Fixes: #5308
2017-02-12 00:44:46 -05:00
Lennart Poettering 376fecf670 execute: set the right exit status for CHDIR vs. CHROOT
Fixes: #5125
2017-02-09 13:18:35 +01:00
Lennart Poettering 3b0e5bb524 execute: use prefix_roota() where appropriate 2017-02-09 13:18:35 +01:00
Lennart Poettering 6732edab4e execute: set working directory to /root if User= is not set, but WorkingDirectory=~ is
Or actually, try to to do the right thing depending on what is
available:

- If we know $HOME from User=, then use that.
- If the UID for the service is 0, hardcode that WorkingDirectory=~ means WorkingDirectory=/root
- In any other case (which will be the unprivileged --user case), use
  get_home_dir() to find the $HOME of the user we are running as.
- Otherwise fail.

Fixes: #5246 #5124
2017-02-09 13:17:58 +01:00
Lennart Poettering 23deef88b9 Revert "core/execute: set HOME, USER also for root users"
This reverts commit 8b89628a10.

This broke #5246
2017-02-09 11:43:44 +01:00
Lennart Poettering 915e6d1676 core: add RootImage= setting for using a specific image file as root directory for a service
This is similar to RootDirectory= but mounts the root file system from a
block device or loopback file instead of another directory.

This reuses the image dissector code now used by nspawn and
gpt-auto-discovery.
2017-02-07 12:19:42 +01:00
Lennart Poettering 5d997827e2 core: add a per-unit setting MountAPIVFS= for mounting /dev, /proc, /sys in conjunction with RootDirectory=
This adds a boolean unit file setting MountAPIVFS=. If set, the three
main API VFS mounts will be mounted for the service. This only has an
effect on RootDirectory=, which it makes a ton times more useful.

(This is basically the /dev + /proc + /sys mounting code posted in the
original #4727, but rebased on current git, and with the automatic logic
replaced by explicit logic controlled by a unit file setting)
2017-02-07 11:22:05 +01:00
Zbigniew Jędrzejewski-Szmek 6a93917df9 core/execute: pass the username to utmp/wtmp database
Before previous commit, username would be NULL for root, and set only
for other users. So the argument passed to utmp_put_init_process()
would be "root" for other users and NULL for root. Seems strange.
Instead, always pass the username if available.
2017-02-03 11:49:43 -05:00
Zbigniew Jędrzejewski-Szmek 8b89628a10 core/execute: set HOME, USER also for root users
This changes the environment for services running as root from:

LANG=C.utf8
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
INVOCATION_ID=ffbdec203c69499a9b83199333e31555
JOURNAL_STREAM=8:1614518

to

LANG=C.utf8
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
HOME=/root
LOGNAME=root
USER=root
SHELL=/bin/sh
INVOCATION_ID=15a077963d7b4ca0b82c91dc6519f87c
JOURNAL_STREAM=8:1616718

Making the environment special for the root user complicates things
unnecessarily. This change simplifies both our logic (by making the setting
of the variables unconditional), and should also simplify the logic in
services (particularly scripts).

Fixes #5124.
2017-02-03 11:49:22 -05:00
Zbigniew Jędrzejewski-Szmek 587ab01b53 core/execute.c: check asprintf return value in the usual fashion
This is unlikely to fail, but we cannot rely on asprintf return value
on failure, so let's just be correct here.

CID #1368227.
2017-01-31 11:31:47 -05:00
Zbigniew Jędrzejewski-Szmek 56fbd56143 core/execute: reformat exec_context_named_iofds() for legibility 2017-01-31 11:23:10 -05:00
Zbigniew Jędrzejewski-Szmek 06ec51d8ef core/execute: fix strv memleak
compile_read_write_paths() returns a normal strv from strv_copy(), and
setup_namespace() uses it read-only, so we should use strv_free to deallocate.
2017-01-24 22:26:10 -05:00
Zbigniew Jędrzejewski-Szmek 5b3637b44a Merge pull request #4991 from poettering/seccomp-fix 2017-01-17 23:10:46 -05:00
Zbigniew Jędrzejewski-Szmek 70dd455c8e pid1: provide a more detailed error message when execution fails (#5074)
Fixes #5000.
2017-01-17 22:38:55 -05:00
Lennart Poettering 469830d142 seccomp: rework seccomp code, to improve compat with some archs
This substantially reworks the seccomp code, to ensure better
compatibility with some architectures, including i386.

So far we relied on libseccomp's internal handling of the multiple
syscall ABIs supported on Linux. This is problematic however, as it does
not define clear semantics if an ABI is not able to support specific
seccomp rules we install.

This rework hence changes a couple of things:

- We no longer use seccomp_rule_add(), but only
  seccomp_rule_add_exact(), and fail the installation of a filter if the
  architecture doesn't support it.

- We no longer rely on adding multiple syscall architectures to a single filter,
  but instead install a separate filter for each syscall architecture
  supported. This way, we can install a strict filter for x86-64, while
  permitting a less strict filter for i386.

- All high-level filter additions are now moved from execute.c to
  seccomp-util.c, so that we can test them independently of the service
  execution logic.

- Tests have been added for all types of our seccomp filters.

- SystemCallFilters= and SystemCallArchitectures= are now implemented in
  independent filters and installation logic, as they semantically are
  very much independent of each other.

Fixes: #4575
2017-01-17 22:14:27 -05:00
Zbigniew Jędrzejewski-Szmek 4014818d53 Merge pull request #4806 from poettering/keyring-init
set up a per-service session kernel keyring, and store the invocation ID in it
2016-12-13 23:24:42 -05:00
Lennart Poettering d2d6c096f6 core: add ability to define arbitrary bind mounts for services
This adds two new settings BindPaths= and BindReadOnlyPaths=. They allow
defining arbitrary bind mounts specific to particular services. This is
particularly useful for services with RootDirectory= set as this permits making
specific bits of the host directory available to chrooted services.

The two new settings follow the concepts nspawn already possess in --bind= and
--bind-ro=, as well as the .nspawn settings Bind= and BindReadOnly= (and these
latter options should probably be renamed to BindPaths= and BindReadOnlyPaths=
too).

Fixes: #3439
2016-12-14 00:54:10 +01:00
Lennart Poettering b3415f5dae core: store the invocation ID in the per-service keyring
Let's store the invocation ID in the per-service keyring as a root-owned key,
with strict access rights. This has the advantage over the environment-based ID
passing that it also works from SUID binaries (as they key cannot be overidden
by unprivileged code starting them), in contrast to the secure_getenv() based
mode.

The invocation ID is now passed in three different ways to a service:

- As environment variable $INVOCATION_ID. This is easy to use, but may be
  overriden by unprivileged code (which might be a bad or a good thing), which
  means it's incompatible with SUID code (see above).

- As extended attribute on the service cgroup. This cannot be overriden by
  unprivileged code, and may be queried safely from "outside" of a service.
  However, it is incompatible with containers right now, as unprivileged
  containers generally cannot set xattrs on cgroupfs.

- As "invocation_id" key in the kernel keyring. This has the benefit that the
  key cannot be changed by unprivileged service code, and thus is safe to
  access from SUID code (see above). But do note that service code can replace
  the session keyring with a fresh one that lacks the key. However in that case
  the key will not be owned by root, which is easily detectable. The keyring is
  also incompatible with containers right now, as it is not properly namespace
  aware (but this is being worked on), and thus most container managers mask
  the keyring-related system calls.

Ideally we'd only have one way to pass the invocation ID, but the different
ways all have limitations. The invocation ID hookup in journald is currently
only available on the host but not in containers, due to the mentioned
limitations.

How to verify the new invocation ID in the keyring:

 # systemd-run -t /bin/sh
 Running as unit: run-rd917366c04f847b480d486017f7239d6.service
 Press ^] three times within 1s to disconnect TTY.
 # keyctl show
 Session Keyring
  680208392 --alswrv      0     0  keyring: _ses
  250926536 ----s-rv      0     0   \_ user: invocation_id
 # keyctl request user invocation_id
 250926536
 # keyctl read 250926536
 16 bytes of data in key:
 9c96317c ac64495a a42b9cd7 4f3ff96b
 # echo $INVOCATION_ID
 9c96317cac64495aa42b9cd74f3ff96b
 # ^D

This creates a new transient service runnint a shell. Then verifies the
contents of the keyring, requests the invocation ID key, and reads its payload.
For comparison the invocation ID as passed via the environment variable is also
displayed.
2016-12-13 20:59:36 +01:00
Lennart Poettering 74dd6b515f core: run each system service with a fresh session keyring
This patch ensures that each system service gets its own session kernel keyring
automatically, and implicitly. Without this a keyring is allocated for it
on-demand, but is then linked with the user's kernel keyring, which is OK
behaviour for logged in users, but not so much for system services.

With this change each service gets a session keyring that is specific to the
service and ceases to exist when the service is shut down. The session keyring
is not linked up with the user keyring and keys hence only search within the
session boundaries by default.

(This is useful in a later commit to store per-service material in the keyring,
for example the invocation ID)

(With input from David Howells)
2016-12-13 20:59:10 +01:00
Lennart Poettering 2e6dbc0fcd Merge pull request #4538 from fbuihuu/confirm-spawn-fixes
Confirm spawn fixes/enhancements
2016-11-18 11:08:06 +01:00
Franck Bui 539622bd8c core: in confirm spawn, suggest 'f' when user selects 'n' choice 2016-11-17 18:23:32 +01:00
Franck Bui c891efaf8a core: confirm_spawn: always accept units with same_pgrp set for now
For some reasons units remaining in the same process group as PID 1
(same_pgrp=true) fail to acquire the console even if it's not taken by anyone.

So always accept for units with same_pgrp set for now.
2016-11-17 18:16:51 +01:00
Franck Bui 63d77c9254 core: include the unit name when notifying that a confirmation question timed out 2016-11-17 18:16:51 +01:00
Franck Bui b0eb29449e core: add 'c' in confirmation_spawn to resume the boot process 2016-11-17 18:16:50 +01:00
Franck Bui 56fde33af1 core: add 'j' in confirmation_spawn to list the jobs that are in progress 2016-11-17 18:16:50 +01:00
Franck Bui dd6f9ac0d0 core: add 'D' in confirmat spawn to show a full dump of the unit to spawn 2016-11-17 18:16:50 +01:00
Franck Bui eedf223a30 core: add 'i' in confirm spawn to give a short summary of the unit to spawn 2016-11-17 18:16:50 +01:00
Franck Bui d172b175f6 core: rework the confirmation spawn prompt
Previously it was "[Yes, Fail, Skip]" which is pretty misleading because it
suggests that the whole word needs to be entered instead of a single char.

Also this won't fit well when we'll extend the number of choices.

This patch addresses this by changing the choice hint with "[y, f, s – h for help]"
so it's now clear that a single letter has to be entered.

It also introduces a new choice 'h' which describes all possible choices since
a single letter can be not descriptive enough for new users.

It also allow to stick with the same hint string regardless of how
many choices we will support.
2016-11-17 18:16:50 +01:00
Franck Bui 2bcd3c26fe core: limit the length of the confirmation question
When "confirmation_spawn=1", the confirmation question can look like:

  Execute /usr/bin/kmod static-nodes --format=tmpfiles --output=/run/tmpfiles.d/kmod.conf? [Yes, No, Skip]

which is pretty verbose and might not fit in the console width size (which is
usually 80 chars) and thus question will be splitted into 2 consecutive lines.

However since the question is now refreshed every 2 secs, the reprinted
question will overwrite the second line of the previous one...

To prevent this, this patch makes sure that the command line won't be longer
than 60 chars by ellipsizing it if the command is longer:

  Execute /usr/bin/kmod static-nodes --format=tmpfiles --output=/ru…nf? [Yes, No, View, Skip]

A following patch will introduce a new choice that will allow the user to get
details on the command to be executed so it will still be possible to see the
full command line.
2016-11-17 18:16:50 +01:00
Franck Bui 2bcc330942 core: in confirm_spawn, the meaning of 'n' and 's' choices are confusing
Before this patch we had:

 - "no" which gives "failing execution" but the command is actually assumed as
   succeed.

 - "skip" which gives "skipping", but the command is assumed to have failed,
   which ends up with "Failed to start ..." on the console.

Now we have:

 - "fail" which gives "failing execution" and the command is indeed assumed as
   failed.

 - "skip" which gives "skipping execution" and the command is assumed as
   succeed.
2016-11-17 18:16:49 +01:00
Franck Bui 3b20f877ad core: rework ask_for_confirmation()
Now the reponses are handled by ask_for_confirmation() as well as the report of
any errors occuring during the process of retrieving the confirmation response.

One benefit of this is that there's no need to open/close the console one more
time when reporting error/status messages.

The caller now just needs to care about the return values whose meanings are:

 - don't execute and pretend that the command failed
 - don't execute and pretend that the command succeeed
 - positive answer, execute the command

Also some slight code reorganization and introduce write_confirm_error() and
write_confirm_error_fd(). write_confim_message becomes unneeded.
2016-11-17 18:16:49 +01:00
Franck Bui 7d5ceb6416 core: allow to redirect confirmation messages to a different console
It's rather hard to parse the confirmation messages (enabled with
systemd.confirm_spawn=true) amongst the status messages and the kernel
ones (if enabled).

This patch gives the possibility to the user to redirect the confirmation
message to a different virtual console, either by giving its name or its path,
so those messages are separated from the other ones and easier to read.
2016-11-17 18:16:16 +01:00
Djalal Harouni c92e8afebd core: improve the logic that implies no new privileges
The no_new_privileged_set variable is not used any more since commit
9b232d3241 that fixed another thing. So remove it. Also no
need to check if we are under user manager, remove that part too.
2016-11-15 15:04:31 +01:00
Djalal Harouni af964954c6 core: on DynamicUser= make sure that protecting sensitive paths is enforced (#4596)
This adds a variable that is always set to false to make sure that
protect paths inside sandbox are always enforced and not ignored. The only
case when it is set to true is on DynamicUser=no and RootDirectory=/chroot
is set. This allows users to use more our sandbox features inside RootDirectory=

The only exception is ProtectSystem=full|strict and when DynamicUser=yes
is implied. Currently RootDirectory= is not fully compatible with these
due to two reasons:

* /chroot/usr|etc has to be present on ProtectSystem=full
* /chroot// has to be a mount point on ProtectSystem=strict.
2016-11-08 21:57:32 -05:00
Zbigniew Jędrzejewski-Szmek d85a0f8028 Merge pull request #4536 from poettering/seccomp-namespaces
core: add new RestrictNamespaces= unit file setting

Merging, not rebasing, because this touches many files and there were tree-wide cleanups in the mean time.
2016-11-08 19:54:21 -05:00
Zbigniew Jędrzejewski-Szmek f97b34a629 Rename formats-util.h to format-util.h
We don't have plural in the name of any other -util files and this
inconsistency trips me up every time I try to type this file name
from memory. "formats-util" is even hard to pronounce.
2016-11-07 10:15:08 -05:00
Lennart Poettering add005357d core: add new RestrictNamespaces= unit file setting
This new setting permits restricting whether namespaces may be created and
managed by processes started by a unit. It installs a seccomp filter blocking
certain invocations of unshare(), clone() and setns().

RestrictNamespaces=no is the default, and does not restrict namespaces in any
way. RestrictNamespaces=yes takes away the ability to create or manage any kind
of namspace. "RestrictNamespaces=mnt ipc" restricts the creation of namespaces
so that only mount and IPC namespaces may be created/managed, but no other
kind of namespaces.

This setting should be improve security quite a bit as in particular user
namespacing was a major source of CVEs in the kernel in the past, and is
accessible to unprivileged processes. With this setting the entire attack
surface may be removed for system services that do not make use of namespaces.
2016-11-04 07:40:13 -06:00
Lennart Poettering 493fd52f1a Merge pull request #4510 from keszybz/tree-wide-cleanups
Tree wide cleanups
2016-11-03 13:59:20 -06:00
Djalal Harouni cdc5d5c55e core: intialize user aux groups and SupplementaryGroups= when DynamicUser= is set
Make sure that when DynamicUser= is set that we intialize the user
supplementary groups and that we also support SupplementaryGroups=

Fixes: https://github.com/systemd/systemd/issues/4539

Thanks Evgeny Vereshchagin (@evverx)
2016-11-03 08:36:53 +01:00
Lennart Poettering 32e134c19f Merge pull request #4483 from poettering/exec-order
more seccomp fixes, and change of order of selinux/aa/smack and seccomp application on exec
2016-11-02 16:09:59 -06:00
Djalal Harouni bbeea27117 core: initialize groups list before checking SupplementaryGroups= of a unit (#4533)
Always initialize the supplementary groups of caller before checking the
unit SupplementaryGroups= option.

Fixes https://github.com/systemd/systemd/issues/4531
2016-11-02 10:51:35 -06:00
Lennart Poettering 5cd9cd3537 execute: apply seccomp filters after changing selinux/aa/smack contexts
Seccomp is generally an unprivileged operation, changing security contexts is
most likely associated with some form of policy. Moreover, while seccomp may
influence our own flow of code quite a bit (much more than the security context
change) make sure to apply the seccomp filters immediately before executing the
binary to invoke.

This also moves enforcement of NNP after the security context change, so that
NNP cannot affect it anymore. (However, the security policy now has to permit
the NNP change).

This change has a good chance of breaking current SELinux/AA/SMACK setups, because
the policy might not expect this change of behaviour. However, it's technically
the better choice I think and should hence be applied.

Fixes: #3993
2016-11-02 08:55:00 -06:00
Djalal Harouni fa1f250d6f Merge pull request #4495 from topimiettinen/block-shmat-exec
seccomp: also block shmat(..., SHM_EXEC) for MemoryDenyWriteExecute
2016-10-28 15:41:07 +02:00
Djalal Harouni 59e856c7d3 core: make unit argument const for apply seccomp functions 2016-10-27 09:40:22 +02:00
Djalal Harouni 50b3dfb9d6 core: lets apply working directory just after mount namespaces
This makes applying groups after applying the working directory, this
may allow some flexibility but at same it is not a big deal since we
don't execute or do anything between applying working directory and
droping groups.
2016-10-27 09:40:21 +02:00
Djalal Harouni 2b3c1b9e9d core: get the working directory value inside apply_working_directory()
Improve apply_working_directory() and lets get the current working directory
inside of it.
2016-10-27 09:40:21 +02:00
Djalal Harouni e7f1e7c6e2 core: move apply working directory code into its own apply_working_directory() 2016-10-27 09:40:21 +02:00
Djalal Harouni 93c6bb51b6 core: move the code that setups namespaces on its own function 2016-10-27 09:40:21 +02:00
Topi Miettinen d2ffa389b8 seccomp: also block shmat(..., SHM_EXEC) for MemoryDenyWriteExecute
shmat(..., SHM_EXEC) can be used to create writable and executable
memory, so let's block it when MemoryDenyWriteExecute is set.
2016-10-26 18:59:14 +03:00
Lennart Poettering a3be2849b2 seccomp: add new helper call seccomp_load_filter_set()
This allows us to unify most of the code in apply_protect_kernel_modules() and
apply_private_devices().
2016-10-24 17:32:50 +02:00
Lennart Poettering 8d7b0c8fd7 seccomp: add new seccomp_init_conservative() helper
This adds a new seccomp_init_conservative() helper call that is mostly just a
wrapper around seccomp_init(), but turns off NNP and adds in all secondary
archs, for best compatibility with everything else.

Pretty much all of our code used the very same constructs for these three
steps, hence unifying this in one small function makes things a lot shorter.

This also changes incorrect usage of the "scmp_filter_ctx" type at various
places. libseccomp defines it as typedef to "void*", i.e. it is a pointer type
(pretty poor choice already!) that casts implicitly to and from all other
pointer types (even poorer choice: you defined a confusing type now, and don't
even gain any bit of type safety through it...). A lot of the code assumed the
type would refer to a structure, and hence aded additional "*" here and there.
Remove that.
2016-10-24 17:32:50 +02:00
Lennart Poettering 25a8d8a0cb core: rework apply_protect_kernel_modules() to use seccomp_add_syscall_filter_set()
Let's simplify this call, by making use of the new infrastructure.

This is actually more in line with Djalal's original patch but instead of
search the filter set in the array by its name we can now use the set index and
jump directly to it.
2016-10-24 17:32:50 +02:00
Lennart Poettering 8130926d32 core: rework syscall filter set handling
A variety of fixes:

- rename the SystemCallFilterSet structure to SyscallFilterSet. So far the main
  instance of it (the syscall_filter_sets[] array) used to abbreviate
  "SystemCall" as "Syscall". Let's stick to one of the two syntaxes, and not
  mix and match too wildly. Let's pick the shorter name in this case, as it is
  sufficiently well established to not confuse hackers reading this.

- Export explicit indexes into the syscall_filter_sets[] array via an enum.
  This way, code that wants to make use of a specific filter set, can index it
  directly via the enum, instead of having to search for it. This makes
  apply_private_devices() in particular a lot simpler.

- Provide two new helper calls in seccomp-util.c: syscall_filter_set_find() to
  find a set by its name, seccomp_add_syscall_filter_set() to add a set to a
  seccomp object.

- Update SystemCallFilter= parser to use extract_first_word().  Let's work on
  deprecating FOREACH_WORD_QUOTED().

- Simplify apply_private_devices() using this functionality
2016-10-24 17:32:50 +02:00
Lennart Poettering e0f3720e39 core: move misplaced comment to the right place 2016-10-24 17:29:51 +02:00
Lennart Poettering f673b62df6 core: simplify skip_seccomp_unavailable() a bit
Let's prefer early-exit over deep-indented if blocks. Not behavioural change.
2016-10-24 17:29:51 +02:00
Djalal Harouni 366ddd252e core: do not assert when sysconf(_SC_NGROUPS_MAX) fails (#4466)
Remove the assert and check the return code of sysconf(_SC_NGROUPS_MAX).

_SC_NGROUPS_MAX maps to NGROUPS_MAX which is defined in <limits.h> to
65536 these days. The value is a sysctl read-only
/proc/sys/kernel/ngroups_max and the kernel assumes that it is always
positive otherwise things may break. Follow this and support only
positive values for all other case return either -errno or -EOPNOTSUPP.

Now if there are systems that want to re-write NGROUPS_MAX then they
should not pass SupplementaryGroups= in units even if it is empty, in
this case nothing fails and we just ignore supplementary groups. However
if SupplementaryGroups= is passed even if it is empty we have to assume
that there will be groups manipulation from our side or the kernel and
since the kernel always assumes that NGROUPS_MAX is positive, then
follow that and support only positive values.
2016-10-24 13:13:06 +02:00
Djalal Harouni 8b6903ad4d core: lets move the setup of working directory before group enforce
This is minor but lets try to split and move bit by bit cgroups and
portable environment setup before applying the security context.
2016-10-23 23:27:20 +02:00
Djalal Harouni 4d885bd326 core: first lookup and cache creds then apply them after namespace setup
This fixes: https://github.com/systemd/systemd/issues/4357

Let's lookup and cache creds then apply them. We also switch from
getgroups() to getgrouplist().
2016-10-23 23:24:14 +02:00
Zbigniew Jędrzejewski-Szmek 605405c6cc tree-wide: drop NULL sentinel from strjoin
This makes strjoin and strjoina more similar and avoids the useless final
argument.

spatch -I . -I ./src -I ./src/basic -I ./src/basic -I ./src/shared -I ./src/shared -I ./src/network -I ./src/locale -I ./src/login -I ./src/journal -I ./src/journal -I ./src/timedate -I ./src/timesync -I ./src/nspawn -I ./src/resolve -I ./src/resolve -I ./src/systemd -I ./src/core -I ./src/core -I ./src/libudev -I ./src/udev -I ./src/udev/net -I ./src/udev -I ./src/libsystemd/sd-bus -I ./src/libsystemd/sd-event -I ./src/libsystemd/sd-login -I ./src/libsystemd/sd-netlink -I ./src/libsystemd/sd-network -I ./src/libsystemd/sd-hwdb -I ./src/libsystemd/sd-device -I ./src/libsystemd/sd-id128 -I ./src/libsystemd-network --sp-file coccinelle/strjoin.cocci --in-place $(git ls-files src/*.c)

git grep -e '\bstrjoin\b.*NULL' -l|xargs sed -i -r 's/strjoin\((.*), NULL\)/strjoin(\1)/'

This might have missed a few cases (spatch has a really hard time dealing
with _cleanup_ macros), but that's no big issue, they can always be fixed
later.
2016-10-23 11:43:27 -04:00
Luca Bruno 52c239d770 core/exec: add a named-descriptor option ("fd") for streams (#4179)
This commit adds a `fd` option to `StandardInput=`,
`StandardOutput=` and `StandardError=` properties in order to
connect standard streams to externally named descriptors provided
by some socket units.

This option looks for a file descriptor named as the corresponding
stream. Custom names can be specified, separated by a colon.
If multiple name-matches exist, the first matching fd will be used.
2016-10-17 20:05:49 -04:00
Zbigniew Jędrzejewski-Szmek 6b430fdb7c tree-wide: use mfree more 2016-10-16 23:35:39 -04:00