Commit graph

357 commits

Author SHA1 Message Date
Yu Watanabe 6c9c51e5e2 fs-util: make symlink_idempotent() optionally create relative link 2018-09-24 18:52:53 +03:00
Zbigniew Jędrzejewski-Szmek 5a72417084 pid1: drop unused path parameter to add_two_dependencies_by_name() 2018-09-15 20:02:00 +02:00
Zbigniew Jędrzejewski-Szmek 35d8c19ace pid1: drop now-unused path parameter to add_dependency_by_name() 2018-09-15 19:57:52 +02:00
Yu Watanabe 0de4876496 core/socket: fix memleak in the error paths in usbffs_dispatch_eps() 2018-09-03 14:25:08 +09:00
Yu Watanabe 8301aa0bf1 tree-wide: use DEFINE_TRIVIAL_REF_UNREF_FUNC() macro or friends where applicable 2018-08-27 14:01:46 +09:00
Zbigniew Jędrzejewski-Szmek 7692fed98b
Merge pull request #9783 from poettering/get-user-creds-flags
beef up get_user_creds() a bit and other improvements
2018-08-21 10:09:33 +02:00
Lennart Poettering fafff8f1ff user-util: rework get_user_creds()
Let's fold get_user_creds_clean() into get_user_creds(), and introduce a
flags argument for it to select "clean" behaviour. This flags parameter
also learns to other new flags:

- USER_CREDS_SYNTHESIZE_FALLBACK: in this mode the user records for
  root/nobody are only synthesized as fallback. Normally, the synthesized
  records take precedence over what is in the user database.  With this
  flag set this is reversed, and the user database takes precedence, and
  the synthesized records are only used if they are missing there. This
  flag should be set in cases where doing NSS is deemed safe, and where
  there's interest in knowing the correct shell, for example if the
  admin changed root's shell to zsh or suchlike.

- USER_CREDS_ALLOW_MISSING: if set, and a UID/GID is specified by
  numeric value, and there's no user/group record for it accept it
  anyway. This allows us to fix #9767

This then also ports all users to set the most appropriate flags.

Fixes: #9767

[zj: remove one isempty() call]
2018-08-20 15:58:21 +02:00
Filipe Brandenburger a0edd02e43 tree-wide: Convert compare_func's to use CMP() macro wherever possible.
Looked for definitions of functions using the *_compare_func() suffix.

Tested:
- Unit tests passed (ninja -C build/ test)
- Installed this build and booted with it.
2018-08-06 19:26:35 -07:00
Zbigniew Jędrzejewski-Szmek 5b316330be
Merge pull request #9624 from poettering/service-state-flush
flush out ExecStatus structures when a new service cycle begins
2018-08-02 09:50:39 +02:00
Lennart Poettering 5686391b00 core: introduce new Type=exec service type
Users are often surprised that "systemd-run" command lines like
"systemd-run -p User=idontexist /bin/true" will return successfully,
even though the logs show that the process couldn't be invoked, as the
user "idontexist" doesn't exist. This is because Type=simple will only
wait until fork() succeeded before returning start-up success.

This patch adds a new service type Type=exec, which is very similar to
Type=simple, but waits until the child process completed the execve()
before returning success. It uses a pipe that has O_CLOEXEC set for this
logic, so that the kernel automatically sends POLLHUP on it when the
execve() succeeded but leaves the pipe open if not. This means PID 1
waits exactly until the execve() succeeded in the child, and not longer
and not shorter, which is the desired functionality.

Making use of this new functionality, the command line
"systemd-run -p User=idontexist -p Type=exec /bin/true" will now fail,
as expected.
2018-07-25 22:48:11 +02:00
Lennart Poettering 6a1d4d9fa6 core: properly reset all ExecStatus structures when entering a new unit cycle
Whenever a unit is started fresh we should flush out any runtime data
from the previous cycle. We are pretty good at that already, but what so
far we missed was the ExecStart=/ExecStop=/… command exit status data.
Let's fix that, and properly flush out that stuff too.

Consider this service:

    [Service]
    ExecStart=/bin/sleep infinity
    ExecStop=/bin/false

When this service is started, then stopped and then started again
"systemctl status" would show the ExecStop= results of the previous run
along with the ExecStart= results of the current one, which is very
confusing. With this patch this is corrected: the data is kept right
until the moment the new service cycle starts, and then flushed out.
Hence "systemctl status" in that case will only show the ExecStart=
data, but no ExecStop= data, like it should be.

This should fix part of the confusion of #9588
2018-07-23 13:36:47 +02:00
Lennart Poettering ee39ca20c6 core: drop "argv" field from ExecParameter structure
We always initialize it from the same field in ExecCommand anyway, hence
there's no point in passing it separately to exec_spawn(), after all we
already pass the ExecCommand structure itself anyway.

No change in behaviour.
2018-07-23 13:36:47 +02:00
Lennart Poettering 0c69794138 tree-wide: remove Lennart's copyright lines
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
2018-06-14 10:20:20 +02:00
Lennart Poettering 818bf54632 tree-wide: drop 'This file is part of systemd' blurb
This part of the copyright blurb stems from the GPL use recommendations:

https://www.gnu.org/licenses/gpl-howto.en.html

The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.

hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
2018-06-14 10:20:20 +02:00
Zbigniew Jędrzejewski-Szmek bbac65bcc2
Merge pull request #9157 from poettering/unit-config-load-error
introduce a new "bad-setting" unit load state in order to improve "systemctl status" output when bad settings are used
2018-06-11 14:37:10 +02:00
xginn8 a98f7575ae Add counter for socket unit refuse events (#9217)
core: add counter for socket unit rejection events
2018-06-11 12:56:26 +02:00
Lennart Poettering 6f40aa4547 core: add a couple of more error cases that should result in "bad-setting"
This changes a number of EINVAL cases to ENOEXEC, so that we enter
"bad-setting" state if they fail.
2018-06-11 12:53:12 +02:00
Zbigniew Jędrzejewski-Szmek 79e221d078
Merge pull request #9158 from poettering/notify-auto-reload
trigger OnFailure= only if Restart= is not in effect
2018-06-05 13:51:07 +02:00
Yu Watanabe 858d36c1ec path-util: introduce path_simplify()
The function is similar to path_kill_slashes() but also removes
initial './', trailing '/.', and '/./' in the path.
When the second argument of path_simplify() is false, then it
behaves as the same as path_kill_slashes(). Hence, this also
replaces path_kill_slashes() with path_simplify().
2018-06-03 23:39:26 +09:00
Lennart Poettering 2ad2e41a72 core: don't trigger OnFailure= deps when a unit is going to restart
This adds a flags parameter to unit_notify() which can be used to pass
additional notification information to the function. We the make the old
reload_failure boolean parameter one of these flags, and then add a new
flag that let's unit_notify() if we are configured to restart the
service.

Note that this adjusts behaviour of systemd to match what the docs say.

Fixes: #8398
2018-06-01 19:08:30 +02:00
Yu Watanabe 7be9df7d4a core/socket: fix coding style 2018-05-31 11:09:41 +09:00
Zbigniew Jędrzejewski-Szmek 7994ac1d85 Rename ratelimit_test to ratelimit_below
When I see "test", I have to think three times what the return value
means. With "below" this is immediately clear. ratelimit_below(&limit)
sounds almost like English and is imho immediately obvious.

(I also considered ratelimit_ok, but this strongly implies that being under the
limit is somehow better. Most of the times this is true, but then we use the
ratelimit to detect triple-c-a-d, and "ok" doesn't fit so well there.)

C.f. a1bcaa07.
2018-05-13 22:08:30 +02:00
Lennart Poettering da6053d0a7 tree-wide: be more careful with the type of array sizes
Previously we were a bit sloppy with the index and size types of arrays,
we'd regularly use unsigned. While I don't think this ever resulted in
real issues I think we should be more careful there and follow a
stricter regime: unless there's a strong reason not to use size_t for
array sizes and indexes, size_t it should be. Any allocations we do
ultimately will use size_t anyway, and converting forth and back between
unsigned and size_t will always be a source of problems.

Note that on 32bit machines "unsigned" and "size_t" are equivalent, and
on 64bit machines our arrays shouldn't grow that large anyway, and if
they do we have a problem, however that kind of overly large allocation
we have protections for usually, but for overflows we do not have that
so much, hence let's add it.

So yeah, it's a story of the current code being already "good enough",
but I think some extra type hygiene is better.

This patch tries to be comprehensive, but it probably isn't and I missed
a few cases. But I guess we can cover that later as we notice it. Among
smaller fixes, this changes:

1. strv_length()' return type becomes size_t

2. the unit file changes array size becomes size_t

3. DNS answer and query array sizes become size_t

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=76745
2018-04-27 14:29:06 +02:00
Zbigniew Jędrzejewski-Szmek 11a1589223 tree-wide: drop license boilerplate
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.

I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
2018-04-06 18:58:55 +02:00
Yu Watanabe 1cc6c93a95 tree-wide: use TAKE_PTR() and TAKE_FD() macros 2018-04-05 14:26:26 +09:00
Filipe Brandenburger 2ef044ea1e core/socket: use chase_symlinks to find binary inside chroot when looking for SELinux label (#8591)
This is a follow up for this comment from @poettering:
https://github.com/systemd/systemd/pull/8405#discussion_r175719214

This updates PR #8405.

Tested manually using the same commands in
https://lists.freedesktop.org/archives/systemd-devel/2018-March/040478.html.
2018-03-28 09:00:42 +02:00
Lennart Poettering c10d6bdb89 macro: introduce new TAKE_FD() macro
This is similar to TAKE_PTR() but operates on file descriptors, and thus
assigns -1 to the fd parameter after returning it.

Removes 60 lines from our codebase. Pretty good too I think.
2018-03-22 20:30:40 +01:00
Filipe Brandenburger 416be1a03b core/socket: support binary inside chroot when looking for SELinux label (#8405)
Otherwise having a .socket unit start a .service running a binary under
a chroot fails as the unit is unable to determine the SELinux label of
the binary.
2018-03-09 12:20:56 +01:00
Lennart Poettering 2ae7ee58fa bpf: beef up bpf detection, check if BPF_F_ALLOW_MULTI is supported
This improves the BPF/cgroup detection logic, and looks whether
BPF_ALLOW_MULTI is supported. This flag allows execution of multiple
BPF filters in a recursive fashion for a whole cgroup tree. It enables
us to properly report IP accounting for slice units, as well as
delegation of BPF support to units without breaking our own IP
accounting.
2018-02-21 16:43:36 +01:00
Lennart Poettering a94ab7acfd
Merge pull request #8175 from keszybz/gc-cleanup
Garbage collection cleanup
2018-02-15 17:47:37 +01:00
Zbigniew Jędrzejewski-Szmek 7f7d01ed58 pid1: include the source unit in UnitRef
No functional change.

The source unit manages the reference. It allocates the UnitRef structure and
registers it in the target unit, and then the reference must be destroyed
before the source unit is destroyed. Thus, is should be OK to include the
pointer to the source unit, it should be live as long as the reference exists.

v2:
- rename refs to refs_by_target
2018-02-15 13:27:06 +01:00
Zbigniew Jędrzejewski-Szmek f2f725e5cc pid1: rename unit_check_gc to unit_may_gc
"check" is unclear: what is true, what is false? Let's rename to "can_gc" and
revert the return value ("positive" values are easier to grok).

v2:
- rename from unit_can_gc to unit_may_gc
2018-02-15 13:04:12 +01:00
Lennart Poettering 004c7f169e core: fold manager_set_exec_params() into unit_set_exec_params()
Let's simplify things a bit: we so far called both functions every
single time, let's just merge one into the other, so that we have fewer
functions to call.
2018-02-12 11:34:00 +01:00
Yu Watanabe e8a565cb66 core: make ExecRuntime be manager managed object
Before this, each ExecRuntime object is owned by a unit. However,
it may be shared with other units which enable JoinsNamespaceOf=.
Thus, by the serialization/deserialization process, its sharing
information, more specifically, reference counter is lost, and
causes issue #7790.

This makes ExecRuntime objects be managed by manager, and changes
the serialization/deserialization process.

Fixes #7790.
2018-02-06 16:00:34 +09:00
Lennart Poettering d2e0ac3d1e tree-wide: unify the process name we pass to wait_for_terminate_and_check() with the one we pass to safe_fork() 2018-01-04 13:27:27 +01:00
Lennart Poettering 7d4904fe7a process-util: rework wait_for_terminate_and_warn() to take a flags parameter
This renames wait_for_terminate_and_warn() to
wait_for_terminate_and_check(), and adds a flags parameter, that
controls how much to log: there's one flag that means we log about
abnormal stuff, and another one that controls whether we log about
non-zero exit codes. Finally, there's a shortcut flag value for logging
in both cases, as that's what we usually use.

All callers are accordingly updated. At three occasions duplicate logging
is removed, i.e. where the old function was called but logged in the
caller, too.
2018-01-04 13:27:27 +01:00
Lennart Poettering 4c253ed1ca tree-wide: introduce new safe_fork() helper and port everything over
This adds a new safe_fork() wrapper around fork() and makes use of it
everywhere. The new wrapper does a couple of things we previously did
manually and separately in a safer, more correct and automatic way:

1. Optionally resets signal handlers/mask in the child

2. Sets a name on all processes we fork off right after forking off (and
   the patch assigns useful names for all processes we fork off now,
   following a systematic naming scheme: always enclosed in () – in order
   to indicate that these are not proper, exec()ed processes, but only
   forked off children, and if the process is long-running with only our
   own code, without execve()'ing something else, it gets am "sd-" prefix.)

3. Optionally closes all file descriptors in the child

4. Optionally sets a PR_SET_DEATHSIG to SIGTERM in the child, in a safe
   way so that the parent dying before this happens being handled
   safely.

5. Optionally reopens the logs

6. Optionally connects stdin/stdout/stderr to /dev/null

7. Debug logs about the forked off processes.
2017-12-25 11:48:21 +01:00
Yu Watanabe 845001221d core/socket: shorten socket_fdname() 2017-12-23 19:32:40 +09:00
Yu Watanabe 827d9bf297 core/socket: dump more settings 2017-12-23 19:32:38 +09:00
Yu Watanabe e045e325df basic: introduce socket_protocol_{from,to}_name()
And use them where they can be applicable.
2017-12-23 19:32:04 +09:00
Yu Watanabe 9c0320e7ab core: implement transient socket unit 2017-12-23 18:47:33 +09:00
Yu Watanabe 038ed5a4b6 core/socket: add socket_port_type_from_string() 2017-12-23 18:46:16 +09:00
Yu Watanabe 836bb1cd42 core:socket: fix string in socket_exec_command_table 2017-12-23 18:45:59 +09:00
Lennart Poettering a4634b214c core: warn about left-over processes in cgroup on unit start
Now that we don't kill control processes anymore, let's at least warn
about any processes left-over in the unit cgroup at the moment of
starting the unit.
2017-11-25 17:08:21 +01:00
Zbigniew Jędrzejewski-Szmek ffb70e4424
Merge pull request #7381 from poettering/cgroup-unified-delegate-rework
Fix delegation in the unified hierarchy + more cgroup work
2017-11-22 07:42:08 +01:00
Lennart Poettering 3c7416b6ca core: unify common code for preparing for forking off unit processes
This introduces a new function unit_prepare_exec() that encapsulates a
number of calls we do in preparation for spawning off some processes in
all our unit types that do so.

This allows us to neatly unify a bit of code between unit types and
shorten our code.
2017-11-21 11:54:08 +01:00
Shawn Landden 4831981d89 tree-wide: adjust fall through comments so that gcc is happy
Distcc removes comments, making the comment silencing
not work.

I know there was a decision against a macro in commit
ec251fe7d5
2017-11-20 13:06:25 -08:00
Zbigniew Jędrzejewski-Szmek 53e1b68390 Add SPDX license identifiers to source files under the LGPL
This follows what the kernel is doing, c.f.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.
2017-11-19 19:08:15 +01:00
Lennart Poettering d3070fbdf6 core: implement /run/systemd/units/-based path for passing unit info from PID 1 to journald
And let's make use of it to implement two new unit settings with it:

1. LogLevelMax= is a new per-unit setting that may be used to configure
   log priority filtering: set it to LogLevelMax=notice and only
   messages of level "notice" and lower (i.e. more important) will be
   processed, all others are dropped.

2. LogExtraFields= is a new per-unit setting for configuring per-unit
   journal fields, that are implicitly included in every log record
   generated by the unit's processes. It takes field/value pairs in the
   form of FOO=BAR.

Also, related to this, one exisiting unit setting is ported to this new
facility:

3. The invocation ID is now pulled from /run/systemd/units/ instead of
   cgroupfs xattrs. This substantially relaxes requirements of systemd
   on the kernel version and the privileges it runs with (specifically,
   cgroupfs xattrs are not available in containers, since they are
   stored in kernel memory, and hence are unsafe to permit to lesser
   privileged code).

/run/systemd/units/ is a new directory, which contains a number of files
and symlinks encoding the above information. PID 1 creates and manages
these files, and journald reads them from there.

Note that this is supposed to be a direct path between PID 1 and the
journal only, due to the special runtime environment the journal runs
in. Normally, today we shouldn't introduce new interfaces that (mis-)use
a file system as IPC framework, and instead just an IPC system, but this
is very hard to do between the journal and PID 1, as long as the IPC
system is a subject PID 1 manages, and itself a client to the journal.

This patch cleans up a couple of types used in journal code:
specifically we switch to size_t for a couple of memory-sizing values,
as size_t is the right choice for everything that is memory.

Fixes: #4089
Fixes: #3041
Fixes: #4441
2017-11-16 12:40:17 +01:00
Lennart Poettering eef85c4a3f core: track why unit dependencies came to be
This replaces the dependencies Set* objects by Hashmap* objects, where
the key is the depending Unit, and the value is a bitmask encoding why
the specific dependency was created.

The bitmask contains a number of different, defined bits, that indicate
why dependencies exist, for example whether they are created due to
explicitly configured deps in files, by udev rules or implicitly.

Note that memory usage is not increased by this change, even though we
store more information, as we manage to encode the bit mask inside the
value pointer each Hashmap entry contains.

Why this all? When we know how a dependency came to be, we can update
dependencies correctly when a configuration source changes but others
are left unaltered. Specifically:

1. We can fix UDEV_WANTS dependency generation: so far we kept adding
   dependencies configured that way, but if a device lost such a
   dependency we couldn't them again as there was no scheme for removing
   of dependencies in place.

2. We can implement "pin-pointed" reload of unit files. If we know what
   dependencies were created as result of configuration in a unit file,
   then we know what to flush out when we want to reload it.

3. It's useful for debugging: "systemd-analyze dump" now shows
   this information, helping substantially with understanding how
   systemd's dependency tree came to be the way it came to be.
2017-11-10 19:45:29 +01:00
Yu Watanabe 4c70109600 tree-wide: use IN_SET macro (#6977) 2017-10-04 16:01:32 +02:00
Andreas Rammhold ec2ce0c5d7
tree-wide: use !IN_SET(..) for a != b && a != c && …
The included cocci was used to generate the changes.

Thanks to @flo-wer for pointing this case out.
2017-10-02 13:09:56 +02:00
Andreas Rammhold 3742095b27
tree-wide: use IN_SET where possible
In addition to the changes from #6933 this handles cases that could be
matched with the included cocci file.
2017-10-02 13:09:54 +02:00
Zbigniew Jędrzejewski-Szmek 9500b9209b Merge pull request #6928 from poettering/cgroup-empty-race
rework cgroup empty notification handling (i.e. a fix for #6608)
2017-09-28 08:48:21 +02:00
Lennart Poettering ed77d407d3 core: log unit failure with type-specific result code
This slightly changes how we log about failures. Previously,
service_enter_dead() would log that a service unit failed along with its
result code, and unit_notify() would do this again but without the
result code. For other unit types only the latter would take effect.

This cleans this up: we keep the message in unit_notify() only for debug
purposes, and add type-specific log lines to all our unit types that can
fail, and always place them before unit_notify() is invoked.

Or in other words: the duplicate log message for service units is
removed, and all other unit types get a more useful line with the
precise result code.
2017-09-27 18:26:18 +02:00
Lennart Poettering 22b20752e2 socket: if RemoveOnStop= is turned on for a socket, try to unlink() pre-existing symlinks
Normally, Symlinks= failing is not considered fatal nor destructive.
Let's slightly alter behaviour here if RemoveOnStop= is turned on. In
that case the use in a way opted for destructive behaviour and we do
unlink all sockets and symlinks when the socket unit goes down. And that
means we might as well unlink any pre-existing if this mode is selected.

Yeah, it's a bit of a stretch to do this, but @OhNoMoreGit is right: if
RemoveOnStop= is on we are destructive regarding any pre-existing
symlinks on stop, and it would be quite weird if we wouldn't be on
start.
2017-09-27 17:53:00 +02:00
Lennart Poettering 1af87ab7d6 socket: create leading directories for socket symlinks
It really doesn't hurt creating prefix directories if necessary, as we
tend to do that for other file nodes we create, too.

Fixes: #6920
2017-09-27 17:53:00 +02:00
Lennart Poettering 95f7fbbf88 socket: make sure we warn loudly about symlinks we can't create
Note that this change does not make symlink creation failing fatal. I am
not entirely sure about whether it should be, but I am leaning towards
not making it fatal for two reasons: symlinks like this tend to be a
compatibility feature, and hence unlikely to be essential for operation,
in a way this breaks compatibility, and while doing that is not off the
table, we should probably avoid it if we are not entirely sure it's a
good thing.

Note that this also changes plain symlink() to symlink_idempotent() so
that existing symlinks with the right destination are nothing we log
about.

Fixes: #6920
2017-09-27 17:53:00 +02:00
Jan Synacek 0cde65e263 test-cpu-set-util.c: fix typo in comment (#6916) 2017-09-26 16:07:34 +02:00
Lennart Poettering 88af31f922 socket: assign socket units to a default slice unconditionally
Due to the chown() logic socket units might end up with processes even
if no explicit command is defined for them, hence let's make sure these
processes are in the right cgroup, and that means within a slice.

Mount, swap and service units unconditionally are assigned to a slice
already, let's do the same here, too.

(This becomes more important as soon as the ebpf/firewall stuff is
merged, as there'll be another reason to fork off processes then)
2017-09-22 20:09:21 +02:00
Lennart Poettering a79279c7fd core: when creating the socket fds for a socket unit, join socket's cgroup first
Let's make sure that a socket unit's IPAddressAllow=/IPAddressDeny=
settings are in effect on all socket fds associated with it. In order to
make this happen we need to make sure the cgroup the fds are associated
with are the socket unit's cgroup. The only way to do that is invoking
socket()+accept() in them. Since we really don't want to migrate PID 1
around we do this by forking off a helper process, which invokes
socket()/accept() and sends the newly created fd to PID 1. Ugly, but
works, and there's apparently no better way right now.

This generalizes forking off per-unit helper processes in a new function
unit_fork_helper_process(), which is then also used by the NSS chown()
code of socket units.
2017-09-22 15:24:55 +02:00
Daniel Mack 906c06f64a cgroup, unit, fragment parser: make use of new firewall functions 2017-09-22 15:24:55 +02:00
Lennart Poettering 18f573aaf9 core: make sure to dump cgroup context when unit_dump() is called for all unit types
For some reason we didn't dump the cgroup context for a number of unit
types, including service units. Not sure how this wasn't noticed
before... Add this in.
2017-09-22 15:24:54 +02:00
Lennart Poettering 1703fa41a7 core: rename EXEC_APPLY_PERMISSIONS → EXEC_APPLY_SANDBOXING
"Permissions" was a bit of a misnomer, as it suggests that UNIX file
permission bits are adjusted, which aren't really changed here. Instead,
this is about UNIX credentials such as users or groups, as well as
namespacing, hence let's use a more generic term here, without any
misleading reference to UNIX file permissions: "sandboxing", which shall
refer to all kinds of sandboxing technologies, including UID/GID
dropping, selinux relabelling, namespacing, seccomp, and so on.
2017-08-10 15:02:50 +02:00
Lennart Poettering f0d477979e core: introduce unit_set_exec_params()
The new unit_set_exec_params() call is to units what
manager_set_exec_params() is to the manager object: it initializes the
various fields from the relevant generic properties set.
2017-08-10 15:02:50 +02:00
Lennart Poettering 19bbdd985e core: manager_set_exec_params() cannot fail, hence make it void
Let's simplify things a bit.
2017-08-10 15:02:50 +02:00
Lennart Poettering 584b8688d1 execute: also fold the cgroup delegate bit into ExecFlags 2017-08-10 15:02:50 +02:00
Lennart Poettering 3ed0cd26ea execute: replace command flag bools by a flags field
This way, we can extend it later on in an easier way, and can pass it
along nicely.
2017-08-10 14:44:58 +02:00
Yu Watanabe 3536f49e8f core: add {State,Cache,Log,Configuration}Directory= (#6384)
This introduces {State,Cache,Log,Configuration}Directory= those are
similar to RuntimeDirectory=. They create the directories under
/var/lib, /var/cache/, /var/log, or /etc, respectively, with the mode
specified in {State,Cache,Log,Configuration}DirectoryMode=.

This also fixes #6391.
2017-07-18 14:34:52 +02:00
Zbigniew Jędrzejewski-Szmek e3f791a2b3 basic/path-util: allow flags for path_equal_or_files_same
No functional change, just a new parameters and the tests that
AT_SYMLINK_NOFOLLOW works as expected.
2017-06-17 12:37:16 -04:00
AsciiWolf 13e785f7a0 Fix missing space in comments (#5439) 2017-02-24 18:14:02 +01:00
Lennart Poettering 1c876927e4 copy: change the various copy_xyz() calls to take a unified flags parameter
This adds a unified "copy_flags" parameter to all copy_xyz() function
calls, replacing the various boolean flags so far used. This should make
many invocations more readable as it is clear what behaviour is
precisely requested. This also prepares ground for adding support for
more modes later on.
2017-02-17 10:22:28 +01:00
Zbigniew Jędrzejewski-Szmek ec251fe7d5 tree-wide: adjust fall through comments so that gcc is happy
gcc 7 adds -Wimplicit-fallthrough=3 to -Wextra. There are a few ways
we could deal with that. After we take into account the need to stay compatible
with older versions of the compiler (and other compilers), I don't think adding
__attribute__((fallthrough)), even as a macro, is worth the trouble. It sticks
out too much, a comment is just as good. But gcc has some very specific
requiremnts how the comment should look. Adjust it the specific form that it
likes. I don't think the extra stuff we had in those comments was adding much
value.

(Note: the documentation seems to be wrong, and seems to describe a different
pattern from the one that is actually used. I guess either the docs or the code
will have to change before gcc 7 is finalized.)
2017-01-31 14:04:55 -05:00
Zbigniew Jędrzejewski-Szmek da3bddc993 core: add missing "=" in message
For consistency. Also drop "e.g." because it's somewhat redundant with the
ellipsis and the message is pretty long already.

Follow-up for 4d1fe20a58.
2017-01-11 16:37:34 -05:00
Stefan Hajnoczi 359a5bcf78 core: add AF_VSOCK support to socket units
Accept AF_VSOCK listen addresses in socket unit files.  Both guest and
host can now take advantage of socket activation.

The QEMU guest agent has recently been modified to support socket
activation and can run over AF_VSOCK with this patch.
2017-01-10 15:29:04 +00:00
Lennart Poettering 41733ae1e0 core: fix sockaddr length calculation for sockaddr_pretty() (#4966)
Let's simply store the socket address length in the SocketPeer object so
that we can use it when invoking sockaddr_pretty():

This fixes the issue described in #4943, but avoids calling
getpeername() twice.
2016-12-29 11:21:37 +01:00
Lennart Poettering 4d1fe20a58 core: improve log message about missing Listen setting (#4988)
Fixes: #4987
2016-12-29 10:39:30 +01:00
Stefan Hajnoczi b9495e8d58 core: prevent invalid socket symlink target dereference (#4895)
socket_find_symlink_target() returns a pointer to
p->address.sockaddr.un.sun_path when the first byte is non-zero without
checking that this is AF_UNIX socket.  Since sockaddr is a union this
byte could be non-zero for AF_INET sockets.

Existing callers happen to be safe but is an accident waiting to happen.
Use socket_address_get_path() since it checks for AF_UNIX.
2016-12-16 11:20:27 +01:00
Lennart Poettering 5125e76243 core: move specifier expansion out of service.c/socket.c
This monopolizes unit file specifier expansion in load-fragment.c, and removes
it from socket.c + service.c. This way expansion becomes an operation done exclusively at time of loading unit files.

Previously specifiers were resolved for all settings during loading of unit
files with the exception of ExecStart= and friends which were resolved in
socket.c and service.c. With this change the latter is also moved to the
loading of unit files.

Fixes: #3061
2016-12-07 18:47:32 +01:00
Franck Bui 7d5ceb6416 core: allow to redirect confirmation messages to a different console
It's rather hard to parse the confirmation messages (enabled with
systemd.confirm_spawn=true) amongst the status messages and the kernel
ones (if enabled).

This patch gives the possibility to the user to redirect the confirmation
message to a different virtual console, either by giving its name or its path,
so those messages are separated from the other ones and easier to read.
2016-11-17 18:16:16 +01:00
Zbigniew Jędrzejewski-Szmek f97b34a629 Rename formats-util.h to format-util.h
We don't have plural in the name of any other -util files and this
inconsistency trips me up every time I try to type this file name
from memory. "formats-util" is even hard to pronounce.
2016-11-07 10:15:08 -05:00
Zbigniew Jędrzejewski-Szmek b744e8937c Merge pull request #4067 from poettering/invocation-id
Add an "invocation ID" concept to the service manager
2016-10-11 13:40:50 -04:00
Lennart Poettering 1f0958f640 core: when determining whether a process exit status is clean, consider whether it is a command or a daemon
SIGTERM should be considered a clean exit code for daemons (i.e. long-running
processes, as a daemon without SIGTERM handler may be shut down without issues
via SIGTERM still) while it should not be considered a clean exit code for
commands (i.e. short-running processes).

Let's add two different clean checking modes for this, and use the right one at
the appropriate places.

Fixes: #4275
2016-10-10 22:57:01 +02:00
Lennart Poettering 4b58153dd2 core: add "invocation ID" concept to service manager
This adds a new invocation ID concept to the service manager. The invocation ID
identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is
generated each time a unit moves from and inactive to an activating or active
state.

The primary usecase for this concept is to connect the runtime data PID 1
maintains about a service with the offline data the journal stores about it.
Previously we'd use the unit name plus start/stop times, which however is
highly racy since the journal will generally process log data after the service
already ended.

The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel,
except that it applies to an individual unit instead of the whole system.

The invocation ID is passed to the activated processes as environment variable.
It is additionally stored as extended attribute on the cgroup of the unit. The
latter is used by journald to automatically retrieve it for each log logged
message and attach it to the log entry. The environment variable is very easily
accessible, even for unprivileged services. OTOH the extended attribute is only
accessible to privileged processes (this is because cgroupfs only supports the
"trusted." xattr namespace, not "user."). The environment variable may be
altered by services, the extended attribute may not be, hence is the better
choice for the journal.

Note that reading the invocation ID off the extended attribute from journald is
racy, similar to the way reading the unit name for a logging process is.

This patch adds APIs to read the invocation ID to sd-id128:
sd_id128_get_invocation() may be used in a similar fashion to
sd_id128_get_boot().

PID1's own logging is updated to always include the invocation ID when it logs
information about a unit.

A new bus call GetUnitByInvocationID() is added that allows retrieving a bus
path to a unit by its invocation ID. The bus path is built using the invocation
ID, thus providing a path for referring to a unit that is valid only for the
current runtime cycleof it.

Outlook for the future: should the kernel eventually allow passing of cgroup
information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we
can alter the invocation ID to be generated as hash from that rather than
entirely randomly. This way we can derive the invocation race-freely from the
messages.
2016-10-07 20:14:38 +02:00
Paweł Szewczyk 00bb64ecfa core: Fix USB functionfs activation and clarify its documentation (#4188)
There was no certainty about how the path in service file should look
like for usb functionfs activation. Because of this it was treated
differently in different places, which made this feature unusable.

This patch fixes the path to be the *mount directory* of functionfs, not
ep0 file path and clarifies in the documentation that ListenUSBFunction should be
the location of functionfs mount point, not ep0 file itself.
2016-09-26 18:45:47 +02:00
Lennart Poettering 00d9ef8560 core: add RemoveIPC= setting
This adds the boolean RemoveIPC= setting to service, socket, mount and swap
units (i.e.  all unit types that may invoke processes). if turned on, and the
unit's user/group is not root, all IPC objects of the user/group are removed
when the service is shut down. The life-cycle of the IPC objects is hence bound
to the unit life-cycle.

This is particularly relevant for units with dynamic users, as it is essential
that no objects owned by the dynamic users survive the service exiting. In
fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set.

In order to communicate the UID/GID of an executed process back to PID 1 this
adds a new "user lookup" socket pair, that is inherited into the forked
processes, and closed before the exec(). This is needed since we cannot do NSS
from PID 1 due to deadlock risks, However need to know the used UID/GID in
order to clean up IPC owned by it if the unit shuts down.
2016-08-19 00:37:25 +02:00
Zbigniew Jędrzejewski-Szmek 3bb81a80bd Merge pull request #3818 from poettering/exit-status-env
beef up /var/tmp and /tmp handling; set $SERVICE_RESULT/$EXIT_CODE/$EXIT_STATUS on ExecStop= and make sure root/nobody are always resolvable
2016-08-05 20:55:08 -04:00
Zbigniew Jędrzejewski-Szmek 80a58668d9 socket: add helper function to remove code duplication 2016-08-05 08:24:00 -04:00
Zbigniew Jędrzejewski-Szmek ea8f50f808 core/socket: include remote address in the message when dropping connection
Without the address the message is not very useful.

Aug 04 23:52:21 rawhide systemd[1]: testlimit.socket: Too many incoming connections (4) from source ::1, dropping connection.
2016-08-05 08:16:31 -04:00
Zbigniew Jędrzejewski-Szmek 3ebcd323bd systemd: do not serialize peer, bump count when deserializing socket instead 2016-08-05 08:16:31 -04:00
Zbigniew Jędrzejewski-Szmek 166cf510c2 core/socket: rework SocketPeer refcounting
Make functions and definitions that don't need to be shared local to
socket.c.
2016-08-05 08:12:31 -04:00
Zbigniew Jędrzejewski-Szmek 9a73653c3e systemd: convert peers_by_address to a set 2016-08-04 23:53:07 -04:00
Lennart Poettering a0fef983ab core: remember first unit failure, not last unit failure
Previously, the result value of a unit was overriden with each failure that
took place, so that the result always reported the last failure that took
place.

With this commit this is changed, so that the first failure taking place is
stored instead. This should normally not matter much as multiple failures are
sufficiently uncommon. However, it improves one behaviour: if we send SIGABRT
to a service due to a watchdog timeout, then this currently would be reported
as "coredump" failure, rather than the "watchodg" failure it really is. Hence,
in order to report information about the type of the failure, and not about
the effect of it, let's change this from all unit type to store the first, not
the last failure.

This addresses the issue pointed out here:

https://github.com/systemd/systemd/pull/3818#discussion_r73433520
2016-08-04 23:08:05 +02:00
Lennart Poettering c39f1ce24d core: turn various execution flags into a proper flags parameter
The ExecParameters structure contains a number of bit-flags, that were so far
exposed as bool:1, change this to a proper, single binary bit flag field. This
makes things a bit more expressive, and is helpful as we add more flags, since
these booleans are passed around in various callers, for example
service_spawn(), whose signature can be made much shorter now.

Not all bit booleans from ExecParameters are moved into the flags field for
now, but this can be added later.
2016-08-04 16:27:07 +02:00
Susant Sahani 9d56542764 socket: add support to control no. of connections from one source (#3607)
Introduce MaxConnectionsPerSource= that is number of concurrent
connections allowed per IP.

RFE: 1939
2016-08-02 13:48:23 -04:00
Lennart Poettering 29206d4619 core: add a concept of "dynamic" user ids, that are allocated as long as a service is running
This adds a new boolean setting DynamicUser= to service files. If set, a new
user will be allocated dynamically when the unit is started, and released when
it is stopped. The user ID is allocated from the range 61184..65519. The user
will not be added to /etc/passwd (but an NSS module to be added later should
make it show up in getent passwd).

For now, care should be taken that the service writes no files to disk, since
this might result in files owned by UIDs that might get assigned dynamically to
a different service later on. Later patches will tighten sandboxing in order to
ensure that this cannot happen, except for a few selected directories.

A simple way to test this is:

        systemd-run -p DynamicUser=1 /bin/sleep 99999
2016-07-22 15:53:45 +02:00
Lennart Poettering 8e38570ebe tree-wide: htonl() is weird, let's use htobe32() instead (#3538)
Super-important change, yeah!
2016-06-15 01:26:01 +02:00
Martin Pitt d75103d4c6 Merge pull request #3202 from poettering/socket-fixes
don't reopen socket fds when reloading the daemon
2016-05-08 21:09:35 +02:00
Evgeny Vereshchagin 1745fa70e7 core: dump TriggerLimitIntervalSec and TriggerLimitBurst too 2016-05-06 21:03:16 +00:00
Lennart Poettering 60d9771c59 core: rework how we flush incoming traffic when a socket unit goes down
Previously, we'd simply close and reopen the socket file descriptors. This is
problematic however, as we won't transition through the SOCKET_CHOWN state
then, and thus the file ownership won't be correct for the sockets.

Rework the flushing logic, and actually read any queued data from the sockets
for flushing, and accept any queued messages and disconnect them.
2016-05-06 13:29:26 +02:00