Commit graph

30229 commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek e3f46367f5 test-conf-parser: add some basic tests for config_parse()
This function is pretty important, but we weren't calling it directly
even once in tests.

v2: add a few tests for escaping and line continuations
2017-09-23 11:08:57 +02:00
Lennart Poettering 5a89faf0e0 fileio: initialize errno to zero before we do fread()
if there was something in the read buffer already errno might not be set
on error, let's detect that case.
2017-09-22 21:05:03 +02:00
Lennart Poettering ff0e7e05c9 fileio: try to read one byte too much in read_full_stream()
Let's read one byte more than the file size we read from stat() on the
first fread() invocation. That way, the first read() will already be
short and indicate eof to fread().

This is a minor optimization, and replaces #3908.
2017-09-22 21:03:33 +02:00
Lennart Poettering 9dd1b1e869 fileio: move fsync() logic into write_string_stream_ts()
That way, write_string_stream_ts() becomes more powerful, and we can
remove duplicate code from  write_string_file_atomic() and
write_string_file_ts().
2017-09-22 20:59:39 +02:00
Lennart Poettering b183713383 fileio: make write_string_stream() accept flags parameter
Let's make write_string_stream() and write_string_file() more alike, and
pass the same flag set so that we can remove a number of boolean
parameters.
2017-09-22 20:55:34 +02:00
Lennart Poettering 2eabcc772b fileio: support writing atomic files with timestamp
Let's make sure "ts" is taken into account when writing atomic files,
too.
2017-09-22 20:45:06 +02:00
Lennart Poettering 2351e44d3e cgroup-util: replace one use of fgets() by read_line() 2017-09-22 20:34:15 +02:00
Lennart Poettering f4b51a2d09 fileio: rework read_one_line_file() on top of read_line() 2017-09-22 20:34:15 +02:00
Lennart Poettering 189912440f def: add new constant LONG_LINE_MAX
LONG_LINE_MAX is much like LINE_MAX, but longer.

As it turns out LINE_MAX at 4096 is too short for many usecases. Since
the general concept of having a common maximum line length limit makes
sense let's add our own, and make it larger (1MB for now).
2017-09-22 20:34:15 +02:00
Lennart Poettering 4f9a66a32d fileio: add new helper call read_line() as bounded getline() replacement
read_line() is much like getline(), and returns a line read from a
FILE*, of arbitrary sizes. In contrast to gets() it will grow the buffer
dynamically, and in contrast to getline() it will place a user-specified
boundary on the line.
2017-09-22 20:34:15 +02:00
Lennart Poettering 88af31f922 socket: assign socket units to a default slice unconditionally
Due to the chown() logic socket units might end up with processes even
if no explicit command is defined for them, hence let's make sure these
processes are in the right cgroup, and that means within a slice.

Mount, swap and service units unconditionally are assigned to a slice
already, let's do the same here, too.

(This becomes more important as soon as the ebpf/firewall stuff is
merged, as there'll be another reason to fork off processes then)
2017-09-22 20:09:21 +02:00
Lennart Poettering 7960b0c704 cgroup: make use of unit_cgroup_delegate() where useful
It's an easy-to-use wrapper, so let's take benefit of it.
2017-09-22 20:02:23 +02:00
Lennart Poettering 40853aa53f cgroup: rework which files we chown() on delegation
On cgroupsv2 we should also chown()/chmod() the subtree_control file,
so that children can use controllers the way they like.

On cgroupsv1 we should also chown()/chmod() cgroups.clone_children, as
not setting this for new cgroups makes little sense, and hence delegated
clients should be able to write to it.

Note that error handling for both cases is different. subtree_control
matters so we check for errors, but the clone_children/tasks stuff
doesn't really, as it's legacy stuff. Hence we only log errors and
proceed.

Fixes: #6216
2017-09-22 20:00:53 +02:00
Lennart Poettering 5beac75e44 cgroup-util: downgrade log messages from library code to LOG_DEBUG
These errors don't really matter, that's why we log and proceed in the
current code. However, we currently log at LOG_WARNING, but we really
shouldn't given that this is library code. Hence downgrade this to
LOG_DEBUG.
2017-09-22 19:57:07 +02:00
John Lin a195dd8e5a man: Requires= needs After= to deactivate "this unit" (#6869)
Fixes: #6856
2017-09-22 19:15:28 +02:00
Lennart Poettering 2b0ba1a417 Merge pull request #6879 from marcelhollerbach/testsuite-fix
time-util: testsuite fix
2017-09-22 18:47:59 +02:00
Zbigniew Jędrzejewski-Szmek d2561cfdf7 install: consider globally enabled units as "enabled" for the user
We would not consider symlinks in /etc/systemd/user/*.{wants,requires}/
towards the user unit being "enabled", because the symlinks were not
located in "config" paths. But this is confusing to users, since those units
are clearly enabled and will be started. So let's muddle the definition of
enablement a bit to include the paths only accessible to root when looking for
enabled user units.

Fixes #4432.
2017-09-22 18:40:26 +02:00
Zbigniew Jędrzejewski-Szmek d9b4b48f3f install: consider non-Alias=/non-DefaultInstance= symlinks as "indirect" enablement
I think this matches the spirit of "indirect" well: the unit
*might* be active, even though it is not "installed" in the
sense of symlinks created based on the [Install] section.

The changes to test-install-root touch the same lines as in the previous
commit; the change in each case is from
   assert_se(unit_file_get_state(...) >= 0 && state == UNIT_FILE_ENABLED)
to
   assert_se(unit_file_get_state(...) >= 0 && state == UNIT_FILE_DISABLED)
to
   assert_se(unit_file_get_state(...) >= 0 && state == UNIT_FILE_INDIRECT)
in the last two commits.
2017-09-22 18:23:02 +02:00
Zbigniew Jędrzejewski-Szmek 5cd8ae3152 install: only consider names in Alias= as "enabling"
When a unit has a symlink that makes an alias in the filesystem,
but that name is not specified in [Install], it is confusing
is the unit is shown as "enabled". Look only for names specified
in Alias=.

Fixes #6338.

v2:
- Fix indentation.
- Fix checking for normal enablement, when the symlink name is the same as the
  unit name. This case wasn't handled properly in v1.

v3:
- Rework the patch to also handle templates properly:
  A template templ@.service with DefaultInstance=foo will be considered
  enabled only when templ@foo.service symlink is found. Symlinks with
  other instance names do not count, which matches the logic for aliases
  to normal units. Tests are updated.
2017-09-22 18:12:52 +02:00
Lennart Poettering 22c8321b09 update TODO 2017-09-22 15:28:05 +02:00
Lennart Poettering 9f2e6892a2 bpf: set BPF_F_ALLOW_OVERRIDE when attaching a cgroup program if Delegate=yes is set
Let's permit installing BPF programs in cgroup subtrees if
Delegeate=yes. Let's not document this precise behaviour for now though,
as most likely the logic here should become recursive, but that's only
going to happen if the kernel starts supporting that. Until then,
support this in a non-recursive fashion.
2017-09-22 15:28:05 +02:00
Lennart Poettering 1c382774c5 man: document two more special units 2017-09-22 15:28:05 +02:00
Lennart Poettering 1180181a51 man: remove double newlines in systemd.special man page header
The <!-- --> comment lines resulted in double newlines in the man page
header, which looks quite ugly. Let's rearrange a bit so that these
comments don't result in changes in the output.
2017-09-22 15:28:05 +02:00
Lennart Poettering ee859930d3 man: drop misplaced "," before "-.slice" 2017-09-22 15:28:05 +02:00
Lennart Poettering fb3ae275cb main: bump RLIMIT_NOFILE for the root user substantially
On current kernels BPF_MAP_TYPE_LPM_TRIE bpf maps are charged against
RLIMIT_MEMLOCK even for privileged users that have CAP_IPC_LOCK. Given
that mlock() generally ignores RLIMIT_MEMLOCK if CAP_IPC_LOCK is set
this appears to be an oversight in the kernel. Either way, until that's
fixed, let's just bump RLIMIT_MEMLOCK for the root user considerably, as
the default is quite limiting, and doesn't permit us to create more than
a few TRIE maps.
2017-09-22 15:28:05 +02:00
Lennart Poettering c4ad3f43ef rlimit: don't assume getrlimit() always succeeds
In times of seccomp it might very well fail, and given that we return
failures from this function anyway, let's also propagate getrlimit()
failures, just to be safe.
2017-09-22 15:28:05 +02:00
Lennart Poettering 915b1d0174 core: whenever a unit terminates, log its consumed resources to the journal
This adds a new recognizable log message for each unit invocation that
contains structured information about consumed resources of the unit as
a whole after it terminated. This is particular useful for apps that
want to figure out what the resource consumption of a unit given a
specific invocation ID was.

The log message is only generated for units that have at least one
XyzAccounting= property turned on, and currently only covers IP traffic and CPU
time metrics.
2017-09-22 15:28:05 +02:00
Lennart Poettering 8e5430c4bd nspawn: set up a new session keyring for the container process
keyring material should not leak into the container. So far we relied on
seccomp to deny access to the keyring, but given that we now made the
seccomp configurable, and access to keyctl() and friends may optionally
be permitted to containers now let's make sure we disconnect the callers
keyring from the keyring of PID 1 in the container.
2017-09-22 15:28:04 +02:00
Lennart Poettering e6a7ec4b8e io-util: add new IOVEC_INIT/IOVEC_MAKE macros
This adds IOVEC_INIT() and IOVEC_MAKE() for initializing iovec structures
from a pointer and a size. On top of these IOVEC_INIT_STRING() and
IOVEC_MAKE_STRING() are added which take a string and automatically
determine the size of the string using strlen().

This patch removes the old IOVEC_SET_STRING() macro, given that
IOVEC_MAKE_STRING() is now useful for similar purposes. Note that the
old IOVEC_SET_STRING() invocations were two characters shorter than the
new ones using IOVEC_MAKE_STRING(), but I think the new syntax is more
readable and more generic as it simply resolves to a C99 literal
structure initialization. Moreover, we can use very similar syntax now
for initializing strings and pointer+size iovec entries. We canalso use
the new macros to initialize function parameters on-the-fly or array
definitions. And given that we shouldn't have so many ways to do the
same stuff, let's just settle on the new macros.

(This also converts some code to use _cleanup_ where dynamically
allocated strings were using IOVEC_SET_STRING() before, to modernize
things a bit)
2017-09-22 15:28:04 +02:00
Lennart Poettering 646cc98dc8 job: change result field for log message about job result RESULT= → JOB_RESULT=
So, currently, some of the structured log messages we generated based on
jobs carry the result in RESULT=, and others in JOB_RESULT=. Let's
streamline this, as stick to JOB_RESULT= in one place.

This is kind of an API break, but given that currently most software has
to check both fields anyway, I think we can get away with it.

Why unify on JOB_RESULT= rather than RESULT=? Well, we manage different
types of result codes in systemd. Most importanlty besides job results
there are also service results, and we should be explicit in what we
mean here.
2017-09-22 15:24:55 +02:00
Lennart Poettering dba1bd4396 documentation: document nss-systemd's internal environment variables in ENVIRONMENT.md 2017-09-22 15:24:55 +02:00
Lennart Poettering f1c50becda core: make sure to log invocation ID of units also when doing structured logging 2017-09-22 15:24:55 +02:00
Daniel Mack 8d8631d4c9 man: document the new ip accounting and filting directives 2017-09-22 15:24:55 +02:00
Lennart Poettering cf3b4be101 cgroup: refuse to return accounting data if accounting isn't turned on
We used to be a bit sloppy on this, and handed out accounting data even
for units where accounting wasn't explicitly enabled. Let's be stricter
here, so that we know the accounting data is actually fully valid. This
is necessary, as the accounting data is no longer stored exclusively in
cgroupfs, but is partly maintained external of that, and flushed during
unit starts. We should hence only expose accounting data we really know
is fully current.
2017-09-22 15:24:55 +02:00
Lennart Poettering 58d83430e1 core: when coming back from reload/reexec, reapply all cgroup properties
With this change we'll invalidate all cgroup settings after coming back
from a daemon reload/reexec, so that the new settings are instantly
applied.

This is useful for the BPF case, because we don't serialize/deserialize
the BPF program fd, and hence have to install a new, updated BPF program
when coming back from the reload/reexec. However, this is also useful
for the rest of the cgroup settings, as it ensures that user
configuration really takes effect wherever we can.
2017-09-22 15:24:55 +02:00
Lennart Poettering 6b659ed87e core: serialize/deserialize IP accounting across daemon reload/reexec
Make sure the current IP accounting counters aren't lost during
reload/reexec.

Note that we destroy all BPF file objects during a reload: the BPF
programs, the access and the accounting maps. The former two need to be
regenerated anyway with the newly loaded configuration data, but the
latter one needs to survive reloads/reexec. In this implementation I
opted to only save/restore the accounting map content instead of the map
itself. While this opens a (theoretic) window where IP traffic is still
accounted to the old map after we read it out, and we thus miss a few
bytes this has the benefit that we can alter the map layout between
versions should the need arise.
2017-09-22 15:24:55 +02:00
Lennart Poettering a79279c7fd core: when creating the socket fds for a socket unit, join socket's cgroup first
Let's make sure that a socket unit's IPAddressAllow=/IPAddressDeny=
settings are in effect on all socket fds associated with it. In order to
make this happen we need to make sure the cgroup the fds are associated
with are the socket unit's cgroup. The only way to do that is invoking
socket()+accept() in them. Since we really don't want to migrate PID 1
around we do this by forking off a helper process, which invokes
socket()/accept() and sends the newly created fd to PID 1. Ugly, but
works, and there's apparently no better way right now.

This generalizes forking off per-unit helper processes in a new function
unit_fork_helper_process(), which is then also used by the NSS chown()
code of socket units.
2017-09-22 15:24:55 +02:00
Lennart Poettering 5ed272cf92 socket-label: let's use IN_SET, so that we have to call socket_address_family() only once 2017-09-22 15:24:55 +02:00
Lennart Poettering 078ba556da core: warn loudly if IP firewalling is configured but not in effect 2017-09-22 15:24:55 +02:00
Daniel Mack db3a59308c Add test for eBPF firewall code 2017-09-22 15:24:55 +02:00
Lennart Poettering 1274b6c687 ip-address-access: minimize IP address lists
Let's drop redundant items from the IP address list after parsing. Let's
also mask out redundant bits hidden by the prefixlength.
2017-09-22 15:24:55 +02:00
Lennart Poettering 2ba6e7381b mkosi: when the build fails, show its log output, and propagate error 2017-09-22 15:24:55 +02:00
Lennart Poettering 3dc5ca9787 core: support IP firewalling to be configured for transient units 2017-09-22 15:24:55 +02:00
Lennart Poettering c21c99060b cgroup: dump the newly added IP settings in the cgroup context 2017-09-22 15:24:55 +02:00
Daniel Mack 0e97c93fe5 systemctl: report accounted network traffic in "systemctl status"
This hooks up the eposed D-Bus values and displays them like this:

-bash-4.3# systemctl status httpd
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/etc/systemd/system/httpd.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2016-11-11 20:10:36 CET; 1min 29s ago
 Main PID: 33 (httpd)
   Status: "Total requests: 22514; Idle/Busy workers 92/7;Requests/sec: 259; Bytes served/sec:  87KB/sec"
  Network: 15.8M in, 51.1M out
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   CGroup: /system.slice/httpd.service
           ├─ 33 /usr/sbin/httpd -DFOREGROUND
           ├─ 37 /usr/sbin/httpd -DFOREGROUND
           ├─112 /usr/sbin/httpd -DFOREGROUND
           └─119 /usr/sbin/httpd -DFOREGROUND
2017-09-22 15:24:55 +02:00
Daniel Mack 377bfd2d49 manager: hook up IP accounting defaults 2017-09-22 15:24:55 +02:00
Daniel Mack 906c06f64a cgroup, unit, fragment parser: make use of new firewall functions 2017-09-22 15:24:55 +02:00
Daniel Mack 1988a9d120 Add firewall eBPF compiler 2017-09-22 15:24:55 +02:00
Daniel Mack 6a48d82f02 cgroup: add fields to accommodate eBPF related details
Add pointers for compiled eBPF programs as well as list heads for allowed
and denied hosts for both directions.
2017-09-22 15:24:54 +02:00
Daniel Mack b36672e072 Add IP address address ACL representation and parser
Add a config directive parser that takes multiple space separated IPv4
or IPv6 addresses with optional netmasks in CIDR notation rvalue and
puts a parsed version of it to linked list of IPAddressAccessItem objects.
The code actually using this will be added later.
2017-09-22 15:24:54 +02:00