Commit graph

30206 commits

Author SHA1 Message Date
Lennart Poettering ee859930d3 man: drop misplaced "," before "-.slice" 2017-09-22 15:28:05 +02:00
Lennart Poettering fb3ae275cb main: bump RLIMIT_NOFILE for the root user substantially
On current kernels BPF_MAP_TYPE_LPM_TRIE bpf maps are charged against
RLIMIT_MEMLOCK even for privileged users that have CAP_IPC_LOCK. Given
that mlock() generally ignores RLIMIT_MEMLOCK if CAP_IPC_LOCK is set
this appears to be an oversight in the kernel. Either way, until that's
fixed, let's just bump RLIMIT_MEMLOCK for the root user considerably, as
the default is quite limiting, and doesn't permit us to create more than
a few TRIE maps.
2017-09-22 15:28:05 +02:00
Lennart Poettering c4ad3f43ef rlimit: don't assume getrlimit() always succeeds
In times of seccomp it might very well fail, and given that we return
failures from this function anyway, let's also propagate getrlimit()
failures, just to be safe.
2017-09-22 15:28:05 +02:00
Lennart Poettering 915b1d0174 core: whenever a unit terminates, log its consumed resources to the journal
This adds a new recognizable log message for each unit invocation that
contains structured information about consumed resources of the unit as
a whole after it terminated. This is particular useful for apps that
want to figure out what the resource consumption of a unit given a
specific invocation ID was.

The log message is only generated for units that have at least one
XyzAccounting= property turned on, and currently only covers IP traffic and CPU
time metrics.
2017-09-22 15:28:05 +02:00
Lennart Poettering 8e5430c4bd nspawn: set up a new session keyring for the container process
keyring material should not leak into the container. So far we relied on
seccomp to deny access to the keyring, but given that we now made the
seccomp configurable, and access to keyctl() and friends may optionally
be permitted to containers now let's make sure we disconnect the callers
keyring from the keyring of PID 1 in the container.
2017-09-22 15:28:04 +02:00
Lennart Poettering e6a7ec4b8e io-util: add new IOVEC_INIT/IOVEC_MAKE macros
This adds IOVEC_INIT() and IOVEC_MAKE() for initializing iovec structures
from a pointer and a size. On top of these IOVEC_INIT_STRING() and
IOVEC_MAKE_STRING() are added which take a string and automatically
determine the size of the string using strlen().

This patch removes the old IOVEC_SET_STRING() macro, given that
IOVEC_MAKE_STRING() is now useful for similar purposes. Note that the
old IOVEC_SET_STRING() invocations were two characters shorter than the
new ones using IOVEC_MAKE_STRING(), but I think the new syntax is more
readable and more generic as it simply resolves to a C99 literal
structure initialization. Moreover, we can use very similar syntax now
for initializing strings and pointer+size iovec entries. We canalso use
the new macros to initialize function parameters on-the-fly or array
definitions. And given that we shouldn't have so many ways to do the
same stuff, let's just settle on the new macros.

(This also converts some code to use _cleanup_ where dynamically
allocated strings were using IOVEC_SET_STRING() before, to modernize
things a bit)
2017-09-22 15:28:04 +02:00
Lennart Poettering 646cc98dc8 job: change result field for log message about job result RESULT= → JOB_RESULT=
So, currently, some of the structured log messages we generated based on
jobs carry the result in RESULT=, and others in JOB_RESULT=. Let's
streamline this, as stick to JOB_RESULT= in one place.

This is kind of an API break, but given that currently most software has
to check both fields anyway, I think we can get away with it.

Why unify on JOB_RESULT= rather than RESULT=? Well, we manage different
types of result codes in systemd. Most importanlty besides job results
there are also service results, and we should be explicit in what we
mean here.
2017-09-22 15:24:55 +02:00
Lennart Poettering dba1bd4396 documentation: document nss-systemd's internal environment variables in ENVIRONMENT.md 2017-09-22 15:24:55 +02:00
Lennart Poettering f1c50becda core: make sure to log invocation ID of units also when doing structured logging 2017-09-22 15:24:55 +02:00
Daniel Mack 8d8631d4c9 man: document the new ip accounting and filting directives 2017-09-22 15:24:55 +02:00
Lennart Poettering cf3b4be101 cgroup: refuse to return accounting data if accounting isn't turned on
We used to be a bit sloppy on this, and handed out accounting data even
for units where accounting wasn't explicitly enabled. Let's be stricter
here, so that we know the accounting data is actually fully valid. This
is necessary, as the accounting data is no longer stored exclusively in
cgroupfs, but is partly maintained external of that, and flushed during
unit starts. We should hence only expose accounting data we really know
is fully current.
2017-09-22 15:24:55 +02:00
Lennart Poettering 58d83430e1 core: when coming back from reload/reexec, reapply all cgroup properties
With this change we'll invalidate all cgroup settings after coming back
from a daemon reload/reexec, so that the new settings are instantly
applied.

This is useful for the BPF case, because we don't serialize/deserialize
the BPF program fd, and hence have to install a new, updated BPF program
when coming back from the reload/reexec. However, this is also useful
for the rest of the cgroup settings, as it ensures that user
configuration really takes effect wherever we can.
2017-09-22 15:24:55 +02:00
Lennart Poettering 6b659ed87e core: serialize/deserialize IP accounting across daemon reload/reexec
Make sure the current IP accounting counters aren't lost during
reload/reexec.

Note that we destroy all BPF file objects during a reload: the BPF
programs, the access and the accounting maps. The former two need to be
regenerated anyway with the newly loaded configuration data, but the
latter one needs to survive reloads/reexec. In this implementation I
opted to only save/restore the accounting map content instead of the map
itself. While this opens a (theoretic) window where IP traffic is still
accounted to the old map after we read it out, and we thus miss a few
bytes this has the benefit that we can alter the map layout between
versions should the need arise.
2017-09-22 15:24:55 +02:00
Lennart Poettering a79279c7fd core: when creating the socket fds for a socket unit, join socket's cgroup first
Let's make sure that a socket unit's IPAddressAllow=/IPAddressDeny=
settings are in effect on all socket fds associated with it. In order to
make this happen we need to make sure the cgroup the fds are associated
with are the socket unit's cgroup. The only way to do that is invoking
socket()+accept() in them. Since we really don't want to migrate PID 1
around we do this by forking off a helper process, which invokes
socket()/accept() and sends the newly created fd to PID 1. Ugly, but
works, and there's apparently no better way right now.

This generalizes forking off per-unit helper processes in a new function
unit_fork_helper_process(), which is then also used by the NSS chown()
code of socket units.
2017-09-22 15:24:55 +02:00
Lennart Poettering 5ed272cf92 socket-label: let's use IN_SET, so that we have to call socket_address_family() only once 2017-09-22 15:24:55 +02:00
Lennart Poettering 078ba556da core: warn loudly if IP firewalling is configured but not in effect 2017-09-22 15:24:55 +02:00
Daniel Mack db3a59308c Add test for eBPF firewall code 2017-09-22 15:24:55 +02:00
Lennart Poettering 1274b6c687 ip-address-access: minimize IP address lists
Let's drop redundant items from the IP address list after parsing. Let's
also mask out redundant bits hidden by the prefixlength.
2017-09-22 15:24:55 +02:00
Lennart Poettering 2ba6e7381b mkosi: when the build fails, show its log output, and propagate error 2017-09-22 15:24:55 +02:00
Lennart Poettering 3dc5ca9787 core: support IP firewalling to be configured for transient units 2017-09-22 15:24:55 +02:00
Lennart Poettering c21c99060b cgroup: dump the newly added IP settings in the cgroup context 2017-09-22 15:24:55 +02:00
Daniel Mack 0e97c93fe5 systemctl: report accounted network traffic in "systemctl status"
This hooks up the eposed D-Bus values and displays them like this:

-bash-4.3# systemctl status httpd
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/etc/systemd/system/httpd.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2016-11-11 20:10:36 CET; 1min 29s ago
 Main PID: 33 (httpd)
   Status: "Total requests: 22514; Idle/Busy workers 92/7;Requests/sec: 259; Bytes served/sec:  87KB/sec"
  Network: 15.8M in, 51.1M out
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   CGroup: /system.slice/httpd.service
           ├─ 33 /usr/sbin/httpd -DFOREGROUND
           ├─ 37 /usr/sbin/httpd -DFOREGROUND
           ├─112 /usr/sbin/httpd -DFOREGROUND
           └─119 /usr/sbin/httpd -DFOREGROUND
2017-09-22 15:24:55 +02:00
Daniel Mack 377bfd2d49 manager: hook up IP accounting defaults 2017-09-22 15:24:55 +02:00
Daniel Mack 906c06f64a cgroup, unit, fragment parser: make use of new firewall functions 2017-09-22 15:24:55 +02:00
Daniel Mack 1988a9d120 Add firewall eBPF compiler 2017-09-22 15:24:55 +02:00
Daniel Mack 6a48d82f02 cgroup: add fields to accommodate eBPF related details
Add pointers for compiled eBPF programs as well as list heads for allowed
and denied hosts for both directions.
2017-09-22 15:24:54 +02:00
Daniel Mack b36672e072 Add IP address address ACL representation and parser
Add a config directive parser that takes multiple space separated IPv4
or IPv6 addresses with optional netmasks in CIDR notation rvalue and
puts a parsed version of it to linked list of IPAddressAccessItem objects.
The code actually using this will be added later.
2017-09-22 15:24:54 +02:00
Daniel Mack 71e5200f94 Add abstraction model for BPF programs
This object takes a number of bpf_insn members and wraps them together with
the in-kernel reference id. Will be needed by the firewall code.
2017-09-22 15:24:54 +02:00
Daniel Mack 3f0c2342c0 build-sys: add new kernel bpf.h drop-in
The defines we need are pretty comprehensive and new, hence copy in the
full header from the kernel.
2017-09-22 15:24:54 +02:00
Lennart Poettering f4912f3a74 in-addr-util: add new helper call in_addr_prefix_from_string_auto()
This is much like in_addr_prefix_from_string(), but automatically
determines whether IPv4 or IPv6 addresses are specified. Also adds a
test for it.
2017-09-22 15:24:54 +02:00
Lennart Poettering 4e2d527361 in-addr-util: prefix return parameters with ret_ 2017-09-22 15:24:54 +02:00
Lennart Poettering 5a941f5f21 in-addr-util: be more systematic with naming our functions
Let's rename all our functions that process IPv4 in_addr structures
in4_addr_xyz(), following the already establishing naming logic for
this.

Leave the in_addr_xyz() prefix for functions that process the IPv4/IPv6
in_addr_union union instead.
2017-09-22 15:24:54 +02:00
Lennart Poettering bd389aa734 manager: initialize timeouts when allocating a naked Manager object
This way we can safely run manager objects from tests and good timeouts
apply. Without this all timeouts are set 0, which means they fire
instantly, when run from tests which do not explicitly configure them
(the way main.c does).
2017-09-22 15:24:54 +02:00
Lennart Poettering 10bd3e2e4c manager: watching the cgroup2 inotify fd is safe in test runs too
Less deviation between test runs and normal runs is always a good idea,
hence enable more stuff that is safe in test runs
2017-09-22 15:24:54 +02:00
Lennart Poettering 7cce4fb7f7 cgroup: always invalidate "cpu" and "cpuacct" together
This doesn't really matter, as we never invalidate cpuacct explicitly,
and there's no real reason to care for it explicitly, however it's
prettier if we always treat cpu and cpuacct as belonging together, the
same way we conisder "io" and "blkio" to belong together.
2017-09-22 15:24:54 +02:00
Lennart Poettering 8b238b13b1 cgroup-util: minor coding style adjustment 2017-09-22 15:24:54 +02:00
Lennart Poettering 18f573aaf9 core: make sure to dump cgroup context when unit_dump() is called for all unit types
For some reason we didn't dump the cgroup context for a number of unit
types, including service units. Not sure how this wasn't noticed
before... Add this in.
2017-09-22 15:24:54 +02:00
Marcel Hollerbach 214cc95d7b time-util: mktime_or_timegm are changing the struct tm
after that wm_day etc. seems to be changed. Moving the check infront of
the mktime_or_timegm fixes that.
2017-09-22 14:01:33 +02:00
Marcel Hollerbach 3fd4929b96 time-util: correctly handle the timezone when parsing
The timezone was cut off the string once the timezone was not UTC.
If it is not UTC but a other timezone that matches tzname[0] or
tzname[1], then we can leave it to the impl function to parse that
correctly. If not we can just fallback to whatever is the current
timezone is in the given t_timezone.

This should fix the testuite and tests.
2017-09-22 14:01:33 +02:00
Lennart Poettering ec20fe5ffb journald: make maximum size of stream log lines configurable and bump it to 48K (#6838)
This adds a new setting LineMax= to journald.conf, and sets it by
default to 48K. When we convert stream-based stdout/stderr logging into
record-based log entries, read up to the specified amount of bytes
before forcing a line-break.

This also makes three related changes:

- When a NUL byte is read we'll not recognize this as alternative line
  break, instead of silently dropping everything after it. (see #4863)

- The reason for a line-break is now encoded in the log record, if it
  wasn't a plain newline. Specifically, we distuingish "nul",
  "line-max" and "eof", for line breaks due to NUL byte, due to the
  maximum line length as configured with LineMax= or due to end of
  stream. This data is stored in the new implicit _LINE_BREAK= field.
  It's not synthesized for plain \n line breaks.

- A randomized 128bit ID is assigned to each log stream.

With these three changes in place it's (mostly) possible to reconstruct
the original byte streams from log data, as (most) of the context of
the conversion from the byte stream to log records is saved now. (So,
the only bits we still drop are empty lines. Which might be something to
look into in a future change, and which is outside of the scope of this
work)

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=86465
See: #4863
Replaces: #4875
2017-09-22 10:22:24 +02:00
Tommi Rantala 24754f3694 journal: add object sanity check to journal_file_move_to_object()
Introduce journal_file_check_object(), which does lightweight object
sanity checks, and use it in journal_file_move_to_object(), so that we
will catch certain corrupted objects in the journal file.

This fixes #6447, where we had only partially written out OBJECT_ENTRY
(ObjectHeader written, but rest of object zero bytes), causing
"journalctl --list-boots" to fail.

  $ builddir.vanilla/journalctl --list-boots -D bug6447/
  Failed to determine boots: No data available

  $ builddir.patched/journalctl --list-boots -D bug6447/
  -52 22633da1c5374a728d6c215e2c301dc2 Mon 2017-07-10 05:29:21 EEST—Mon 2017-07-10 05:31:51 EEST
  -51 2253aab9ea7e4a2598f2abda82939eff Mon 2017-07-10 05:32:22 EEST—Mon 2017-07-10 05:36:49 EEST
  -50 ef0d85d35c74486fa4104f9d6391b6ba Mon 2017-07-10 05:40:33 EEST—Mon 2017-07-10 05:40:40 EEST
  [...]

Note that journal_file_check_object() is similar to
journal_file_object_verify(). The most expensive checks are omitted, as
they would slow down every journal_file_move_to_object() call too much.

With this implementation, the added overhead is small, for example when
dumping some journal content to /dev/null
(built with -Dbuildtype=debugoptimized -Db_ndebug=true):

 Performance counter stats for 'builddir.vanilla/journalctl -D 76f4d4c3406945f9a60d3ca8763aa754/':

      12542,311634      task-clock:u (msec)       #    1,000 CPUs utilized
                 0      context-switches:u        #    0,000 K/sec
                 0      cpu-migrations:u          #    0,000 K/sec
            80 100      page-faults:u             #    0,006 M/sec
    41 786 963 456      cycles:u                  #    3,332 GHz
   105 453 864 770      instructions:u            #    2,52  insn per cycle
    24 342 227 334      branches:u                # 1940,809 M/sec
       105 709 217      branch-misses:u           #    0,43% of all branches

      12,545199291 seconds time elapsed

 Performance counter stats for 'builddir.patched/journalctl -D 76f4d4c3406945f9a60d3ca8763aa754/':

      12734,723233      task-clock:u (msec)       #    1,000 CPUs utilized
                 0      context-switches:u        #    0,000 K/sec
                 0      cpu-migrations:u          #    0,000 K/sec
            80 693      page-faults:u             #    0,006 M/sec
    42 661 017 429      cycles:u                  #    3,350 GHz
   107 696 985 865      instructions:u            #    2,52  insn per cycle
    24 950 526 745      branches:u                # 1959,252 M/sec
       101 762 806      branch-misses:u           #    0,41% of all branches

      12,737527327 seconds time elapsed

Fixes #6447.
2017-09-22 10:32:20 +03:00
Lennart Poettering d82611c918 Merge pull request #6853 from sourcejedi/GetAll
sd-bus: fix response for GetAll on non-existent objects
2017-09-21 21:41:55 +02:00
Zbigniew Jędrzejewski-Szmek a4041e4f68 Link to the right glibc commit in comment (#6884)
Reported by Marcos Mello.

Fixes #6882.
2017-09-21 20:54:16 +02:00
Zbigniew Jędrzejewski-Szmek 7a7ec2bf71 install: move and rename to lowercase two functions
No reason to make them look like macros.
2017-09-21 18:36:45 +02:00
Zbigniew Jędrzejewski-Szmek 3ec530a189 timedatectl: be more explicit what "ntp synchronized" means
The documentation explained that the message doesn't really mean what it says,
but I think it's better to just make the message more straightforward.

Fixes #6554.
2017-09-21 16:16:45 +02:00
Marcel Hollerbach ff69484a1f time-util: fix shadowing of timezone
timezone was shadowing timezone from time.h which leads to a buildbreak
since systemd is built with -Werror
2017-09-21 14:39:08 +02:00
Jan Synacek f679ed6116 execute: fix typo in error message (#6881) 2017-09-21 10:38:52 +02:00
Lennart Poettering 70b089d7dc Merge pull request #6847 from keszybz/disable-enable-generators
Disable and optionally again enable generators in test mode
2017-09-20 19:51:44 +02:00
Zbigniew Jędrzejewski-Szmek 23115a607e path-lookup: fix minor memleak
Introduced in a1f31f4715.
2017-09-19 20:14:22 +02:00
Zbigniew Jędrzejewski-Szmek 641c0fd14e analyze-verify: add --generators switch to enable generators again 2017-09-19 20:14:22 +02:00