Commit graph

2403 commits

Author SHA1 Message Date
Reverend Homer 8fb3f00997 tree-wide: replace all readdir cycles with FOREACH_DIRENT{,_ALL} (#4853) 2016-12-09 10:04:30 +01:00
Lennart Poettering e187369587 tree-wide: stop using canonicalize_file_name(), use chase_symlinks() instead
Let's use chase_symlinks() everywhere, and stop using GNU
canonicalize_file_name() everywhere. For most cases this should not change
behaviour, however increase exposure of our function to get better tested. Most
importantly in a few cases (most notably nspawn) it can take the correct root
directory into account when chasing symlinks.
2016-12-01 00:25:51 +01:00
Jouke Witteveen 7ed0a4c537 bus-util: add protocol error type explanation 2016-11-29 23:19:52 +01:00
Zbigniew Jędrzejewski-Szmek ee43050b40 Merge pull request #4692 from poettering/networkd-dhcp
Various networkd/DHCP fixes.
2016-11-22 23:22:04 -05:00
Lennart Poettering 17cbb288fa nspawn: add fallback top normal copy/reflink when we cannot btrfs snapshot
Given that other file systems (notably: xfs) support reflinks these days, let's
extend the file system snapshotting logic to fall back to plan copies or
reflinks when full btrfs subvolume snapshots are not available.

This essentially makes "systemd-nspawn --ephemeral" and "systemd-nspawn
--template=" available on non-btrfs subvolumes. Of course, both operations will
still be slower on non-btrfs than on btrfs (simply because reflinking each file
individually in a directory tree is still slower than doing this in one step
for a whole subvolume), but it's probably good enough for many cases, and we
should provide the users with the tools, they have to figure out what's good
for them.

Note that "machinectl clone" already had a fallback like this in place, this
patch generalizes this, and adds similar support to our other cases.
2016-11-22 13:35:09 +01:00
Lennart Poettering b6e953f24c nspawn: add ability to run nspawn without container locks applied
This adds a new undocumented env var $SYSTEMD_NSPAWN_LOCK. When set to "0",
nspawn will not attempt to lock the image.

Fixes: #4037
2016-11-22 13:35:09 +01:00
Lennart Poettering 546dbec532 shared: make sure image_path_lock() return parameters are always initialized on success
We forgot to initialize the "global" return parameter in one case. Fix that.
2016-11-22 13:35:09 +01:00
Lennart Poettering 1a1b13c957 seccomp: add @filesystem syscall group (#4537)
@filesystem groups various file system operations, such as opening files and
directories for read/write and stat()ing them, plus renaming, deleting,
symlinking, hardlinking.
2016-11-21 19:29:12 -05:00
Lennart Poettering 08a4849ec9 shared: add new API to validate a string as hostname or IP address 2016-11-21 22:58:26 +01:00
Lennart Poettering 2e6dbc0fcd Merge pull request #4538 from fbuihuu/confirm-spawn-fixes
Confirm spawn fixes/enhancements
2016-11-18 11:08:06 +01:00
Lennart Poettering 289188ca7d system-run: add support for configuring unit dependencies with --property=
Support on the server side has already been in place for quite some time, let's
also add support on the client side for this.
2016-11-16 15:03:26 +01:00
Lennart Poettering c5a97ed132 core: GC redundant device jobs from the run queue
In contrast to all other unit types device units when queued just track
external state, they cannot effect state changes on their own. Hence unless a
client or other job waits for them there's no reason to keep them in the job
queue. This adds a concept of GC'ing jobs of this type as soon as no client or
other job waits for them anymore.

To ensure this works correctly we need to track which clients actually
reference a job (i.e. which ones enqueued it). Unfortunately that's pretty
nasty to do for direct connections, as sd_bus_track doesn't work for
them. For now, work around this, by simply remembering in a boolean that a job
was requested by a direct connection, and reset it when we notice the direct
connection is gone. This means the GC logic works fine, except that jobs are
not immediately removed when direct connections disconnect.

In the longer term, a rework of the bus logic should fix this properly. For now
this should be good enough, as GC works for fine all cases except this one, and
thus is a clear improvement over the previous behaviour.

Fixes: #1921
2016-11-16 15:03:26 +01:00
Lennart Poettering 984794baf4 shared: split out code for adding multiple names to sd_bus_track object
Let's introduce a new call bus_track_add_name_many() that adds a string list to
a tracking object.
2016-11-16 15:03:26 +01:00
Djalal Harouni 73186d534b bus-util: print RestrictNamespaces= as a string
Allow all callers that want to print RestrictNamespaces= returned from D-Bus
as a string instead of a u64 value.
2016-11-15 15:52:42 +01:00
Zbigniew Jędrzejewski-Szmek c58bd76a6a tree-wide: make invocations of extract_first_word more uniform (#4627)
extract_first_words deals fine with the string being NULL, so drop the upfront
check for that.
2016-11-11 18:58:41 +01:00
Franck Bui 51b9bb4f8e nfsflags: drop useless include file 'seccomp-util.h'
This also fixes the build when seccomp is disabled.
2016-11-10 10:19:24 +01:00
Zbigniew Jędrzejewski-Szmek d85a0f8028 Merge pull request #4536 from poettering/seccomp-namespaces
core: add new RestrictNamespaces= unit file setting

Merging, not rebasing, because this touches many files and there were tree-wide cleanups in the mean time.
2016-11-08 19:54:21 -05:00
Zbigniew Jędrzejewski-Szmek a809cee582 Merge pull request #4612 from keszybz/format-strings
Format string tweaks (and a small fix on 32bit)
2016-11-08 08:09:40 -05:00
Martin Pitt ca91fd2aca Merge pull request #4509 from keszybz/foreach-word-quoted
Remove FOREACH_WORD_QUOTED
2016-11-08 09:41:51 +01:00
Zbigniew Jędrzejewski-Szmek 70887c5f29 tree-wide: add PRI_[NU]SEC, and use time format strings more 2016-11-07 22:49:09 -05:00
Zbigniew Jędrzejewski-Szmek f97b34a629 Rename formats-util.h to format-util.h
We don't have plural in the name of any other -util files and this
inconsistency trips me up every time I try to type this file name
from memory. "formats-util" is even hard to pronounce.
2016-11-07 10:15:08 -05:00
Zbigniew Jędrzejewski-Szmek 9a82ab9592 tree-wide: drop unneded WHITESPACE param to extract_first_word
It's the default, and NULL is shorter.
2016-11-05 15:35:51 -04:00
Ronny Chevalier 9bda42660d Merge pull request #4578 from evverx/no-hostname-memleak
journalctl: fix memleak
2016-11-05 15:23:31 +01:00
Lennart Poettering add005357d core: add new RestrictNamespaces= unit file setting
This new setting permits restricting whether namespaces may be created and
managed by processes started by a unit. It installs a seccomp filter blocking
certain invocations of unshare(), clone() and setns().

RestrictNamespaces=no is the default, and does not restrict namespaces in any
way. RestrictNamespaces=yes takes away the ability to create or manage any kind
of namspace. "RestrictNamespaces=mnt ipc" restricts the creation of namespaces
so that only mount and IPC namespaces may be created/managed, but no other
kind of namespaces.

This setting should be improve security quite a bit as in particular user
namespacing was a major source of CVEs in the kernel in the past, and is
accessible to unprivileged processes. With this setting the entire attack
surface may be removed for system services that do not make use of namespaces.
2016-11-04 07:40:13 -06:00
Zbigniew Jędrzejewski-Szmek cf88547034 Merge pull request #4548 from keszybz/seccomp-help
systemd-analyze syscall-filter
2016-11-03 20:27:45 -04:00
Evgeny Vereshchagin 29d87223d5 acl-util: fix memleak
Fixes:
$ ./libtool --mode execute valgrind --leak-check=full ./journalctl >/dev/null
==22309== Memcheck, a memory error detector
==22309== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==22309== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==22309== Command: /home/vagrant/systemd/.libs/lt-journalctl
==22309==
Hint: You are currently not seeing messages from other users and the system.
      Users in groups 'adm', 'systemd-journal', 'wheel' can see all messages.
      Pass -q to turn off this notice.
==22309==
==22309== HEAP SUMMARY:
==22309==     in use at exit: 8,680 bytes in 4 blocks
==22309==   total heap usage: 5,543 allocs, 5,539 frees, 9,045,618 bytes allocated
==22309==
==22309== 488 (56 direct, 432 indirect) bytes in 1 blocks are definitely lost in loss record 2 of 4
==22309==    at 0x4C2BBAD: malloc (vg_replace_malloc.c:299)
==22309==    by 0x6F37A0A: __new_var_obj_p (__libobj.c:36)
==22309==    by 0x6F362F7: __acl_init_obj (acl_init.c:28)
==22309==    by 0x6F37731: __acl_from_xattr (__acl_from_xattr.c:54)
==22309==    by 0x6F36087: acl_get_file (acl_get_file.c:69)
==22309==    by 0x4F15752: acl_search_groups (acl-util.c:172)
==22309==    by 0x113A1E: access_check_var_log_journal (journalctl.c:1836)
==22309==    by 0x113D8D: access_check (journalctl.c:1889)
==22309==    by 0x115681: main (journalctl.c:2236)
==22309==
==22309== LEAK SUMMARY:
==22309==    definitely lost: 56 bytes in 1 blocks
==22309==    indirectly lost: 432 bytes in 1 blocks
==22309==      possibly lost: 0 bytes in 0 blocks
==22309==    still reachable: 8,192 bytes in 2 blocks
==22309==         suppressed: 0 bytes in 0 blocks
2016-11-03 22:07:49 +00:00
Evgeny Vereshchagin 12104159ed journalctl: fix memleak
bash-4.3# journalctl --no-hostname >/dev/null

=================================================================
==288==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 48492 byte(s) in 2694 object(s) allocated from:
    #0 0x7fb4aba13e60 in malloc (/lib64/libasan.so.3+0xc6e60)
    #1 0x7fb4ab5b2cc4 in malloc_multiply src/basic/alloc-util.h:70
    #2 0x7fb4ab5b3194 in parse_field src/shared/logs-show.c:98
    #3 0x7fb4ab5b4918 in output_short src/shared/logs-show.c:347
    #4 0x7fb4ab5b7cb7 in output_journal src/shared/logs-show.c:977
    #5 0x5650e29cd83d in main src/journal/journalctl.c:2581
    #6 0x7fb4aabdb730 in __libc_start_main (/lib64/libc.so.6+0x20730)

SUMMARY: AddressSanitizer: 48492 byte(s) leaked in 2694 allocation(s).

Closes: #4568
2016-11-03 21:23:22 +00:00
Lennart Poettering 493fd52f1a Merge pull request #4510 from keszybz/tree-wide-cleanups
Tree wide cleanups
2016-11-03 13:59:20 -06:00
Zbigniew Jędrzejewski-Szmek d5efc18b60 seccomp-util, analyze: export comments as a help string
Just to make the whole thing easier for users.
2016-11-03 09:35:36 -04:00
Zbigniew Jędrzejewski-Szmek 40eb6a8014 seccomp-util: move @default to the first position
Now that the list is user-visible, @default should be first.
2016-11-03 09:35:36 -04:00
Zbigniew Jędrzejewski-Szmek a6eccc3647 Do not raise in switch root if paths are too long
If we encounter the (unlikely) situation where the combined path to the
new root and a path to a mount to be moved together exceed maximum path length,
we shouldn't crash, but fail this path instead.
2016-11-02 22:36:43 -04:00
Lennart Poettering 133ddbbeae seccomp: add two new syscall groups
@resources contains various syscalls that alter resource limits and memory and
scheduling parameters of processes. As such they are good candidates to block
for most services.

@basic-io contains a number of basic syscalls for I/O, similar to the list
seccomp v1 permitted but slightly more complete. It should be useful for
building basic whitelisting for minimal sandboxes
2016-11-02 08:50:00 -06:00
Lennart Poettering cd5bfd7e60 seccomp: include pipes and memfd in @ipc
These system calls clearly fall in the @ipc category, hence should be listed
there, simply to avoid confusion and surprise by the user.
2016-11-02 08:50:00 -06:00
Lennart Poettering a8c157ff30 seccomp: drop execve() from @process list
The system call is already part in @default hence implicitly allowed anyway.
Also, if it is actually blocked then systemd couldn't execute the service in
question anymore, since the application of seccomp is immediately followed by
it.
2016-11-02 08:49:59 -06:00
Lennart Poettering c79aff9a82 seccomp: add clock query and sleeping syscalls to "@default" group
Timing and sleep are so basic operations, it makes very little sense to ever
block them, hence don't.
2016-11-02 08:49:59 -06:00
Zbigniew Jędrzejewski-Szmek aa34055ffb seccomp: allow specifying arm64, mips, ppc (#4491)
"Secondary arch" table for mips is entirely speculative…
2016-11-01 09:33:18 -06:00
Evgeny Vereshchagin 492466c1b5 Merge pull request #4442 from keszybz/detect-virt-userns
detect-virt: add --private-users switch to check if a userns is active; add Condition=private-users
2016-10-27 13:16:16 +03:00
Zbigniew Jędrzejewski-Szmek 0809d7740c condition: simplify condition_test_virtualization
Rewrite the function to be slightly simpler. In particular, if a specific
match is found (like ConditionVirtualization=yes), simply return an answer
immediately, instead of relying that "yes" will not be matched by any of
the virtualization names below.

No functional change.
2016-10-26 20:12:52 -04:00
Zbigniew Jędrzejewski-Szmek 239a5707e1 shared/condition: add ConditionVirtualization=[!]private-users
This can be useful to silence warnings about units which fail in userns
container.
2016-10-26 20:12:52 -04:00
Lennart Poettering f6281133de seccomp: add test-seccomp test tool
This validates the system call set table and many of our seccomp-util.c APIs.
2016-10-24 17:32:51 +02:00
Lennart Poettering a3be2849b2 seccomp: add new helper call seccomp_load_filter_set()
This allows us to unify most of the code in apply_protect_kernel_modules() and
apply_private_devices().
2016-10-24 17:32:50 +02:00
Lennart Poettering 60f547cf68 seccomp: two fixes for the syscall set tables
"oldumount()" is not a syscall, but simply a wrapper for it, the actual syscall
nr is called "umount" (and the nr of umount() is called umount2 internally).

"sysctl()" is not a syscall, but "_syscall()" is. Fix this in the table.

Without these changes libseccomp cannot actually translate the tables in full.
This wasn't noticed before as the code was written defensively for this case.
2016-10-24 17:32:50 +02:00
Lennart Poettering 8d7b0c8fd7 seccomp: add new seccomp_init_conservative() helper
This adds a new seccomp_init_conservative() helper call that is mostly just a
wrapper around seccomp_init(), but turns off NNP and adds in all secondary
archs, for best compatibility with everything else.

Pretty much all of our code used the very same constructs for these three
steps, hence unifying this in one small function makes things a lot shorter.

This also changes incorrect usage of the "scmp_filter_ctx" type at various
places. libseccomp defines it as typedef to "void*", i.e. it is a pointer type
(pretty poor choice already!) that casts implicitly to and from all other
pointer types (even poorer choice: you defined a confusing type now, and don't
even gain any bit of type safety through it...). A lot of the code assumed the
type would refer to a structure, and hence aded additional "*" here and there.
Remove that.
2016-10-24 17:32:50 +02:00
Lennart Poettering 8130926d32 core: rework syscall filter set handling
A variety of fixes:

- rename the SystemCallFilterSet structure to SyscallFilterSet. So far the main
  instance of it (the syscall_filter_sets[] array) used to abbreviate
  "SystemCall" as "Syscall". Let's stick to one of the two syntaxes, and not
  mix and match too wildly. Let's pick the shorter name in this case, as it is
  sufficiently well established to not confuse hackers reading this.

- Export explicit indexes into the syscall_filter_sets[] array via an enum.
  This way, code that wants to make use of a specific filter set, can index it
  directly via the enum, instead of having to search for it. This makes
  apply_private_devices() in particular a lot simpler.

- Provide two new helper calls in seccomp-util.c: syscall_filter_set_find() to
  find a set by its name, seccomp_add_syscall_filter_set() to add a set to a
  seccomp object.

- Update SystemCallFilter= parser to use extract_first_word().  Let's work on
  deprecating FOREACH_WORD_QUOTED().

- Simplify apply_private_devices() using this functionality
2016-10-24 17:32:50 +02:00
Lennart Poettering ec2ebfd524 update-done: minor clean-ups
This is a follow-up for fb8b0869a7, and makes a
couple of minor clean-up changes:

- The field name in the timestamp file is changed from "TimestampNSec=" to
  "TIMESTAMP_NSEC=". This is done simply to reflect the fact that we parse the
  file with the env var file parser, and hence the contents should better
  follow the usual capitalization of env vars, i.e. be all uppercase.

- Needless negation of the errno parameter log_error_errno() and friends has
  been removed.

- Instead of manually calculating the nsec remainder of the timestamp, use
  timespec_store().

- We now check whether we were able to write the timestamp file in full with
  fflush_and_check() the way we usually do it.
2016-10-24 17:29:51 +02:00
Jan Synacek 3b3557c410 shared, systemctl: teach is-enabled to show installation targets
It may be desired by users to know what targets a particular service is
installed into. Improve user friendliness by teaching the is-enabled
command to show such information when used with --full.

This patch makes use of the newly added UnitFileFlags and adds
UNIT_FILE_DRY_RUN flag into it. Since the API had already been modified,
it's now easy to add the new dry-run feature for other commands as
well. As a next step, --dry-run could be added to systemctl, which in
turn might pave the way for a long requested dry-run feature when
running systemctl start.
2016-10-24 10:19:08 +02:00
Jan Synacek b3796dd834 install: introduce UnitFileFlags
Introduce a new enum to get rid of some boolean arguments of unit_file_*
functions. It unifies the code, makes it a bit cleaner and extensible.
2016-10-24 10:19:08 +02:00
Zbigniew Jędrzejewski-Szmek 605405c6cc tree-wide: drop NULL sentinel from strjoin
This makes strjoin and strjoina more similar and avoids the useless final
argument.

spatch -I . -I ./src -I ./src/basic -I ./src/basic -I ./src/shared -I ./src/shared -I ./src/network -I ./src/locale -I ./src/login -I ./src/journal -I ./src/journal -I ./src/timedate -I ./src/timesync -I ./src/nspawn -I ./src/resolve -I ./src/resolve -I ./src/systemd -I ./src/core -I ./src/core -I ./src/libudev -I ./src/udev -I ./src/udev/net -I ./src/udev -I ./src/libsystemd/sd-bus -I ./src/libsystemd/sd-event -I ./src/libsystemd/sd-login -I ./src/libsystemd/sd-netlink -I ./src/libsystemd/sd-network -I ./src/libsystemd/sd-hwdb -I ./src/libsystemd/sd-device -I ./src/libsystemd/sd-id128 -I ./src/libsystemd-network --sp-file coccinelle/strjoin.cocci --in-place $(git ls-files src/*.c)

git grep -e '\bstrjoin\b.*NULL' -l|xargs sed -i -r 's/strjoin\((.*), NULL\)/strjoin(\1)/'

This might have missed a few cases (spatch has a really hard time dealing
with _cleanup_ macros), but that's no big issue, they can always be fixed
later.
2016-10-23 11:43:27 -04:00
Zbigniew Jędrzejewski-Szmek 6309e51ea3 shared/install: fix %u expansion when running under sudo
Test case:
[Install]
DefaultInstance=bond1
WantedBy= foobar-U-%U.device
WantedBy= foobar-u-%u.device

$ sudo systemctl --root=/ enable testing4@.service
(before)
Created symlink /etc/systemd/system/foobar-U-0.device.wants/testing4@bond1.service → /etc/systemd/system/testing4@.service.
Created symlink /etc/systemd/system/foobar-u-zbyszek.device.wants/testing4@bond1.service → /etc/systemd/system/testing4@.service.

(after)
Created symlink /etc/systemd/system/foobar-U-0.device.wants/testing4@bond1.service → /etc/systemd/system/testing4@.service.
Created symlink /etc/systemd/system/foobar-u-root.device.wants/testing4@bond1.service → /etc/systemd/system/testing4@.service.

It doesn't make much sense to use a different user for %U and %u.
2016-10-20 09:28:51 -04:00
Zbigniew Jędrzejewski-Szmek cfd559c9a8 shared/install: fix DefaultInstance expansion in %n, %N
We should substitute DefaultInstance if the instance is not specified.

Test case:
[Install]
DefaultInstance=bond1
WantedBy= foobar-n-%n.device
WantedBy= foobar-N-%N.device

$ systemctl --root=/ enable testing4@.service
Created symlink /etc/systemd/system/foobar-n-testing4@bond1.service.device.wants/testing4@bond1.service → /etc/systemd/system/testing4@.service.
Created symlink /etc/systemd/system/foobar-N-testing4@bond1.device.wants/testing4@bond1.service → /etc/systemd/system/testing4@.service.

(before, the symlink would be created with empty %n, %N parts).
2016-10-20 09:28:46 -04:00