Commit graph

16453 commits

Author SHA1 Message Date
Zbigniew Jędrzejewski-Szmek 5611ddebe4 loginctl: report tty in session listings
Without the tty it's really hard to tell which session is which.

New output:
$ ./loginctl
   SESSION        UID USER             SEAT             TTY
        13       1002 zbyszek          seat0            tty3
        c1         42 gdm              seat0            /dev/tty1
        11       1002 zbyszek          seat0            tty4
         3       1002 zbyszek          seat0            /dev/tty2
        17       1002 zbyszek          seat0            tty5
        18       1002 zbyszek          seat0            tty6
6 sessions listed.
2016-10-16 15:59:22 -04:00
Zbigniew Jędrzejewski-Szmek e8f215d34c loginctl: drop casts in printf 2016-10-16 15:59:22 -04:00
Lukáš Nykrýn 08a28eeca7 virt: add possibility to skip the check for chroot (#4374)
https://bugzilla.redhat.com/show_bug.cgi?id=1379852
2016-10-15 13:54:58 -04:00
Tejun Heo 7d862ab8c2 core: make settings for unified cgroup hierarchy supersede the ones for legacy hierarchy (#4269)
There are overlapping control group resource settings for the unified and
legacy hierarchies.  To help transition, the settings are translated back and
forth.  When both versions of a given setting are present, the one matching the
cgroup hierarchy type in use is used.  Unfortunately, this is more confusing to
use and document than necessary because there is no clear static precedence.

Update the translation logic so that the settings for the unified hierarchy are
always preferred.  systemd.resource-control man page is updated to reflect the
change and reorganized so that the deprecated settings are at the end in its
own section.
2016-10-14 21:07:16 -04:00
Thomas H. P. Andersen 5c4624e082 nspawn: remove unused variable (#4369) 2016-10-14 00:30:28 +03:00
Lennart Poettering 8bfdf29b24 Merge pull request #4243 from endocode/djalal/sandbox-first-protection-kernelmodules-v1
core:sandbox: Add ProtectKernelModules= and some fixes
2016-10-13 18:36:29 +02:00
Zbigniew Jędrzejewski-Szmek f5df066d1d Merge pull request #653 from dvdhrm/bus-gold 2016-10-13 12:25:56 -04:00
Daniel Mack d02b5af3f3 Merge pull request #4363 from stefan-it/replace-while-loops
basic,coredump: use for loop instead of while
2016-10-13 15:56:23 +02:00
Evgeny Vereshchagin f0bef277a4 nspawn: cleanup and chown the synced cgroup hierarchy (#4223)
Fixes: #4181
2016-10-13 09:50:46 -04:00
Zbigniew Jędrzejewski-Szmek c1a9199ec4 Merge pull request #4362 from poettering/journalbootlistfix 2016-10-13 07:45:09 -04:00
Stefan Schweter aa7530d681 coredump: use for() loop instead of while() 2016-10-12 22:49:01 +02:00
Stefan Schweter e7f1334f07 basic: use for() loop instead of while() 2016-10-12 22:48:41 +02:00
Lennart Poettering 6612379adf Merge pull request #4358 from fsateler/pam-config
Pam config fixes
2016-10-12 20:41:52 +02:00
Lennart Poettering 3cc44bf91b journalctl: say in which directory we vacuum stuff
Fixes: #4060
2016-10-12 20:25:20 +02:00
Lennart Poettering 8da830bca9 journalctl: don't claim the journal was stored on disk
Let's just say that the journal takes up space in the file system, not on disk,
as tmpfs is definitely a file system, but not a disk.

Fixes: #4059
2016-10-12 20:25:20 +02:00
Lennart Poettering ae739cc1ed journal: refuse opening journal files from the future for writing
Never permit that we write to journal files that have newer timestamps than our
local wallclock has. If we'd accept that, then the entries in the file might
end up not being ordered strictly.

Let's refuse this with ETXTBSY, and then immediately rotate to use a new file,
so that each file remains strictly ordered also be wallclock internally.
2016-10-12 20:25:20 +02:00
Lennart Poettering 7c07001711 journald: automatically rotate journal files when the clock jumps backwards
As soon as we notice that the clock jumps backwards, rotate journal files. This
is beneficial, as this makes sure that the entries in journal files remain
strictly ordered internally, and thus the bisection algorithm applied on it is
not confused.

This should help avoiding borked wallclock-based bisection on journal files as
witnessed in #4278.
2016-10-12 20:25:20 +02:00
Lennart Poettering 0f972d66d4 journald: use the event loop dispatch timestamp for journal entries
Let's use the earliest linearized event timestamp for journal entries we have:
the event dispatch timestamp from the event loop, instead of requerying the
timestamp at the time of writing.

This makes the time a bit more accurate, allows us to query the kernel time one
time less per event loop, and also makes sure we always use the same timestamp
for both attempts to write an entry to a journal file.
2016-10-12 20:25:20 +02:00
Lennart Poettering 989793d341 journal: when iterating through entry arrays and we hit an invalid one keep going
When iterating through partially synced journal files we need to be prepared
for hitting with invalid entries (specifically: non-initialized). Instead of
generated an error and giving up, let's simply try to preceed with the next one
that is valid (and debug log about this).

This reworks the logic introduced with caeab8f626
to iteration in both directions, and tries to look for valid entries located
after the invalid one. It also extends the behaviour to both iterating through
the global entry array and per-data object entry arrays.

Fixes: #4088
2016-10-12 20:25:20 +02:00
Lennart Poettering 1c69f0966a journal: add an explicit check for uninitialized objects
Let's make dissecting of borked journal files more expressive: if we encounter
an object whose first 8 bytes are all zeroes, then let's assume the object was
simply never initialized, and say so.

Previously, this would be detected as "overly short object", which is true too
in a away, but it's a lot more helpful printing different debug options for the
case where the size is not initialized at all and where the size is initialized
to some bogus value.

No function behaviour change, only a different log messages for both cases.
2016-10-12 20:25:20 +02:00
Lennart Poettering ded5034e7a journal: also check that our entry arrays are properly ordered
Let's and extra check, reusing check_properly_ordered() also for
journal_file_next_entry_for_data().
2016-10-12 20:25:20 +02:00
Lennart Poettering b6da4ed045 journal: split out check for properly ordered arrays into its own function
This adds a new call check_properly_ordered(), which we can reuse later, and
makes the code a bit more readable.
2016-10-12 20:25:20 +02:00
Lennart Poettering aa598ba5b6 journal: split out array index inc/dec code into a new call bump_array_index()
This allows us to share a bit more code between journal_file_next_entry() and
journal_file_next_entry_for_data().
2016-10-12 20:25:20 +02:00
Lennart Poettering 202fd896e5 journal: when we encounter a broken journal file, add some debug logging
Let's make it easier to figure out when we see an invalid journal file, why we
consider it invalid, and add some minimal debug logging for it.

This log output is normally not seen (after all, this all is library code),
unless debug logging is exlicitly turned on.
2016-10-12 20:25:20 +02:00
hese10 ec02a6c90a Avoid forever loop for journalctl --list-boots command (#4278)
When date is changed in system to future and normal user logs to new journal file, and then date is changed back to present time, the "journalctl --list-boot" command goes to forever loop. This commit tries to fix this problem by checking first the boot id list if the found boot id was already in that list. If it is found, then stopping the boot id find loop.
2016-10-12 18:40:28 +02:00
Felipe Sateler 95cbf84564 systemd-user: add pam_unix account module
Otherwise systemd-user@ fails because systemd validates the account

Fixes: #4342
2016-10-12 11:56:36 -03:00
Djalal Harouni 4982dbcc30 test: add test to make sure that ProtectKernelModules=yes disconnect mount propagation 2016-10-12 14:12:36 +02:00
Djalal Harouni e66a2f658b core: make sure to dump ProtectKernelModules= value 2016-10-12 14:12:17 +02:00
Djalal Harouni 4084e8fc89 core: check protect_kernel_modules and private_devices in order to setup NNP 2016-10-12 14:12:07 +02:00
Djalal Harouni c575770b75 core:sandbox: lets make /lib/modules/ inaccessible on ProtectKernelModules=
Lets go further and make /lib/modules/ inaccessible for services that do
not have business with modules, this is a minor improvment but it may
help on setups with custom modules and they are limited... in regard of
kernel auto-load feature.

This change introduce NameSpaceInfo struct which we may embed later
inside ExecContext but for now lets just reduce the argument number to
setup_namespace() and merge ProtectKernelModules feature.
2016-10-12 14:11:16 +02:00
Djalal Harouni 625d8769fa test: add test to make sure that CAP_SYS_RAWIO was removed on PrivateDevices=yes 2016-10-12 13:47:59 +02:00
Djalal Harouni 2cd0a73547 core:sandbox: remove CAP_SYS_RAWIO on PrivateDevices=yes
The rawio system calls were filtered, but CAP_SYS_RAWIO allows to access raw
data through /proc, ioctl and some other exotic system calls...
2016-10-12 13:39:49 +02:00
Djalal Harouni 3ae33295f0 test: add capability tests for ProtectKernelModules=
This just adds capabilities test.
2016-10-12 13:36:27 +02:00
Djalal Harouni 502d704e5e core:sandbox: Add ProtectKernelModules= option
This is useful to turn off explicit module load and unload operations on modular
kernels. This option removes CAP_SYS_MODULE from the capability bounding set for
the unit, and installs a system call filter to block module system calls.

This option will not prevent the kernel from loading modules using the module
auto-load feature which is a system wide operation.
2016-10-12 13:31:21 +02:00
Lennart Poettering 18e51a022c Merge pull request #4351 from keszybz/nspawn-debugging
Enhance nspawn debug logs for mount/unmount operations
2016-10-12 11:21:11 +02:00
Zbigniew Jędrzejewski-Szmek 3ccb886283 Allow block and char classes in DeviceAllow bus properties (#4353)
Allowed paths are unified betwen the configuration file parses and the bus
property checker. The biggest change is that the bus code now allows "block-"
and "char-" classes. In addition, path_startswith("/dev") was used in the bus
code, and startswith("/dev") was used in the config file code. It seems
reasonable to use path_startswith() which allows a slightly broader class of
strings.

Fixes #3935.
2016-10-12 11:12:11 +02:00
Andrew Jeddeloh 9f1008d513 networkd: add dbus interface for lease raw options (#3528)
Add a dbus object to represent dhcp leases and their raw options (i.e.
options 224-254).
2016-10-11 21:28:22 -04:00
0xAX 74e7579c17 core/main: get rid from excess check of ACTION_TEST (#4350)
If `--test` command line option was passed, the systemd set skip_setup
to true during bootup. But after this we check again that arg_action is
test or help and opens pager depends on result.

We should skip setup in a case when `--test` is passed, but it is also
safe to set skip_setup in a case of `--help`. So let's remove first
check and move skip_setup = true to the second check.
2016-10-11 17:30:04 -04:00
Zbigniew Jędrzejewski-Szmek 7ef7147041 missing: add a bunch of mount flags 2016-10-11 17:24:03 -04:00
Evgeny Vereshchagin 8492849ee5 nspawn: let's mount(/tmp) inside the user namespace (#4340)
Fixes:
host# systemd-nspawn -D ... -U -b systemd.unit=multi-user.target
...
$ grep /tmp /proc/self/mountinfo
154 145 0:41 / /tmp rw - tmpfs tmpfs rw,seclabel,uid=1036124160,gid=1036124160

$ umount /tmp
umount: /root/tmp: not mounted

$ systemctl poweroff
...
[FAILED] Failed unmounting Temporary Directory.
2016-10-11 17:18:27 -04:00
Zbigniew Jędrzejewski-Szmek 60e76d4897 nspawn,mount-util: add [u]mount_verbose and use it in nspawn
This makes it easier to debug failed nspawn invocations:

Mounting sysfs on /var/lib/machines/fedora-rawhide/sys (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV "")...
Mounting tmpfs on /var/lib/machines/fedora-rawhide/dev (MS_NOSUID|MS_STRICTATIME "mode=755,uid=1450901504,gid=1450901504")...
Mounting tmpfs on /var/lib/machines/fedora-rawhide/dev/shm (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=1777,uid=1450901504,gid=1450901504")...
Mounting tmpfs on /var/lib/machines/fedora-rawhide/run (MS_NOSUID|MS_NODEV|MS_STRICTATIME "mode=755,uid=1450901504,gid=1450901504")...
Bind-mounting /sys/fs/selinux on /var/lib/machines/fedora-rawhide/sys/fs/selinux (MS_BIND "")...
Remounting /var/lib/machines/fedora-rawhide/sys/fs/selinux (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")...
Mounting proc on /proc (MS_NOSUID|MS_NOEXEC|MS_NODEV "")...
Bind-mounting /proc/sys on /proc/sys (MS_BIND "")...
Remounting /proc/sys (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")...
Bind-mounting /proc/sysrq-trigger on /proc/sysrq-trigger (MS_BIND "")...
Remounting /proc/sysrq-trigger (MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_BIND|MS_REMOUNT "")...
Mounting tmpfs on /tmp (MS_STRICTATIME "mode=1777,uid=0,gid=0")...
Mounting tmpfs on /sys/fs/cgroup (MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_STRICTATIME "mode=755,uid=0,gid=0")...
Mounting cgroup on /sys/fs/cgroup/systemd (MS_NOSUID|MS_NOEXEC|MS_NODEV "none,name=systemd,xattr")...
Failed to mount cgroup on /sys/fs/cgroup/systemd (MS_NOSUID|MS_NOEXEC|MS_NODEV "none,name=systemd,xattr"): No such file or directory
2016-10-11 16:50:07 -04:00
Zbigniew Jędrzejewski-Szmek add554f4e1 nspawn: small cleanups in get_controllers()
- check for oom after strdup
- no need to truncate the line since we're only extracting one field anyway
- use STR_IN_SET
2016-10-11 16:46:58 -04:00
Zbigniew Jędrzejewski-Szmek ada5412039 nspawn: simplify arg_us_cgns passing
We would check the condition cg_ns_supported() twice. No functional
change.
2016-10-11 16:46:58 -04:00
Lennart Poettering e0d2adfde6 core: chown() any TTY used for stdin, not just when StandardInput=tty is used (#4347)
If stdin is supplied as an fd for transient units (using the
StandardInputFileDescriptor pseudo-property for transient units), then we
should also fix up the TTY ownership, not just when we opened the TTY
ourselves.

This simply drops the explicit is_terminal_input()-based check. Note that
chown_terminal() internally does a much more appropriate isatty()-based check
anyway, hence we can drop this without replacement.

Fixes: #4260
2016-10-11 14:07:22 -04:00
Thomas H. P. Andersen f68c9dd5c6 resolve: remove unsed counter (#4349)
It was introduced but never used in 45ec7efb.
2016-10-11 13:51:03 -04:00
Zbigniew Jędrzejewski-Szmek 56b4c80b42 Merge pull request #4348 from poettering/docfixes
Various smaller documentation fixes.
2016-10-11 13:49:15 -04:00
Zbigniew Jędrzejewski-Szmek b744e8937c Merge pull request #4067 from poettering/invocation-id
Add an "invocation ID" concept to the service manager
2016-10-11 13:40:50 -04:00
Lennart Poettering 2cdbbc9a34 man: avoid using the term "loaded" for units currently in memory, since we also have a unit state of that name
Fixes: #3971
2016-10-11 17:55:04 +02:00
Lennart Poettering 57c9e04781 pager: tiny beautification 2016-10-11 17:46:59 +02:00
Stefan Schweter 1f1a5e8b40 udevadm: use parse_sec instead of atoi for timeout option (#4331)
log_error method is used instead of fprintf
2016-10-11 09:08:04 +02:00
Zbigniew Jędrzejewski-Szmek ec72b96366 Merge pull request #4337 from poettering/exit-code
Fix for #4275 and more
2016-10-10 21:24:57 -04:00
Thomas H. P. Andersen 01b0669e9a resolved: initialize variable (#4338)
r was not initialized and would be used if "tcp" was the only option
used for the stub. We should initialize it to 0 to indicate that no
error happened in the udp case.
2016-10-10 20:12:40 -04:00
Martin Pitt c637e72b7a Merge pull request #4336 from dandedrick/journal-remote-non-blocking
Journal remote non blocking
2016-10-10 23:13:26 +02:00
Lennart Poettering 052364d41f core: simplify if branches a bit
We do the same thing in two branches, let's merge them. Let's also add an
explanatory comment, while we are at it.
2016-10-10 22:57:02 +02:00
Lennart Poettering f2aed3070d core: make use of IN_SET() in various places in mount.c 2016-10-10 22:57:02 +02:00
Lennart Poettering 1f0958f640 core: when determining whether a process exit status is clean, consider whether it is a command or a daemon
SIGTERM should be considered a clean exit code for daemons (i.e. long-running
processes, as a daemon without SIGTERM handler may be shut down without issues
via SIGTERM still) while it should not be considered a clean exit code for
commands (i.e. short-running processes).

Let's add two different clean checking modes for this, and use the right one at
the appropriate places.

Fixes: #4275
2016-10-10 22:57:01 +02:00
Lennart Poettering 38107f5a4a core: lower exit status "level" at one place
When we print information about PID 1's crashdump subprocess failing. In this
case we *know* that we do not generate LSB exit codes, as it's basically PID 1
itself that exited there.
2016-10-10 22:56:55 +02:00
0xAX f6dd106c73 main: use strdup instead of free_and_strdup to initialize default unit (#4335)
Previously we've used free_and_strdup() to fill arg_default_unit with unit
name, If we didn't pass default unit name through a kernel command line or
command line arguments. But we can use just strdup() instead of
free_and_strdup() for this, because we will start fill arg_default_unit
only if it wasn't set before.
2016-10-10 22:11:36 +02:00
Lennart Poettering 41e2036eb8 exit-status: kill is_clean_exit_lsb(), move logic to sysv-generator
Let's get rid of is_clean_exit_lsb(), let's move the logic for the special
handling of the two LSB exit codes into the sysv-generator by writing out
appropriate SuccessExitStatus= lines if the LSB header exists. This is not only
semantically more correct, bug also fixes a bug as the code in service.c that
chose between is_clean_exit_lsb() and is_clean_exit() based this check on
whether a native unit files was available for the unit. However, that check was
bogus since a long time, since the SysV generator was introduced and native
SysV script support was removed from PID 1, as in that case a unit file always
existed.
2016-10-10 21:48:08 +02:00
Dan Dedrick 800d3f3478 journal-remote: make the child pipe non-blocking
We are going to add this child as a source to our event loop so we don't
want to block when reading data from it as this will prevent us from
processing other events. Specifically this will block the signalfds
which means if we are waiting for data from curl we won't handle SIGTERM
or SIGINT until we happen to get more data.
2016-10-10 15:11:01 -04:00
Lennart Poettering 3b8769bda8 install: let's always refer to the actual setting in errors 2016-10-10 20:11:49 +02:00
Lennart Poettering 56ecbcc048 exit-status: reorder the exit status switch table
Let's make sure it's in the same order as the actual enum defining the exit
statuses.
2016-10-10 20:11:21 +02:00
Lennart Poettering 65e3fd83c9 exit-status: remove ExitStatus typedef
Do not make up our own type for ExitStatus, but use the type used by POSIX for
this, which is "int".  In particular as we never used that type outside of the
definition of exit_status_to_string() where we internally cast the paramter to
(int) every single time we used it.

Hence, let's simplify things, drop the type and use the kernel type directly.
2016-10-10 20:08:41 +02:00
Susant Sahani 53c06862c1 networkd: rename Rename CheckSum → Checksum (#4312) 2016-10-10 19:52:12 +02:00
Lennart Poettering 6dca2fe325 Merge pull request #4332 from keszybz/nspawn-arguments-3
nspawn --private-users parsing, v2
2016-10-10 19:51:51 +02:00
0xAX c76cf844d6 tree-wide: pass return value of make_null_stdio() to warning instead of errno (#4328)
as @poettering suggested in the #4320
2016-10-10 19:51:33 +02:00
Evgeny Vereshchagin a0f72a24e0 Merge pull request #4310 from keszybz/nspawn-autodetect
Autodetect systemd version in containers started by systemd-nspawn
2016-10-10 20:47:25 +03:00
Zbigniew Jędrzejewski-Szmek be7157316c nspawn: better error messages for parsing errors
In particular, the check for arg_uid_range <= 0 is moved to the end, so that
"foobar:0" gives "Failed to parse UID", and not "UID range cannot be 0.".
2016-10-10 11:55:06 -04:00
Zbigniew Jędrzejewski-Szmek ae209204d8 nspawn,man: fix parsing of numeric args for --private-users, accept any boolean
This is like the previous reverted commit, but any boolean is still accepted,
not just "yes" and "no". Man page is adjusted to match the code.
2016-10-10 11:55:06 -04:00
Zbigniew Jędrzejewski-Szmek 6c2058b35e Revert "nspawn: fix parsing of numeric arguments for --private-users"
This reverts commit bfd292ec35.
2016-10-10 11:17:40 -04:00
Felipe Sateler baed1fedba login: drop fedora-specific PAM config, add note to DISTRO_PORTING (#4314)
It is impossible to ship a fully generic PAM configuration upstream.
Therefore, ship a minimal configuration with the systemd --user requirements,
and add a note to DISTRO_PORTING documenting this.

Fixes #4284
2016-10-10 15:40:05 +02:00
Lennart Poettering 7a9ee77204 Merge pull request #4323 from keszybz/resolved-in-userns
A fix to get resolved to start in userns
2016-10-10 09:37:01 +02:00
0xAX 10c961b9c9 main: initialize default unit little later (#4321)
systemd fills arg_default_unit during startup with default.target
value. But arg_default_unit may be overwritten in parse_argv() or
parse_proc_cmdline_item().

Let's check value of arg_default_unit after calls of parse_argv()
and parse_proc_cmdline_item() and fill it with default.target if
it wasn't filled before. In this way we will not spend unnecessary
time to for filling arg_default_unit with default.target.
2016-10-09 22:57:03 -04:00
0xAX 9fc932bff1 tree-wide: print warning in a failure case of make_null_stdio() (#4320)
The make_null_stdio() may fail. Let's check its result and print
warning message instead of keeping silence.
2016-10-09 22:55:24 -04:00
Zbigniew Jędrzejewski-Szmek 0f4db364c9 resolved: also disable stub listener on EPERM
When running in a user namespace without private networking, resolved would
fail to start. There isn't much difference between EADDRINUSE and EPERM,
so treat them the same, except for the warning message text.
2016-10-09 21:22:23 -04:00
Zbigniew Jędrzejewski-Szmek 424e490b94 resolved: simplify error handling in manager_dns_stub_{udp,tcp}_fd()
Make sure an error is always printed… When systemd-resolved is started in a
user namespace without private network, it would fail on setsockopt, but the
error wouldn't be particularly informative:
"Failed to start manager: permission denied."
2016-10-09 21:22:23 -04:00
Evgeny Vereshchagin 763368943a Merge pull request #4319 from keszybz/nspawn-arguments
Nspawn arguments parsing and man page update
2016-10-10 03:26:17 +03:00
Lans Zhang 59991e3fe3 sd-boot: trigger to record further logs to tcg 2.0 final event log area (#4302)
According to TCG EFI Protocol Specification for TPM 2.0 family,
all events generated after the invocation of EFI_TCG2_GET_EVENT_LOG
shall be stored in an instance of an EFI_CONFIGURATION_TABLE aka
EFI TCG 2.0 final events table. Hence, it is necessary to trigger the
internal switch through calling get_event_log() in order to allow
to retrieve the logs from OS runtime.

msekletar:
> I've looked at EDK2 and indeed log entry is added to FinalEventsTable only after 
> EFI_TCG2_PROTOCOL.GetEventLog was called[1][2]. Also, same patch was currently
> merged to shim by Peter Jones [3].

[1] https://github.com/tianocore/edk2/blob/master/SecurityPkg/Tcg/Tcg2Dxe/Tcg2Dxe.c#L698
[2] https://github.com/tianocore/edk2/blob/master/SecurityPkg/Tcg/Tcg2Dxe/Tcg2Dxe.c#L824
[3] rhinstaller/shim#64
2016-10-09 18:59:54 -04:00
Zbigniew Jędrzejewski-Szmek bfd292ec35 nspawn: fix parsing of numeric arguments for --private-users
The documentation says lists "yes", "no", "pick", and numeric arguments.
But parse_boolean was attempted first, so various numeric arguments were
misinterpreted.

In particular, this fixes --private-users=0 to mean the same thing as
--private-users=0:65536.

While at it, use strndupa to avoid some error handling.
Also give a better error for an empty UID range. I think it's likely that
people will use --private-users=0:0 thinking that the argument means UID:GID.
2016-10-09 11:52:35 -04:00
Zbigniew Jędrzejewski-Szmek 27eb8e9028 nspawn: reindent table 2016-10-09 11:51:18 -04:00
Zbigniew Jędrzejewski-Szmek a8725a06e6 nspawn: also fall back to legacy cgroup hierarchy for old containers
Current systemd version detection routine cannot detect systemd 230,
only systmed >= 231. This means that we'll still use the legacy hierarchy
in some cases where we wouldn't have too. If somebody figures out a nice
way to detect systemd 230 this can be later improved.
2016-10-08 19:03:53 -04:00
0xAX 084f580557 machinectl: enable pager on help (#4313)
as its output is fairly long.
2016-10-08 17:49:33 -04:00
Zbigniew Jędrzejewski-Szmek 0fd9563fde nspawn: use mixed cgroup hierarchy only when container has new systemd
systemd-soon-to-be-released-232 is able to deal with the mixed hierarchy.
So make an educated guess, and use the mixed hierarchy in that case.

Tested by running the host with mixed hierarchy (i.e. simply using a recent
kernel with systemd from git), and booting first a container with older systemd,
and then one with a newer systemd.

Fixes #4008.
2016-10-08 14:48:41 -04:00
Zbigniew Jędrzejewski-Szmek 27e29a1e43 nspawn: fix spurious reboot if container process returns 133 2016-10-08 14:48:41 -04:00
Zbigniew Jędrzejewski-Szmek b006762524 nspawn: move the main loop body out to a new function
The new function has 416 lines by itself!

"return log_error_errno" is used to nicely reduce the volume of error
handling code.

A few minor issues are fixed on the way:
- positive value was used as error value (EIO), causing systemd-nspawn
  to return success, even though it shouldn't.
- In two places random values were used as error status, when the
  actual value was in an unusual place (etc_password_lock, notify_socket).

Those are the only functional changes.

There is another potential issue, which is marked with a comment, and left
unresolved: the container can also return 133 by itself, causing a spurious
reboot.
2016-10-08 14:48:41 -04:00
Zbigniew Jędrzejewski-Szmek 98afd6af3a nspawn: check env var first, detect second
If we are going to use the env var to override the detection result
anyway, there is not point in doing the detection, especially that
it can fail.
2016-10-08 14:48:41 -04:00
Zbigniew Jędrzejewski-Szmek 5a46d55fc8 path-util: add a function to peek into a container and guess systemd version
This is a bit crude and only works for new systemd versions which
have libsystemd-shared.
2016-10-08 14:48:41 -04:00
Stefan Schweter a60f4d0b44 systemd-resolve: use sha256 for local-part of openpgp key (#4193) 2016-10-08 13:59:34 +02:00
Susant Sahani e63be0847c networkd: address add support to configure flags (#4201)
This patch enables to configure

IFA_F_HOMEADDRESS
IFA_F_NODAD
IFA_F_MANAGETEMPADDR
IFA_F_NOPREFIXROUTE
IFA_F_MCAUTOJOIN
2016-10-08 13:05:41 +02:00
Lennart Poettering 3157b2d9d2 Merge pull request #4061 from dm0-/coreos-1545
resolved: add an option to disable the stub resolver
2016-10-07 23:38:03 +02:00
David Michael 1ae4329575 resolved: add an option to control the DNS stub listener 2016-10-07 12:14:38 -07:00
Lennart Poettering 4b58153dd2 core: add "invocation ID" concept to service manager
This adds a new invocation ID concept to the service manager. The invocation ID
identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is
generated each time a unit moves from and inactive to an activating or active
state.

The primary usecase for this concept is to connect the runtime data PID 1
maintains about a service with the offline data the journal stores about it.
Previously we'd use the unit name plus start/stop times, which however is
highly racy since the journal will generally process log data after the service
already ended.

The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel,
except that it applies to an individual unit instead of the whole system.

The invocation ID is passed to the activated processes as environment variable.
It is additionally stored as extended attribute on the cgroup of the unit. The
latter is used by journald to automatically retrieve it for each log logged
message and attach it to the log entry. The environment variable is very easily
accessible, even for unprivileged services. OTOH the extended attribute is only
accessible to privileged processes (this is because cgroupfs only supports the
"trusted." xattr namespace, not "user."). The environment variable may be
altered by services, the extended attribute may not be, hence is the better
choice for the journal.

Note that reading the invocation ID off the extended attribute from journald is
racy, similar to the way reading the unit name for a logging process is.

This patch adds APIs to read the invocation ID to sd-id128:
sd_id128_get_invocation() may be used in a similar fashion to
sd_id128_get_boot().

PID1's own logging is updated to always include the invocation ID when it logs
information about a unit.

A new bus call GetUnitByInvocationID() is added that allows retrieving a bus
path to a unit by its invocation ID. The bus path is built using the invocation
ID, thus providing a path for referring to a unit that is valid only for the
current runtime cycleof it.

Outlook for the future: should the kernel eventually allow passing of cgroup
information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we
can alter the invocation ID to be generated as hash from that rather than
entirely randomly. This way we can derive the invocation race-freely from the
messages.
2016-10-07 20:14:38 +02:00
Lennart Poettering e5d855d364 util: use SPECIAL_ROOT_SLICE macro where appropriate 2016-10-07 20:14:38 +02:00
Lennart Poettering 0474ef7b3e log: minor fixes
Most important is a fix to negate the error number if necessary, before we
first access it.
2016-10-07 20:14:38 +02:00
Lennart Poettering 398a50cdd1 journal: fix format string used for usec_t 2016-10-07 20:14:38 +02:00
Lennart Poettering d473176a74 journal: complete slice info in journal metadata
We are already attaching the system slice information to log messages, now add
theuser slice info too, as well as the object slice info.
2016-10-07 20:14:38 +02:00
Lennart Poettering 766c94ad6b bus-util: generalize helper for ID128 prpoerties
This way, we can make use of this in other code, too.
2016-10-07 20:14:38 +02:00
Lennart Poettering 4a39c77419 strv: fix STRV_FOREACH_BACKWARDS() to be a single statement only
Let's make sure people invoking STRV_FOREACH_BACKWARDS() as a single statement
of an if statement don't fall into a trap, and find the tail for the list via
strv_length().
2016-10-07 20:14:38 +02:00
Lennart Poettering f767d3de65 Merge pull request #4304 from poettering/notify-nul-check
3 minor improvements for notification message handling
2016-10-07 18:30:53 +02:00
Zbigniew Jędrzejewski-Szmek 8f4d640135 core: only warn on short reads on signal fd 2016-10-07 10:05:04 -04:00
Susant Sahani 1644102735 networkd: remote checksum offload for vxlan (#4110)
This patch adds support to remote checksum checksum offload to VXLAN.
This patch adds RemoteCheckSumTx and RemoteCheckSumRx vxlan configuration
to enable remote checksum offload for transmit and receive on the VXLAN tunnel.
2016-10-07 09:46:18 -04:00
rwmjones 171b533800 architecture: Add support for the RISC-V architecture. (#4305)
RISC-V is an open source ISA in development since 2010 at UCB.
For more information, see https://riscv.org/

I am adding RISC-V support to Fedora:
https://fedoraproject.org/wiki/Architectures/RISC-V

There are three major variants of the architecture (32-, 64- and
128-bit).  The 128-bit variant is a paper exercise, but the other
two really exist in silicon.  RISC-V is always little endian.

On Linux, the default kernel uname(2) can return "riscv" for all
variants.  However a patch was added recently which makes the kernel
return one of "riscv32" or "riscv64" (or in future "riscv128").  So
systemd should be prepared to handle any of "riscv", "riscv32" or
"riscv64" (in future, "riscv128" but that is not included in the
current patch).  If the kernel returns "riscv" then you need to use
the pointer size in order to know the real variant.

The Fedora/RISC-V kernel only ever returns "riscv64" since we're
only doing Fedora for 64 bit at the moment, and we've patched the
kernel so it doesn't return "riscv".

As well as the major bitsize variants, there are also architecture
extensions.  However I'm trying to ensure that uname(2) does *not*
return any other information about those in utsname.machine, so that
we don't end up with "riscv64abcde" nonsense.  Instead those
extensions will be exposed in /proc/cpuinfo similar to how flags
work in x86.
2016-10-07 14:56:27 +02:00
Lennart Poettering 875ca88da5 manager: tighten incoming notification message checks
Let's not accept datagrams with embedded NUL bytes. Previously we'd simply
ignore everything after the first NUL byte. But given that sending us that is
pretty ugly let's instead complain and refuse.

With this change we'll only accept messages that have exactly zero or one NUL
bytes at the very end of the datagram.
2016-10-07 12:14:33 +02:00
Lennart Poettering 045a3d5989 manager: be stricter with incomining notifications, warn properly about too large ones
Let's make the kernel let us know the full, original datagram size of the
incoming message. If it's larger than the buffer space provided by us, drop the
whole message with a warning.

Before this change the kernel would truncate the message for us to the buffer
space provided, and we'd not complain about this, and simply process the
incomplete message as far as it made sense.
2016-10-07 12:12:10 +02:00
Lennart Poettering c55ae51e77 manager: don't ever busy loop when we get a notification message we can't process
If the kernel doesn't permit us to dequeue/process an incoming notification
datagram message it's still better to stop processing the notification messages
altogether than to enter a busy loop where we keep getting notified but can't
do a thing about it.

With this change, manager_dispatch_notify_fd() behaviour is changed like this:

- if an error indicating a spurious wake-up is seen on recvmsg(), ignore it
  (EAGAIN/EINTR)

- if any other error is seen on recvmsg() propagate it, thus disabling
  processing of further wakeups

- if any error is seen on later code in the function, warn about it but do not
  propagate it, as in this cas we're not going to busy loop as the offending
  message is already dequeued.
2016-10-07 12:08:51 +02:00
Lukáš Nykrýn 24dd31c19e core: add possibility to set action for ctrl-alt-del burst (#4105)
For some certification, it should not be possible to reboot the machine through ctrl-alt-delete. Currently we suggest our customers to mask the ctrl-alt-delete target, but that is obviously not enough.

Patching the keymaps to disable that is really not a way to go for them, because the settings need to be easily checked by some SCAP tools.
2016-10-06 21:08:21 -04:00
Lennart Poettering 97f0e76f18 user-util: rework maybe_setgroups() a bit
Let's drop the caching of the setgroups /proc field for now. While there's a
strict regime in place when it changes states, let's better not cache it since
we cannot really be sure we follow that regime correctly.

More importantly however, this is not in performance sensitive code, and
there's no indication the cache is really beneficial, hence let's drop the
caching and make things a bit simpler.

Also, while we are at it, rework the error handling a bit, and always return
negative errno-style error codes, following our usual coding style. This has
the benefit that we can sensible hanld read_one_line_file() errors, without
having to updat errno explicitly.
2016-10-06 19:04:10 +02:00
Lennart Poettering 7429b2eb83 tree-wide: drop some misleading compiler warnings
gcc at some optimization levels thinks thes variables were used without
initialization. it's wrong, but let's make the message go anyway.
2016-10-06 19:04:10 +02:00
Lennart Poettering 2d6fce8d7c core: leave PAM stub process around with GIDs updated
In the process execution code of PID 1, before
096424d123 the GID settings where changed before
invoking PAM, and the UID settings after. After the change both changes are
made after the PAM session hooks are run. When invoking PAM we fork once, and
leave a stub process around which will invoke the PAM session end hooks when
the session goes away. This code previously was dropping the remaining privs
(which were precisely the UID). Fix this code to do this correctly again, by
really dropping them else (i.e. the GID as well).

While we are at it, also fix error logging of this code.

Fixes: #4238
2016-10-06 19:04:10 +02:00
Lennart Poettering 729c6467df sd-bus: add DNS errors to the errno translation table
We generate these, hence we should also add errno translations for them.
2016-10-06 19:04:10 +02:00
Lennart Poettering 6f21e066f6 resolved: properly handle BADCOOKIE DNS error
Add this new error code (documented in RFC7873) to our list of known errors.
2016-10-06 19:04:09 +02:00
Lennart Poettering 19526c6679 sd-bus: add a few missing entries to the error translation tables
These were forgotten, let's add some useful mappings for all errors we define.
2016-10-06 19:04:09 +02:00
Lennart Poettering 429b435026 sd-device/networkd: unify code to get a socket for issuing netdev ioctls on
As suggested here:

https://github.com/systemd/systemd/pull/4296#issuecomment-251911349

Let's try AF_INET first as socket, but let's fall back to AF_NETLINK, so that
we can use a protocol-independent socket here if possible. This has the benefit
that our code will still work even if AF_INET/AF_INET6 is made unavailable (for
exmple via seccomp), at least on current kernels.
2016-10-06 19:04:01 +02:00
Lennart Poettering e057995bb1 Merge pull request #4280 from giuseppe/unprivileged-user
[RFC] run systemd in an unprivileged container
2016-10-06 15:44:27 +02:00
Lennart Poettering 8ffce876de Merge pull request #4199 from dvdhrm/hwdb-order
hwdb: return conflicts in a well-defined order
2016-10-06 11:58:13 +02:00
Giuseppe Scrivano 36d854780c core: do not fail in a container if we can't use setgroups
It might be blocked through /proc/PID/setgroups
2016-10-06 11:49:00 +02:00
Giuseppe Scrivano f006b30bd5 audit: disable if cannot create NETLINK_AUDIT socket 2016-10-06 11:49:00 +02:00
Susant Sahani 197e280932 networkd: fix coding style (#4294) 2016-10-06 11:45:07 +02:00
Yuki Inoguchi d2665e0866 journald, ratelimit: fix inaccurate message suppression in journal_rate_limit_test() (#4291)
Currently, the ratelimit does not handle the number of suppressed messages accurately.
Even though the number of messages reaches the limit, it still allows to add one extra messages to journal.

This patch fixes the problem.
2016-10-06 11:44:51 +02:00
Giuseppe Scrivano 77531863ca Fix typo 2016-10-05 18:36:48 +02:00
Tobias Jungel f6bb7ac5c6 networkd: use BridgeFDB as well on bridge ports (#4253)
[BridgeFDB] did not apply to bridge ports so far. This patch adds the proper
handling. In case of a bridge interface the correct flag NTF_MASTER is now set
in the netlink call. FDB MAC addresses are now applied in
link_enter_set_addresses to make sure the link is setup.
2016-10-05 17:06:40 +02:00
hbrueckner 6abfd30372 seccomp: add support for the s390 architecture (#4287)
Add seccomp support for the s390 architecture (31-bit and 64-bit)
to systemd.

This requires libseccomp >= 2.3.1.
2016-10-05 13:58:55 +02:00
Djalal Harouni 41eb436265 nspawn: add log message to let users know that nspawn needs an empty /dev directory (#4226)
Fixes https://github.com/systemd/systemd/issues/3695

At the same time it adds a protection against userns chown of inodes of
a shared mount point.
2016-10-05 06:57:02 +02:00
Stefan Schweter 629ff674ac tree-wide: remove consecutive duplicate words in comments 2016-10-04 17:06:25 +02:00
Michael Olbrich 5076f4219e list: LIST_INSERT_BEFORE: update head if necessary (#4261)
If the new item is inserted before the first item in the list, then the
head must be updated as well.
Add a test to the list unit test to check for this.
2016-10-04 16:15:37 +02:00
Michael Olbrich c080fbce9c automount: make sure the expire event is restarted after a daemon-reload (#4265)
If the corresponding mount unit is deserialized after the automount unit
then the expire event is set up in automount_trigger_notify(). However, if
the mount unit is deserialized first then the automount unit is still in
state AUTOMOUNT_DEAD and automount_trigger_notify() aborts without setting
up the expire event.
Explicitly call automount_start_expire() during coldplug to make sure that
the expire event is set up as necessary.

Fixes #4249.
2016-10-04 16:13:27 +02:00
Alban Crequy 19caffac75 nspawn: set shared propagation mode for the container 2016-10-03 14:19:27 +02:00
Zbigniew Jędrzejewski-Szmek a63ee40751 core: do not try to create /run/systemd/transient in test mode
This prevented systemd-analyze from unprivileged operation on older systemd
installations, which should be possible.
Also, we shouldn't touch the file system in test mode even if we can.
2016-10-01 22:53:17 +02:00
Zbigniew Jędrzejewski-Szmek d941ea22e3 analyze-verify: honour $SYSTEMD_UNIT_PATH, allow system paths to be ignored
SYSTEMD_UNIT_PATH=foobar: systemd-analyze verify barbar/unit.service
will load units from barbar/, foobar/, /etc/systemd/system/, etc.

SYSTEMD_UNIT_PATH= systemd-analyze verify barbar/unit.service
will load units only from barbar/, which is useful e.g. when testing
systemd's own units on a system with an older version of systemd installed.
2016-10-01 22:53:17 +02:00
Zbigniew Jędrzejewski-Szmek dd5e7000cb core: complain if Before= dep on .device is declared
[Unit]
Before=foobar.device

[Service]
ExecStart=/bin/true
Type=oneshot

$ systemd-analyze verify before-device.service
before-device.service: Dependency Before=foobar.device ignored (.device units cannot be delayed)
2016-10-01 22:53:17 +02:00
Martin Pitt 93a0884126 systemctl: Add --wait option to wait until started units terminate again
Fixes #3830
2016-10-01 17:58:59 +02:00
Martin Pitt d7247512a9 nss-resolve: return NOTFOUND instead of UNAVAIL on resolution errors
It needs to be possible to tell apart "the nss-resolve module does not exist"
(which can happen when running foreign-architecture programs) from "the queried
DNS name failed DNSSEC validation" or other errors. So return NOTFOUND for these
cases too, and only keep UNAVAIL for the cases where we cannot handle the given
address family.

This makes it possible to configure a fallback to "dns" without breaking
DNSSEC, with "resolve [!UNAVAIL=return] dns". Add this to the manpage.

This does not change behaviour if resolved is not running, as that already
falls back to the "dns" glibc module.

Fixes #4157
2016-10-01 16:59:06 +02:00
Martin Pitt 46c7a7ac87 nss-resolve: simplify error handling
Handle general errors from the resolved call in _nss_resolve_gethostbyaddr2_r()
the same say as in the other variants: Just "goto fail" as that does exactly
the same.
2016-10-01 16:43:29 +02:00
Zbigniew Jędrzejewski-Szmek 5fd2c135f1 core: update warning message
"closing all" might suggest that _all_ fds received with the notification message
will be closed. Reword the message to clarify that only the "unused" ones will be
closed.
2016-10-01 11:01:31 +02:00
Zbigniew Jędrzejewski-Szmek c4bee3c40e core: get rid of unneeded state variable
No functional change.
2016-10-01 11:01:31 +02:00
Elias Probst 82936769a8 networkd: fix "parametres" typo (#4244) 2016-09-30 13:25:25 +02:00
Martin Pitt 6740ec4a65 Merge pull request #4225 from keszybz/coredump
coredump: remove Storage=both support, various fixes for sd-coredump and coredumpctl
2016-09-30 11:16:51 +02:00
Martin Pitt b9fe94cad9 resolved: don't query domain-limited DNS servers for other domains (#3621)
DNS servers which have route-only domains should only be used for
the specified domains. Routing queries about other domains there is a privacy
violation, prone to fail (as that DNS server was not meant to be used for other
domains), and puts unnecessary load onto that server.

Introduce a new helper function dns_server_limited_domains() that checks if the
DNS server should only be used for some selected domains, i. e. has some
route-only domains without "~.". Use that when determining whether to query it
in the scope, and when writing resolv.conf.

Extend the test_route_only_dns() case to ensure that the DNS server limited to
~company does not appear in resolv.conf. Add test_route_only_dns_all_domains()
to ensure that a server that also has ~. does appear in resolv.conf as global
name server. These reproduce #3420.

Add a new test_resolved_domain_restricted_dns() test case that verifies that
domain-limited DNS servers are only being used for those domains. This
reproduces #3421.

Clarify what a "routing domain" is in the manpage.

Fixes #3420
Fixes #3421
2016-09-30 09:30:08 +02:00
Zbigniew Jędrzejewski-Szmek a86b76753d pid1: more informative error message for ignored notifications
It's probably easier to diagnose a bad notification message if the
contents are printed. But still, do anything only if debugging is on.
2016-09-29 22:57:57 +02:00
Zbigniew Jędrzejewski-Szmek 8523bf7dd5 pid1: process zero-length notification messages again
This undoes 531ac2b234. I acked that patch without looking at the code
carefully enough. There are two problems:
- we want to process the fds anyway
- in principle empty notification messages are valid, and we should
  process them as usual, including logging using log_unit_debug().
2016-09-29 22:57:57 +02:00
Franck Bui 9987750e7a pid1: don't return any error in manager_dispatch_notify_fd() (#4240)
If manager_dispatch_notify_fd() fails and returns an error then the handling of
service notifications will be disabled entirely leading to a compromised system.

For example pid1 won't be able to receive the WATCHDOG messages anymore and
will kill all services supposed to send such messages.
2016-09-29 19:44:34 +02:00
Jorge Niedbalski 531ac2b234 If the notification message length is 0, ignore the message (#4237)
Fixes #4234.

Signed-off-by: Jorge Niedbalski <jnr@metaklass.org>
2016-09-29 05:26:16 -04:00
Zbigniew Jędrzejewski-Szmek 73a99163a7 coredump,catalog: give better notice when a core file is truncated
coredump had code to check if copy_bytes() hit the max_bytes limit,
and refuse further processing in that case.
But in 84ee096044, the return convention for copy_bytes() was changed
from -EFBIG to 1 for the case when the limit is hit, so the condition
check in coredump couldn't ever trigger.
But it seems that *do* want to process such truncated cores [1].
So change the code to detect truncation properly, but instead of
returning an error, give a nice log entry.

[1] https://github.com/systemd/systemd/issues/3883#issuecomment-239106337

Should fix (or at least alleviate) #3883.
2016-09-28 23:50:29 +02:00
Zbigniew Jędrzejewski-Szmek 6e9ef6038f coredump: log if the core is too large to store or generate backtrace
Another fix for #4161.
2016-09-28 23:49:01 +02:00
Zbigniew Jędrzejewski-Szmek bb7c5bad4a coredumpctl: delay the "on tty" refusal until as late as possible
For the user, if the core file is missing or inaccessible, it is
more interesting that the fact that they forgot to pipe to a file.
So delay the failure from the check until after we have verified
that the file or the COREDUMP field are present.

Partially fixes #4161.

Also, error reporting on failure was duplicated. save_core() now
always prints an error message (because it knows the paths involved,
so can the most useful message), and the callers don't have to.
2016-09-28 23:49:01 +02:00
Zbigniew Jędrzejewski-Szmek 062b99e8be coredumpctl: tighten print_field() code
Propagate errors properly, so that if we hit oom or an error in the
journal, the whole command will fail. This is important when using
the output in scripts.

Support the output of multiple values for the same field with -F.
The journal supports that, and our official commands should too, as
far as it makes sense. -F can be used to print user-defined fields
(e.g. somebody could use a TAG field with multiple occurences), so
we should support that too. That seems better than silently printing
the last value found as was done before.

We would iterate trying to match the same field with all possible
field names. Once we find something, cut the loop short, since we
know that nothing else can match.
2016-09-28 23:49:01 +02:00
Zbigniew Jędrzejewski-Szmek 04de587942 coredumpctl: rework presence reporting
The column for "present" was easy to miss, especially if somebody had no
coredumps present at all, in which case the column of spaces of width one
wasn't visually distinguished from the neighbouring columns. Replace this
with an explicit text, one of: "missing", "journal", "present", "error".

$ coredumpctl
TIME                            PID   UID   GID SIG COREFILE EXE
Mon 2016-09-26 22:46:31 CEST   8623     0     0  11 missing  /usr/bin/bash
Mon 2016-09-26 22:46:35 CEST   8639  1001  1001  11 missing  /usr/bin/bash
Tue 2016-09-27 01:10:46 CEST  16110  1001  1001  11 journal  /usr/bin/bash
Tue 2016-09-27 01:13:20 CEST  16290  1001  1001  11 journal  /usr/bin/bash
Tue 2016-09-27 01:33:48 CEST  17867  1001  1001  11 present  /usr/bin/bash
Tue 2016-09-27 01:37:55 CEST  18549     0     0  11 error    /usr/bin/bash

Also, use access(…, R_OK), so that we can report a present but inaccessible
file different than a missing one.
2016-09-28 23:49:01 +02:00
Zbigniew Jędrzejewski-Szmek 47f5064207 coredumpctl: report corefile presence properly
In 'list', show present also for coredumps stored in the journal.

In 'status', replace "File" with "Storage" line that is always present.
Possible values:
   Storage: none
   Storage: journal
   Storage: /path/to/file (inacessible)
   Storage: /path/to/file

Previously the File field be only present if the file was accessible, so users
had to manually extract the file name precisely in the cases where it was
needed, i.e. when coredumpctl couldn't access the file. It's much more friendly
to always show something. This output is designed for human consumption, so
it's better to be a bit verbose.

The call to sd_j_set_data_threshold is moved, so that status is always printed
with the default of 64k, list uses 4k, and coredump retrieval is done with the
limit unset. This should make checking for the presence of the COREDUMP field
not too costly.
2016-09-28 23:49:01 +02:00
Zbigniew Jędrzejewski-Szmek 554ed50f90 coredumpctl: report user unit properly 2016-09-28 23:49:01 +02:00
Zbigniew Jędrzejewski-Szmek cfeead6c77 coredumpctl: fix spurious "more than one entry matches" warning
sd_journal_previous() returns 0 if it didn't do any move, so the
warning was stupidly always printed.
2016-09-28 23:49:01 +02:00
Zbigniew Jędrzejewski-Szmek 954d3a51af coredumpctl: fix handling of files written to fd
Added in 9fe13294a9 (by me :[```), and later obfuscated in d0c8806d4a, if an
uncompressed external file or an internally stored coredump was supposed to be
written to a file descriptor, nothing would be written.
2016-09-28 23:49:01 +02:00
Zbigniew Jędrzejewski-Szmek fc6cec8613 coredump: remove Storage=both option
Back when external storage was initially added in 34c10968cb, this mode of
storage was added. This could have made some sense back when XZ compression was
used, and an uncompressed core on disk could be used as short-lived cache file
which does require costly decompression. But now fast LZ4 compression is used
(by default) both internally and externally, so we have duplicated storage,
using the same compression and same default maximum core size in both cases,
but with different expiration lifetimes. Even the uncompressed-external,
compressed-internal mode is not very useful: for small files, decompression
with LZ4 is fast enough not to matter, and for large files, decompression is
still relatively fast, but the disk-usage penalty is very big.

An additional problem with the two modes of storage is that it complicates
the code and makes it much harder to return a useful error message to the user
if we cannot find the core file, since if we cannot find the file we have to
check the internal storage first.

This patch drops "both" storage mode. Effectively this means that if somebody
configured coredump this way, they will get a warning about an unsupported
value for Storage, and the default of "external" will be used.
I'm pretty sure that this mode is very rarely used anyway.
2016-09-28 23:49:01 +02:00
Vito Caputo 95cbb83c20 journal: add stdout_stream_scan() comment (#4102)
When s->length is zero this function doesn't do anything, note that in a
comment.
2016-09-28 07:35:48 +02:00
Evgeny Vereshchagin cc238590e4 Merge pull request #4185 from endocode/djalal-sandbox-first-protection-v1
core:sandbox: Add new ProtectKernelTunables=, ProtectControlGroups=, ProtectSystem=strict and fixes
2016-09-28 04:50:30 +03:00
Martin Pitt b8fafaf4a1 Merge pull request #4220 from keszybz/show-and-formatting-fixes
Show and formatting fixes
2016-09-27 16:25:27 +02:00
Susant Sahani 629abfc23f basic: fix for IPv6 status (#4224)
Even if
```
   cat /proc/sys/net/ipv6/conf/all/disable_ipv6
1
```

is disabled

cat /proc/net/sockstat6

```
TCP6: inuse 2
UDP6: inuse 1
UDPLITE6: inuse 0
RAW6: inuse 0
FRAG6: inuse 0 memory 0
 ```

Looking for /proc/net/if_inet6 is the right choice.
2016-09-27 15:55:13 +02:00
Djalal Harouni cdfbd1fb26 test: make sure that {readonly|inaccessible|readwrite}paths disconnect mount propagation
Better safe.
2016-09-27 09:24:46 +02:00
Djalal Harouni f78b36f016 test: add tests for simple ReadOnlyPaths= case 2016-09-27 09:24:43 +02:00
Zbigniew Jędrzejewski-Szmek 9aa8fa701b test-bus-creds: are more debugging info
This test sometimes fails in semaphore, but not when run interactively,
so it's hard to debug.
2016-09-26 22:22:28 +02:00
Keith Busch b4c6f71b82 udev/path_id: introduce support for NVMe devices (#4169)
This appends the nvme name and namespace identifier attribute the the
PCI path for by-path links. Symlinks like the following are now present:

lrwxrwxrwx. 1 root root 13 Sep 16 12:12 pci-0000:01:00.0-nvme-1 -> ../../nvme0n1
lrwxrwxrwx. 1 root root 15 Sep 16 12:12 pci-0000:01:00.0-nvme-1-part1 -> ../../nvme0n1p1

Cc: Michal Sekletar <sekletar.m@gmail.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
2016-09-26 21:01:07 +02:00
Paweł Szewczyk 00bb64ecfa core: Fix USB functionfs activation and clarify its documentation (#4188)
There was no certainty about how the path in service file should look
like for usb functionfs activation. Because of this it was treated
differently in different places, which made this feature unusable.

This patch fixes the path to be the *mount directory* of functionfs, not
ep0 file path and clarifies in the documentation that ListenUSBFunction should be
the location of functionfs mount point, not ep0 file itself.
2016-09-26 18:45:47 +02:00
Zbigniew Jędrzejewski-Szmek bc3bb330b8 machinectl: prefer user@ to --uid=user for shell (#4006)
It seems to me that the explicit positional argument should have higher
priority than "an option".
2016-09-26 11:45:31 -04:00
HATAYAMA Daisuke eeb084806b journald,ratelimit: fix wrong calculation of burst_modulate() (#4218)
This patch fixes wrong calculation of burst_modulate(), which now calculates
the values smaller than really expected ones if available disk space is
strictly more than 1MB.

In particular, if available disk space is strictly more than 1MB and strictly
less than 16MB, the resulted value becomes smaller than its original one.

>>> (math.log2(1*1024**2)-16) / 4
1.0
>>> (math.log2(16*1024**2)-16) / 4
2.0
>>> (math.log2(256*1024**2)-16) / 4
3.0
→ This matches the comment in the function.
2016-09-26 11:36:20 -04:00
Matej Habrnal a5ca3649d3 coredump: initialize coredump_size in submit_coredump() (#4219)
If ulimit is smaller than page_size(), function save_external_coredump()
returns -EBADSLT and this causes skipping whole core dumping part in
submit_coredump(). Initializing coredump_size to UINT64_MAX prevents
evaluating a condition with uninitialized varialbe which leads to
calling allocate_journal_field() with coredump_fd = -1 which causes
aborting.

Signed-off-by: Matej Habrnal <mhabrnal@redhat.com>
2016-09-26 11:28:58 -04:00
Torstein Husebø d23a0044a3 treewide: fix typos (#4217) 2016-09-26 11:32:47 +02:00
Djalal Harouni 615a1f4b26 test: add CAP_MKNOD tests for PrivateDevices= 2016-09-25 13:04:30 +02:00
Djalal Harouni 8f81a5f61b core: Use @raw-io syscall group to filter I/O syscalls when PrivateDevices= is set
Instead of having a local syscall list, use the @raw-io group which
contains the same set of syscalls to filter.
2016-09-25 12:52:27 +02:00
Djalal Harouni b6c432ca7e core:namespace: simplify ProtectHome= implementation
As with previous patch simplify ProtectHome and don't care about
duplicates, they will be sorted by most restrictive mode and cleaned.
2016-09-25 12:41:16 +02:00
Djalal Harouni f471b2afa1 core: simplify ProtectSystem= implementation
ProtectSystem= with all its different modes and other options like
PrivateDevices= + ProtectKernelTunables= + ProtectHome= are orthogonal,
however currently it's a bit hard to parse that from the implementation
view. Simplify it by giving each mode its own table with all paths and
references to other Protect options.

With this change some entries are duplicated, but we do not care since
duplicate mounts are first sorted by the most restrictive mode then
cleaned.
2016-09-25 12:21:25 +02:00
Djalal Harouni 49accde7bd core:sandbox: add more /proc/* entries to ProtectKernelTunables=
Make ALSA entries, latency interface, mtrr, apm/acpi, suspend interface,
filesystems configuration and IRQ tuning readonly.

Most of these interfaces now days should be in /sys but they are still
available through /proc, so just protect them. This patch does not touch
/proc/net/...
2016-09-25 11:30:11 +02:00
Djalal Harouni 2652c6c103 core:namespace: simplify mount calculation
Move out mount calculation on its own function. Actually the logic is
smart enough to later drop nop and duplicates mounts, this change
improves code readability.
---
 src/core/namespace.c | 47 ++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 36 insertions(+), 11 deletions(-)
2016-09-25 11:25:00 +02:00
Djalal Harouni 11a30cec2a core:namespace: put paths protected by ProtectKernelTunables= in
Instead of having all these paths everywhere, put the ones that are
protected by ProtectKernelTunables= into their own table. This way it
is easy to add paths and track which ones are protected.
2016-09-25 11:16:44 +02:00
Djalal Harouni 9c94d52e09 core:namespace: minor improvements to append_mounts() 2016-09-25 11:03:21 +02:00
Lennart Poettering cefc33aee2 execute: move SMACK setup code into its own function
While we are at it, move PAM code #ifdeffery into setup_pam() to simplify the
main execution logic a bit.
2016-09-25 10:52:57 +02:00
Lennart Poettering cd2902c954 namespace: drop all mounts outside of the new root directory
There's no point in mounting these, if they are outside of the root directory
we'll move to.
2016-09-25 10:52:57 +02:00
Lennart Poettering 54500613a4 main: minor simplification 2016-09-25 10:52:57 +02:00
Lennart Poettering ba128bb809 execute: filter low-level I/O syscalls if PrivateDevices= is set
If device access is restricted via PrivateDevices=, let's also block the
various low-level I/O syscalls at the same time, so that we know that the
minimal set of devices in our virtualized /dev are really everything the unit
can access.
2016-09-25 10:52:57 +02:00
Lennart Poettering 8f1ad200f0 namespace: don't make the root directory of a namespace a mount if it already is one
Let's not stack mounts needlessly.
2016-09-25 10:42:18 +02:00
Lennart Poettering d944dc9553 namespace: chase symlinks for mounts to set up in userspace
This adds logic to chase symlinks for all mount points that shall be created in
a namespace environment in userspace, instead of leaving this to the kernel.
This has the advantage that we can correctly handle absolute symlinks that
shall be taken relative to a specific root directory. Moreover, we can properly
handle mounts created on symlinked files or directories as we can merge their
mounts as necessary.

(This also drops the "done" flag in the namespace logic, which was never
actually working, but was supposed to permit a partial rollback of the
namespace logic, which however is only mildly useful as it wasn't clear in
which case it would or would not be able to roll back.)

Fixes: #3867
2016-09-25 10:42:18 +02:00
Lennart Poettering 1e4e94c881 namespace: invoke unshare() only after checking all parameters
Let's create the new namespace only after we validated and processed all
parameters, right before we start with actually mounting things.

This way, the window where we can roll back is larger (not that it matters
IRL...)
2016-09-25 10:42:18 +02:00
Lennart Poettering 096424d123 execute: drop group priviliges only after setting up namespace
If PrivateDevices=yes is set, the namespace code creates device nodes in /dev
that should be owned by the host's root, hence let's make sure we set up the
namespace before dropping group privileges.
2016-09-25 10:42:18 +02:00
Lennart Poettering 920a7899de nspawn: let's mount /proc/sysrq-trigger read-only by default
LXC does this, and we should probably too. Better safe than sorry.
2016-09-25 10:42:18 +02:00
Lennart Poettering 63bb64a056 core: imply ProtectHome=read-only and ProtectSystem=strict if DynamicUser=1
Let's make sure that services that use DynamicUser=1 cannot leave files in the
file system should the system accidentally have a world-writable directory
somewhere.

This effectively ensures that directories need to be whitelisted rather than
blacklisted for access when DynamicUser=1 is set.
2016-09-25 10:42:18 +02:00
Lennart Poettering 3f815163ff core: introduce ProtectSystem=strict
Let's tighten our sandbox a bit more: with this change ProtectSystem= gains a
new setting "strict". If set, the entire directory tree of the system is
mounted read-only, but the API file systems /proc, /dev, /sys are excluded
(they may be managed with PrivateDevices= and ProtectKernelTunables=). Also,
/home and /root are excluded as those are left for ProtectHome= to manage.

In this mode, all "real" file systems (i.e. non-API file systems) are mounted
read-only, and specific directories may only be excluded via
ReadWriteDirectories=, thus implementing an effective whitelist instead of
blacklist of writable directories.

While we are at, also add /efi to the list of paths always affected by
ProtectSystem=. This is a follow-up for
b52a109ad3 which added /efi as alternative for
/boot. Our namespacing logic should respect that too.
2016-09-25 10:42:18 +02:00
Lennart Poettering 160cfdbed3 namespace: add some debug logging when enforcing InaccessiblePaths= 2016-09-25 10:42:18 +02:00
Lennart Poettering 6b7c9f8bce namespace: rework how ReadWritePaths= is applied
Previously, if ReadWritePaths= was nested inside a ReadOnlyPaths=
specification, then we'd first recursively apply the ReadOnlyPaths= paths, and
make everything below read-only, only in order to then flip the read-only bit
again for the subdirs listed in ReadWritePaths= below it.

This is not only ugly (as for the dirs in question we first turn on the RO bit,
only to turn it off again immediately after), but also problematic in
containers, where a container manager might have marked a set of dirs read-only
and this code will undo this is ReadWritePaths= is set for any.

With this patch behaviour in this regard is altered: ReadOnlyPaths= will not be
applied to the children listed in ReadWritePaths= in the first place, so that
we do not need to turn off the RO bit for those after all.

This means that ReadWritePaths=/ReadOnlyPaths= may only be used to turn on the
RO bit, but never to turn it off again. Or to say this differently: if some
dirs are marked read-only via some external tool, then ReadWritePaths= will not
undo it.

This is not only the safer option, but also more in-line with what the man page
currently claims:

        "Entries (files or directories) listed in ReadWritePaths= are
        accessible from within the namespace with the same access rights as
        from outside."

To implement this change bind_remount_recursive() gained a new "blacklist"
string list parameter, which when passed may contain subdirs that shall be
excluded from the read-only mounting.

A number of functions are updated to add more debug logging to make this more
digestable.
2016-09-25 10:40:51 +02:00
Lennart Poettering 7648a565d1 namespace: when enforcing fs namespace restrictions suppress redundant mounts
If /foo is marked to be read-only, and /foo/bar too, then the latter may be
suppressed as it has no effect.
2016-09-25 10:19:15 +02:00
Lennart Poettering 6ee1a919cf namespace: simplify mount_path_compare() a bit 2016-09-25 10:19:10 +02:00
Lennart Poettering 3fbe8dbe41 execute: if RuntimeDirectory= is set, it should be writable
Implicitly make all dirs set with RuntimeDirectory= writable, as the concept
otherwise makes no sense.
2016-09-25 10:19:05 +02:00
Lennart Poettering be39ccf3a0 execute: move suppression of HOME=/ and SHELL=/bin/nologin into user-util.c
This adds a new call get_user_creds_clean(), which is just like
get_user_creds() but returns NULL in the home/shell parameters if they contain
no useful information. This code previously lived in execute.c, but by
generalizing this we can reuse it in run.c.
2016-09-25 10:18:57 +02:00
Lennart Poettering 07689d5d2c execute: split out creation of runtime dirs into its own functions 2016-09-25 10:18:54 +02:00
Lennart Poettering fe3c2583be namespace: make sure InaccessibleDirectories= masks all mounts further down
If a dir is marked to be inaccessible then everything below it should be masked
by it.
2016-09-25 10:18:51 +02:00
Lennart Poettering 59eeb84ba6 core: add two new service settings ProtectKernelTunables= and ProtectControlGroups=
If enabled, these will block write access to /sys, /proc/sys and
/proc/sys/fs/cgroup.
2016-09-25 10:18:48 +02:00
Lennart Poettering 72246c2a65 core: enforce seccomp for secondary archs too, for all rules
Let's make sure that all our rules apply to all archs the local kernel
supports.
2016-09-25 10:18:44 +02:00
Zbigniew Jędrzejewski-Szmek e4662e553c journal-remote: fix error format string
Bug introduced in 1b4cd64683.
2016-09-24 21:46:48 -04:00
Zbigniew Jędrzejewski-Szmek bd5b9f0a12 systemctl: suppress errors with "show" for nonexistent units and properties
Show is documented to be program-parseable, and printing the warning about
about a non-existent unit, while useful for humans, broke a lot of scripts.
Restore previous behaviour of returning success and printing empty or useless
stuff for units which do not exist, and printing empty values for properties
which do not exists.

With SYSTEMD_LOG_LEVEL=debug, hints are printed, but the return value is
still 0.

This undoes parts of e33a06a and 3dced37b7 and fixes #3856.

We might consider adding an explicit switch to fail on missing units/properties
(e.g. --ensure-exists or similar), and make -P foobar equivalent to
--ensure-exists --property=foobar.
2016-09-24 21:09:39 -04:00
Zbigniew Jędrzejewski-Szmek 1cf03a4f8e systemctl,networkctl,busctl,backlight: use STRPTR_IN_SET 2016-09-24 20:22:05 -04:00
Zbigniew Jędrzejewski-Szmek c7bf9d5183 basic/strv: add STRPTR_IN_SET
Also some trivial tests for STR_IN_SET and STRPTR_IN_SET.
2016-09-24 20:13:28 -04:00
Zbigniew Jędrzejewski-Szmek 72240b52f1 systemctl: use STR_IN_SET 2016-09-24 19:17:31 -04:00
Zbigniew Jędrzejewski-Szmek d11e656ace Merge pull request #4182 from jkoelker/routetable 2016-09-24 11:05:06 -04:00