Commit graph

823 commits

Author SHA1 Message Date
Lennart Poettering e5d855d364 util: use SPECIAL_ROOT_SLICE macro where appropriate 2016-10-07 20:14:38 +02:00
Lennart Poettering 0474ef7b3e log: minor fixes
Most important is a fix to negate the error number if necessary, before we
first access it.
2016-10-07 20:14:38 +02:00
Lennart Poettering 4a39c77419 strv: fix STRV_FOREACH_BACKWARDS() to be a single statement only
Let's make sure people invoking STRV_FOREACH_BACKWARDS() as a single statement
of an if statement don't fall into a trap, and find the tail for the list via
strv_length().
2016-10-07 20:14:38 +02:00
rwmjones 171b533800 architecture: Add support for the RISC-V architecture. (#4305)
RISC-V is an open source ISA in development since 2010 at UCB.
For more information, see https://riscv.org/

I am adding RISC-V support to Fedora:
https://fedoraproject.org/wiki/Architectures/RISC-V

There are three major variants of the architecture (32-, 64- and
128-bit).  The 128-bit variant is a paper exercise, but the other
two really exist in silicon.  RISC-V is always little endian.

On Linux, the default kernel uname(2) can return "riscv" for all
variants.  However a patch was added recently which makes the kernel
return one of "riscv32" or "riscv64" (or in future "riscv128").  So
systemd should be prepared to handle any of "riscv", "riscv32" or
"riscv64" (in future, "riscv128" but that is not included in the
current patch).  If the kernel returns "riscv" then you need to use
the pointer size in order to know the real variant.

The Fedora/RISC-V kernel only ever returns "riscv64" since we're
only doing Fedora for 64 bit at the moment, and we've patched the
kernel so it doesn't return "riscv".

As well as the major bitsize variants, there are also architecture
extensions.  However I'm trying to ensure that uname(2) does *not*
return any other information about those in utsname.machine, so that
we don't end up with "riscv64abcde" nonsense.  Instead those
extensions will be exposed in /proc/cpuinfo similar to how flags
work in x86.
2016-10-07 14:56:27 +02:00
Lennart Poettering 97f0e76f18 user-util: rework maybe_setgroups() a bit
Let's drop the caching of the setgroups /proc field for now. While there's a
strict regime in place when it changes states, let's better not cache it since
we cannot really be sure we follow that regime correctly.

More importantly however, this is not in performance sensitive code, and
there's no indication the cache is really beneficial, hence let's drop the
caching and make things a bit simpler.

Also, while we are at it, rework the error handling a bit, and always return
negative errno-style error codes, following our usual coding style. This has
the benefit that we can sensible hanld read_one_line_file() errors, without
having to updat errno explicitly.
2016-10-06 19:04:10 +02:00
Lennart Poettering 429b435026 sd-device/networkd: unify code to get a socket for issuing netdev ioctls on
As suggested here:

https://github.com/systemd/systemd/pull/4296#issuecomment-251911349

Let's try AF_INET first as socket, but let's fall back to AF_NETLINK, so that
we can use a protocol-independent socket here if possible. This has the benefit
that our code will still work even if AF_INET/AF_INET6 is made unavailable (for
exmple via seccomp), at least on current kernels.
2016-10-06 19:04:01 +02:00
Giuseppe Scrivano 36d854780c core: do not fail in a container if we can't use setgroups
It might be blocked through /proc/PID/setgroups
2016-10-06 11:49:00 +02:00
Giuseppe Scrivano f006b30bd5 audit: disable if cannot create NETLINK_AUDIT socket 2016-10-06 11:49:00 +02:00
Stefan Schweter 629ff674ac tree-wide: remove consecutive duplicate words in comments 2016-10-04 17:06:25 +02:00
Michael Olbrich 5076f4219e list: LIST_INSERT_BEFORE: update head if necessary (#4261)
If the new item is inserted before the first item in the list, then the
head must be updated as well.
Add a test to the list unit test to check for this.
2016-10-04 16:15:37 +02:00
Evgeny Vereshchagin cc238590e4 Merge pull request #4185 from endocode/djalal-sandbox-first-protection-v1
core:sandbox: Add new ProtectKernelTunables=, ProtectControlGroups=, ProtectSystem=strict and fixes
2016-09-28 04:50:30 +03:00
Martin Pitt b8fafaf4a1 Merge pull request #4220 from keszybz/show-and-formatting-fixes
Show and formatting fixes
2016-09-27 16:25:27 +02:00
Susant Sahani 629abfc23f basic: fix for IPv6 status (#4224)
Even if
```
   cat /proc/sys/net/ipv6/conf/all/disable_ipv6
1
```

is disabled

cat /proc/net/sockstat6

```
TCP6: inuse 2
UDP6: inuse 1
UDPLITE6: inuse 0
RAW6: inuse 0
FRAG6: inuse 0 memory 0
 ```

Looking for /proc/net/if_inet6 is the right choice.
2016-09-27 15:55:13 +02:00
Lennart Poettering d944dc9553 namespace: chase symlinks for mounts to set up in userspace
This adds logic to chase symlinks for all mount points that shall be created in
a namespace environment in userspace, instead of leaving this to the kernel.
This has the advantage that we can correctly handle absolute symlinks that
shall be taken relative to a specific root directory. Moreover, we can properly
handle mounts created on symlinked files or directories as we can merge their
mounts as necessary.

(This also drops the "done" flag in the namespace logic, which was never
actually working, but was supposed to permit a partial rollback of the
namespace logic, which however is only mildly useful as it wasn't clear in
which case it would or would not be able to roll back.)

Fixes: #3867
2016-09-25 10:42:18 +02:00
Lennart Poettering 6b7c9f8bce namespace: rework how ReadWritePaths= is applied
Previously, if ReadWritePaths= was nested inside a ReadOnlyPaths=
specification, then we'd first recursively apply the ReadOnlyPaths= paths, and
make everything below read-only, only in order to then flip the read-only bit
again for the subdirs listed in ReadWritePaths= below it.

This is not only ugly (as for the dirs in question we first turn on the RO bit,
only to turn it off again immediately after), but also problematic in
containers, where a container manager might have marked a set of dirs read-only
and this code will undo this is ReadWritePaths= is set for any.

With this patch behaviour in this regard is altered: ReadOnlyPaths= will not be
applied to the children listed in ReadWritePaths= in the first place, so that
we do not need to turn off the RO bit for those after all.

This means that ReadWritePaths=/ReadOnlyPaths= may only be used to turn on the
RO bit, but never to turn it off again. Or to say this differently: if some
dirs are marked read-only via some external tool, then ReadWritePaths= will not
undo it.

This is not only the safer option, but also more in-line with what the man page
currently claims:

        "Entries (files or directories) listed in ReadWritePaths= are
        accessible from within the namespace with the same access rights as
        from outside."

To implement this change bind_remount_recursive() gained a new "blacklist"
string list parameter, which when passed may contain subdirs that shall be
excluded from the read-only mounting.

A number of functions are updated to add more debug logging to make this more
digestable.
2016-09-25 10:40:51 +02:00
Lennart Poettering be39ccf3a0 execute: move suppression of HOME=/ and SHELL=/bin/nologin into user-util.c
This adds a new call get_user_creds_clean(), which is just like
get_user_creds() but returns NULL in the home/shell parameters if they contain
no useful information. This code previously lived in execute.c, but by
generalizing this we can reuse it in run.c.
2016-09-25 10:18:57 +02:00
Zbigniew Jędrzejewski-Szmek c7bf9d5183 basic/strv: add STRPTR_IN_SET
Also some trivial tests for STR_IN_SET and STRPTR_IN_SET.
2016-09-24 20:13:28 -04:00
Martin Pitt 6ac288a990 Merge pull request #4123 from keszybz/network-file-dropins
Network file dropins
2016-09-17 10:00:19 +02:00
Zbigniew Jędrzejewski-Szmek 43688c49d1 tree-wide: rename config_parse_many to …_nulstr
In preparation for adding a version which takes a strv.
2016-09-16 10:32:03 -04:00
Zbigniew Jędrzejewski-Szmek e77e0f51fe Merge pull request #4131 from intelfx/update-done-timestamps-precision
condition: ignore nanoseconds in timestamps for ConditionNeedsUpdate=

Fixes #4130.
2016-09-15 22:53:00 -04:00
Ivan Shapovalov 3a730176b3 time-util: export timespec_load_nsec() 2016-09-15 05:21:09 +03:00
Martin Pitt 2d88def959 Merge pull request #4133 from keszybz/strerror-removal
Strerror removal and other janitorial cleanups
2016-09-14 11:17:58 +02:00
Zbigniew Jędrzejewski-Szmek 481a2b02a3 Always use unicode ellipsis when ellipsizing
We were already unconditionally using the unicode character when the
input string was not pure ASCII, leading to different behaviour in
depending on the input string.

systemd[1]: Starting printit.service.
python3[19962]: foooooooooooooooooooooooooooooooooooo…oooo
python3[19964]: fooąęoooooooooooooooooooooooooooooooo…oooo
python3[19966]: fooąęoooooooooooooooooooooooooooooooo…ąęąę
python3[19968]: fooąęoooooooooooooooooąęąęąęąęąęąęąęą…ąęąę
systemd[1]: Started printit.service.
2016-09-13 20:10:57 -04:00
Topi Miettinen 646853bdd8 fileio: simplify mkostemp_safe() (#4090)
According to its manual page, flags given to mkostemp(3) shouldn't include
O_RDWR, O_CREAT or O_EXCL flags as these are always included. Beyond
those, the only flag that all callers (except a few tests where it
probably doesn't matter) use is O_CLOEXEC, so set that unconditionally.
2016-09-13 08:20:38 +02:00
Seraphime Kirkovski 07b0b339d6 machinectl: split OS field in two; print ip addresses (#4058)
This splits the OS field in two : one for the distribution name
and one for the the version id.
Dashes are written for missing fields.
This also prints ip addresses of known machines. The `--max-addresses`
option specifies how much ip addresses we want to see. The default is 1.
When more than one address is written for a machine, a `,` follows it.
If there are more ips than `--max-addresses`, `...` follows the last
address.
2016-08-31 20:06:57 +02:00
Yann E. MORIN 1d9ed17178 basic/fileio: we always have O_TMPFILE now
fileio makes use of O_TMPFILE when it is available.

We now always have O_TMPFILE, defined in missing.h if missing
from the toolchain headers.

Have fileio include missing.h and drop the guards around the
use of O_TMPFILE.
2016-08-29 12:49:10 +02:00
Yann E. MORIN daad709a7c missing.h: add missing definitions for __O_TMPFILE
Currently, a missing __O_TMPFILE was only defined for i386 and x86_64,
leaving any other architectures with an "old" toolchain fail miserably
at build time:
    src/import/export-raw.c: In function 'reflink_snapshot':
    src/import/export-raw.c:271:26: error: 'O_TMPFILE' undeclared (first use in this function)
             new_fd = open(d, O_TMPFILE|O_CLOEXEC|O_NOCTTY|O_RDWR, 0600);
                              ^

__O_TMPFILE (and O_TMPFILE) are available since glibc 2.19. However, a
lot of existing toolchains are still using glibc-2.18, and some even
before that, and it is not really possible to update those toolchains.

Instead of defining it only for i386 and x86_64, define __O_TMPFILE
with the specific values for those archs where it is different from the
generic value. Use the values as found in the Linux kernel (v4.8-rc3,
current as of time of commit).

---
Note: tested on ARM (build+run), with glibc-2.18 and linux headers 3.12.
Untested on other archs, though (I have no board to test this).

Changes v1 -> v2:
  - add a comment specifying some are hexa, others are octal.
2016-08-29 12:40:22 +02:00
Zbigniew Jędrzejewski-Szmek 2056ec1927 Merge pull request #3965 from htejun/systemd-controller-on-unified 2016-08-19 19:58:01 -04:00
0xAX e6c9fa74a5 terminal-util: remove unnecessary check of result of isatty() (#4000)
After the call of the isatty() we check its result twice in the
open_terminal(). There are no sense to check result of isatty() that
it is less than zero and return -errno, because as described in
documentation:

isatty() returns 1 if fd is an open file descriptor referring to a
terminal;  otherwise 0 is returned, and errno is set to indicate the
error.

So it can't be less than zero.
2016-08-19 18:51:54 -04:00
Evgeny Vereshchagin 29272c04a7 Merge pull request #3909 from poettering/mount-tool
add a new tool for creating transient mount and automount units
2016-08-19 23:33:49 +03:00
Lennart Poettering 16d901e251 Merge pull request #3987 from keszybz/console-color-setup
Rework console color setup
2016-08-19 19:36:09 +02:00
Zbigniew Jędrzejewski-Szmek acf553b04d terminal-util: use getenv_bool for $SYSTEMD_COLORS
This changes the semantics a bit: before, SYSTEMD_COLORS= would be treated as
"yes", same as SYSTEMD_COLORS=xxx and SYSTEMD_COLORS=1, and only
SYSTEMD_COLORS=0 would be treated as "no". Now, only valid booleans are treated
as "yes". This actually matches how $SYSTEMD_COLORS was announced in NEWS.
2016-08-19 11:57:37 -04:00
Zbigniew Jędrzejewski-Szmek 158fbf7661 systemd: ignore lack of tty when checking whether colors should be enabled
When started by the kernel, we are connected to the console, and we'll set TERM
properly to some value in fixup_environment(). We'll then enable or disable
colors based on the value of $SYSTEMD_COLORS and $TERM.

When reexecuting, TERM should be already set, so we can use this value.
Effectively, behaviour is the same as before affd7ed1a was reverted, but instead
of reopening the console before configuring color output, we just ignore what
stdout is connected to and decide based on the variables only.
2016-08-19 11:34:22 -04:00
Lennart Poettering cbf138ebef Merge pull request #3988 from keszybz/journald-dynamic-users
Journald dynamic users
2016-08-19 10:41:26 +02:00
Zbigniew Jędrzejewski-Szmek 61755fdae0 journald: do not create split journals for dynamic users
Dynamic users should be treated like system users, and their logs
should end up in the main system journal.
2016-08-18 23:34:40 -04:00
Tejun Heo f50582649f logind: update empty and "infinity" handling for [User]TasksMax (#3835)
The parsing functions for [User]TasksMax were inconsistent.  Empty string and
"infinity" were interpreted as no limit for TasksMax but not accepted for
UserTasksMax.  Update them so that they're consistent with other knobs.

* Empty string indicates the default value.
* "infinity" indicates no limit.

While at it, replace opencoded (uint64_t) -1 with CGROUP_LIMIT_MAX in TasksMax
handling.

v2: Update empty string to indicate the default value as suggested by Zbigniew
    Jędrzejewski-Szmek.

v3: Fixed empty UserTasksMax handling.
2016-08-18 22:57:53 -04:00
Lennart Poettering 2ae0858e6c hostnamectl: rework pretty hostname validation (#3985)
Rework 17eb9a9ddb a bit.

Let's make sure we don't clobber the input parameter args[1], following our
coding style to not clobber parameters unless explicitly indicated. (in
particular, as we don't want to have our changes appear in the command line
shown in "ps"...)

No functional change.
2016-08-18 21:16:16 -04:00
Lennart Poettering 450442cf93 add a new tool for creating transient mount and automount units
This adds "systemd-mount" which is for transient mount and automount units what
"systemd-run" is for transient service, scope and timer units.

The tool allows establishing mounts and automounts during runtime. It is very
similar to the usual /bin/mount commands, but can pull in additional
dependenices on access (for example, it pulls in fsck automatically), an take
benefit of the automount logic.

This tool is particularly useful for mount removable file systems (such as USB
sticks), as the automount logic (together with automatic unmount-on-idle), as
well as automatic fsck on first access ensure that the removable file system
has a high chance to remain in a fully clean state even when it is unplugged
abruptly, and returns to a clean state on the next re-plug.

This is a follow-up for #2471, as it adds a simple client-side for the
transient automount logic added in that PR.

In later work it might make sense to invoke this tool automatically from udev
rules in order to implement a simpler and safer version of removable media
management á la udisks.
2016-08-18 22:41:19 +02:00
Tejun Heo 5da38d0768 core: use the unified hierarchy for the systemd cgroup controller hierarchy
Currently, systemd uses either the legacy hierarchies or the unified hierarchy.
When the legacy hierarchies are used, systemd uses a named legacy hierarchy
mounted on /sys/fs/cgroup/systemd without any kernel controllers for process
management.  Due to the shortcomings in the legacy hierarchy, this involves a
lot of workarounds and complexities.

Because the unified hierarchy can be mounted and used in parallel to legacy
hierarchies, there's no reason for systemd to use a legacy hierarchy for
management even if the kernel resource controllers need to be mounted on legacy
hierarchies.  It can simply mount the unified hierarchy under
/sys/fs/cgroup/systemd and use it without affecting other legacy hierarchies.
This disables a significant amount of fragile workaround logics and would allow
using features which depend on the unified hierarchy membership such bpf cgroup
v2 membership test.  In time, this would also allow deleting the said
complexities.

This patch updates systemd so that it prefers the unified hierarchy for the
systemd cgroup controller hierarchy when legacy hierarchies are used for kernel
resource controllers.

* cg_unified(@controller) is introduced which tests whether the specific
  controller in on unified hierarchy and used to choose the unified hierarchy
  code path for process and service management when available.  Kernel
  controller specific operations remain gated by cg_all_unified().

* "systemd.legacy_systemd_cgroup_controller" kernel argument can be used to
  force the use of legacy hierarchy for systemd cgroup controller.

* nspawn: By default nspawn uses the same hierarchies as the host.  If
  UNIFIED_CGROUP_HIERARCHY is set to 1, unified hierarchy is used for all.  If
  0, legacy for all.

* nspawn: arg_unified_cgroup_hierarchy is made an enum and now encodes one of
  three options - legacy, only systemd controller on unified, and unified.  The
  value is passed into mount setup functions and controls cgroup configuration.

* nspawn: Interpretation of SYSTEMD_CGROUP_CONTROLLER to the actual mount
  option is moved to mount_legacy_cgroup_hierarchy() so that it can take an
  appropriate action depending on the configuration of the host.

v2: - CGroupUnified enum replaces open coded integer values to indicate the
      cgroup operation mode.
    - Various style updates.

v3: Fixed a bug in detect_unified_cgroup_hierarchy() introduced during v2.

v4: Restored legacy container on unified host support and fixed another bug in
    detect_unified_cgroup_hierarchy().
2016-08-17 17:44:36 -04:00
Tejun Heo ca2f6384aa core: rename cg_unified() to cg_all_unified()
A following patch will update cgroup handling so that the systemd controller
(/sys/fs/cgroup/systemd) can use the unified hierarchy even if the kernel
resource controllers are on the legacy hierarchies.  This would require
distinguishing whether all controllers are on cgroup v2 or only the systemd
controller is.  In preparation, this patch renames cg_unified() to
cg_all_unified().

This patch doesn't cause any functional changes.
2016-08-15 18:13:36 -04:00
Zbigniew Jędrzejewski-Szmek 5f9a610ad2 Merge pull request #3905 from htejun/cgroup-v2-cpu
core: add cgroup CPU controller support on the unified hierarchy

(zj: merging not squashing to make it clear against which upstream this patch was developed.)
2016-08-14 18:03:35 -04:00
Tejun Heo 66ebf6c0a1 core: add cgroup CPU controller support on the unified hierarchy
Unfortunately, due to the disagreements in the kernel development community,
CPU controller cgroup v2 support has not been merged and enabling it requires
applying two small out-of-tree kernel patches.  The situation is explained in
the following documentation.

 https://git.kernel.org/cgit/linux/kernel/git/tj/cgroup.git/tree/Documentation/cgroup-v2-cpu.txt?h=cgroup-v2-cpu

While it isn't clear what will happen with CPU controller cgroup v2 support,
there are critical features which are possible only on cgroup v2 such as
buffered write control making cgroup v2 essential for a lot of workloads.  This
commit implements systemd CPU controller support on the unified hierarchy so
that users who choose to deploy CPU controller cgroup v2 support can easily
take advantage of it.

On the unified hierarchy, "cpu.weight" knob replaces "cpu.shares" and "cpu.max"
replaces "cpu.cfs_period_us" and "cpu.cfs_quota_us".  [Startup]CPUWeight config
options are added with the usual compat translation.  CPU quota settings remain
unchanged and apply to both legacy and unified hierarchies.

v2: - Error in man page corrected.
    - CPU config application in cgroup_context_apply() refactored.
    - CPU accounting now works on unified hierarchy.
2016-08-07 09:45:39 -04:00
Zbigniew Jędrzejewski-Szmek 3bb81a80bd Merge pull request #3818 from poettering/exit-status-env
beef up /var/tmp and /tmp handling; set $SERVICE_RESULT/$EXIT_CODE/$EXIT_STATUS on ExecStop= and make sure root/nobody are always resolvable
2016-08-05 20:55:08 -04:00
Lennart Poettering ceab9e2dee Merge pull request #3900 from keszybz/fix-3607
Fix 3607
2016-08-05 17:03:09 +02:00
Lennart Poettering 41bf0590cc util-lib: unify parsing of nice level values
This adds parse_nice() that parses a nice level and ensures it is in the right
range, via a new nice_is_valid() helper. It then ports over a number of users
to this.

No functional changes.
2016-08-05 11:18:32 +02:00
Vito Caputo 82b103a7ce fileio: fix MIN/MAX mixup (#3896)
The intention is to clamp the value to READ_FULL_BYTES_MAX, which
would be the minimum of the two.
2016-08-05 00:09:23 -04:00
Zbigniew Jędrzejewski-Szmek 163f561e2a basic/set: remove some spurious spaces 2016-08-04 23:53:07 -04:00
Vito Caputo c2d11a6302 fileio: fix read_full_stream() bugs (#3887)
read_full_stream() _always_ allocated twice the memory needed, due to
only breaking the realloc() && fread() loop when fread() returned 0,
requiring another iteration and exponentially enlarged buffer just to
discover the EOF condition.

This also caused file sizes >2MiB && <= 4MiB to erroneously be treated
as E2BIG, due to the inappropriately doubled buffer size exceeding
4*1024*1024.

Also made the 4*1024*1024 magic number a READ_FULL_BYTES_MAX constant.
2016-08-04 22:52:02 +02:00
Lennart Poettering 992e8f224b util-lib: rework /tmp and /var/tmp handling code
Beef up the existing var_tmp() call, rename it to var_tmp_dir() and add a
matching tmp_dir() call (the former looks for the place for /var/tmp, the
latter for /tmp).

Both calls check $TMPDIR, $TEMP, $TMP, following the algorithm Python3 uses.
All dirs are validated before use. secure_getenv() is used in order to limite
exposure in suid binaries.

This also ports a couple of users over to these new APIs.

The var_tmp() return parameter is changed from an allocated buffer the caller
will own to a const string either pointing into environ[], or into a static
const buffer. Given that environ[] is mostly considered constant (and this is
exposed in the very well-known getenv() call), this should be OK behaviour and
allows us to avoid memory allocations in most cases.

Note that $TMPDIR and friends override both /var/tmp and /tmp usage if set.
2016-08-04 16:27:07 +02:00
David Michael 5124866d73 util-lib: add parse_percent_unbounded() for percentages over 100% (#3886)
This permits CPUQuota to accept greater values as documented.
2016-08-04 13:09:54 +02:00