Commit graph

28428 commits

Author SHA1 Message Date
Tejun Heo bd15ab41a1 nspawn: fix cgroup mode detection
cgroup mode detection is broken in two different ways.

* detect_unified_cgroup_hierarchy() is called too nested in outer_child().
  sync_cgroup() which is used by run() also needs to know the requested cgroup
  mode but it's currently always getting CGROUP_UNIFIED_UNKNOWN.  This makes it
  skip syncing the inner cgroup hierarchy on some config combinations.

   $ cat /proc/self/cgroup | grep systemd
   1:name=systemd:/user.slice/user-0.slice/session-c1.scope

   $ UNIFIED_CGROUP_HIERARCHY=0 SYSTEMD_NSPAWN_USE_CGNS=0 systemd-nspawn -M container
   ...
   [root@container ~]# cat /proc/self/cgroup | grep systemd
   1:name=systemd:/machine.slice/machine-container.x86_64.scope
   $ exit

   $ UNIFIED_CGROUP_HIERARCHY=1 SYSTEMD_NSPAWN_USE_CGNS=0 systemd-nspawn -M container
   [root@container ~]# cat /proc/self/cgroup | grep 0::
   0::/
   $ exit

  Note how the unified hierarchy case's path is not synchronized with the host.
  This for example can cause issues when there are multiple such containers.

  Fixed by moving detect_unified_cgroup_hierarchy() invocation to main().

* inner_child() was invoking cg_unified_flush().  inner_child() executes fully
  scoped and can't determine which cgroup mode the host was in.  It doesn't
  make sense to keep flushing the detected mode when the host mode can't
  change.

  Fixed by replacing cg_unified_flush() invocations in outer_child() and
  inner_child() with one in main().
2017-02-18 17:49:06 -05:00
Lucas Werkmeister 1e94df4471 journalctl: add reference to sd-id128(3) to output (#5382)
SD_ID128_MAKE is clearly not a standard C macro, so let’s point the user
to its documentation to let them know which header they need and what
they can then do with MESSAGE_XYZ.
2017-02-18 16:36:25 -05:00
Lucas Werkmeister b22319ead4 man: sd-id128: fix journalctl option name (#5381)
--new-id works because it’s an unambiguous prefix, but the full option
name is --new-id128.
2017-02-18 16:34:28 -05:00
Zbigniew Jędrzejewski-Szmek 581a07f9f0 Merge pull request #5369 from poettering/nspawn-resolved
fixes for running nspawn+resolved in combination
2017-02-18 11:54:34 -05:00
Lennart Poettering dec718065b units: order systemd-nspawn@.service after systemd-resolved.service
This way, the nspawn internal check whether resolved is running will
succeed if it is enabled.

Fixes: #4649
2017-02-17 16:06:31 -05:00
Lennart Poettering b053cd5f8e nspawn: tweak check whether resolved is around a bit
Let's check D-Bus instead of files in /run to see if resolved is
running. This is a bit nicer as bus names are automatically cleaned up
when resolved dies, which is not the case for files in /run.

See: #4649
2017-02-17 16:06:31 -05:00
Lennart Poettering 4d1f490c93 units: enable resolved bus activation though a symlink in /etc
The change:
-/usr/lib/systemd/system/dbus-org.freedesktop.resolve1.service
+/etc/systemd/system/dbus-org.freedesktop.resolve1.service

If resolved is disabled, without this, talking to the resolved bus API will
activate it regardless whether it is enabled or not, let's fix that.
2017-02-17 16:03:47 -05:00
Martin Pitt cc39016131 test: re-drop assumption that /run is a mount point (#5377)
Commit 436e916ea introduced the assumption into test-stat-util that /run
is a tmpfs mount point. This is not the case in build chroots such as
Fedora's mock or Debian's sbuild. So only assert that /run is a tmpfs
and not a btrfs if /run is actually a mount point. This will then still
be asserted with installed tests.
2017-02-17 15:29:02 -05:00
Adrián López ef6e596ff0 systemctl: show extra args if defined (#5379) 2017-02-17 15:27:45 -05:00
Zbigniew Jędrzejewski-Szmek 52d1f5e569 Merge pull request #5373 from poettering/coredump-timestamp-fixes
various coredump fixes
2017-02-17 15:23:52 -05:00
Zbigniew Jędrzejewski-Szmek cbe8c50958 Merge pull request #5347 from poettering/local-nta
more resolved fixes
2017-02-17 15:00:36 -05:00
Lennart Poettering 925c81cd20 missing: add renameat2() definition for 64bit arm (#5378)
Following a similar commit in casync:

https://github.com/systemd/casync/pull/10
2017-02-17 13:10:09 -05:00
Lennart Poettering c5d3ee266b Merge pull request #5275 from ssahani/fix-dropin-net-section
networkd: fix drop-in conf directory configs overwriting each other
2017-02-17 18:03:04 +01:00
Viktor Mihajlovski ecc11cf70c udev: fix id_net_name_path for virtio-ccw interfaces (#5357)
The CCW id_net_name_path detection didn't account for virtio
interfaces on the CCW bus. As a result the default interface
names for virtio-ccw interfaces would use the old eth<x>
format instead of enc<busid>.

Since virtio-pci interface naming follows the naming rules
of the parent bus, the names_ccw() logic was changed to apply
the CCW interface naming rules to virtio interfaces as well,
e.g. enc2000 for an interface with a CCW bus id 0.0.2000.
As virtio interfaces are apt to get the otherwise unusual
CCW bus id 0.0.0000, the last '0' is now preserved in this
case.

The virtio subsystem skipping loop has been moved from
names_pci() into a function skip_virtio() that can be reused
for all bus types with virtio network devices.

Since virtio-ccw interfaces use single CCW addresses the ccwgroup
requirement was relaxed and the C definitions were changed
accordingly.
2017-02-17 16:18:01 +01:00
Zbigniew Jędrzejewski-Szmek 48317c39e2 network: change condition in if testing section presence
section_line and filename should be set together or not at all. Change the
if to test filename, since it's the first of the pair and it seems more natural
to test that.
2017-02-17 09:34:25 -05:00
Zbigniew Jędrzejewski-Szmek fd45e522dd networkd: immediately transfer ownership of route->section
The code was not incorrect previously, but I think it's easier to follow the
ownership (and the code is more likely to remain correct when updated later on),
if freeing of NetworkConfigSection* is immediately made the responsibility of
route_free(), so instead of relying on route_free() not freeing ->section
if adding to the network hashmap failed, make this freeing unconditional.
2017-02-17 09:28:17 -05:00
Lennart Poettering e4363cd8ae Merge pull request #5333 from poettering/machined-copy-files-userns
machined userns fixes
2017-02-17 13:51:58 +01:00
Lennart Poettering ea2aa0343f Merge pull request #5366 from poettering/default-hostname-fix
fallback hostname fixes
2017-02-17 13:51:27 +01:00
Lennart Poettering aa10469e17 man: document that user namespacing complicates file copies 2017-02-17 11:47:20 +01:00
Lennart Poettering 6d337300f2 coredump: store the full coredump kernel context in xattrs on the coredump file
We didn't include the resource limit field, add it.
2017-02-17 11:35:31 +01:00
Lennart Poettering 80002f6640 coredump: when reconstructing original kernel coredump context, chop off trailing zeroes
Our coredump handler operates on a "context" supplied by the kernel via
the core_pattern arguments. When we pass off a coredump for processing
to coredumpd we pass along enough information for this context to be
reconstructed. This information is passed in the usual journal fields,
and that means we extended the 1s granularity timestamp to 1µs
granularity by appending 6 zeroes. We need to chop them off again when
reconstructing the original kernel context.

Fixes: #4779
2017-02-17 11:35:19 +01:00
Lennart Poettering 76341acc38 udevd: use signal_to_string() instead of strsignal() at one place
strsignal() sucks, as it tries to generate human readable strings from
something that isn't really human readable by concept. Let's use
signal_to_string() instead, making this more grokkable. Difference is:
SIGINT gets translated → "SIGINT" rather than → "Interrupted".
2017-02-17 11:18:22 +01:00
Lennart Poettering d14bcb4ea7 coredump: include signal name in journal metadata
(Note that we only do this for the journal metadata, not for the xattrs,
as the xattrs are only supposed to store the original 1:1 info we
acquired from the kernel.)
2017-02-17 11:18:18 +01:00
Lennart Poettering 86562420ff coredump: fix handling of special crashes
When we encounter a "special" crash we should not continue processing it
the usual way.
2017-02-17 10:59:21 +01:00
Lennart Poettering 6993d26469 resolved: try to authenticate SOA on negative replies
For caching negative replies we need the SOA TTL information. Hence,
let's authenticate all auxiliary SOA RRs through DS requests on all
negative requests.
2017-02-17 10:25:16 +01:00
Lennart Poettering 74a3ed7408 resolved: extend various timeouts
Let's increase a number of timeouts as they apparently are too short for
some real-world lookups.

See:

https://github.com/systemd/systemd/issues/4003#issuecomment-279842616

In particular we change the following timeouts:

1) The first UDP retry we increase 500ms → 750ms. This is a good idea,
   since some servers need relatively long responses for trivial lookups,
   and giving up our first attempt also has the effect of trying a
   different server for the next attempt which has the side effect that
   we'll run two down-grade iterations in parallel, on both servers.
   Hence, let's give servers a bit more time in the first iteration.

2) Permit 24 retries instead of just 16 per transactions. If we end up
   downgrading all the way down to UDP for a lookup we already need 5
   iterations for that. If we want permit a couple of lost packages for
   each (let's say 4), then we already need 20 iterations.

3) Increase the overall query timeout on the service side to 60s (from
   45s), simply because very long and slow DNSSEC + CNAME chains (such as
   us.ynuf.alipay.com) hit this boundary too easily. The client side
   timeout for the bus method call is increased to 90s, in order to have
   room for the dbus reply to go through
2017-02-17 10:25:16 +01:00
Lennart Poettering 2d4a4e1419 resolved: initialize all return values on successful exit of dns_cache_lookup()
Following our coding style on success we should initialize all return
parameters of a function. We missed to cases for dns_cache_lookup() (but
covered all others), fix them too.
2017-02-17 10:25:16 +01:00
Lennart Poettering 1fdeaeb741 resolved: show rcode in debug output for incoming replies
This is the most important piece of information of replies, hence show
this in the first log message about it.

(Wireshark shows it too in the short summary, hence this definitely
makes sense...)
2017-02-17 10:25:16 +01:00
Lennart Poettering 7d581a6576 resolved: don't downgrade feature level if we get RCODE on UDP level
Retrying a transaction via TCP is a good approach for mitigating
packet loss. However, it's not a good away way to fix a bad RCODE if we
already downgraded to UDP level for it. Hence, don't do this.

This is a small tweak only, but shortens the time we spend on
downgrading when a specific domain continously returns a bad rcode.
2017-02-17 10:25:16 +01:00
Lennart Poettering 201d99584e resolved: cache SERVFAIL responses for 30s
Some domains (such as us.ynuf.alipay.com) almost appear as if they actively
want to sabotage our DNSSEC work. Specifically, they unconditionally
return SERVFAIL on SOA lookups and always only after a 1s delay (at
least). This is pretty bad for our validation logic, as we use SOA
lookups to distuingish zones from non-terminal names. Moreover, SERVFAIL
is an error that is typically returned if we send requests a server
doesn't grok, and thus is reason for us to downgrade our protocol and
try again. In case of these zones this means we'll accept the SERVFAIL
response only after a full iterative downgrade to our lowest feature
level: TCP. In combination with the 1s delays this has the effect of
making us hit our transaction timeout way to easily.

As first attempt to improve the situation: let's start caching SERVFAIL
responses in our cache, after the full downgrade for a short period of
time.

Conceptually this is exposed as "weird rcode" caching, but for now we
only consider SERVFAIL a "weird rcode" worthy of caching. Later on we
might want to add more.
2017-02-17 10:25:15 +01:00
Lennart Poettering dc349f5f7a resolved: lengthen timeout for TCP transactions
When we are doing a TCP transaction the kernel will automatically resend
all packets for us, there's no need to do that ourselves. Hence:
increase the timeout for TCP transactions substantially, to give the
kernel enough time to connect to the peer, without interrupting it when
we become impatient.
2017-02-17 10:25:15 +01:00
Lennart Poettering 97277567b8 resolved: when DNSSEC mode is disabled, don't go beyond EDNS0 feature level
There's no point in talking to a server in DNSSEC mode when we don't
actually want to verify anything.

See: #5352
2017-02-17 10:25:15 +01:00
Lennart Poettering cbb1aabb99 resolved: when accepted a query candidate as final answer, propagate authentication bool even on failure
Let's make sure that if we accept a query candidate, then let's also
propagate the authenticated flag for it, so that we can properly report
back to the clients whether lookups failed due to non-existance that can
be proven.
2017-02-17 10:25:15 +01:00
Lennart Poettering 2b2d98c175 resolved: propagate AD bit for NXDOMAIN into stub replies
When we managed to prove non-existance of a name, then we should
properly propagate this to clients by setting the AD bit on NXDOMAIN.

See: #4621
2017-02-17 10:25:15 +01:00
Lennart Poettering 941dd29450 resolved: automatically downgrade reply bits on send
Doesn't really change anything, but makes things a bit simpler to read.
2017-02-17 10:25:15 +01:00
Lennart Poettering ce7c8b20df resolved: when the dns server feature level grace period elapses, flush caches
The cache might contain all kinds of unauthenticated data that we really
shouldn't be using if we upgrade our feature level and suddenly are able
to get authenticated data again.

Might fix: #4866
2017-02-17 10:25:15 +01:00
Lennart Poettering 97c2ea2645 resolved: fix NSEC proofs for missing TLDs
For the wildcard NSEC check we need to generate an "asterisk" domain, by
prepend the common ancestor with "*.". So far we did that with a simple
strappenda() which is fine for most domains, but doesn't work if the
common ancestor is the root domain as we usually write that as "." in
normalized form, and "*." joined with "." is "*.." and not "*." as it
should be.

Hence, use the clean way out, let's just use dns_name_concat() which
only exists precisely for this reason, to properly concatenate labels.

There's a good chance this actually fixes #5029, as this NSEC proof is
triggered by lookups in the TLD "example", which doesn't exist in the
Internet.
2017-02-17 10:25:15 +01:00
Lennart Poettering c775838ad7 resolved: make sure configured NTAs affect subdomains too
This ensures that configured NTAs exclude not only the listed domain but
also all domains below it from DNSSEC validation -- except if a positive
trust anchor is defined below (as suggested by RFC7647, section 1.1)

Fixes: #5048
2017-02-17 10:25:15 +01:00
Lennart Poettering 7f43928ba6 machined: refuse bind mounts on containers that have user namespaces applied
As the kernel won't map the UIDs this is simply not safe, and hence we
should generate a clean error and refuse it.

We can restore this feature later should a "shiftfs" become available in
the kernel.
2017-02-17 10:22:28 +01:00
Lennart Poettering 3aca8326bd machined: properly propagate long-running operation errors
Actually initialize the "error" structure with the error we got
2017-02-17 10:22:28 +01:00
Lennart Poettering d01cd40196 machined: when copying files from/to userns containers chown to root
This changes the file copy logic of machined to set the UID/GID of all
copied files to 0 if the host and container do not share the same user
namespace.

Fixes: #4078
2017-02-17 10:22:28 +01:00
Lennart Poettering 1c876927e4 copy: change the various copy_xyz() calls to take a unified flags parameter
This adds a unified "copy_flags" parameter to all copy_xyz() function
calls, replacing the various boolean flags so far used. This should make
many invocations more readable as it is clear what behaviour is
precisely requested. This also prepares ground for adding support for
more modes later on.
2017-02-17 10:22:28 +01:00
Lennart Poettering 7026a775e6 machinectl: tweak address output in "machinectl status"
With this change we'll not show an "Addresses" field for machines that
we don't know any addresses for.

This changes print_addresses() to never suffix its output with a
newline, leaving that to the caller. That's a good idea since depending
on who the caller is, different rules apply: if no addresses are found,
then the list view still wants a newline, but the status view does not.

This also changes the function to return the number of found addresses,
which can be used to decide when to add a newline or not.
2017-02-17 10:22:28 +01:00
Lennart Poettering 3401419bb8 machined: expose "UID shift" concept for containers
UID/GID mapping with userns can be arbitrarily complex. Let's break this
down to a single admin-friendly parameter: let's expose the UID/GID
shift of a container via a new bus call for each container, and let's
show this as part of "machinectl status" if it is not 0.

This should work for pretty much all real-life full OS container setups
(i.e. the stuff machined is suppose to be useful for).  For everything
else we generate a clean error, clarifying that we can't expose the
mapping.
2017-02-17 10:22:28 +01:00
Lennart Poettering a25b0dc82d resolved: default to the compile-time fallback hostname
This changes resolved to use the compile-time fallback hostname the
configured one is not set. Note that if the local hostname is set to
"localhost" then we'll instead default to "linux" here, as for
mDNS/LLMNR exposing "localhost" is actively dangerous.
2017-02-17 10:19:26 +01:00
Lennart Poettering 8341d4fa04 core: when booting up, initialize hostname to compile-time fallback hostname
When /etc/hostname isn't set, default to the configured compile-time
fallback hostname instead of "localhost" for the kernel hostname.
2017-02-17 10:19:26 +01:00
Lennart Poettering d91e8e1b69 hostname-util: default to the compile time default hostname in gethostname_malloc()
Currently, if the hostname is not set gethostname_malloc() defaults to
the "sysname", which is "linux" on Linux. Let's change that to also
honour the compile-time fallback hostname as specified on the configure
command line.
2017-02-17 10:19:26 +01:00
Evgeny Vereshchagin f73e6ee687 Merge pull request #5338 from mbiebl/fix-install-tests-target
Fix "make install-tests" when srcdir != builddir, fix valgrind-tests
2017-02-17 11:38:23 +03:00
Keith Busch 5c1be4f730 Export NVMe WWID udev attribute (#5348)
We need this for multipath support without relying on NVMe to SCSI
translations.

Signed-off-by: Keith Busch <keith.busch@intel.com>
2017-02-17 08:46:06 +01:00
Benjamin Robin 2f8e375d17 virt: Update cache if the detected vm is virtualbox (#5364) 2017-02-17 08:45:30 +01:00