Let's replace usage of fputc_unlocked() and friends by __fsetlocking(f,
FSETLOCKING_BYCALLER). This turns off locking for the entire FILE*,
instead of doing individual per-call decision whether to use normal
calls or _unlocked() calls.
This has various benefits:
1. It's easier to read and easier not to forget
2. It's more comprehensive, as fprintf() and friends are covered too
(as these functions have no _unlocked() counterpart)
3. Philosophically, it's a bit more correct, because it's more a
property of the file handle really whether we ever pass it on to another
thread, not of the operations we then apply to it.
This patch reworks all pieces of codes that so far used fxyz_unlocked()
calls to use __fsetlocking() instead. It also reworks all places that
use open_memstream(), i.e. use stdio FILE* for string manipulations.
Note that this in some way a revert of 4b61c87511.
I opted to completely generate a unit for both mount points and swaps. For
swaps, it would be possible to use fixed template unit like systemd-mkswap@.service,
because there's no information passed except the device name. For mount points,
that's not possible because both the device name and file system type need to
be passed. Nevertheless, I expect that options will need to passed to both mkfs
and mkswap, in which case it'll be necessary to create units of both types
anyway.
Let's always escape strings we receive from the user before writing them
out to unit file settings that suppor specifier expansion, so that user
strings are transported as-is.
As a follow-up for db3f45e2d2 let's do the
same for all other cases where we create a FILE* with local scope and
know that no other threads hence can have access to it.
For most cases this shouldn't change much really, but this should speed
dbus introspection and calender time formatting up a bit.
In case fstab-generator is called in the initrd, chase_symlinks()
returns with a canonical path "/sysroot/sysroot/<mountpoint>", if the
"/sysroot" prefix is present in the path.
This patch skips the "/sysroot" prefix for the chase_symlinks() call,
because "/sysroot" is already the root directory and chase_symlinks()
prepends the root directory in the canonical path returned.
This has a long history; see see 5261ba9018
which originally introduced the behavior. Unfortunately that commit
doesn't include any rationale, but IIRC the basic issue is that
systemd wants to model the real mount state as units, and symlinks
make canonicalization much more difficult.
At the same time, on a RHEL6 system (upstart), one can make e.g. `/home` a
symlink, and things work as well as they always did; but one doesn't have
access to the sophistication of mount units (dependencies, introspection, etc.)
Supporting symlinks here will hence make it easier for people to do upgrades to
RHEL7 and beyond.
The `/home` as symlink case also appears prominently for OSTree; see
https://ostree.readthedocs.io/en/latest/manual/adapting-existing/
Further work has landed in the nspawn case for this; see e.g.
d944dc9553
A basic limitation with doing this in the fstab generator (and that I hit while
doing some testing) is that we obviously can't chase symlinks into mounts,
since the generator runs early before mounts. Or at least - doing so would
require multiple passes over the fstab data (as well as looking at existing
mount units), and potentially doing multi-phase generation. I'm not sure it's
worth doing that without a real world use case. For now, this will fix at least
the OSTree + `/home` <https://bugzilla.redhat.com/show_bug.cgi?id=1382873> case
mentioned above, and in general anyone who for whatever reason has symlinks in
their `/etc/fstab`.
When "bg" is specified for NFS mounts, and if the server is
not accessible, two behaviors are possible depending on networking
details.
If a definitive error is received, such a EHOSTUNREACH or ECONNREFUSED,
mount.nfs will fork and continue in the background, while /bin/mount
will report success.
If no definitive error is reported but the connection times out
instead, then the mount.nfs timeout will normally be longer than the
systemd.mount timeout, so mount.nfs will be killed by systemd.
In the first case the mount has appeared to succeed even though
it hasn't. This can be confusing. Also the background mount.nfs
will never get cleaned up, even if the mount unit is stopped.
In the second case, mount.nfs is killed early and so the mount will
not complete when the server comes back.
Neither of these are ideal.
This patch modifies the options when an NFS bg mount is detected to
force an "fg" mount, but retain the default "retry" time of 10000
minutes that applies to "bg" mounts.
It also imposes "nofail" behaviour and sets the TimeoutSec for the
mount to "infinity" so the retry= time is allowed to complete.
This provides near-identical behaviour to an NFS bg mount started directly
by "mount -a". The only difference is that systemd will not wait for
the first mount attempt, while "mount -a" will.
Fixes#6046
This extends 2d79a0bbb9 to the kernel
command line parsing.
The parsing is changed a bit to only understand "0" as infinity. If units are
specified, parse normally, e.g. "0s" is just 0. This makes it possible to
provide a zero timeout if necessary.
Simple test is added.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1462378.
In case the device field of fstab record is an actual device (not an address)
apply same dependencies to the device unit as to the mount unit, i.e.
> After=network-online.target network.target
> Wants=network-online.targe
It makes sense to start the device expecting job only when network is actually
ready (consider e.g. iSCSI devices) since it is device's implicit dependency.
The eventual implementation should better obtain network flag from udev
database and would also take into account device hierarchy (see [1]).
This patch approximates that by taking the `_netdev` option as a hint from the
user both about the filesystem and underlying device. (For local devices with
network filesystems (e.g. ocfs2), this hint leads to unused dependencies.)
[1] https://lists.freedesktop.org/archives/systemd-devel/2014-October/024718.html
Commit ae3251851 changed the fprintf() format argument into a variable
which triggers a gcc 6.3 warning/error:
src/fstab-generator/fstab-generator.c:243:17: error: format not a string literal,
argument types not checked [-Werror=format-nonliteral]
fprintf(f, format, res);
This is a false positive, as the function is only being called with
constant (not user-definable) arguments which are valid format strings.
Currently fstab entries with 'nofail' option are mounted
asynchronously and there is no way how to specify dependencies
between such fstab entry and another units. It means that
users are forced to write additional dependency units manually.
The patch introduces new systemd fstab options:
x-systemd.before=<PATH>
x-systemd.after=<PATH>
- to specify another mount dependency (PATH is translated to unit name)
x-systemd.before=<UNIT>
x-systemd.after=<UNIT>
- to specify arbitrary UNIT dependency
For example mount where A should be mounted before local-fs.target unit:
/dev/sdb1 /mnt/test/A none nofail,x-systemd.before=local-fs.target
This adds a generator and a small service that will look for "roothash="
on the kernel command line and use it for setting up a very partition
for the root device.
This provides similar functionality to nspawn's existing --roothash=
switch.
This adds support for a new kernel command line option "systemd.volatile=" that
provides the same functionality that systemd-nspawn's --volatile= switch
provides, but for host systems (i.e. systems booting with a kernel).
It takes the same parameter and has the same effect.
In order to implement systemd.volatile=yes a new service
systemd-volatile-root.service is introduced that only runs in the initrd and
rearranges the root directory as needed to become a tmpfs instance. Note that
systemd.volatile=state is implemented different: it simply generates a
var.mount unit file that is part of the normal boot and has no effect on the
initrd execution.
The way this is implemented ensures that other explicit configuration for /var
can always override the effect of these options. Specifically, the var.mount
unit is generated in the "late" generator directory, so that it only is in
effect if nothing else overrides it.
This improves kernel command line parsing in a number of ways:
a) An kernel option "foo_bar=xyz" is now considered equivalent to
"foo-bar-xyz", i.e. when comparing kernel command line option names "-" and
"_" are now considered equivalent (this only applies to the option names
though, not the option values!). Most of our kernel options used "-" as word
separator in kernel command line options so far, but some used "_". With
this change, which was a source of confusion for users (well, at least of
one user: myself, I just couldn't remember that it's systemd.debug-shell,
not systemd.debug_shell). Considering both as equivalent is inspired how
modern kernel module loading normalizes all kernel module names to use
underscores now too.
b) All options previously using a dash for separating words in kernel command
line options now use an underscore instead, in all documentation and in
code. Since a) has been implemented this should not create any compatibility
problems, but normalizes our documentation and our code.
c) All kernel command line options which take booleans (or are boolean-like)
have been reworked so that "foobar" (without argument) is now equivalent to
"foobar=1" (but not "foobar=0"), thus normalizing the handling of our
boolean arguments. Specifically this means systemd.debug-shell and
systemd_debug_shell=1 are now entirely equivalent.
d) All kernel command line options which take an argument, and where no
argument is specified will now result in a log message. e.g. passing just
"systemd.unit" will no result in a complain that it needs an argument. This
is implemented in the proc_cmdline_missing_value() function.
e) There's now a call proc_cmdline_get_bool() similar to proc_cmdline_get_key()
that parses booleans (following the logic explained in c).
f) The proc_cmdline_parse() call's boolean argument has been replaced by a new
flags argument that takes a common set of bits with proc_cmdline_get_key().
g) All kernel command line APIs now begin with the same "proc_cmdline_" prefix.
h) There are now tests for much of this. Yay!
This adds a new systemd fstab option x-systemd.mount-timeout. The option
adds a timeout value that specifies how long systemd waits for the mount
command to finish. It allows to mount huge btrfs volumes without issues.
This is equivalent to adding option TimeoutSec= to [Mount] section in a
mount unit file.
fixes#4055
This stripping is contolled by a new boolean parameter. When the parameter
is true, it means that the caller does not care about the distinction between
initrd and real root, and wants to act on both rd-dot-prefixed and unprefixed
parameters in the initramfs, and only on the unprefixed parameters in real
root. If the parameter is false, behaviour is the same as before.
Changes by caller:
log.c (systemd.log_*): changed to accept rd-dot-prefix params
pid1: no change, custom logic
cryptsetup-generator: no change, still accepts rd-dot-prefix params
debug-generator: no change, does not accept rd-dot-prefix params
fsck: changed to accept rd-dot-prefix params
fstab-generator: no change, custom logic
gpt-auto-generator: no change, custom logic
hibernate-resume-generator: no change, does not accept rd-dot-prefix params
journald: changed to accept rd-dot-prefix params
modules-load: no change, still accepts rd-dot-prefix params
quote-check: no change, does not accept rd-dot-prefix params
udevd: no change, still accepts rd-dot-prefix params
I added support for "rd." params in the three cases where I think it's
useful: logging, fsck options, journald forwarding options.
Let's follow the same logic for all mounts here: log errors, and exit the
process uncleanly ultimately, but do not skip further mounts if we encounter a
problem with an earlier one.
Fixes: #2344
We have to check for OOM here, let's add that. There's really no point in
checking for path_is_absolute() on the result however, as there's no particular
reason why that should be refused. Also, we don't have similar checks for the
other mount devices the generator deals with, hence don't bother with it here
either. Let's remove that check.
(And it shouldn't return made-up errors like "-1" in this case anyway.)
root=/dev/nfs is a legacy option for the kernel to handle root on NFS.
Documentation for this kernel command line option
can be found in the kernel source tree:
Documentation/filesystems/nfs/nfsroot.txt
Add a synchronization point so that custom initramfs units can run
after the root device becomes available, before it is fsck'd and
mounted.
This is useful for custom initramfs units that may modify the
root disk partition table, where the root device is not known in
advance (it's dynamically selected by the generators).
Without this patch applied the mount unit with 'automount' option was still
pulled by local-fs.target and thus was activated during the boot process which
defeats the purpose of the 'automount' option:
$ grep /mnt /etc/fstab
/dev/vdb1 /mnt ext2 defaults,x-systemd.automount 0 0
$ reboot
...
$ mount | grep mnt
systemd-1 on /mnt type autofs (rw,relatime,fd=34,pgrp=1,timeout=0,minproto=5,maxproto=5,direct)
/dev/vdb1 on /mnt type ext2 (rw,relatime)
$ systemctl status mnt.mount | grep Active
Active: active (mounted) since Thu 2016-03-03 21:36:22 CET; 42s ago
With the patch applied:
$ reboot
...
$ mount | grep mnt
systemd-1 on /mnt type autofs (rw,relatime,fd=22,pgrp=1,timeout=0,minproto=5,maxproto=5,direct)
$ systemctl status mnt.mount | grep Active
Active: inactive (dead)
$ ls /mnt
lost+found
$ systemctl status mnt.mount | grep Active
Active: active (mounted) since Thu 2016-03-03 21:47:32 CET; 4s ago
According to bootup(7) and the behavior when /usr is specified in /etc/fstab, the /sysroot/usr mount should be before initrd-fs.target, not before initrd-root-fs.target.
The sysroot mount is already taken care of by the add_sysroot_mount function. With this condition left in, we can get something like this:
initrd-root-fs.target.requires
`-- usr.mount -> /run/systemd/generator/usr.mount
in the main system (i.e., not in the initramfs). In the initramfs, the previous condition already kicks in.
There are more than enough calls doing string manipulations to deserve
its own files, hence do something about it.
This patch also sorts the #include blocks of all files that needed to be
updated, according to the sorting suggestions from CODING_STYLE. Since
pretty much every file needs our string manipulation functions this
effectively means that most files have sorted #include blocks now.
Also touches a few unrelated include files.
Introduce a proper enum, and don't pass around string ids anymore. This
simplifies things quite a bit, and makes virtualization detection more
similar to architecture detection.
This patch simplify swapon usage in systemd. The command swapon(8)
since util-linux v2.26 supports "-o <list>". The idea is exactly the
same like for mount(8). The -o specifies options in fstab-compatible
way. For systemd it means that it does not have to care about things
like "discard" or another swapon specific options.
swapon -o <options-from-fstab>
For backward compatibility the code cares about "Priority:" swap unit
field (for a case when Priority: is set, but pri= in the Options: is
missing).
References: http://lists.freedesktop.org/archives/systemd-devel/2014-October/023576.html
Currently we have no way how to specify dependencies between fstab
entries (or another units) in the /etc/fstab. It means that users are
forced to bypass fstab and write .mount units manually.
The patch introduces new systemd fstab options:
x-systemd.requires=<PATH>
- to specify dependence an another mount (PATH is translated to unit name)
x-systemd.requires=<UNIT>
- to specify dependence on arbitrary UNIT
x-systemd.requires-mounts-for=<PATH ...>
- to specify dependence on another paths, implemented by
RequiresMountsFor=. The option may be specified more than once.
For example two bind mounts where B depends on A:
/mnt/test/A /mnt/test/A none bind,defaults
/mnt/test/A /mnt/test/B none bind,x-systemd.requires=/mnt/test/A
More complex example with overlay FS where one mount point depends on
"low" and "upper" directories:
/dev/sdc1 /mnt/low ext4 defaults
/dev/sdc2 /mnt/high ext4 defaults
overlay /mnt/merged overlay lowerdir=/mnt/low,upperdir=/mnt/high/data,workdir=/mnt/high/work,x-systemd.requires-mounts-for=/mnt/low,x-systemd.requires-mounts-for=mnt/high
https://bugzilla.redhat.com/show_bug.cgi?id=812826https://bugzilla.redhat.com/show_bug.cgi?id=1164334
A variety of changes:
- Make sure all our calls distuingish OOM from other errors if OOM is
not the only error possible.
- Be much stricter when parsing escaped paths, do not accept trailing or
leading escaped slashes.
- Change unit validation to take a bit mask for allowing plain names,
instance names or template names or an combination thereof.
- Refuse manipulating invalid unit name
This makes it obvious that those functions are only usable in the
initramfs.
Also, add a warning when noauto, nofail, or automount is used for the
root fs, instead of silently ignoring. Using those options would be a
sign of significant misconfiguration, and if we bother to check for
them, than let's go all the way and complain.
Other various small cleanups and reformattings elsewhere.
filtered was used to store an allocated string twice. The first allocation was
thus lost. The string is not needed for anything, so simply skip the allocation.
Fixup for deb0a77cf0.
And other non-device entries (like fstab does).
Mount whatever the user asked to be mounted on / on the kernel
command line. Do less sanity check and do *not* bail out
when the mount device looks strange or does not exist.
This basically makes the changes for deviceless filesystems
from yesterday unnecessary and is in line with what we do for
filesystems set up in fstab.
Remove some code that is now dead (reverting fb02a2775a and
b0438462).
[tomegun:
- change patch title/description a bit.
- don't touch the /usr logic, that would be a separate change and
we don't currently have a convincing use-case for that.
- don't bail out on /sys ro. This only makes sense in containers,
where we would not be doing this anyway. If there is a use-case
we could consider that as a separate patch.]
A failed priority is not something worth stopping boot over. Most people
have only one swap device, in which case priority is irrelevant, and even
if there is more than one swap device, they are all usable, and ignoring the
priority field should only result in some loss of performance.
The kernel will report the priority as -1 if not set, so it's easy for
people to make this mistake.
https://bugzilla.redhat.com/show_bug.cgi?id=1204336
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
After all it is now much more like strjoin() than strappend(). At the
same time, add support for NULL sentinels, even if they are normally not
necessary.
We would ignore options like "fail" and "auto", and for any option
which takes a value the first assignment would win. Repeated and
options equivalent to the default are rarely used, but they have been
documented forever, and people might use them. Especially on the
kernel command line it is easier to append a repeated or negated
option at the end.
This fixes parsing of options in shared/generator.c. Existing code
had some issues:
- it would treate whitespace and semicolons as seperators. fstab(5)
is pretty clear that only commas matter. And the syntax does
not allow for spaces to be inserted in the field in fstab.
Whitespace might be escaped, but then it should not seperate
options. Treat whitespace and semicolons as any other character.
- it assumed that x-systemd.device-timeout would always be followed
by "=". But this is not guaranteed, hasmntopt will return this
option even if there's no value. Uninitialized memory could be read.
- some error paths would log, and inconsistently, some would just
return an error code.
Filtering is split out to a separate function and tests are added.
Similar code paths in other places are adjusted to use the new function.
We always should use the same checks when deciding whether swap support
and mounting of devices is supported. Hence, let's make
fstab-generator's logic more similar to the usual logic we follow:
a) Look for /proc/swaps and no container support before activating
swaps.
b) Look for /sys being writable befire supporting device mounts.
There is no need to require mount.usrflags. The original implementation
assumed that a btrfs subvolume would always be needed but that is not
applicable to systems that do not use btrfs for /usr.
Similar to using rootflags= for the default of mount.usrflags=, append
the classic 'ro' and 'rw' flags to the mount options.
If the format string contains %m, clearly errno must have a meaningful
value, so we might as well use log_*_errno to have ERRNO= logged.
Using:
find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\((".*%m.*")/log_\1_errno(errno, \2/'
Plus some whitespace, linewrap, and indent adjustments.
As a followup to 086891e5c1 "log: add an "error" parameter to all
low-level logging calls and intrdouce log_error_errno() as log calls
that take error numbers", use sed to convert the simple cases to use
the new macros:
find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\("(.*)%s"(.*), strerror\(-([a-zA-Z_]+)\)\);/log_\1_errno(-\4, "\2%m"\3);/'
Multi-line log_*() invocations are not covered.
And we also should add log_unit_*_errno().
systemd stops adding automatic dependencies on swap.target to swap
units. If a dependency is required, it has to be added by unit
configuration. fstab-generator did that already, except that now it is
modified to create a Requires or Wants type dependency, depending on
whether nofail is specified in /etc/fstab. This makes .swap units
obey the nofail/noauto options more or less the same as .mount units.
Documentation is extended to clarify that, and to make
systemd.mount(5) and system.swap(5) more similar. The gist is not
changed, because current behaviour actually matches existing
documentation.
https://bugs.freedesktop.org/show_bug.cgi?id=86488
For now, it's systemd itself that parses the options string, but as soon
as util-linux' swapon can take the option string directly with -o we
should pass it on unmodified.
Previous code would only return correct results when discard
was the last option.
While at it, avoid incorrect behaviour for (invalid) 'pri' option
not followed by '=...', and also do not return -1 as the error code.
Instead of adjusting job timeouts in the core, let fstab-generator
write out a dropin snippet with the appropriate JobTimeout.
x-systemd-device.timeout option is removed from Options= line
in the generated unit.
The functions to write dropins are moved from core/unit.c to
shared/dropin.c, to make them available outside of core.
generator.c is moved to libsystemd-label, because it now uses
functions defined in dropin.c, which are in libsystemd-label.
Since these device nodes will never appear in the container anyway
there's no point in waiting for them.
This makes it easier to boot images generated with general purpose
installers like Anaconda which unconditionally populate /etc/fstab to
boot in containers.
- Add support for finding and mounting /srv based on GPT data, similar
to how we already handly /home.
- Share the fsck logic between GPT, EFI and fstab generators
- Make sure we never run the EFI generator inside containers
- Drop DefaultDependencies=no from EFI mount units
- Other fixes