This just sets up some variables the loaded configuration will then
modify. Let's invoke it hence right before loading the configuration.
This moves the initialization just a tiny bit later, but that shouldn't
matter, since we never access it in-between.
Let's make sure that if we are PID 1 we are invoked in ACTION_RUN mode,
and in arg_system mode, as well as the opposite.
Everything else is untested and probably not worth supporting hence
let's bail out early if people try anyway.
Let's shorten main() a bit, and split out everything that loads our
configuration and runtime parameters into a function of its own.
No changes in behaviour.
Dec 14 14:10:54 krowka systemd[1]: System is tainted: overflowgid-not-65534
-- Subject: The system is configured in a way that might cause problems
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- The following "tags" are possible:
-- - "split-usr" — /usr is a separate file system and was not mounted when systemd
-- was booted
-- - "cgroups-missing" — the kernel was compiled without cgroup support or access
-- to expected interface files is resticted
-- - "var-run-bad" — /var/run is not a symlink to /run
-- - "overflowuid-not-65534" — the kernel user ID used for "unknown" users (with
-- NFS or user namespaces) is not 65534
-- - "overflowgid-not-65534" — the kernel group ID used for "unknown" users (with
-- NFS or user namespaces) is not 65534
-- Current system is tagged as overflowgid-not-65534.
Our CODING_STYLE suggests not comparing with NULL, but relying on C's
downgrade-to-bool feature for that. Fix up some code to match these
guidelines. (This is not comprehensive, the coccinelle output for this
is unfortunately kinda borked)
This option allows a device path to be specified for the systemd
watchdog (both runtime and shutdown).
If a system requires a watchdog other than /dev/watchdog (pointing to
/dev/watchdog0) to be used to reboot the system, this setting should be
changed to the relevant watchdog device path (e.g. /dev/watchdog1).
The tainting logic existed for a long time, but was hidden inside the
bus interfaces. Let's give it a small bit more coverage, by logging its
value early at boot during initialization.
This moves the invocation a bit later, but that shoudln't matter. By
moving it we gain two things: first of all, its closer to other code
where it belongs, secondly its naturally conditioned properly, as we no
longer will rewrite the container ID file on every reexecution again,
and not in test mode either.
No real functional changes, just some rearranging to shorten the overly
long main() function a bit.
This gets rid of the arm_reboot_watchdog variable, as it can be directly
derived from shutdown_verb, and we need it only one time. By dropping it
we can reduce the number of arguments we need to pass around.
read_one_line_file() always returns <= 0, so the code was OK, but let's write
the check a bit differently to make it obvious that min_max is always set.
This makes things quite a bit more systematic I think, as we can
systematically operate on all timestamps, for example for the purpose of
serialization/deserialization.
This rework doesn't necessarily make things shorter in the individual
lines, but it does reduce the line count a bit.
(This is useful particularly when we want to add additional timestamps,
for example to solve #7023)
Presets are useful to initialize uninitialized /etc, but that doesn't
apply to the initrd.
Also, let's rename etc_empty → first_boot. After all, the variable
doesn't actually reflect whether /etc is really empty, it just reflects
whether /etc/machine-id existed originally or not. Moreover, we later on
directly initialize manager_set_first_boot() from it, hence let's just
name it the same way all through the codepath, to make this all less
confusing.
See: #7100
This function is really not a method of the Manager object (implemented
in manager.c), but just a helper in main.c. Hence let's not confusingly
name it the way methods are called.
These three settings only make sense within the context of actual unit
files, hence filter this out when applied to the per-manager default,
and generate a log message about it.
We neglected to set error_message for errors which occur _after_ the
`finish` label. These fatal errors only happen in paths where `finish`
was reached successfully, i.e. error_message has not already been set
(and this analysis is simple enough that this need not cause too much
headaches. Also our new assignments to error_message come immediately
after execve() calls, which would have lost the error_message if it had
been set).
Also print a status message when we fail to exec init, otherwise the only
sign the user will see is `# ` :).
This addresses the lack of error messages pointed out in issue #6827.
The advantage is that is the name is mispellt, cpp will warn us.
$ git grep -Ee "conf.set\('(HAVE|ENABLE)_" -l|xargs sed -r -i "s/conf.set\('(HAVE|ENABLE)_/conf.set10('\1_/"
$ git grep -Ee '#ifn?def (HAVE|ENABLE)' -l|xargs sed -r -i 's/#ifdef (HAVE|ENABLE)/#if \1/; s/#ifndef (HAVE|ENABLE)/#if ! \1/;'
$ git grep -Ee 'if.*defined\(HAVE' -l|xargs sed -i -r 's/defined\((HAVE_[A-Z0-9_]*)\)/\1/g'
$ git grep -Ee 'if.*defined\(ENABLE' -l|xargs sed -i -r 's/defined\((ENABLE_[A-Z0-9_]*)\)/\1/g'
+ manual changes to meson.build
squash! build-sys: use #if Y instead of #ifdef Y everywhere
v2:
- fix incorrect setting of HAVE_LIBIDN2
On current kernels BPF_MAP_TYPE_LPM_TRIE bpf maps are charged against
RLIMIT_MEMLOCK even for privileged users that have CAP_IPC_LOCK. Given
that mlock() generally ignores RLIMIT_MEMLOCK if CAP_IPC_LOCK is set
this appears to be an oversight in the kernel. Either way, until that's
fixed, let's just bump RLIMIT_MEMLOCK for the root user considerably, as
the default is quite limiting, and doesn't permit us to create more than
a few TRIE maps.
Now generators are only run in systemd --test mode, where this makes
most sense (how are you going to test what would happen otherwise?).
Fixes#6842.
v2:
- rename test_run to test_run_flags
Most importantly, don't collect open socket activation fds when in
--test mode. This specifically created a problem because we invoke
pager_open() beforehand (which these days makes copies of the original
stdout/stderr in order to be able to restore them when the pager goes
away) and we might mistakenly the fd copies it creates as socket
activation fds.
Fixes: #6383
This commit moves the first-boot system preset-settings evaluation out
of main and into the manager startup logic itself. Notably, it reverses
the order between generators and presets evaluation, so that any changes
performed by first-boot generators are taken into the account by presets
logic.
After this change, units created by a generator can be enabled as part
of a preset.
Some kdbus_flag and memfd related parts are left behind, because they
are entangled with the "legacy" dbus support.
test-bus-benchmark is switched to "manual". It was already broken before
(in the non-kdbus mode) but apparently nobody noticed. Hopefully it can
be fixed later.
This moves pretty much all uses of getpid() over to getpid_raw(). I
didn't specifically check whether the optimization is worth it for each
replacement, but in order to keep things simple and systematic I
switched over everything at once.
The CapabilityBoundingSet option only makes sense if we are running as
PID1.
The system.conf.d(5) manpage, already states that the CapabilityBoundingSet
option:
Controls which capabilities to include in the capability bounding set
for PID 1 and its children.
https://github.com/systemd/systemd/issues/6080
All those uses were correct, but I think it's better to be explicit.
Using implicit errno is too error prone, and with this change we can require
(in the sense of a style guideline) that the code is always specified.
Helpful query: git grep -n -P 'log_[^s][a-z]+\(.*%m'
This has systemd look at /proc/sys/fs/nr_open to find the current maximum of
open files compiled into the kernel and tries to set the RLIMIT_NOFILE max to
it. This has the advantage the value chosen as limit is less arbitrary and also
improves the behavior of systemd in containers that have an rlimit set: When
systemd currently starts in a container that has RLIMIT_NOFILE set to e.g.
100000 systemd will lower it to 65536. With this patch systemd will try to set
the nofile limit to the allowed kernel maximum. If this fails, it will compute
the minimum of the current set value (the limit that is set on the container)
and the maximum value as soft limit and the currently set maximum value as the
maximum value. This way it retains the limit set on the container.
This substantially reworks the seccomp code, to ensure better
compatibility with some architectures, including i386.
So far we relied on libseccomp's internal handling of the multiple
syscall ABIs supported on Linux. This is problematic however, as it does
not define clear semantics if an ABI is not able to support specific
seccomp rules we install.
This rework hence changes a couple of things:
- We no longer use seccomp_rule_add(), but only
seccomp_rule_add_exact(), and fail the installation of a filter if the
architecture doesn't support it.
- We no longer rely on adding multiple syscall architectures to a single filter,
but instead install a separate filter for each syscall architecture
supported. This way, we can install a strict filter for x86-64, while
permitting a less strict filter for i386.
- All high-level filter additions are now moved from execute.c to
seccomp-util.c, so that we can test them independently of the service
execution logic.
- Tests have been added for all types of our seccomp filters.
- SystemCallFilters= and SystemCallArchitectures= are now implemented in
independent filters and installation logic, as they semantically are
very much independent of each other.
Fixes: #4575
This improves kernel command line parsing in a number of ways:
a) An kernel option "foo_bar=xyz" is now considered equivalent to
"foo-bar-xyz", i.e. when comparing kernel command line option names "-" and
"_" are now considered equivalent (this only applies to the option names
though, not the option values!). Most of our kernel options used "-" as word
separator in kernel command line options so far, but some used "_". With
this change, which was a source of confusion for users (well, at least of
one user: myself, I just couldn't remember that it's systemd.debug-shell,
not systemd.debug_shell). Considering both as equivalent is inspired how
modern kernel module loading normalizes all kernel module names to use
underscores now too.
b) All options previously using a dash for separating words in kernel command
line options now use an underscore instead, in all documentation and in
code. Since a) has been implemented this should not create any compatibility
problems, but normalizes our documentation and our code.
c) All kernel command line options which take booleans (or are boolean-like)
have been reworked so that "foobar" (without argument) is now equivalent to
"foobar=1" (but not "foobar=0"), thus normalizing the handling of our
boolean arguments. Specifically this means systemd.debug-shell and
systemd_debug_shell=1 are now entirely equivalent.
d) All kernel command line options which take an argument, and where no
argument is specified will now result in a log message. e.g. passing just
"systemd.unit" will no result in a complain that it needs an argument. This
is implemented in the proc_cmdline_missing_value() function.
e) There's now a call proc_cmdline_get_bool() similar to proc_cmdline_get_key()
that parses booleans (following the logic explained in c).
f) The proc_cmdline_parse() call's boolean argument has been replaced by a new
flags argument that takes a common set of bits with proc_cmdline_get_key().
g) All kernel command line APIs now begin with the same "proc_cmdline_" prefix.
h) There are now tests for much of this. Yay!
It's rather hard to parse the confirmation messages (enabled with
systemd.confirm_spawn=true) amongst the status messages and the kernel
ones (if enabled).
This patch gives the possibility to the user to redirect the confirmation
message to a different virtual console, either by giving its name or its path,
so those messages are separated from the other ones and easier to read.
We don't have plural in the name of any other -util files and this
inconsistency trips me up every time I try to type this file name
from memory. "formats-util" is even hard to pronounce.
This stripping is contolled by a new boolean parameter. When the parameter
is true, it means that the caller does not care about the distinction between
initrd and real root, and wants to act on both rd-dot-prefixed and unprefixed
parameters in the initramfs, and only on the unprefixed parameters in real
root. If the parameter is false, behaviour is the same as before.
Changes by caller:
log.c (systemd.log_*): changed to accept rd-dot-prefix params
pid1: no change, custom logic
cryptsetup-generator: no change, still accepts rd-dot-prefix params
debug-generator: no change, does not accept rd-dot-prefix params
fsck: changed to accept rd-dot-prefix params
fstab-generator: no change, custom logic
gpt-auto-generator: no change, custom logic
hibernate-resume-generator: no change, does not accept rd-dot-prefix params
journald: changed to accept rd-dot-prefix params
modules-load: no change, still accepts rd-dot-prefix params
quote-check: no change, does not accept rd-dot-prefix params
udevd: no change, still accepts rd-dot-prefix params
I added support for "rd." params in the three cases where I think it's
useful: logging, fsck options, journald forwarding options.
Since we ignore the result anyway, downgrade errors to warning.
log_oom() will still emit an error, but that's mostly theoretical, so it
is not worth complicating the code to avoid the small inconsistency
If `--test` command line option was passed, the systemd set skip_setup
to true during bootup. But after this we check again that arg_action is
test or help and opens pager depends on result.
We should skip setup in a case when `--test` is passed, but it is also
safe to set skip_setup in a case of `--help`. So let's remove first
check and move skip_setup = true to the second check.
When we print information about PID 1's crashdump subprocess failing. In this
case we *know* that we do not generate LSB exit codes, as it's basically PID 1
itself that exited there.
Previously we've used free_and_strdup() to fill arg_default_unit with unit
name, If we didn't pass default unit name through a kernel command line or
command line arguments. But we can use just strdup() instead of
free_and_strdup() for this, because we will start fill arg_default_unit
only if it wasn't set before.
systemd fills arg_default_unit during startup with default.target
value. But arg_default_unit may be overwritten in parse_argv() or
parse_proc_cmdline_item().
Let's check value of arg_default_unit after calls of parse_argv()
and parse_proc_cmdline_item() and fill it with default.target if
it wasn't filled before. In this way we will not spend unnecessary
time to for filling arg_default_unit with default.target.
For some certification, it should not be possible to reboot the machine through ctrl-alt-delete. Currently we suggest our customers to mask the ctrl-alt-delete target, but that is obviously not enough.
Patching the keymaps to disable that is really not a way to go for them, because the settings need to be easily checked by some SCAP tools.
The parsing functions for [User]TasksMax were inconsistent. Empty string and
"infinity" were interpreted as no limit for TasksMax but not accepted for
UserTasksMax. Update them so that they're consistent with other knobs.
* Empty string indicates the default value.
* "infinity" indicates no limit.
While at it, replace opencoded (uint64_t) -1 with CGROUP_LIMIT_MAX in TasksMax
handling.
v2: Update empty string to indicate the default value as suggested by Zbigniew
Jędrzejewski-Szmek.
v3: Fixed empty UserTasksMax handling.
This way, invoking nspawn from a shell in the best case inherits the TERM
setting all the way down into the login shell spawned in the container.
Fixes: #3697
IMA wiki says: "If the IMA policy contains LSM labels, then the LSM
policy must be loaded prior to the IMA policy." Right now, in case of
Smack, the IMA policy is loaded before the Smack policy. Move the order
around to allow Smack labels to be used in IMA policy.
the ACTION_DONE was introduced in the 4288f61921 (dbus: automatically
generate and install introspection files ) commit and was used in
systemd --introspect command.
Later 'introspect' command was removed in the ca2871d9b (bus: remove
static introspection file export) commit and have no users anymore.
So we can remove it.
As it turns out 512 is max number of tasks per service is hit by too many
applications, hence let's bump it a bit, and make it relative to the system's
maximum number of PIDs. With this change the new default is 15%. At the
kernel's default pids_max value of 32768 this translates to 4915. At machined's
default TasksMax= setting of 16384 this translates to 2457.
Why 15%? Because it sounds like a round number and is close enough to 4096
which I was going for, i.e. an eight-fold increase over the old 512
Summary:
| on the host | in a container
old default | 512 | 512
new default | 4915 | 2457
When systemd is started by the kernel, the kernel set the TERM
environment variable unconditionnally to "linux" no matter the console
device used. This might be an issue for dumb devices with no colors
support.
This patch uses default_term_for_tty() for getting a more accurate
value. But it makes sure to keep the user preferences (if any) which
might be passed via the kernel command line. For that purpose /proc
should be mounted.
The current raw_clone function takes two arguments, the cloning flags and
a pointer to the stack for the cloned child. The raw cloning without
passing a "thread main" function does not make sense if a new stack is
specified, as it returns in both the parent and the child, which will fail
in the child as the stack is virgin. All uses of raw_clone indeed pass NULL
for the stack pointer which indicates that both processes should share the
stack address (so you better don't pass CLONE_VM).
This commit refactors the code to not require the caller to pass the stack
address, as NULL is the only sensible option. It also adds the magic code
needed to make raw_clone work on sparc64, which does not return 0 in %o0
for the child, but indicates the child process by setting %o1 to non-zero.
This refactoring is not plain aesthetic, because non-NULL stack addresses
need to get mangled before being passed to the clone syscall (you have to
apply STACK_BIAS), whereas NULL must not be mangled. Implementing the
conditional mangling of the stack address would needlessly complicate the
code.
raw_clone is moved to a separete header, because the burden of including
the assert machinery and sched.h shouldn't be applied to every user of
missing_syscalls.h
On the unified hierarchy, blkio controller is renamed to io and the interface
is changed significantly.
* blkio.weight and blkio.weight_device are consolidated into io.weight which
uses the standardized weight range [1, 10000] with 100 as the default value.
* blkio.throttle.{read|write}_{bps|iops}_device are consolidated into io.max.
Expansion of throttling features is being worked on to support
work-conserving absolute limits (io.low and io.high).
* All stats are consolidated into io.stats.
This patchset adds support for the new interface. As the interface has been
revamped and new features are expected to be added, it seems best to treat it
as a separate controller rather than trying to expand the blkio settings
although we might add automatic translation if only blkio settings are
specified.
* io.weight handling is mostly identical to blkio.weight[_device] handling
except that the weight range is different.
* Both read and write bandwidth settings are consolidated into
CGroupIODeviceLimit which describes all limits applicable to the device.
This makes it less painful to add new limits.
* "max" can be used to specify the maximum limit which is equivalent to no
config for max limits and treated as such. If a given CGroupIODeviceLimit
doesn't contain any non-default configs, the config struct is discarded once
the no limit config is applied to cgroup.
* lookup_blkio_device() is renamed to lookup_block_device().
Signed-off-by: Tejun Heo <htejun@fb.com>
We generally follow the rule that for time settings we suffix the setting name
with "Sec" to indicate the default unit if none is specified. The only
exception was the rate limiting interval settings. Fix this, and keep the old
names for compatibility.
Do the same for journald's RateLimitInterval= setting
This adds two new settings TriggerLimitIntervalSec= and TriggerLimitBurst= that
define a rate limit for activation of socket units. When the limit is hit, the
socket is is put into a failure mode. This is an alternative fix for #2467,
since the original fix resulted in issue #2684.
In a later commit the StartLimitInterval=/StartLimitBurst= rate limiter will be
changed to be applied after any start conditions checks are made. This way,
there are two separate rate limiters enforced: one at triggering time, before
any jobs are queued with this patch, as well as the start limit that is moved
again to be run immediately before the unit is activated. Condition checks are
done in between the two, and thus no longer affect the start limit.
This changes the behaviour of pid1 in the following ways:
- obviously $TERM is now checked,
- $SYSTEMD_COLORS is now honoured too, before only SYSTEMD_LOG_COLORS was checked,
- isatty() is run on stdout not stderr.
As requested in #3025.
Previously, we had two enums ManagerRunningAs and UnitFileScope, that were
mostly identical and converted from one to the other all the time. The latter
had one more value UNIT_FILE_GLOBAL however.
Let's simplify things, and remove ManagerRunningAs and replace it by
UnitFileScope everywhere, thus making the translation unnecessary. Introduce
two new macros MANAGER_IS_SYSTEM() and MANAGER_IS_USER() to simplify checking
if we are running in one or the user context.
Add path argument to clock_is_localtime() and default to "/etc/adjtime" if it's
NULL. This makes the function testable.
Add test-clock: initial test cases for some scenarios, using a temporary file.
This also checks the behaviour with a NULL (i. e. the system's /etc/adjtime)
file.
Many subsystems define own pager_open_if_enabled() function which
checks '--no-pager' command line argument and open pager depends
on its value. All implementations of pager_open_if_enabled() are
the same. Let's merger this function with pager_open() from the
shared/pager.c and remove pager_open_if_enabled() from all subsytems
to prevent code duplication.
Throughout the tree there's spurious use of spaces separating ++ and --
operators from their respective operands. Make ++ and -- operator
consistent with the majority of existing uses; discard the spaces.
If we are not PID 1 and started as init, we executing systemctl
with execv(). Here no need to set errno manually, because in a
failure case, because the execv() anyway will set errno depends
on a error.
The kernel sets RLIMIT_CORE to 0 by default. Let's bump this to unlimited by
default (for systemd itself and all processes we fork off), so that the
coredump hooks have an effect if they honour it.
Bumping RLIMIT_CORE of course would have the effect that "core" files will end
up on the system at various places, if no coredump hook is used. To avoid this,
make sure PID1 sets the core pattern to the empty string by default, so that
this logic is disabled.
This change in defaults should be useful for all systems where coredump hooks
are used, as it allows useful usage of RLIMIT_CORE from these hooks again. OTOH
systems that expect that coredumps are placed under the name "core" in the
current directory will break with this change. Given how questionnable this
behaviour is, and given that no common distro makes use of this by default it
shouldn't be too much of a loss. Also, the old behaviour may be restored by
explicitly configuring a "core_pattern" of "core", and setting the default
system RLIMIT_CORE to 0 again via system.conf.
For use in timesyncd we already defined a compile-time "epoch" value, which is based on the mtime of the NEWS file, and
specifies a point in time we know lies in the past at runtime. timesyncd uses this to filter out nonsensical timestamp
file data, and bump the system clock to a time that is after the build time of systemd. This patch adds similar bumping
code to earliest PID 1 initialization, so that the system never continues operation with a clock that is in the 1970ies
or even 1930s.
This clean-ups timeout handling in PID 1. Specifically, instead of storing 0 in internal timeout variables as
indication for a disabled timeout, use USEC_INFINITY which is in-line with how we do this in the rest of our code
(following the logic that 0 means "no", and USEC_INFINITY means "never").
This also replace all usec_t additions with invocations to usec_add(), so that USEC_INFINITY is properly propagated,
and sd-event considers it has indication for turning off the event source.
This also alters the deserialization of the units to restart timeouts from the time they were originally started from.
Before this patch timeouts would be restarted beginning with the time of the deserialization, which could lead to
artificially prolonged timeouts if a daemon reload took place.
Finally, a new RuntimeMaxSec= setting is introduced for service units, that specifies a maximum runtime after which a
specific service is forcibly terminated. This is useful to put time limits on time-intensive processing jobs.
This also simplifies the various xyz_spawn() calls of the various types in that explicit distruction of the timers is
removed, as that is done anyway by the state change handlers, and a state change is always done when the xyz_spawn()
calls fail.
Fixes: #2249
Allow for overriding all other machine-ids which may be present on
the system using a kernel command line systemd.machine_id or
--machine-id= option.
This is especially useful for network booted systems where the
machine-id needs to be static, or for containers where a specific
machine-id is wanted.
Change the capability bounding set parser and logic so that the bounding
set is kept as a positive set internally. This means that the set
reflects those capabilities that we want to keep instead of drop.
GLIB has recently started to officially support the gcc cleanup
attribute in its public API, hence let's do the same for our APIs.
With this patch we'll define an xyz_unrefp() call for each public
xyz_unref() call, to make it easy to use inside a
__attribute__((cleanup())) expression. Then, all code is ported over to
make use of this.
The new calls are also documented in the man pages, with examples how to
use them (well, I only added docs where the _unref() call itself already
had docs, and the examples, only cover sd_bus_unrefp() and
sd_event_unrefp()).
This also renames sd_lldp_free() to sd_lldp_unref(), since that's how we
tend to call our destructors these days.
Note that this defines no public macro that wraps gcc's attribute and
makes it easier to use. While I think it's our duty in the library to
make our stuff easy to use, I figure it's not our duty to make gcc's own
features easy to use on its own. Most likely, client code which wants to
make use of this should define its own:
#define _cleanup_(function) __attribute__((cleanup(function)))
Or similar, to make the gcc feature easier to use.
Making this logic public has the benefit that we can remove three header
files whose only purpose was to define these functions internally.
See #2008.
Now that we don't have RequiresOverridable= and RequisiteOverridable=
dependencies anymore, we can get rid of tracking the "override" boolean
for jobs in the job engine, as it serves no purpose anymore.
While we are at it, fix some error messages we print when invoking
functions that take the override parameter.
The macro is generically useful for putting together search paths, hence
let's make it truly generic, by dropping the implicit ".d" appending it
does, and leave that to the caller. Also rename it from
CONF_DIRS_NULSTR() to CONF_PATHS_NULSTR(), since it's not strictly about
dirs that way, but any kind of file system path.
Also, mark CONF_DIR_SPLIT_USR() as internal macro by renaming it to
_CONF_PATHS_SPLIT_USR() so that the leading underscore indicates that
it's internal.
Let's make things more user-friendly and support for example
LimitAS=16G
rather than force users to always use LimitAS=16106127360.
The change is relevant for options:
[Default]Limit{FSIZE,DATA,STACK,CORE,RSS,AS,MEMLOCK,MSGQUEUE}
The patch introduces config_parse_bytes_limit(), it's the same as
config_parse_limit() but uses parse_size() tu support the suffixes.
Addresses: https://github.com/systemd/systemd/issues/1772
util-linux 2.27.1 now entirely stops looking at /etc/mtab, so we don't need to
verify /etc/mtab during early boot any more. Later on, tmpfiles.d/etc.conf will
fix /etc/mtab anyway, so there's not even a point in warning about it.
Drop test_mtab() and bump the util-linux dependency to >= 2.17.1.
Fixes#1495
The files are named too generically, so that they might conflict with
the upstream project headers. Hence, let's add a "-util" suffix, to
clarify that this are just our utility headers and not any official
upstream headers.
Instead of freezing in PID1 and letting the forked child freeze or
reboot when exec("/bin/sh") fails, just wait for the child's
exit and then do the freeze_or_reboot in PID1 as usual.
This means that when both crash_shell and crash_reboot are enabled, the
system will reboot after the shell exits.
This adds support for caching harddisk passwords in the kernel keyring
if it is available, thus supporting caching without Plymouth being
around.
This is also useful for hooking up "gdm-auto-login" with the collected
boot-time harddisk password, in order to support gnome keyring
passphrase unlocking via the HDD password, if it is the same.
Any passwords added to the kernel keyring this way have a timeout of
2.5min at which time they are purged from the kernel.
- Rely everywhere that we use abs() on the error code passed in anyway,
thus don't need to explicitly negate what we pass in
- Never attach synthetic error number information to log messages. Only
log about errors we *receive* with the error number we got there,
don't log any synthetic error, that don#t even propagate, but just eat
up.
- Be more careful with attaching exactly the error we get, instead of
errno or unrelated errors randomly.
- Fix one occasion where the error number and line number got swapped.
- Make sure we never tape over OOM issues, or inability to resolve
specifiers
This introduces a new systemd.crash_reboot=1 kernel command line option
that triggers a reboot after crashing.
This also cleans up crash VT handling. Specifically, it cleans up the
configuration setting, to be between 1..63 or a boolean. This is to
replace the previous logic where "-1" meant disabled. We continue to
accept that setting, but only document the boolean syntax instead.
This also brings the documentation of the default settings in sync with
what actually happens.
The CrashChVT= configuration file setting is renamed to CrashChangeVT=,
following our usual logic of not abbreviating unnecessarily. The old
setting stays support for compat reasons.
Fixes#1300
This also allows us to drop build.h from a ton of files, hence do so.
Since we touched the #includes of those files, let's order them properly
according to CODING_STYLE.
Add (void) casting for a couple of functions where we knowingly ignore
the returning error code.
Use EXIT_FAILURE where appropriate.
Try to initialize structures at declaration time, or at once.
Use the new code in config_parse_cpu_affinity2.
Tested by modifying CPUAffinity=... setting in /etc/systemd/system.conf
and reloading the daemon, then checking ^Cpus_allowed in /proc/1/status
to confirm the correct CPU mask is in place.
Shutting down a user session currently fails with:
Sep 22 22:35:38 david-t2 systemd[640]: Reached target Shutdown.
Sep 22 22:35:38 david-t2 systemd[640]: Starting Exit the Session...
Sep 22 22:35:38 david-t2 systemd[640]: Received SIGRTMIN+24 from PID 659 (kill).
Sep 22 22:35:38 david-t2 systemd[640]: Shutting down.
Sep 22 22:35:38 david-t2 systemd[640]: Not executed by init (PID 1).
Sep 22 22:35:38 david-t2 systemd[640]: Critical error while doing system shutdown: Operation not permitted
This is a regression from:
commit 287419c119
Author: Alban Crequy <alban.crequy@gmail.com>
Date: Fri Sep 18 13:37:34 2015 +0200
containers: systemd exits with non-zero code
Make sure we never ever execute systemd-shutdown from within a
user-manager. Restore the previous behavior by partially reverting given
commit.
Let's underline the header line of the table shown by cgtop, how it is
customary for tables. In order to do this, let's introduce new ANSI
underline macros, and clean up the existing ones as side effect.
When a systemd service running in a container exits with a non-zero
code, it can be useful to terminate the container immediately and get
the exit code back to the host, when systemd-nspawn returns. This was
not possible to do. This patch adds the following to make it possible:
- Add a read-only "ExitCode" property on PID 1's "Manager" bus object.
By default, it is 0 so the behaviour stays the same as previously.
- Add a method "SetExitCode" on the same object. The method fails when
called on baremetal: it is only allowed in containers or in user
session.
- Add support in systemctl to call "systemctl exit 42". It reuses the
existing code for user session.
- Add exit.target and systemd-exit.service to the system instance.
- Change main() to actually call systemd-shutdown to exit() with the
correct value.
- Add verb 'exit' in systemd-shutdown with parameter --exit-code
- Update systemctl manpage.
I used the following to test it:
| $ sudo rkt --debug --insecure-skip-verify run \
| --mds-register=false --local docker://busybox \
| --exec=/bin/chroot -- /proc/1/root \
| systemctl --force exit 42
| ...
| Container rkt-895a0cba-5c66-4fa5-831c-e3f8ddc5810d failed with error code 42.
| $ echo $?
| 42
Fixes https://github.com/systemd/systemd/issues/1290
This adds support for the new "pids" cgroup controller of 4.3 kernels.
It allows accounting the number of tasks in a cgroup and enforcing
limits on it.
This adds two new setting TasksAccounting= and TasksMax= to each unit,
as well as a gloabl option DefaultTasksAccounting=.
This also updated "cgtop" to optionally make use of the new
kernel-provided accounting.
systemctl has been updated to show the number of tasks for each service
if it is available.
This patch also adds correct support for undoing memory limits for units
using a MemoryLimit=infinity syntax. We do the same for TasksMax= now
and hence keep things in sync here.
The mount monitor that was added to libmount v2.27 requires /etc/mtab to be
non-existant. As systemd now uses that functionality, we cannot monitor any
mounts anymore, and hence not support .mount units.
Systems that have /etc/mtab around as regular file are unsupported by
systemd since a long time.
This patch makes that condition fatal, so we do not boot up with
non-working mount monitor support.
Introduce a proper enum, and don't pass around string ids anymore. This
simplifies things quite a bit, and makes virtualization detection more
similar to architecture detection.