This patch renames Read{Write,Only}Directories= and InaccessibleDirectories=
to Read{Write,Only}Paths= and InaccessiblePaths=, previous names are kept
as aliases but they are not advertised in the documentation.
Renamed variables:
`read_write_dirs` --> `read_write_paths`
`read_only_dirs` --> `read_only_paths`
`inaccessible_dirs` --> `inaccessible_paths`
Despite the name, `Read{Write,Only}Directories=` already allows for
regular file paths to be masked. This commit adds the same behavior
to `InaccessibleDirectories=` and makes it explicit in the doc.
This patch introduces `/run/systemd/inaccessible/{reg,dir,chr,blk,fifo,sock}`
{dile,device}nodes and mounts on the appropriate one the paths specified
in `InacessibleDirectories=`.
Based on Luca's patch from https://github.com/systemd/systemd/pull/3327
With this change, binary record data is formatted as string if --all is
specified when using json output. This is inline with the effect of --all on
the other available output modes.
Fixes: #3416
When converting log messages from human readable text into binary records to
send off to journald in sd_journal_print(), strip trailing whitespace in the
log message. This way, handling of logs made via syslog(), stdout/stderr and
sd_journal_print() are treated the same way: trailing (but not leading)
whitespace is automatically removed, in particular \n and \r. Note that in case
of syslog() and stdout/stderr based logging the stripping takes place
server-side though, while for the native protocol based transport this takes
place client-side. This is because in the former cases conversion from
free-form human-readable strings into structured, binary log records takes
place on the server-side while for journal-native logging it happens on the
client side, and after conversion into binary records we probably shouldn't
alter the data anymore.
See: #3416
WaitForKeyEx may never return on some UEFI systems depending
on firmware, hardware configuration and the phase of the moon.
Use ConIn->WaitForKey unconditionally instead.
Fixes#3632
Such mkdir errors happen for example when trying to mkdir /sys/fs/selinux.
/sys is documented to be readonly in the container, so mkdir errors below /sys
can be expected.
They shouldn't be logged as warnings since they lead users to think that
there is something wrong.
strv_make_nulstr was creating a nulstr which was not a valid nulstr,
because it was missing the terminating NUL. This didn't cause any issues,
because strv_parse_nulstr correctly parsed the result, using the
separately specified length.
But it's confusing to have something called nulstr which really isn't.
It is likely that somebody will try to use strv_make_nulstr() in
some other place, incorrectly.
This patch changes strv_parse_nulstr() to produce a valid nulstr, and
changes the output length parameter to be the minimum number of bytes
which can be later on parsed by strv_parse_nulstr(). This allows the
only user in ask-password-api to be slightly simplified.
Based-on-patch-by: Jean-Sébastien Bour <jean-sebastien@bour.name>
Fixes#3689.
During stop when service has one "regular" pid one main pid and one
control pid and the sighld for the regular one is processed first the
unit_tidy_watch_pids will skip the main and control pid and does not
remove them from u->pids(). But then we skip the sigchld event because we
already did one in the iteration and there are two pids in u->pids.
v2: Use general unit_main_pid() and unit_control_pid() instead of
reaching directly to service structure.
Callers of the 'udev monitor' tool expect to see output when
an event occurs. The stdio buffering defeats that. This patch
switches it to line buffering.
Do not allocate objects of dynamic and potentially large size on the stack
to avoid both clang compilation errors and unpredictable runtime behavior
on exotic platforms. Use the heap for that instead.
While at it, refactor the code a bit. Access 's->domain' via
NDISC_DNSSL_DOMAIN(), and refrain from allocating 'x' independently, but
rather reuse 's' if we're dealing with a new entry to the set.
Fixes#3717
There's really no reason to use 10s here, let's instead default to 90s like we
do for everything else.
The SIGKILL during the final killing spree is in most regards the fourth level
of a safety net, after all: any normal service should have already been stopped
during the normal service shutdown logic, first via SIGTERM and then SIGKILL,
and then also via SIGTERM during the finall killing spree before we send
SIGKILL. And as a fourth level safety net it should only be required in
exceptional cases, which means it's safe to rais the default timeout, as normal
shutdowns should never be delayed by it.
Note that journald excludes itself from the normal service shutdown, and relies
on the final killing spree to terminate it (this is because it wants to cover
the normal shutdown phase's complete logging). If the system's IO is
excessively slow, then the 10s might not be enough for journald to sync
everything to disk and logs might get lost during shutdown.
seccomp_syscall_resolve_name() can return a mix of positive and negative
(pseudo-) syscall numbers, while errors are signaled via __NR_SCMP_ERROR.
This commit lets the syscall filter parser only abort on real parsing
failures, letting libseccomp handle pseudo-syscall number on its own
and allowing proper multiplexed syscalls filtering.
The unit load queue can be processed in the middle of setting the
unit's properties, so its load_state would no longer be UNIT_STUB
for the check in bus_unit_set_properties(), which would cause it to
incorrectly return an error.
Commit da4d897e ("core: add cgroup memory controller support on the unified
hierarchy (#3315)") changed the code in src/core/cgroup.c to always write
the real numeric value from the cgroup parameters to the
"memory.limit_in_bytes" attribute file.
For parameters set to CGROUP_LIMIT_MAX, this results in the string
"18446744073709551615" being written into that file, which is UINT64_MAX.
Before that commit, CGROUP_LIMIT_MAX was special-cased to the string "-1".
This causes a regression on CentOS 7, which is based on kernel 3.10, as the
value is interpreted as *signed* 64 bit, and clamped to 0:
[root@n54 ~]# echo 18446744073709551615 >/sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes
[root@n54 ~]# cat /sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes
0
[root@n54 ~]# echo -1 >/sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes
[root@n54 ~]# cat /sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes
9223372036854775807
Hence, all units that are subject to the limits enforced by the memory
controller will crash immediately, even though they have no actual limit
set. This happens to for the user.slice, for instance:
[ 453.577153] Hardware name: SeaMicro SM15000-64-CC-AA-1Ox1/AMD Server CRB, BIOS Estoc.3.72.19.0018 08/19/2014
[ 453.587024] ffff880810c56780 00000000aae9501f ffff880813d7fcd0 ffffffff816360fc
[ 453.594544] ffff880813d7fd60 ffffffff8163109c ffff88080ffc5000 ffff880813d7fd28
[ 453.602120] ffffffff00000202 fffeefff00000000 0000000000000001 ffff880810c56c03
[ 453.609680] Call Trace:
[ 453.612156] [<ffffffff816360fc>] dump_stack+0x19/0x1b
[ 453.617324] [<ffffffff8163109c>] dump_header+0x8e/0x214
[ 453.622671] [<ffffffff8116d20e>] oom_kill_process+0x24e/0x3b0
[ 453.628559] [<ffffffff81088dae>] ? has_capability_noaudit+0x1e/0x30
[ 453.634969] [<ffffffff811d4155>] mem_cgroup_oom_synchronize+0x575/0x5a0
[ 453.641721] [<ffffffff811d3520>] ? mem_cgroup_charge_common+0xc0/0xc0
[ 453.648299] [<ffffffff8116da84>] pagefault_out_of_memory+0x14/0x90
[ 453.654621] [<ffffffff8162f4cc>] mm_fault_error+0x68/0x12b
[ 453.660233] [<ffffffff81642012>] __do_page_fault+0x3e2/0x450
[ 453.666017] [<ffffffff816420a3>] do_page_fault+0x23/0x80
[ 453.671467] [<ffffffff8163e308>] page_fault+0x28/0x30
[ 453.676656] Task in /user.slice/user-0.slice/user@0.service killed as a result of limit of /user.slice/user-0.slice/user@0.service
[ 453.688477] memory: usage 0kB, limit 0kB, failcnt 7
[ 453.693391] memory+swap: usage 0kB, limit 9007199254740991kB, failcnt 0
[ 453.700039] kmem: usage 0kB, limit 9007199254740991kB, failcnt 0
[ 453.706076] Memory cgroup stats for /user.slice/user-0.slice/user@0.service: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[ 453.725702] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[ 453.733614] [ 2837] 0 2837 11950 899 23 0 0 (systemd)
[ 453.741919] Memory cgroup out of memory: Kill process 2837 ((systemd)) score 1 or sacrifice child
[ 453.750831] Killed process 2837 ((systemd)) total-vm:47800kB, anon-rss:3188kB, file-rss:408kB
Fix this issue by special-casing the UINT64_MAX case again.
By cleaning up before setting up PAM we maintain control of overriding
behavior in setting variables. Otherwise, pam_putenv is in control.
This also makes sure we use a cleaned up environment in replacing
variables in argv.
Commit d054f0a4 ("tree-wide: use xsprintf() where applicable") used a
semantic patch approach to change a number of locations from
snprintf(buf, sizeof(buf), FMT, ...)
to
xsprintf(buf, FMT, ...)
The problem is that xsprintf() wraps the snprintf() in an
assert_message_se(), so if snprintf() reports an overflow of the
destination buffer, the binary will now terminate.
This hit a user running a version of systemd that was built from a
deeply nested system path.
Fix this by
a) Switching back to snprintf() for this particular case. We should really
rather truncate the location string than crash in such situations.
b) Increasing the size of that static string buffer, to make the event more
unlikely.
==1447== 4 bytes in 1 blocks are definitely lost in loss record 1 of 1
==1447== at 0x4C2BBAD: malloc (vg_replace_malloc.c:299)
==1447== by 0x5350F19: strdup (in /usr/lib64/libc-2.23.so)
==1447== by 0x4E9D435: strv_new_ap (strv.c:166)
==1447== by 0x4E9D5FA: strv_new (strv.c:199)
==1447== by 0x10E665: test_strv_fnmatch (test-strv.c:693)
==1447== by 0x10EAD5: main (test-strv.c:763)
==1447==
There are some places in the systemd which are use the same pattern:
fd_cloexec(STDIN_FILENO, false);
fd_cloexec(STDOUT_FILENO, false);
fd_cloexec(STDERR_FILENO, false);
to unset CLOEXEC for standard file descriptors. This patch introduces
the stdio_unset_cloexec() function to hide this and make code cleaner.