TRUNCATE_FILE is now handled by a new dedicated function
truncate_file(). Indeed we have to take special care when truncating existing
file since the behavior is only specified for regular files.
Well that's not entirely true for fifo and terminal devices since O_TRUNC is
ignored in this case but even in for these types of file, truncating is
probably not the right thing to do.
It is worth noting that both truncate_file() and create_file() have been
modified so they use fstat(2) instead of stat(2) since both functions are not
supposed to follow symlinks.
write_one_file() only deals with the 'w' command and 'f'/'F' are now handled by
a new function create_file().
This is primarly done because 'w' is allowed to operate on any kind of files,
not just regular ones.
This a slight simplification since all callers of item_do()
(glob_item_recursively() and item_do() itself) stat the file descriptor only
for passing it to item_do().
When a nested struct is initialized by structured initializer, then
padding space is not cleared by zero. So, before setting values,
this makes explicitly set zero including padding.
This fixes the following false positive warning by valgrind:
```
==492== Syscall param sendmsg(msg.msg_iov[0]) points to uninitialised byte(s)
==492== at 0x56D0CF7: sendmsg (in /usr/lib64/libpthread-2.27.so)
==492== by 0x4FDD3C5: sd_resolve_getaddrinfo (sd-resolve.c:975)
==492== by 0x110B9E: manager_connect (timesyncd-manager.c:879)
==492== by 0x10B729: main (timesyncd.c:165)
==492== Address 0x1fff0008f1 is on thread 1's stack
==492== in frame #1, created by sd_resolve_getaddrinfo (sd-resolve.c:928)
==492==
```
To decreae latency this add support for TFO and TLS Session Tickets. As OpenSSL wouldn't let you easily set a different function all written data is temporarily cached and therefore needs to be flushed after each SSL function which can write data.
This provides basic OpenSSL support without optimizations like TCP Fast Open and TLS Session Tickets.
Notice only a single SSL library can be enabled at a time and therefore journald functions provided by GnuTLS will be disabled when using OpenSSL.
Fixes#9531
During handshake and TLS session closing, messages needs to be exchanged. Therefore this patch overrides the requested IO events for the TCP stream when the TLS is waiting for sending or receiving of messages during theses periods. This fixes issues with correctly closing the TLS stream and prevents the handshake from hanging in rare cases (not seen yet).
This makes hibernation unavailable if the kernel image we are currently
running was removed. This is supposed to be superficial protection
against hibernating a system we can never return from because the kernel
has been updated and the kernel we currently run is not available
anymore.
We look at a couple of places for the kernel, which should cover all
distributions I know off. Should I have missed a path I am sure people
will quickly notice and we can add more places to check. (or maybe
convince those distros to stick their kernels at a standard place)
On a host with sufficiently large zram but with no actual swap, logind will
respond to CanHibernate() with yes. With this patch, it will correctly respond
no, unless there are other swap devices to consider.
When DynamicUser=yes and static User= are set, and the user has
different uid and gid, then as the storage socket for the dynamic
user does not contains gid, we need to obtain gid.
Follow-up for 9ec655cbbd.
Fixes#9702.
Before this, the property changed signal is emitted immediately after
StartUnit/StopUnit method is called. So, the running state of the NTP
client service may not updated.
This makes the timing of emitting property changed signal is deferred
until job of starting/stopping NTP client service is completed.
Fixes#9672.
The commit 5d280742b6 introduces a
barrier to suppress calling context_update_ntp_status() multiple times.
However, it just stores the address of sd_bus_message object. So,
when an address is reused on the subsequent message, then the status
of NTP clients are not updated.
This makes the stored message object is referenced by the context
object. So, the subsequent message is on cirtainly different address.
Users are often surprised that "systemd-run" command lines like
"systemd-run -p User=idontexist /bin/true" will return successfully,
even though the logs show that the process couldn't be invoked, as the
user "idontexist" doesn't exist. This is because Type=simple will only
wait until fork() succeeded before returning start-up success.
This patch adds a new service type Type=exec, which is very similar to
Type=simple, but waits until the child process completed the execve()
before returning success. It uses a pipe that has O_CLOEXEC set for this
logic, so that the kernel automatically sends POLLHUP on it when the
execve() succeeded but leaves the pipe open if not. This means PID 1
waits exactly until the execve() succeeded in the child, and not longer
and not shorter, which is the desired functionality.
Making use of this new functionality, the command line
"systemd-run -p User=idontexist -p Type=exec /bin/true" will now fail,
as expected.
When process fd lists to pass to activated programs we always place the
socket activation fds first, and the storage fds last. Irritatingly in
almost all calls the "n_storage_fds" parameter (i.e. the number of
storage fds to pass) came first so far, and the "n_socket_fds" parameter
second. Let's clean this up, and specify the number of fds in the order
the fds themselves are passed.
(Also, let's fix one more case where "unsigned" was used to size an
array, while we should use "size_t" instead.)
machined exposes the pseudo-container ".host" as a reference to the host
system, and this means "machinectl login .host" and "machinectl shell
.host" get your a login/shell on the host. systemd-run currently doesn't
allow that. Let's fix that, and make sd-bus understand ".host" as an
alias for connecting to the host system.
We so far had various placed we'd parse percentages with
parse_percent(). Let's make them use parse_permille() instead, which is
downward compatible (as it also parses percent values), and increases
the granularity a bit. Given that on the wire we usually normalize
relative specifications to something like UINT32_MAX anyway changing
from base-100 to base-1000 calculations can be done easily without
breaking compat.
This commit doesn't document this change in the man pages. While
allowing more precise specifcations permille is not as commonly
understood as perent I guess, hence let's keep this out of the docs for
now.
If 'v' is negative, it's wrong to add the decimal to it, as we'd
actually need to subtract it in this case. But given that we don't want
to allow negative vaues anyway, simply check earlier whether what we
have parsed so far was negative, and react to that before adding the
decimal to it.
We likely get the data from the env block, but we might also determine
it from elsewhere (such as PAM module parameters). Let's set the env
vars on the env block explicitly, so that they are available always, and
apps can rely on it.
Let's make this symmetric with XDG_SESSION_CLASS and XDG_SESSION_TYPE,
so that PAM stacks can configure this easily without involving env vars,
in case there are PAM session managers which only support a single
desktop anyway.
Since D-Bus 1.9.14 (2015-03-02) dbus looks in $XDG_RUNTIME_DIR/bus for
the system bus on its own, hence we can finally drop setting this
environment variable. gdbus since glib 2.45.3 (June 2015) also supports
it.
When networkd has not connected and setting hostname/timezone is
requested, the operation is delayed, not canceled. So, logging in
debug level is sufficient for the corresponding log message.
Closes#9699.
This adds -Dnss-resolve= and -Dnss-mymachines= meson options.
By using this option, e.g., resolved can be built without nss-resolve.
When no nss modules are built, then test-nss is neither built.
Also, This changes the option name -Dmyhostname= to -Dnss-myhostname=
for consistency to other nss related options.
Closes#9596.
Usecase is to allow changing the final kill from SIGKILL to SIGQUIT which
should create a core dump useful for debugging why the service didn't stop
with the SIGTERM
We often open the parent directory of a path. Let's add a common helper
for that, that shortens our code a bit and adds some extra safety
checks, for example it will fail if used on the root directory (which
doesn't really have a parent).
The helper is actually generalized from a function in btrfs-util.[ch]
which already existed for this purpose.
the service manager serializes ExecStop= execution data after
ExecStart=, like it makes sense and how it should be expected. However,
systemctl previously would reverse them when deserializing them locally,
and thus show ExecStop= results before ExecStart= results. And that's
confusing. Let's fix that.
Whenever a unit is started fresh we should flush out any runtime data
from the previous cycle. We are pretty good at that already, but what so
far we missed was the ExecStart=/ExecStop=/… command exit status data.
Let's fix that, and properly flush out that stuff too.
Consider this service:
[Service]
ExecStart=/bin/sleep infinity
ExecStop=/bin/false
When this service is started, then stopped and then started again
"systemctl status" would show the ExecStop= results of the previous run
along with the ExecStart= results of the current one, which is very
confusing. With this patch this is corrected: the data is kept right
until the moment the new service cycle starts, and then flushed out.
Hence "systemctl status" in that case will only show the ExecStart=
data, but no ExecStop= data, like it should be.
This should fix part of the confusion of #9588
We always initialize it from the same field in ExecCommand anyway, hence
there's no point in passing it separately to exec_spawn(), after all we
already pass the ExecCommand structure itself anyway.
No change in behaviour.
That call to mount was added as a safeguard against a kernel bug which was fixed in
torvalds/linux@bbd5192.
In principle, the error could be ignored because
* normally everything mounted on /proc/PID should disappear as soon as the PID has gone away
* test-mount-util that had been confused by those phantom entries in /proc/self/mountinfo was
taught to ignore them in 112cc3b.
On the other hand, in practice, if the mount fails, then the next one is extremely unlikely to
succeed, so it seems to be reasonable to just skip the rest of `test_get_process_cmdline_harder`
if that happens.
Closes https://github.com/systemd/systemd/issues/9649.
Currently to set the flag to reboot into the firmware setup an
authentication by an administrative user is required. Since we are
already enabling active users to reboot the system, it is advisable to
let the user decide if he wants to boot into the firmware setup without
any more hassle.
Currently, mount_sysfs() only creates /sys/fs/cgroup if cg_ns_supported().
The comment explains that we need to "Create mountpoint for
cgroups. Otherwise we are not allowed since we remount /sys read-only.";
that is: that we need to do it now, rather than later. However, the
comment doesn't do anything to explain why we only need to do this if
cg_ns_supported(); shouldn't we _always_ need to do it?
The answer is that if !use_cgns, then this was already done by the outer
child, so mount_sysfs() only needs to do it if use_cgns. Now,
mount_sysfs() doesn't know whether use_cgns, but !cg_ns_supported() implies
!use_cgns, so we can optimize" the case where we _know_ !use_cgns, and deal
with a no-op mkdir_p() in the false-positive where cgns_supported() but
!use_cgns.
But is it really much of an optimization? We're potentially spending an
access(2) (cg_ns_supported() could be cached from a previous call) to
potentially save an lstat(2) and mkdir(2); and all of them are on virtual
fileystems, so they should all be pretty cheap.
So, simplify and drop the conditional. It's a dubious optimization that
requires more text to explain than it's worth.
Remove "arbitrary named hierarchies" from the list of things that
cg_kernel_controllers() might return, and clarify that "name="
pseudo-controllers are not included in the returned list.
/proc/cgroups does not contain "name=" pseudo-controllers, and
cg_kernel_controllers() makes no effort to enumerate them via a different
mechanism.
One of the things that tmpfs_patch_options does is take an (optional) UID,
and insert "uid=${UID},gid=${UID}" into the options string. So we need a
uid_t argument, and a way of telling if we should use it. Fortunately,
that is built in to the uid_t type by having UID_INVALID as a possible
value.
So this is really a feature that requires one argument. Yet, it is somehow
taking 4! That is absurd. Simplify it to only take one argument, and have
that trickle all the way up to mount_all()'s usage.
Now, in may of the uses, the argument becomes
uid_shift == 0 ? UID_INVALID : uid_shift
because it used to treat uid_shift=0 as invalid unless the patch_ids flag
was also set. This keeps the behavior the same. Note that in all cases
where it is invoked, if !use_userns (sometimes called !userns), then
uid_shift is 0; we don't have to add any checks for that.
That said, I'm pretty sure that "uid=0" and not setting "uid=" are the
same, but Christian Brauner seemed to not think so when implementing the
cgns support. https://github.com/systemd/systemd/pull/3589
One of the things that mkdir_userns{,_p}() does is take an (optional) UID,
and chown the directory to that. So we need a uid_t argument, and a way of
telling if we should use that uid_t argument. Fortunately, that is built
in to the uid_t type by having UID_INVALID as a possible value.
However, currently mkdir_userns() also takes a MountSettingsMask and checks
a couple of bits in it to decide if it should perform the chown.
Drop the mask argument, and instead have the caller pass UID_INVALID if it
shouldn't chown.
When we open our own little namespace for running our tests in, let's
turn off mount propagation only one way, rather than both ways. This is
better as this means we don't pin host mounts unnecessarily long in our
namespace, even though the host already got rid of them. This is because
MS_SLAVE in contrast to MS_PRIVATE allows umount events to propagate
from the host into our environment.
Looking at a recent Bad Day, my log contains over 100 lines of
systemd[23895]: Failed to connect to API bus: Connection refused
It is due to "systemd --user" retrying to connect to an API bus.[*] I
would prefer to avoid spamming the logs. I don't think it is good for us
to retry so much like this.
systemd was mislead by something setting DBUS_SESSION_BUS_ADDRESS. My best
guess is an unfortunate series of events caused gdm to set this. gdm has
code to start a session dbus if there is not a bus available already (and
in this case it exports the environment variable). I believe it does not
normally do this when running under systemd, because "systemd --user" and
hence "dbus.service" would already have been started by pam_systemd.
I see two possibilities
1. Rip out the check for DBUS_SESSION_BUS_ADDRESS entirely.
2. Only check for DBUS_SESSION_BUS_ADDRESS on startup. Not in the
"recheck" logic.
The justification for 2), is that the recheck is called from unit_notify(),
this is used to check whether the service just started (or stopped) was
"dbus.service". This reason for rechecking does not apply if we think
the session bus was started outside our logic.
But I think we can justify 1). dbus-daemon ships a statically-enabled
/usr/lib/systemd/user/dbus.service, which would conflict with an attempt to
use an external dbus. Also "systemd --user" is started from user@.service;
if you try to start it manually so that it inherits an environment
variable, it will conflict if user@.service was started by pam_systemd
(or loginctl enable-linger).
This allows aliases to be used for the basic modules we load from pid1 before
udev is started. In #9501 the kernel renamed autofs4 to autofs, with "autofs4"
as alias, but we wouldn't load the module, because we didn't follow aliases.
The kernel change was reverted, but it's probably better to support aliases.
These custom macros make the expression go through a function, in order
to prevent ASSERT_SIDE_EFFECT false positives on our macros such as
assert_se() and assert_return() that cannot be disabled and will always
evaluate their expressions.
This technique has been described and recommended in:
https://community.synopsys.com/s/question/0D534000046Yuzb/suppressing-assertsideeffect-for-functions-that-allow-for-sideeffects
Tested by doing a local cov-build and uploading the resulting tarball to
scan.coverity.com, confirmed that the ASSERT_SIDE_EFFECT false positives
were gone.
This makes bus_slot_disconnect() unref the slot object from bus when
`unref == true` and it is floating, as the function removes the
reference from the relevant bus object.
This reverts 20d4ee2cbc, as it
introduces #9604.
Fixes#9604.
key_serial_t is defined in keyutil.h, which wasn't included in the header list
in the test, so the test always failed. We were always compiling stuff with
!HAVE_KEY_SERIAL_T.
We could try to add keyutil.h to the test, but then we'd have to first check if
it is available, which just doesn't seem worth the trouble.
key_serial_t should always be defined as int32_t. Let's keep the uncoditional
define, since repeated compatible typedefs are not a problem, and it allows us
to compile even if the header file is missing. If there's ever a change in the
definition, we'll have to adjust the code for the different type anyway, and
our compiler will tell us.
Using _GNU_SOURCE is better because that's how we include the headers in the
actual build, and some headers define different stuff when it is defined.
sys/stat.h for example defines 'struct statx' conditionally.
The switch to memory_startswith() changed the logic to only look for a space or
NUL byte after the matched word, but matching the full size should also be
acceptable.
This changed the behavior of parsing of "AUTH\r\n", where m will be set to 4,
since even though the word will match, the check for it being followed by ' '
or NUL will make line_begins() return false.
Tested:
- Using netcat to connect to the private socket directly:
$ echo -ne '\0AUTH\r\n' | sudo nc -U /run/systemd/private
REJECTED EXTERNAL ANONYMOUS
- Running the Ignition blackbox test:
$ sudo sh -c 'PATH=$PWD/bin/amd64:$PATH ./tests.test'
PASS
Fixes: d27b725abf