Macro returns -1, 0, 1 depending on whether a < b, a == b or a > b.
It's safe to use on unsigned types.
Add tests to confirm corner cases are properly covered.
Drop __extension__, since we don't use gcc -Wpedantic or -ansi.
Reformat code for spacing. Add spaces after commas almost everywhere.
Reindent code blocks in macro definitions, for consistency.
--machine has been missing for a while in systemd-stdio-bridge
this syntax can be switched to be more standard.
v2: Support the old syntax too.
timedatectl -H server1.myhostingcompany.com:5555/container1
Closes: #8071
Previously, we'd only ever read 512 byte from the random seed file,
under the assumption we won't need more. With this change we'll read the
full file, even if it is larger.
The idea behind htis change is that people can dump additional data into the
random seed file offline if they like, and it can be low quality, and
we'll seed the pool with it anyway. Moreover, if people are paranoid and
want us to save/restore a bigger seed, it's easy to do: just truncate
the file to the right size and we'll save/restore as much in the future.
This also reworks the file a bit, introducing two clear if blocks that
load and that save the random seed, and that each are conditionalized
more carefully.
On target boards without RTC, `t->kernel_time` is 0 or 1 usec.
`systemd-analyze` reads this value over D-Bus from
`org.freedesktop.systemd1.Manager`, property `KernelTimestamp`.
The issue is: if `t->kernel_time` is 0, `systemd-analyze` does not print
the kernel time:
~~~~
$ systemd-analyze
Startup finished in 1.860s (userspace) = 5.957s
~~~~
This commit fixes the misbehaviour:
~~~~
$ systemd-analyze
Startup finished in 3.866s (kernel) + 2.015s (userspace) = 5.881s
~~~~
Fixes#7721.
v2: fixes one more condition (by Yu Watanabe <watanabe.yu+github@gmail.com>)
v3: fixes one more condition (by Kirill Marinushkin <kmarinushkin@de.adit-jv.com>)
RootImage= may require the following settings
```
DeviceAllow=/dev/loop-control rw
DeviceAllow=block-loop rwm
DeviceAllow=block-blkext rwm
```
This adds the following settings implicitly when RootImage= is
specified.
Fixes#9737.
When clients don't follow protocol and use the same object from
different threads, then we previously would silently corrupt memory.
With this assert we'll fail with an assert(). This doesn't fix anything
but certainly makes mis-uses easier to detect and debug.
Triggered by https://bugzilla.redhat.com/show_bug.cgi?id=1609349
As the comments already say it might be quite likely that
$XDG_RUNTIME_DIR is not set up as mount, and we shouldn't complain about
that.
Moreover, let's make this idempotent, so that a runtime dir that is
already gone and is removed again doesn't cause failure.
Test it when sending an FD without any contents, or an FD and some contents,
or only contents and no FD (using a bare send().)
Also fix the previous test which forked but was missing an _exit() at the
end of the child execution code.
These take a struct iovec to send data together with the passed FD.
The receive function returns the FD through an output argument. In case data is
received, but no FD is passed, the receive function will set the output
argument to -1 explicitly.
Update code in dynamic-user to use the new helpers.
We would verify destination e.g. in sd_bus_message_new_call, but allow setting
any value later on with sd_bus_message_set_destination. I assume this check was
omitted not on purpose.
The choice what errors to ignore is left to the caller, and the caller is
changed to ignore all errors.
On error, previously read data is kept. So if e.g. an oom error happens, we
will continue to return slightly stale data instead of pretending we have no
entries for the given address. I think that's better, for example when
/etc/hosts contains some important overrides that external DNS should not be
queried for.
We'd store every 0.0.0.0 and ::0 entry as a structure without any addresses
allocated. This is a somewhat common use case, let's optimize it a bit.
This gives some memory savings and a bit faster response time too:
'time build/test-resolved-etc-hosts hosts' goes from 7.7s to 5.6s, and
memory use as reported by valgrind for ~10000 hosts is reduced
==18097== total heap usage: 29,902 allocs, 29,902 frees, 2,136,437 bytes allocated
==18240== total heap usage: 19,955 allocs, 19,955 frees, 1,556,021 bytes allocated
Also rename 'suppress' to 'found' (with reverse meaning). I think this makes
the intent clearer.
This hides the details of juggling the two hashmaps from the callers a bit.
It also makes memory management a bit easier, because those two hashmaps share
some strings, so we can only free them together.
etc_hosts_parse() is made responsible to free the half-filled data structures
on error, which makes the caller a bit simpler.
No functional change. A refactoring to prepare for later changes.
- drop compatibility with autotools (/.libs/ directory)
- don't special-case "libnss_dns", just try build/libnss_foo.so.2 and libnss_foo.so.2.
This makes it possible to call e.g. build/test-nss files google.com.
Meson does not care either way, so let's use the simpler syntax. And files()
already gives a list, so nesting this in a list wouldn't be necessary even
if meson did not flatten everything.
Since all path_set_*() helpers don't follow symlinks, it's possible to use
chase_symlinks(CHASE_NOFOLLOW) flag to both open the files specified by the
passed paths and check their validity (unlike their counterpart fd_set_*()
helpers).
This flag mimics what "O_NOFOLLOW|O_PATH" does for open(2) that is
chase_symlinks() will not resolve the final pathname component if it's a
symlink and instead will return a file descriptor referring to the symlink
itself.
Note: if CHASE_SAFE is also passed, no safety checking is performed on the
transition done if the symlink would have been followed.
TRUNCATE_FILE is now handled by a new dedicated function
truncate_file(). Indeed we have to take special care when truncating existing
file since the behavior is only specified for regular files.
Well that's not entirely true for fifo and terminal devices since O_TRUNC is
ignored in this case but even in for these types of file, truncating is
probably not the right thing to do.
It is worth noting that both truncate_file() and create_file() have been
modified so they use fstat(2) instead of stat(2) since both functions are not
supposed to follow symlinks.
write_one_file() only deals with the 'w' command and 'f'/'F' are now handled by
a new function create_file().
This is primarly done because 'w' is allowed to operate on any kind of files,
not just regular ones.
This a slight simplification since all callers of item_do()
(glob_item_recursively() and item_do() itself) stat the file descriptor only
for passing it to item_do().
When a nested struct is initialized by structured initializer, then
padding space is not cleared by zero. So, before setting values,
this makes explicitly set zero including padding.
This fixes the following false positive warning by valgrind:
```
==492== Syscall param sendmsg(msg.msg_iov[0]) points to uninitialised byte(s)
==492== at 0x56D0CF7: sendmsg (in /usr/lib64/libpthread-2.27.so)
==492== by 0x4FDD3C5: sd_resolve_getaddrinfo (sd-resolve.c:975)
==492== by 0x110B9E: manager_connect (timesyncd-manager.c:879)
==492== by 0x10B729: main (timesyncd.c:165)
==492== Address 0x1fff0008f1 is on thread 1's stack
==492== in frame #1, created by sd_resolve_getaddrinfo (sd-resolve.c:928)
==492==
```
To decreae latency this add support for TFO and TLS Session Tickets. As OpenSSL wouldn't let you easily set a different function all written data is temporarily cached and therefore needs to be flushed after each SSL function which can write data.
This provides basic OpenSSL support without optimizations like TCP Fast Open and TLS Session Tickets.
Notice only a single SSL library can be enabled at a time and therefore journald functions provided by GnuTLS will be disabled when using OpenSSL.
Fixes#9531
During handshake and TLS session closing, messages needs to be exchanged. Therefore this patch overrides the requested IO events for the TCP stream when the TLS is waiting for sending or receiving of messages during theses periods. This fixes issues with correctly closing the TLS stream and prevents the handshake from hanging in rare cases (not seen yet).
This makes hibernation unavailable if the kernel image we are currently
running was removed. This is supposed to be superficial protection
against hibernating a system we can never return from because the kernel
has been updated and the kernel we currently run is not available
anymore.
We look at a couple of places for the kernel, which should cover all
distributions I know off. Should I have missed a path I am sure people
will quickly notice and we can add more places to check. (or maybe
convince those distros to stick their kernels at a standard place)
On a host with sufficiently large zram but with no actual swap, logind will
respond to CanHibernate() with yes. With this patch, it will correctly respond
no, unless there are other swap devices to consider.
When DynamicUser=yes and static User= are set, and the user has
different uid and gid, then as the storage socket for the dynamic
user does not contains gid, we need to obtain gid.
Follow-up for 9ec655cbbd.
Fixes#9702.
Before this, the property changed signal is emitted immediately after
StartUnit/StopUnit method is called. So, the running state of the NTP
client service may not updated.
This makes the timing of emitting property changed signal is deferred
until job of starting/stopping NTP client service is completed.
Fixes#9672.
The commit 5d280742b6 introduces a
barrier to suppress calling context_update_ntp_status() multiple times.
However, it just stores the address of sd_bus_message object. So,
when an address is reused on the subsequent message, then the status
of NTP clients are not updated.
This makes the stored message object is referenced by the context
object. So, the subsequent message is on cirtainly different address.
Users are often surprised that "systemd-run" command lines like
"systemd-run -p User=idontexist /bin/true" will return successfully,
even though the logs show that the process couldn't be invoked, as the
user "idontexist" doesn't exist. This is because Type=simple will only
wait until fork() succeeded before returning start-up success.
This patch adds a new service type Type=exec, which is very similar to
Type=simple, but waits until the child process completed the execve()
before returning success. It uses a pipe that has O_CLOEXEC set for this
logic, so that the kernel automatically sends POLLHUP on it when the
execve() succeeded but leaves the pipe open if not. This means PID 1
waits exactly until the execve() succeeded in the child, and not longer
and not shorter, which is the desired functionality.
Making use of this new functionality, the command line
"systemd-run -p User=idontexist -p Type=exec /bin/true" will now fail,
as expected.
When process fd lists to pass to activated programs we always place the
socket activation fds first, and the storage fds last. Irritatingly in
almost all calls the "n_storage_fds" parameter (i.e. the number of
storage fds to pass) came first so far, and the "n_socket_fds" parameter
second. Let's clean this up, and specify the number of fds in the order
the fds themselves are passed.
(Also, let's fix one more case where "unsigned" was used to size an
array, while we should use "size_t" instead.)
machined exposes the pseudo-container ".host" as a reference to the host
system, and this means "machinectl login .host" and "machinectl shell
.host" get your a login/shell on the host. systemd-run currently doesn't
allow that. Let's fix that, and make sd-bus understand ".host" as an
alias for connecting to the host system.
We so far had various placed we'd parse percentages with
parse_percent(). Let's make them use parse_permille() instead, which is
downward compatible (as it also parses percent values), and increases
the granularity a bit. Given that on the wire we usually normalize
relative specifications to something like UINT32_MAX anyway changing
from base-100 to base-1000 calculations can be done easily without
breaking compat.
This commit doesn't document this change in the man pages. While
allowing more precise specifcations permille is not as commonly
understood as perent I guess, hence let's keep this out of the docs for
now.
If 'v' is negative, it's wrong to add the decimal to it, as we'd
actually need to subtract it in this case. But given that we don't want
to allow negative vaues anyway, simply check earlier whether what we
have parsed so far was negative, and react to that before adding the
decimal to it.
We likely get the data from the env block, but we might also determine
it from elsewhere (such as PAM module parameters). Let's set the env
vars on the env block explicitly, so that they are available always, and
apps can rely on it.
Let's make this symmetric with XDG_SESSION_CLASS and XDG_SESSION_TYPE,
so that PAM stacks can configure this easily without involving env vars,
in case there are PAM session managers which only support a single
desktop anyway.
Since D-Bus 1.9.14 (2015-03-02) dbus looks in $XDG_RUNTIME_DIR/bus for
the system bus on its own, hence we can finally drop setting this
environment variable. gdbus since glib 2.45.3 (June 2015) also supports
it.
When networkd has not connected and setting hostname/timezone is
requested, the operation is delayed, not canceled. So, logging in
debug level is sufficient for the corresponding log message.
Closes#9699.
This adds -Dnss-resolve= and -Dnss-mymachines= meson options.
By using this option, e.g., resolved can be built without nss-resolve.
When no nss modules are built, then test-nss is neither built.
Also, This changes the option name -Dmyhostname= to -Dnss-myhostname=
for consistency to other nss related options.
Closes#9596.
Usecase is to allow changing the final kill from SIGKILL to SIGQUIT which
should create a core dump useful for debugging why the service didn't stop
with the SIGTERM
We often open the parent directory of a path. Let's add a common helper
for that, that shortens our code a bit and adds some extra safety
checks, for example it will fail if used on the root directory (which
doesn't really have a parent).
The helper is actually generalized from a function in btrfs-util.[ch]
which already existed for this purpose.
the service manager serializes ExecStop= execution data after
ExecStart=, like it makes sense and how it should be expected. However,
systemctl previously would reverse them when deserializing them locally,
and thus show ExecStop= results before ExecStart= results. And that's
confusing. Let's fix that.
Whenever a unit is started fresh we should flush out any runtime data
from the previous cycle. We are pretty good at that already, but what so
far we missed was the ExecStart=/ExecStop=/… command exit status data.
Let's fix that, and properly flush out that stuff too.
Consider this service:
[Service]
ExecStart=/bin/sleep infinity
ExecStop=/bin/false
When this service is started, then stopped and then started again
"systemctl status" would show the ExecStop= results of the previous run
along with the ExecStart= results of the current one, which is very
confusing. With this patch this is corrected: the data is kept right
until the moment the new service cycle starts, and then flushed out.
Hence "systemctl status" in that case will only show the ExecStart=
data, but no ExecStop= data, like it should be.
This should fix part of the confusion of #9588
We always initialize it from the same field in ExecCommand anyway, hence
there's no point in passing it separately to exec_spawn(), after all we
already pass the ExecCommand structure itself anyway.
No change in behaviour.
That call to mount was added as a safeguard against a kernel bug which was fixed in
torvalds/linux@bbd5192.
In principle, the error could be ignored because
* normally everything mounted on /proc/PID should disappear as soon as the PID has gone away
* test-mount-util that had been confused by those phantom entries in /proc/self/mountinfo was
taught to ignore them in 112cc3b.
On the other hand, in practice, if the mount fails, then the next one is extremely unlikely to
succeed, so it seems to be reasonable to just skip the rest of `test_get_process_cmdline_harder`
if that happens.
Closes https://github.com/systemd/systemd/issues/9649.
Currently to set the flag to reboot into the firmware setup an
authentication by an administrative user is required. Since we are
already enabling active users to reboot the system, it is advisable to
let the user decide if he wants to boot into the firmware setup without
any more hassle.
Currently, mount_sysfs() only creates /sys/fs/cgroup if cg_ns_supported().
The comment explains that we need to "Create mountpoint for
cgroups. Otherwise we are not allowed since we remount /sys read-only.";
that is: that we need to do it now, rather than later. However, the
comment doesn't do anything to explain why we only need to do this if
cg_ns_supported(); shouldn't we _always_ need to do it?
The answer is that if !use_cgns, then this was already done by the outer
child, so mount_sysfs() only needs to do it if use_cgns. Now,
mount_sysfs() doesn't know whether use_cgns, but !cg_ns_supported() implies
!use_cgns, so we can optimize" the case where we _know_ !use_cgns, and deal
with a no-op mkdir_p() in the false-positive where cgns_supported() but
!use_cgns.
But is it really much of an optimization? We're potentially spending an
access(2) (cg_ns_supported() could be cached from a previous call) to
potentially save an lstat(2) and mkdir(2); and all of them are on virtual
fileystems, so they should all be pretty cheap.
So, simplify and drop the conditional. It's a dubious optimization that
requires more text to explain than it's worth.
Remove "arbitrary named hierarchies" from the list of things that
cg_kernel_controllers() might return, and clarify that "name="
pseudo-controllers are not included in the returned list.
/proc/cgroups does not contain "name=" pseudo-controllers, and
cg_kernel_controllers() makes no effort to enumerate them via a different
mechanism.
One of the things that tmpfs_patch_options does is take an (optional) UID,
and insert "uid=${UID},gid=${UID}" into the options string. So we need a
uid_t argument, and a way of telling if we should use it. Fortunately,
that is built in to the uid_t type by having UID_INVALID as a possible
value.
So this is really a feature that requires one argument. Yet, it is somehow
taking 4! That is absurd. Simplify it to only take one argument, and have
that trickle all the way up to mount_all()'s usage.
Now, in may of the uses, the argument becomes
uid_shift == 0 ? UID_INVALID : uid_shift
because it used to treat uid_shift=0 as invalid unless the patch_ids flag
was also set. This keeps the behavior the same. Note that in all cases
where it is invoked, if !use_userns (sometimes called !userns), then
uid_shift is 0; we don't have to add any checks for that.
That said, I'm pretty sure that "uid=0" and not setting "uid=" are the
same, but Christian Brauner seemed to not think so when implementing the
cgns support. https://github.com/systemd/systemd/pull/3589
One of the things that mkdir_userns{,_p}() does is take an (optional) UID,
and chown the directory to that. So we need a uid_t argument, and a way of
telling if we should use that uid_t argument. Fortunately, that is built
in to the uid_t type by having UID_INVALID as a possible value.
However, currently mkdir_userns() also takes a MountSettingsMask and checks
a couple of bits in it to decide if it should perform the chown.
Drop the mask argument, and instead have the caller pass UID_INVALID if it
shouldn't chown.
When we open our own little namespace for running our tests in, let's
turn off mount propagation only one way, rather than both ways. This is
better as this means we don't pin host mounts unnecessarily long in our
namespace, even though the host already got rid of them. This is because
MS_SLAVE in contrast to MS_PRIVATE allows umount events to propagate
from the host into our environment.
Looking at a recent Bad Day, my log contains over 100 lines of
systemd[23895]: Failed to connect to API bus: Connection refused
It is due to "systemd --user" retrying to connect to an API bus.[*] I
would prefer to avoid spamming the logs. I don't think it is good for us
to retry so much like this.
systemd was mislead by something setting DBUS_SESSION_BUS_ADDRESS. My best
guess is an unfortunate series of events caused gdm to set this. gdm has
code to start a session dbus if there is not a bus available already (and
in this case it exports the environment variable). I believe it does not
normally do this when running under systemd, because "systemd --user" and
hence "dbus.service" would already have been started by pam_systemd.
I see two possibilities
1. Rip out the check for DBUS_SESSION_BUS_ADDRESS entirely.
2. Only check for DBUS_SESSION_BUS_ADDRESS on startup. Not in the
"recheck" logic.
The justification for 2), is that the recheck is called from unit_notify(),
this is used to check whether the service just started (or stopped) was
"dbus.service". This reason for rechecking does not apply if we think
the session bus was started outside our logic.
But I think we can justify 1). dbus-daemon ships a statically-enabled
/usr/lib/systemd/user/dbus.service, which would conflict with an attempt to
use an external dbus. Also "systemd --user" is started from user@.service;
if you try to start it manually so that it inherits an environment
variable, it will conflict if user@.service was started by pam_systemd
(or loginctl enable-linger).