Some additional files related to single socket may appear in the
filesystem and they should be opened and passed to related service.
This commit adds optional list of file descriptors, which are
dynamically discovered and opened.
Let's add a way to get the type-specific D-Bus interface of a unit from
either its type or name to src/basic/unit-name.[ch]. That way we can
share it with the client side, where it is useful in tools like cgls or
machinectl.
Also ports over machinectl to make use of this.
This reverts commit d4d00020d6. The idea of
the commit is broken and needs to be reworked. We really cannot reduce
the bus-addresses to a single address. We always will have systemd with
native clients and legacy clients at the same time, so we also need both
addresses at the same time.
We should not fall back to dbus-1 and connect to the proxy when kdbus
returns an error that indicates that kdbus is running but just does not
accept new connections because of quota limits or something similar.
Using is_kdbus_available() in libsystemd/ requires it to move from
shared/ to libsystemd/.
Based on a patch from David Herrmann:
https://github.com/systemd/systemd/pull/886
- Add smack xattr lookup table
- Unify all of mac_smack_apply_xxx{_fd}() to mac_smack_apply() and
mac_smack_apply_fd().
- Add smack xattr read apis similar with apply apis as
mac_smack_read{_fd}().
bind() fails if it is called before setting SO_REUSEPORT and another
process is already binded to the same addess.
A new reuse_port option has been introduced to socket_address_listen()
to set the option as part of socket initialization.
Also, when the child is potentially long-running make sure to set a
death signal.
Also, ignore the result of the reset operations explicitly by casting
them to (void).
It's primarily just a property of the Manager object after all, and we
try to refer to PID 1 as "manager" instead of "systemd", hence let's to
stick to this here too.
This changes log_unit_info() (and friends) to take a real Unit* object
insted of just a unit name as parameter. The call will now prefix all
logged messages with the unit name, thus allowing the unit name to be
dropped from the various passed romat strings, simplifying invocations
drastically, and unifying log output across messages. Also, UNIT= vs.
USER_UNIT= is now derived from the Manager object attached to the Unit
object, instead of getpid(). This has the benefit of correcting the
field for --test runs.
Also contains a couple of other logging improvements:
- Drops a couple of strerror() invocations in favour of using %m.
- Not only .mount units now warn if a symlinks exist for the mount
point already, .automount units do that too, now.
- A few invocations of log_struct() that didn't actually pass any
additional structured data have been replaced by simpler invocations
of log_unit_info() and friends.
- For structured data a new LOG_UNIT_MESSAGE() macro has been added,
that works like LOG_MESSAGE() but prefixes the message with the unit
name. Similar, there's now LOG_LINK_MESSAGE() and
LOG_NETDEV_MESSAGE().
- For structured data new LOG_UNIT_ID(), LOG_LINK_INTERFACE(),
LOG_NETDEV_INTERFACE() macros have been added that generate the
necessary per object fields. The old log_unit_struct() call has been
removed in favour of these new macros used in raw log_struct()
invocations. In addition to removing one more function call this
allows generated structured log messages that contain two object
fields, as necessary for example for network interfaces that are
joined into another network interface, and whose messages shall be
indexed by both.
- The LOG_ERRNO() macro has been removed, in favour of
log_struct_errno(). The latter has the benefit of ensuring that %m in
format strings is properly resolved to the specified error number.
- A number of logging messages have been converted to use
log_unit_info() instead of log_info()
- The client code in sysv-generator no longer #includes core code from
src/core/.
- log_unit_full_errno() has been removed, log_unit_full() instead takes
an errno now, too.
- log_unit_info(), log_link_info(), log_netdev_info() and friends, now
avoid double evaluation of their parameters
A variety of changes:
- Make sure all our calls distuingish OOM from other errors if OOM is
not the only error possible.
- Be much stricter when parsing escaped paths, do not accept trailing or
leading escaped slashes.
- Change unit validation to take a bit mask for allowing plain names,
instance names or template names or an combination thereof.
- Refuse manipulating invalid unit name
Because the order of coldplugging is not defined, we can reference a
not-yet-coldplugged unit and read its state while it has not yet been
set to a meaningful value.
This way, already active units may get started again.
We fix this by deferring such actions until all units have been at
least somehow coldplugged.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=88401
This patch removes includes that are not used. The removals were found with
include-what-you-use which checks if any of the symbols from a header is
in use.
When dbus.socket is updated like this:
-ListenStream=/var/run/dbus/system_bus_socket
+ListenStream=/run/dbus/system_bus_socket
... and daemon-reload is performed, bad things happen.
During deserialization systemd does not recognize that the two paths
refer to the same named socket and replaces the socket file with a new
one. As a result, applications hang when they try talking to dbus.
Fix this by finding a match not only when the path names are equal, but
also when they point to the same inode.
In socket_address_equal() it is necessary to move the address size
comparison into the abstract sockets branch. For path name sockets the
comparison must not be done and for other families it is redundant
(their sizes are constant and checked by socket_address_verify()).
FIFOs and special files can also have multiple pathnames, so compare the
inodes for them as well. Note that previously the pathname checks used
streq_ptr(), but the paths cannot be NULL.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1186018
After all it is now much more like strjoin() than strappend(). At the
same time, add support for NULL sentinels, even if they are normally not
necessary.
Unit _start() and _stop() implementations can fail with -EAGAIN to delay
execution temporarily. Thus, we should not output status messages before
invoking these calls, but after, and only when we know that the
invocation actually made a change.
As a followup to 086891e5c1 "log: add an "error" parameter to all
low-level logging calls and intrdouce log_error_errno() as log calls
that take error numbers", use sed to convert the simple cases to use
the new macros:
find . -name '*.[ch]' | xargs sed -r -i -e \
's/log_(debug|info|notice|warning|error|emergency)\("(.*)%s"(.*), strerror\(-([a-zA-Z_]+)\)\);/log_\1_errno(-\4, "\2%m"\3);/'
Multi-line log_*() invocations are not covered.
And we also should add log_unit_*_errno().
- Rename log_meta() → log_internal(), to follow naming scheme of most
other log functions that are usually invoked through macros, but never
directly.
- Rename log_info_object() to log_object_info(), simply because the
object should be before any other parameters, to follow OO-style
programming style.
For priviliged units this resource control property ensures that the
processes have all controllers systemd manages enabled.
For unpriviliged services (those with User= set) this ensures that
access rights to the service cgroup is granted to the user in question,
to create further subgroups. Note that this only applies to the
name=systemd hierarchy though, as access to other controllers is not
safe for unpriviliged processes.
Delegate=yes should be set for container scopes where a systemd instance
inside the container shall manage the hierarchies below its own cgroup
and have access to all controllers.
Delegate=yes should also be set for user@.service, so that systemd
--user can run, controlling its own cgroup tree.
This commit changes machined, systemd-nspawn@.service and user@.service
to set this boolean, in order to ensure that container management will
just work, and the user systemd instance can run fine.
APIs that query and return something cannot silently fail, they must
either return something useful, or an error. Fix that.
Also, properly rollback socket unit fd creation when something goes
wrong with the security framework.
This makes possible to spawn service instances triggered by socket with
MLS/MCS SELinux labels which are created based on information provided by
connected peer.
Implementation of label_get_child_mls_label derived from xinetd.
Reviewed-by: Paul Moore <pmoore@redhat.com>
This way, the list of arguments to that function gets more comprehensive,
and we can get around passing lots of NULL and 0 arguments from socket.c,
swap.c and mount.c.
It also allows for splitting up the code in exec_spawn().
While at it, make ExecContext const in execute.c.
This is what we have done so far for all other time values, and hence we
should do this here. This indicates the default unit of time values
specified here, if they don't contain a unit.
This makes possible to spawn service instances triggered by socket with
MLS/MCS SELinux labels which are created based on information provided by
connected peer.
Implementation of label_get_child_label derived from xinetd.
Reviewed-by: Paul Moore <pmoore@redhat.com>
TCP_DEFER_ACCEPT Allow a listener to be awakened only when data
arrives on the socket. If TCP_DEFER_ACCEPT set on a server-side
listening socket, the TCP/IP stack will not to wait for the final
ACK packet and not to initiate the process until the first packet
of real data has arrived. After sending the SYN/ACK, the server will
then wait for a data packet from a client. Now, only three packets
will be sent over the network, and the connection establishment delay
will be significantly reduced.
The tcp keep alive variables now can be configured via conf
parameter. Follwing variables are now supported by this patch.
tcp_keepalive_intvl: The number of seconds between TCP keep-alive probes
tcp_keepalive_probes: The maximum number of TCP keep-alive probes to
send before giving up and killing the connection if no response is
obtained from the other end.
tcp_keepalive_time: The number of seconds a connection needs to be
idle before TCP begins sending out keep-alive probes.
This reverts commit 9528592ff8.
Apparently TFO is actually the default at least for the server side now.
Also the setsockopt doesn't actually take a bool, but a qlen integer.
TCP Fast Open (TFO) speeds up the opening of successiveTCP)
connections between two endpoints.It works by using a TFO cookie
in the initial SYN packet to authenticate a previously connected
client. It starts sending data to the client before the receipt
of the final ACK packet of the three way handshake is received,
skipping a round trip and lowering the latency in the start of
transmission of data.
This patch adds support for TCP TCP_NODELAY socket option. This can be
configured via NoDelay conf parameter. TCP Nagle's algorithm works by
combining a number of small outgoing messages, and sending them all at
once. This controls the TCP_NODELAY socket option.
This tool will warn about misspelt directives, unknown sections, and
non-executable commands. It will also catch the common mistake of
using Accept=yes with a non-template unit and vice versa.
https://bugs.freedesktop.org/show_bug.cgi?id=56607
Parsing sysv files was moved to the sysv-generator in the previous commit.
This patch removes the sysv parsing from serivce.c.
Note that this patch drops the following now unused sysv-specific info
from service dump:
"SysV Init Script has LSB Header: (yes/no)"
"SysVEnabled: (yes/no)"
"SysVRunLevels: (levels)"
With Symlinks= we can manage one or more symlinks to AF_UNIX or FIFO
nodes in the file system, with the same lifecycle as the socket itself.
This has two benefits: first, this allows us to remove /dev/log and
/dev/initctl from /dev, thus leaving only symlinks, device nodes and
directories in the /dev tree. More importantly however, this allows us
to move /dev/log out of /dev, while still making it accessible there, so
that PrivateDevices= can provide /dev/log too.
This reverts commit 9754d56e9b.
It causes a crash in PID1:
Apr 19 13:49:32 lon systemd[1]: Code should not be reached 'Unhandled socket type.'
at src/core/socket.c:684, function instance_from_socket(). Aborting.
Apr 19 13:49:32 lon systemd[1]: Caught <ABRT>, dumped core as pid 336.
Apr 19 13:49:32 lon systemd[1]: Freezing execution.
NixOS uses Unix domain sockets for certain host <-> container
interaction; i.e. the host connects to a socket visible in the
container's directory tree, where the container uses a .socket unit to
spawn the handler program on demand. This worked in systemd 203, but
in 212 fails with "foo.socket failed to queue service startup job
(Maybe the service file is missing or not a template unit?): No data
available".
The reason is that getpeercred() now returns ENODATA if it can't get
the PID of the client, which happens in this case because the client
is not in the same PID namespace. Since getpeercred() is only used to
generate the instance name, this patch simply handles ENODATA by
creating an instance name "<nr>-unknown".
[zj: reorder clauses and remove (unsigned long) casts.]
Let's automatically initialize the kill, exec and cgroup contexts of the
various unit types when the object is constructed, instead of
invididually in type-specific code.
Also, when PrivateDevices= is set, set DevicePolicy= to closed.
safe_close() automatically becomes a NOP when a negative fd is passed,
and returns -1 unconditionally. This makes it easy to write lines like
this:
fd = safe_close(fd);
Which will close an fd if it is open, and reset the fd variable
correctly.
By making use of this new scheme we can drop a > 200 lines of code that
was required to test for non-negative fds or to reset the closed fd
variable afterwards.
Previously the returned object of constructor functions where sometimes
returned as last, sometimes as first and sometimes as second parameter.
Let's clean this up a bit. Here are the new rules:
1. The object the new object is derived from is put first, if there is any
2. The object we are creating will be returned in the next arguments
3. This is followed by any additional arguments
Rationale:
For functions that operate on an object we always put that object first.
Constructors should probably not be too different in this regard. Also,
if the additional parameters might want to use varargs which suggests to
put them last.
Note that this new scheme only applies to constructor functions, not to
all other functions. We do give a lot of freedom for those.
Note that this commit only changes the order of the new functions we
added, for old ones we accept the wrong order and leave it like that.
Given that we now have KillMode=mixed where SIGTERM might kill a smaller
set than SIGKILL we need to make sure to always go explicitly throught
the SIGKILL state to get the right end result.
Also, introduce a new environment variable named $WATCHDOG_PID which
cotnains the PID of the process that is supposed to send the keep-alive
events. This is similar how $LISTEN_FDS and $LISTEN_PID work together,
and protects against confusing processes further down the process tree
due to inherited environment.
This patch converts PID 1 to libsystemd-bus and thus drops the
dependency on libdbus. The only remaining code using libdbus is a test
case that validates our bus marshalling against libdbus' marshalling,
and this dependency can be turned off.
This patch also adds a couple of things to libsystem-bus, that are
necessary to make the port work:
- Synthesizing of "Disconnected" messages when bus connections are
severed.
- Support for attaching multiple vtables for the same interface on the
same path.
This patch also fixes the SetDefaultTarget() and GetDefaultTarget() bus
calls which used an inappropriate signature.
As a side effect we will now generate PropertiesChanged messages which
carry property contents, rather than just invalidation information.
Previously to automatically create dependencies between mount units we
matched every mount unit agains all others resulting in O(n^2)
complexity. On setups with large amounts of mount units this might make
things slow.
This change replaces the matching code to use a hashtable that is keyed
by a path prefix, and points to a set of units that require that path to
be around. When a new mount unit is installed it is hence sufficient to
simply look up this set of units via its own file system paths to know
which units to order after itself.
This patch also changes all unit types to only create automatic mount
dependencies via the RequiresMountsFor= logic, and this is exposed to
the outside to make things more transparent.
With this change we still have some O(n) complexities in place when
handling mounts, but that's currently unavoidable due to kernel APIs,
and still substantially better than O(n^2) as before.
https://bugs.freedesktop.org/show_bug.cgi?id=69740
Previously we did operations like attach, trim or migrate only on the
controllers that were enabled for a specific unit. With this changes we
will now do them for all supproted controllers, and fall back to all
possible prefix paths if the specified paths do not exist.
This fixes issues if a controller is being disabled for a unit where it
was previously enabled, and makes sure that all processes stay as "far
down" the tree as groups exist.
Previously the specifier calls could only indicate OOM by returning
NULL. With this change they will return negative errno-style error codes
like everything else.
Replace the very generic cgroup hookup with a much simpler one. With
this change only the high-level cgroup settings remain, the ability to
set arbitrary cgroup attributes is removed, so is support for adding
units to arbitrary cgroup controllers or setting arbitrary paths for
them (especially paths that are different for the various controllers).
This also introduces a new -.slice root slice, that is the parent of
system.slice and friends. This enables easy admin configuration of
root-level cgrouo properties.
This replaces DeviceDeny= by DevicePolicy=, and implicitly adds in
/dev/null, /dev/zero and friends if DeviceAllow= is used (unless this is
turned off by DevicePolicy=).
- This changes all logind cgroup objects to use slice objects rather
than fixed croup locations.
- logind can now collect minimal information about running
VMs/containers. As fixed cgroup locations can no longer be used we
need an entity that keeps track of machine cgroups in whatever slice
they might be located. Since logind already keeps track of users,
sessions and seats this is a trivial addition.
- nspawn will now register with logind and pass various bits of metadata
along. A new option "--slice=" has been added to place the container
in a specific slice.
- loginctl gained commands to list, introspect and terminate machines.
- user.slice and machine.slice will now be pulled in by logind.service,
since only logind.service requires this slice.
In order to prepare for the kernel cgroup rework, let's introduce a new
unit type to systemd, the "slice". Slices can be arranged in a tree and
are useful to partition resources freely and hierarchally by the user.
Each service unit can now be assigned to one of these slices, and later
on login users and machines may too.
Slices translate pretty directly to the cgroup hierarchy, and the
various objects can be assigned to any of the slices in the tree.
I'm assuming that it's fine if a _const_ or _pure_ function
calls assert. It is assumed that the assert won't trigger,
and even if it does, it can only trigger on the first call
with a given set of parameters, and we don't care if the
compiler moves the order of calls.
When a trigger unit wants to know if a stop is queued for it, we should
just check precisely that and do not check whether it is actually
stopped already. This is because we use these checks usually from state
change calls where the state variables are not updated yet.
This change splits unit_pending_inactive() into two calls
unit_inactive_or_pending() and unit_stop_pending(). The former checks
state and pending jobs, the latter only pending jobs.
When showing an error like 'Socket service not loaded', the
error won't show up in the status for the socket, unless it is
marked as SYSTEMD_UNIT=*.socket. Marking it as SYSTEMD_UNIT=*.service,
when the service is non-existent, is not useful.
All Execs within the service, will get mounted the same
/tmp and /var/tmp directories, if service is configured with
PrivateTmp=yes. Temporary directories are cleaned up by service
itself in addition to systemd-tmpfiles. Directory which is mounted
as inaccessible is created at runtime in /run/systemd.
There are very few differences in the implementations of the kill method in the
unit types that have one. Let's unify them.
This does not yet unify unit_kill() with unit_kill_context().
Hello list,
some socket activated service gave me the error message you can see on
the subject, maybe systemd should be more verbose in that case.
Thanks,
Dimitris
This adds #ifdef HAVE_ATTR_XATTR_H guards around all usage of xattr.
This unbreaks building with --disable-xattr when <attr/xattr.h> doesn't exist.
<attr/xattr.h> and usage of fsetxattr() without
Since we already allow defining the mode of AF_UNIX sockets and FIFO, it
makes sense to also allow specific user/group ownership of the socket
file for restricting access.
This adds SMACK label configuration options to socket units.
SMACK labels should be applied to most objects on disk well before
execution time, but two items remain that are generated dynamically
at run time that require SMACK labels to be set in order to enforce
MAC on all objects.
Files on disk can be labelled using package management.
For device nodes, simple udev rules are sufficient to add SMACK labels
at boot/insertion time.
Sockets can be created at run time and systemd does just that for
several services. In order to protect FIFO's and UNIX domain sockets,
we must instruct systemd to apply SMACK labels at runtime.
This patch adds the following options:
Smack - applicable to FIFO's.
SmackIpIn/SmackIpOut - applicable to sockets.
No external dependencies are required to support SMACK, as setting
the labels is done using fsetxattr(). The labels can be set on a
kernel that does not have SMACK enabled either, so there is no need
to #ifdef any of this code out.
For more information about SMACK, please see Documentation/Smack.txt
in the kernel source code.
v3 of this patch changes the config options to be CamelCased.
Under some circumstances this could lead to a segfault since we we
half-initialized a mount unit, then tried to hook it into the network of
things and while doing that recursively ended up looking at our
half-initialized mount unit again assuming it was fully initialized.
Note: I did s/MANAGER/SYSTEMD/ everywhere, even though it makes the
patch quite verbose. Nevertheless, keeping MANAGER prefix in some
places, and SYSTEMD prefix in others would just lead to confusion down
the road. Better to rip off the band-aid now.
In some cases, like wrong configuration, restarting after error
does not help, so administrator can specify statuses by RestartPreventExitStatus
which will not cause restart of a service.
Sometimes you have non-standart exit status, so this can be specified
by SuccessfulExitStatus.
It made no sense, and since we are documenting the bus calls now and
want to include them in our stability promise we really should get it
cleaned up sooner, not later.
As described in
https://bugs.freedesktop.org/show_bug.cgi?id=50184
the journal currently doesn't set fields such as _SYSTEMD_UNIT
properly for messages coming from processes that have already
terminated. This means among other things that "systemctl status" may
not show some of the output of services that wrote messages just
before they exited.
This patch fixes this by having processes that log to the journal
write their unit identifier to journald when the connection to
/run/systemd/journal/stdout is opened. Journald stores the unit ID
and uses it to fill in _SYSTEMD_UNIT when it cannot be obtained
normally (i.e. from the cgroup). To prevent impersonating another
unit, this information is only used when the caller is root.
This doesn't fix the general problem of getting metadata about
messages from terminated processes (which requires some kernel
support), but it allows "systemctl status" and similar queries to do
the Right Thing for units that log via stdout/stderr.
UnitPath= is also writable via native units and may be used by generators
to clarify from which file a unit is generated. This patch also hooks up
the cryptsetup and fstab generators to set UnitPath= accordingly.
Instead of generic "Starting..." and "Started" messages for all unit use
type-dependent messages. For example, mounts will announce "Mounting..."
and "Mounted".
Add status messages to units of types that used to be entirely silent
(automounts, sockets, targets, devices). For unit types whose jobs are
instantaneous, report only the job completion, not the starting event.
Socket units with non-instantaneous jobs are rare (Exec*= is not used
often in socket units), so I chose not to print the starting messages
for them either.
This will hopefully give people better understanding of the boot.
The kernel will only notify us of cgroups running empty if no subcgroups
exist anymore. Hence make sure we don't leave our own control/ subcgroup
around longer than necessary.
https://bugzilla.redhat.com/show_bug.cgi?id=818381
Type=idle is much like Type=simple, however between the fork() and the
exec() in the child we wait until PID 1 informs us that no jobs are
left.
This is mostly a cosmetic fix to make gettys appear only after all boot
output is finished and complete.
Note that this does not impact the normal job logic as we do not delay
the completion of any jobs. We just delay the invocation of the actual
binary, and only for services that otherwise would be of Type=simple.
Previously, we were brutally and onconditionally killing all processes
in a service's cgroup before starting the service anew, in order to
ensure that StartPre lines cannot be misused to spawn long-running
processes.
On logind-less systems this has the effect that restarting sshd
necessarily calls all active ssh sessions, which is usually not
desirable.
With this patch control processes for a service are placed in a
sub-cgroup called "control/". When starting a service anew we simply
kill this cgroup, but not the main cgroup, in order to avoid killing any
long-running non-control processes from previous runs.
https://bugzilla.redhat.com/show_bug.cgi?id=805942
We finally got the OK from all contributors with non-trivial commits to
relicense systemd from GPL2+ to LGPL2.1+.
Some udev bits continue to be GPL2+ for now, but we are looking into
relicensing them too, to allow free copy/paste of all code within
systemd.
The bits that used to be MIT continue to be MIT.
The big benefit of the relicensing is that closed source code may now
link against libsystemd-login.so and friends.