Let's protect ourselves against the recently reported docker security
issue. Our man page makes clear that we do not make any security
promises anyway, but well, this one is easy to mitigate, so let's do it.
While we are at it block a couple of more syscalls that are no good in
containers, too.
The logic otherwise is that we leave anything preconfigured alone, but in the case of DHCP
we actually need to update it whenever the lease is renewed.
Even if we cannot renew the lease at T1, we will likely succeed at T2, so warn and ignore the failure.
This could happen if for whatever reason the received address is not yet configured, or it has
been lost.
This adds support for DHCP options 33 and 121: Static Route and
Classless Static Route. To enable this feature, set UseRoutes=true
in .network file. Returned routes are added to the routing table.
If there are v4 or v6 specific options we can keep those in separate sections,
but for the common options, we will use only one.
Moreovere only use DHCP=[yes/both|no/none|v4|v6] to enable or disable the clients.
Let's move things closer to journald's configuration settings, which
knows Compress= already, as a boolean. This makes things more uniform,
but also gives us more freedom to possibly swap out the used compression
algorithm one day.
This sounds overly low-level and implementation-detaily. Let's just
use the default level XZ suggests. This gives us more room to possibly
swap out the compression algorithm used, as the compression level range
will not leak into user configuration.
When disk space taken up by coredumps grows beyond a configured limit
start removing the oldest coredump of the user with the most coredumps,
until we get below the limit again.
Add a Rapid Commit option to Solicit messages and expect a Reply to
be received instead of an Advertise. When receiving a DHCPv6 message
from the server in state Solicit, continue testing whether the
message is a Reply. Ease up the message type checking, it's not fatal
if the message is of a wrong type.
Add helper functions to set/get the rapid commit of a lease. See
RFC 3315, sections 17., 17.1.2., 17.1.4. and 18.1.8.
Start sending Renew and Rebind DHCPv6 messages when respective timers T1
and T2 expire. Rebind messages do not include a Server ID option and the
Rebind procedure ends when the last IPv6 address valid lifetime expires,
whereafter the client restarts the address acquisition procedure by
Soliciting for available servers.
See RFC 3315, sections 18.1.3. and 18.1.4. for details.
Create a helper function to compute the remaining time in seconds from
time T2 to the IPv6 address with the longest lifetime. The computed
time is used as the Maximum Retransmission Duration in Rebinding state.
See RFC 3315, section 18.1.4. for details.
Provide a function to request more options from the DHCPv6 server.
Provide a sensible default set at startup and add test basic test
cases for the intended usage.
Define DNS and NTP related option codes and add comments for the
unassigned codes.
Patch fixes some incorrect-looking code in transaction.c.
It could fix cases where Debian users with bad package configurations
had systemd go into an infinite loop printing messages about breaking an
ordering cycle, though I have not reproduced that problem myself.
transaction_verify_order_one() considers jobs/units outside current
transaction when checking whether ordering dependencies cause cycles.
It would also incorrectly try to break cycles at these jobs; this
cannot work, as the break action is to remove the job from the
transaction, which is a no-op if the job isn't part of the transaction
to begin with. The unit_matters_to_anchor() test also looks like it
would not work correctly for non-transaction jobs. Add a check to
verify that the unit is part of the transaction before considering a
job a candidate for deletion.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=752259
Add Compression={none,xz} and CompressionLevel=0-9 settings. Defaults
are xz/6.
Compression=filesystem may be added later.
I picked "xz" for the compression "type", since we might want to add
different compressors later on. XZ is fairly memory and CPU intensive, and
embedded users will likely want to use LZO or some other lightweight compression
mechanism.
Unfortunately the core is first written uncompressed, then compressed
by reading from disk and writing to the output file. This is ugly and
slow, but I don't see a way around, if we want to get the backtrace
without keeping everything in memory.
When running in 'quiet' mode, the only message printed from shutdown
binary would be 'Cannot finalize remaining file systems and devices,
giving up.', the only log line at error level before switch back to
initramfs. This is misleading, because in initramfs everything will
be cleaned up properly.
Avoid printing anything at error level before the attempt to switch
back to initramfs. Rework the messages to contain a bit more
information what is still remaining, to help people diagnose shutdown
issues.
Let's keep this behavior consistent across our libraries.
In order to keep the refcounting working, a DONT_DESTROY macro similar
to the one in sd-bus was introduced.
Kernel mangles comm information in an irreversible way when comm
constains repeated spaces. Retrieve comm information from /proc, and
only fallback to the information provided on the commandline when
retrieving information from /proc fails.
Add exe information to the list of saved xattr.
https://bugs.freedesktop.org/show_bug.cgi?id=62043
As special magic, don't create device dependencies for /dev/null. Of
course, there might be similar devices we might want to include, but
given that none of them really make sense to specify as password source
there's really no point in checking for anything else here.
https://bugs.freedesktop.org/show_bug.cgi?id=75816
src/readahead/readahead-common.c:55:17: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 7 has type ‘__off64_t’ [-Wformat=]
log_debug("Not preloading file %s with size out of bounds %zu", fn, st->st_size);
^
shifting from a non fixed number of bits >= to the size of the type
leads to weird results, handle the special case of << 32 to fix it.
This was causing a test failure from test-socket-util:
Assertion 'in_addr_prefix_intersect(f, &ua, apl, &ub, bpl) == result' failed at
/var/tmp/paludis/build/sys-apps-systemd-scm/work/systemd-scm/src/test/test-socket-util.c:147, function
test_in_addr_prefix_intersect_one(). Aborting.
Minimal reproducer:
paludisbuild@Lou /tmp $ cat test.c
static void test(unsigned m) {
unsigned nm = 0xFFFFFFFFUL << (32-m);
printf("%u: %x\n", m, nm);
}
int main (void) {
test(1);
test(0);
return 0;
}
paludisbuild@Lou /tmp $ gcc -m32 -std=gnu99 test.c -o test32
paludisbuild@Lou /tmp $ ./test32
1: 80000000
0: ffffffff
paludisbuild@Lou /tmp $ gcc -std=gnu99 test.c -o test64
paludisbuild@Lou /tmp $ ./test64
1: 80000000
0: 0
The short lease was useful for testing, but in real-world usage it is pointless to keep leases
this short. That said, the cost of lease renewal is really low, so we keep the lease still
relatively short at one hour to not get into hard-to-hit problems with lease exhaustion etc.
We used to check if e.g. IFLA_BOND_MAX is defined and provide fallback
values in missing.h is it wasn't. But over time, various kernel
versions added IFLA_* defines, so checking for IFLA_BOND_MAX is not
enough if the kernel is new enough to have some of them but too old to
have all. In case we detect that the latest known enum value is
missing, #define most of them.
https://bugs.freedesktop.org/show_bug.cgi?id=80095
There's no need to save the old sigmask, if we are going to die. Let's
simplify this. Also, reset all the signal handlers, so that we don't
leave SIG_IGN set for some of them across reexec.
This restores the original root handling logic that was present prior to
112cfb18 when path expansion moved to path_strv_canonicalize_absolute.
That behavior partially went away in 12ed81d9.
Alternatively all users of conf_files_list* could be updated to
concatenate the paths themselves as unit_file_query_preset did but since
no user needs the un-concatenated form that is pointless duplication.
Since 12ed81d9 path_strv_canonicalize_absolute leaves the search list
relative to the given root directory instead of resolving paths to their
true location as the name implies. To better reflect this behavior
rename to the less strongly worded path_strv_resolve.
Otherwise the add_symlink() function tries to make directories for
each slash even for the slash after the @ symbol in the final link
name, failing for /dev/3270/tty1.
Based on a patch by Werner Fink <werner@suse.de>.
Previously it would recursively copy the entire tree in, and descend
into subdirectories even if the destination already exists. Let's do
what the documentation says and not do that.
If files down the tree shall be copied too, they should get their own
"C" lines.
As generators and other components started to maintain their own kernel
command line options this help text needed more and more exceptions and
wasn't complete anyway. Fixing that would leak more information about
specific generators into PID 1, which we should avoid.
Given that kernel cmdline handling traditionally doesn't generate errors
or show help texts, let's just remove the logic for it for systemd too.
debug-generator can mask specific units if they are specified on the
kernel command line with systemd.mask=.
debug-generator can pull in debug-shell.service is systemd.debug-shell
is passed on the kernel command line.
client_initialize name is misleading, since the function is actually
useful at the *end*, to reinitialize the object. But reset is shorter,
so rename it to client_reset.
Enhance the test case by generating a Reply. With a properly formed
Reply the callback function will be called and the additional
earlier event loop exit can now be removed.
Enhance the test case by replying with an Advertise message to the
client. Copy the transaction id, IAID and DUID from the Solicit
message. Verify the Request message created by the DHCPv6 client
implementation and move the main loop exit to the end of the Request
message verification.
As described in RFC 3315, Section 17.1.2, a client has to wait until the
first timeout has elapsed before it is allowed to request IPv6 addresses
from the DHCPv6 server. This is indicated by a non-NULL lease and a
non-zero resend count. Should the Advertisement contain a preference
value of 255 or be received after the first timeout, IPv6 address
requesting is started immediately.
In response to these events, create a Request message and set up proper
resend timers to send the message to the server.
Update the start function so that the client state can be conveniently
changed with the previous message resend timers cleared. On initial
startup also create and bind to the UDP socket.
Add a basic test case excersising once more option parsing function
in addition to lease handling. Check that the address iteration
functions return the correct IPv6 address and lifetimes and that
only one address is returned. Also verify that the server ID and
preference values are read correctly.
When receiving DHCPv6 messages, discard the ones that are not meant
for DHCPv6 clients and verify the transaction id. Once that is done,
process the Advertise message and select the Advertise with the
highest preference.
Create a separate function for lease information parsing so that it
can be reused in other parts of the protocol. Verify both DUID and
IAID in the received message and store other necessary information
with the lease structure.
Add functionality to parse DHCPv6 Identity Association for
Non-temporary (IA_NA) and Temporary Addresses (IA_TA) options.
Both of them contain one or more IA Address (IAADDR) options
and optinally a status code option. Only the IA_NA option
contains lease lifetimes. See RFC 3315, sections 22.4., 22.5.,
22.6., 22.13. and appendix B. for details. If the lease
timeouts are not set, use the ones recommended for servers in
section 22.4.
Factor out common code in the form of an option header parsing
helper function.
Create a structure describing a DHCPv6 lease. Add internal functions
for creating a new lease and accessing the server ID, preference and
IAID. Provide functions for clearing addresses and associated timers.
External users are initially given only the capabilities of
referencing and unreferencing the lease structure.
Verify the Solicit message created by the DHCPv6 client code.
Provide local variants for detect_vm(), detect_container() and
detect_virtualization() defined in virt.h. This makes the DHCPv6
library believe it is run in a container and does not try to request
interface information from udev for the non-existing interface index
used by the test case code.
Implement the initial functionality used for creating a DHCPv6 Solicit
message containing the needed options and send it to the DHCPv6
broadcast address. Increase the sent message count and ensure that
the Solicit Initial Retransmission Time is strictly greater than
the Solicitation IRT as described in RFC 3315, section 17.1.2.
Add a function that creates a UDP socket bound to the given interface
and optionally to an IPv6 address. Add another function that will
send the DHCPv6 UDP packet to its destination.
Using IPV6_PKTINFO in setsockopt to bind the IPv6 socket to an
interface is documented in section 4. of RFC 3542, "Advanced Sockets
Application Program Interface (API) for IPv6"
Add a define for DHCPv6 Relay Agents and Servers multicast address as
its not available elsewhere.
Add option appending and parsing. DHCPv6 options are not aligned, thus
the option handling code must be able to handle options starting at
any byte boundary.
Add a test case for the basic option handling.
Add the core of DHCPv6 client message retransmission and upper bound
timer and message count handling according to RFC 3315 Secions 7.1.2
and 14. Omit the DHCPv6 initial delay; for now it is assumed that
systemd-networkd will provide decent startup randomization that will
desynchronize the clients.
When reinitializing the client, clear all timers.
Create structures describing Identity Association IDentifiers and
IPv6 lease addresses.
[tomegun: initialize the IAID when client is started. Base this off of the
predictable udev names, if available, as these satisfy the requirement of
the IAID, and base it off the mac addres otherwise, as that is the best we
have.]
Initialize DHCP Unique Identifier when creating the client. The
DUID is generated based on the machine-id, which satisfies all the
requirements of what an DUID should be. The DUID type is DUID-EN.
Based on patch by Patrik Flykt.
Feed a Router Advertisement to the code and expect proper events
each time. The sending part is ignored, as all of it is static code
in the real dhcp_network_icmp6_send_rs() function.
Provide functions to bind the ICMPv6 socket to the approriate interface
and set multicast sending and receiving according to RFC 3493, section
5.2. and RFC 3542, sections 3. and 3.3. Filter out all ICMPv6 messages
except Router Advertisements for the socket in question according to
RFC 3542, section 3.2.
Send Router Solicitations to the all routers multicast group as
described in RFC 4861, section 6. and act on the received Router
Advertisments according to section 6.3.7.
Implement a similar API for ICMPv6 handling as is done for DHCPv4 and
DHCPv6.
Now that we actually can distuingish system and normal users there's no
point in taking session information into account anymore when splitting
up logs.
This has the beenfit with that coredump information will actually end up
in each user's own journal.
since 376cd3b89c LIST_FIND_TAIL accepts
an empty list. That removed an assert in LIST_FIND_TAIL and we now
theoretically risk a null pointer deref. This adds the assert directly
to protect against that.
Introduce a new configuration file /etc/systemd/coredump.conf to
configure when to place coredumps in the journal and when on disk.
Since the coredumps are quite large, default to storing them only on
disk.
When an address is configured to be all zeroes, networkd will now
automatically find a locally unused network of the right size from a
list of pre-configured pools. Currently those pools are 10.0.0.0/8,
172.16.0.0/12, 192.168.0.0/16 and fc00::/7, i.e. the network ranges for
private networks. They are compiled in, but should be configurable
eventually.
This allows applying the same configuration to a large number of
interfaces with each time a different IP range block, and management of
these IP ranges is fully automatic.
When allocating an address range from the pool it is made sure the range
is not used otherwise.
There is no need to explicitly check version of L3 protocol in the
ethernet header because we bind socket with .sll_protocol set to
ETH_P_IP, thus we only receive IPv4 packets on the socket.
For efficiency, we group bytes together before adding them up. This
is guaranteed to always work (regardless of the byte order) as long
as the i-th byte in each group lign up with the i-th byte in each
other group.
On big-endian machines this broke when handling the trailing few bytes
which did not make up a full group of 4 bytes. This patch fixes the
problem by explicitly creating a 4 byte zero-padded group out of the
trailing bytes.
Reported and tested by Thomas Ritter <th.ritter@gmx.at>.
The DefaultInstance= name is used when enabling template units when only
specifying the template name, but no instance.
Add DefaultInstance=tty1 to getty@.service, so that when the template
itself is enabled an instance for tty1 is created.
This is useful so that we "systemctl preset-all" can work properly,
because we can operate on getty@.service after finding it, and the right
instance is created.
The new "systemctl preset-all" command may now be used to put all
installed units back into the enable/disable state the vendor/admin
encoded in preset files.
Also, introduce "systemctl --preset-mode=enable-only" and "systemctl
--preset-mode=disable-only" to only apply the enable or only the disable
operations of a "systemctl preset" or "systemctl preset-all" operation.
"systemctl preset-all" implements this RFE:
https://bugzilla.redhat.com/show_bug.cgi?id=630174
Process 1 (aka init) needs to be started with an empty signal mask.
That includes the process 1 that's started after the initrd is finished.
When the initrd is using systemd (as it does with dracut based initrds)
then it is systemd that calls the real init. Normally this is systemd
again, except when the user uses for instance "init=/bin/bash" on the
kernel command line.
Passing 0 to malloc() is not required to return NULL. Therefore, don't
bail out if "b" is 0. This is not of importance to the existing helpers,
but the upcoming realloc_multiply() requires this. To keep consistence, we
keep the same behavior for the other helpers.
As it turns out, we cannot use _Pragma in compound-statements. Therefore,
constructs like MIN(MAX(a, b), x) will warn due to shadowed variable
declarations. The DISABLE_WARNING_SHADOW macro can be used to suppress
these.
Note that using UNIQUE(_var) does not work either as GCC uses the last
line of a macro-expansion for __LINE__, therefore, still causing both
macros to have the same variables. We could use different variable-names
for MIN and MAX, but that just hides the problem and still fails for
MIN(something(MIN(a, b)), c).
The only working solution is to use __COUNTER__ and pass it pre-evaluated
as extra argument to a macro to use as name-prefix. This, however, makes
all these macros much more complicated so I'll go with manual
DISABLE_WARNING_SHADOW so far.
/etc/mtab should die die die. It's sad enough util-linux still contains
support for it, but we don't have to partake in that charade, so let's
turn this off.
This is in-line with the fact that since years we already have been
"tainting" systemd if we detect /etc/mtab not being a symlink...
Of course, util-linux is currently broken, and still touches /etc/mtab,
weven if we pass "--no-mtab" to it:
https://bugzilla.redhat.com/show_bug.cgi?id=1109367
But hey, let's hope that gets fixed quickly, even if total removal of
/etc/mtab support from util-linux might not happen so quickly...
We could still have an old interface name and/or mac address when libudev
tells us that the device is initialized, as the up-to-date info could still
be on its way from the kernel.
This reverts (and rewrites) commit 7d95c772cb.
The issue blocking this feature has now been fixed in the kernel, and backported
to the various stable kernels.
Our netdevs will now have stable MAC addresses, even if one is not specified.
It may sometimes be necessary to specify the MAC address of a netdev.
Let us set the correct one from the get-go, rather than having the
kernel generate a random one, and then change it after.
It should not be possible to have a DHCP lease on a link without also having
an associated network. Add assert() to avoid compiler warnings.
Reported by Thomas H. P. Andersen
The file should have been in /usr/lib/ in the first place, since it
describes the OS container in /usr (and not the configuration in /etc),
hence, let's support os-release files in /usr/lib as fallback if no
version in /etc exists, following the usual override logic.
A prior commit already enabled tmpfiles to create /etc/os-release as a
symlink to /usr/lib/os-release should it be missing, thus providing nice
compatibility with applications only checking in /etc.
While it's probably a good idea if all apps check both locations via a
fallback logic, it is only necessary in the early boot process, as long
as the /etc/os-release symlink has not been restored, in case we boot
with an empty /etc.
For most NSS calls it is documented that they return NULL + errno=0 when
an entry is not found. However, in reality it appears to be common to
return NULL + errno=ENOENT, instead. Handle that correctly, and don't
consider ENOENT a systematic error.
With this in place RPMs can make sure that whatever they drop in is
immeidately applied, and not delayed until next reboot.
This also moves systemd-sysusers back to /usr/bin, since hardcoding the
path to /usr/lib in the macros would mean compatibility breaks in
future, should we turn sysusers into a command that is actually OK for
people to call directly. And given that that is quite likely to happen
(since it is useful to prepare images with its --root= switch), let's
just prepare for it.
this gives us a little bit more freedom to move things around later on,
as we don't hardcode the systemd paths in old RPMs that shall work with
new systemds.
int unit_file_mask(...) in ./src/shared/install.c calls
get_config_path(...) which can in 4 error cases return without setting
"ret", and thus "prefix" can be uninitialized when unit_file_mask(...)
finishes (which it does directly after the error is returned from
get_config_path(...)).
static bool enable_name_policy(...) in ./src/udev/net/link-config.c
calls proc_cmdline(...) to get "line" initialized, but
proc_cmdline(...) does not guarantee that atleast when both
conditions (detect_container(NULL) > 0) and
read_full_file(...) returned < 0.
static int killall(....) in ./src/core/killall.c tries to get "s"
initialized by calling get_process_comm(...) which calls
read_one_line_file(...) which if it fails will mean it is left
uninitialized.
It is then used in argument to strna(s) call where it is
dereferenced(!), in addition to nothing else initializing it before
the scope it is in finishes.
static int client_send_request(...) in
./src/libsystemd-network/sd-dhcp-client.c tries to initialize
"request" by calling client_message_init(...), which has atleast
5 error cases where it can return without that happening.
This leads to the function finishing without "request" being initialized.
When enabled in [Network] it will set up a dhcp server on the interface, listening
on one of its statically configured IPv4 addresses and with a fixed size pool of
leases determined from it.
Example:
[Match]
Name=ve-arch-tree
[Network]
Address=192.168.12.5/24
DHCPServer=yes
[Route]
Gateway=192.168.12.5
Destination=192.168.12.0/24
In this case we will configure ve-arch-tree with the address 192.168.12.5 and
hand out addresses in the range 192.168.12.6 - 192.168.12.38.
In the future, we should (as suggested by Lennart) introduce a syntax to pick the
server address automatically.