Systemd

Author	SHA1	Message	Date
Lennart Poettering	078ba556da	core: warn loudly if IP firewalling is configured but not in effect	2017-09-22 15:24:55 +02:00
Lennart Poettering	1274b6c687	ip-address-access: minimize IP address lists Let's drop redundant items from the IP address list after parsing. Let's also mask out redundant bits hidden by the prefixlength.	2017-09-22 15:24:55 +02:00
Lennart Poettering	3dc5ca9787	core: support IP firewalling to be configured for transient units	2017-09-22 15:24:55 +02:00
Lennart Poettering	27458ed629	tree-wide: use path_startswith() rather than startswith() where ever that's appropriate When checking path prefixes we really should use the right APIs, just in case people add multiple slashes to their paths...	2017-08-09 19:03:39 +02:00
Lennart Poettering	4b61c87511	tree-wide: fput[cs]() → fput[cs]_unlocked() wherever that makes sense (#6396 ) As a follow-up for `db3f45e2d2` let's do the same for all other cases where we create a FILE* with local scope and know that no other threads hence can have access to it. For most cases this shouldn't change much really, but this should speed dbus introspection and calender time formatting up a bit.	2017-07-21 10:35:45 +02:00
Zbigniew Jędrzejewski-Szmek	d4bf82fcac	pid1: properly encode infinity when writing CPUQuota snippet (#6141 ) We would write [Slice] CPUQuota=1844674407370955% which is (numerically) correct, but it seems better to just write [Slice] CPUQuota= which is interpreted as USEC_INFINITY by the parser in config_parse_cpu_quota(). Fixes #5965.	2017-06-18 11:18:41 +02:00
WaLyong Cho	96e131ea09	core: introduce MemorySwapMax= Similar to MemoryMax=, MemorySwapMax= limits swap usage. This controls controls "memory.swap.max" attribute in unified cgroup.	2016-08-30 11:11:45 +09:00
Tejun Heo	66ebf6c0a1	core: add cgroup CPU controller support on the unified hierarchy Unfortunately, due to the disagreements in the kernel development community, CPU controller cgroup v2 support has not been merged and enabling it requires applying two small out-of-tree kernel patches. The situation is explained in the following documentation. https://git.kernel.org/cgit/linux/kernel/git/tj/cgroup.git/tree/Documentation/cgroup-v2-cpu.txt?h=cgroup-v2-cpu While it isn't clear what will happen with CPU controller cgroup v2 support, there are critical features which are possible only on cgroup v2 such as buffered write control making cgroup v2 essential for a lot of workloads. This commit implements systemd CPU controller support on the unified hierarchy so that users who choose to deploy CPU controller cgroup v2 support can easily take advantage of it. On the unified hierarchy, "cpu.weight" knob replaces "cpu.shares" and "cpu.max" replaces "cpu.cfs_period_us" and "cpu.cfs_quota_us". [Startup]CPUWeight config options are added with the usual compat translation. CPU quota settings remain unchanged and apply to both legacy and unified hierarchies. v2: - Error in man page corrected. - CPU config application in cgroup_context_apply() refactored. - CPU accounting now works on unified hierarchy.	2016-08-07 09:45:39 -04:00
Lennart Poettering	f7903e8db6	core: rename MemoryLimitByPhysicalMemory transient property to MemoryLimitScale That way, we can neatly keep this in line with the new TasksMaxScale= option. Note that we didn't release a version with MemoryLimitByPhysicalMemory= yet, hence this change should be unproblematic without breaking API.	2016-07-22 15:33:12 +02:00
Lennart Poettering	83f8e80857	core: support percentage specifications on TasksMax= This adds support for a TasksMax=40% syntax for specifying values relative to the system's configured maximum number of processes. This is useful in order to neatly subdivide the available room for tasks within containers.	2016-07-22 15:33:12 +02:00
Alessandro Puccetti	31d28eabc1	nspawn: enable major=0/minor=0 devices inside the container (#3773 ) https://github.com/systemd/systemd/pull/3685 introduced /run/systemd/inaccessible/{chr,blk} to map inacessible devices, this patch allows systemd running inside a nspawn container to create /run/systemd/inaccessible/{chr,blk}.	2016-07-21 17:39:38 +02:00
Lennart Poettering	d58d600efd	systemctl: allow percent-based MemoryLimit= settings via systemctl set-property The unit files already accept relative, percent-based memory limit specification, let's make sure "systemctl set-property" support this too. Since we want the physical memory size of the destination machine to apply we pass the percentage in a new set of properties that only exist for this purpose, and can only be set.	2016-06-14 20:01:45 +02:00
Lennart Poettering	799ec13412	core: make sure to use "infinity" in unit files, not "max" THe latter is a kernelism, we only understand "infinity".	2016-06-14 19:50:38 +02:00
Lennart Poettering	cd0a7a8e58	core: when receiving a memory limit via the bus, refuse 0 When parsing unit files we already refuse unit memory limits of zero, let's also refuse it when the value is set via the bus.	2016-06-14 19:50:38 +02:00
Zbigniew Jędrzejewski-Szmek	b27b4b51c6	tree-wide: remove newlines from unit_write_drop_in This reverts part of #3329, but all for a good cause.	2016-05-28 16:29:42 -04:00
Tejun Heo	da4d897e75	core: add cgroup memory controller support on the unified hierarchy (#3315 ) On the unified hierarchy, memory controller implements three control knobs - low, high and max which enables more useable and versatile control over memory usage. This patch implements support for the three control knobs. * MemoryLow, MemoryHigh and MemoryMax are added for memory.low, memory.high and memory.max, respectively. * As all absolute limits on the unified hierarchy use "max" for no limit, make memory limit parse functions accept "max" in addition to "infinity" and document "max" for the new knobs. * Implement compatibility translation between MemoryMax and MemoryLimit. v2: - Fixed missing else's in config_parse_memory_limit(). - Fixed missing newline when writing out drop-ins. - Coding style updates to use "val > 0" instead of "val". - Minor updates to documentation.	2016-05-27 18:10:18 +02:00
Tejun Heo	0c2d96f5f5	core: fix missing newlines when writing out drop-ins for cgroup settings Except for per-device BlockIO, IO and DeviceAllow/Deny settings, all were missing newline causing the next drop-in to be concatenated at the end of the line. Fix it.	2016-05-23 16:48:46 -04:00
Tejun Heo	6fb0926976	core: fix the reversed sanity check when setting StartupBlockIOWeight over dbus bus_cgroup_set_property() was rejecting if the input value was in range. Reverse it.	2016-05-23 16:48:46 -04:00
Tejun Heo	979d03117f	core: update CGroupBlockIODeviceBandwidth to record both rbps and wbps CGroupBlockIODeviceBandwith is used to keep track of IO bandwidth limits for legacy cgroup hierarchies. Unlike the unified hierarchy counterpart CGroupIODeviceLimit, a CGroupBlockIODeviceBandwiddth records either a read or write limit and has a couple issues. * There's no way to clear specific config entry. * When configs are cleared for an IO direction of a unit, the kernel settings aren't cleared accordingly creating discrepancies. This patch updates CGroupBlockIODeviceBandwidth so that it behaves similarly to CGroupIODeviceLimit - each entry records both rbps and wbps limits and is cleared if both are at default values after kernel settings are updated.	2016-05-18 13:51:46 -07:00
Tejun Heo	ac06a0cf8a	core: add support for IOReadIOPSMax and IOWriteIOPSMax cgroup IO controller supports maximum limits for both bandwidth and IOPS but systemd resource control currently only supports bandwidth limits. This patch adds support for IOReadIOPSMax and IOWriteIOPSMax when unified cgroup hierarchy is in use. It isn't difficult to also add BlockIOReadIOPS and BlockIOWriteIOPS for legacy hierarchies but IO control on legacy hierarchies is half-broken anyway, so let's leave it alone for now.	2016-05-18 13:50:56 -07:00
Tejun Heo	9be572497d	core: introduce CGroupIOLimitType enums Currently, there are two cgroup IO limits, bandwidth max for read and write, and they are hard-coded in various places. This is fine for two limits but IO is expected to grow more limits - low, high and max limits for bandwidth and IOPS - and hard-coding each limit won't make sense. This patch replaces hard-coded limits with an array indexed by CGroupIOLimitType and accompanying string and default value tables so that new limits can be added trivially.	2016-05-18 13:50:56 -07:00
Tejun Heo	13c31542cc	core: add io controller support on the unified hierarchy On the unified hierarchy, blkio controller is renamed to io and the interface is changed significantly. * blkio.weight and blkio.weight_device are consolidated into io.weight which uses the standardized weight range [1, 10000] with 100 as the default value. * blkio.throttle.{read\|write}_{bps\|iops}_device are consolidated into io.max. Expansion of throttling features is being worked on to support work-conserving absolute limits (io.low and io.high). * All stats are consolidated into io.stats. This patchset adds support for the new interface. As the interface has been revamped and new features are expected to be added, it seems best to treat it as a separate controller rather than trying to expand the blkio settings although we might add automatic translation if only blkio settings are specified. * io.weight handling is mostly identical to blkio.weight[_device] handling except that the weight range is different. * Both read and write bandwidth settings are consolidated into CGroupIODeviceLimit which describes all limits applicable to the device. This makes it less painful to add new limits. * "max" can be used to specify the maximum limit which is equivalent to no config for max limits and treated as such. If a given CGroupIODeviceLimit doesn't contain any non-default configs, the config struct is discarded once the no limit config is applied to cgroup. * lookup_blkio_device() is renamed to lookup_block_device(). Signed-off-by: Tejun Heo <htejun@fb.com>	2016-05-05 16:43:06 -04:00
Daniel Mack	b26fa1a2fb	tree-wide: remove Emacs lines from all files This should be handled fine now by .dir-locals.el, so need to carry that stuff in every file.	2016-02-10 13:41:57 +01:00
Lennart Poettering	b5efdb8af4	util-lib: split out allocation calls into alloc-util.[ch]	2015-10-27 13:45:53 +01:00
Lennart Poettering	0d39fa9c69	util-lib: move more file I/O related calls into fileio.[ch]	2015-10-27 13:25:55 +01:00
Lennart Poettering	3ffd4af220	util-lib: split out fd-related operations into fd-util.[ch] There are more than enough to deserve their own .c file, hence move them over.	2015-10-25 13:19:18 +01:00
Nicolas Cornu	1f2f874c3c	core dbus: Check that flush works with memstream	2015-10-21 18:17:12 +02:00
Lennart Poettering	e7ab4d1ac9	cgroup: unify how we invalidate cgroup controller settings Let's make sure that we follow the same codepaths when adjusting a cgroup property via the dbus SetProperty() call, and when we execute the StartupCPUShares= effect.	2015-09-11 18:31:50 +02:00
Lennart Poettering	d53d94743c	core: refactor cpu shares/blockio weight cgroup logic Let's stop using the "unsigned long" type for weights/shares, and let's just use uint64_t for this, as that's what we expose on the bus. Unify parsers, and always validate the range for these fields. Correct the default blockio weight to 500, since that's what the kernel actually uses. When parsing the weight/shares settings from unit files accept the empty string as a way to reset the weight/shares value. When getting it via the bus, uniformly map (uint64_t) -1 to unset. Open up StartupCPUShares= and StartupBlockIOWeight= to transient units.	2015-09-11 18:31:49 +02:00
Lennart Poettering	03a7b521e3	core: add support for the "pids" cgroup controller This adds support for the new "pids" cgroup controller of 4.3 kernels. It allows accounting the number of tasks in a cgroup and enforcing limits on it. This adds two new setting TasksAccounting= and TasksMax= to each unit, as well as a gloabl option DefaultTasksAccounting=. This also updated "cgtop" to optionally make use of the new kernel-provided accounting. systemctl has been updated to show the number of tasks for each service if it is available. This patch also adds correct support for undoing memory limits for units using a MemoryLimit=infinity syntax. We do the same for TasksMax= now and hence keep things in sync here.	2015-09-10 18:41:06 +02:00
Lennart Poettering	3905f12713	cgroups: make sure the "devices" controller's enum is named the same way as the controller in the kernel Follow-up to `5bf8002a3a`.	2015-09-08 18:15:50 +02:00
Lennart Poettering	efdb02375b	core: unified cgroup hierarchy support This patch set adds full support the new unified cgroup hierarchy logic of modern kernels. A new kernel command line option "systemd.unified_cgroup_hierarchy=1" is added. If specified the unified hierarchy is mounted to /sys/fs/cgroup instead of a tmpfs. No further hierarchies are mounted. The kernel command line option defaults to off. We can turn it on by default as soon as the kernel's APIs regarding this are stabilized (but even then downstream distros might want to turn this off, as this will break any tools that access cgroupfs directly). It is possibly to choose for each boot individually whether the unified or the legacy hierarchy is used. nspawn will by default provide the legacy hierarchy to containers if the host is using it, and the unified otherwise. However it is possible to run containers with the unified hierarchy on a legacy host and vice versa, by setting the $UNIFIED_CGROUP_HIERARCHY environment variable for nspawn to 1 or 0, respectively. The unified hierarchy provides reliable cgroup empty notifications for the first time, via inotify. To make use of this we maintain one manager-wide inotify fd, and each cgroup to it. This patch also removes cg_delete() which is unused now. On kernel 4.2 only the "memory" controller is compatible with the unified hierarchy, hence that's the only controller systemd exposes when booted in unified heirarchy mode. This introduces a new enum for enumerating supported controllers, plus a related enum for the mask bits mapping to it. The core is changed to make use of this everywhere. This moves PID 1 into a new "init.scope" implicit scope unit in the root slice. This is necessary since on the unified hierarchy cgroups may either contain subgroups or processes but not both. PID 1 hence has to move out of the root cgroup (strictly speaking the root cgroup is the only one where processes and subgroups are still allowed, but in order to support containers nicey, we move PID 1 into the new scope in all cases.) This new unit is also used on legacy hierarchy setups. It's actually pretty useful on all systems, as it can then be used to filter journal messages coming from PID 1, and so on. The root slice ("-.slice") is now implicitly created and started (and does not require a unit file on disk anymore), since that's where "init.scope" is located and the slice needs to be started before the scope can. To check whether we are in unified or legacy hierarchy mode we use statfs() on /sys/fs/cgroup. If the .f_type field reports tmpfs we are in legacy mode, if it reports cgroupfs we are in unified mode. This patch set carefuly makes sure that cgls and cgtop continue to work as desired. When invoking nspawn as a service it will implicitly create two subcgroups in the cgroup it is using, one to move the nspawn process into, the other to move the actual container processes into. This is done because of the requirement that cgroups may either contain processes or other subgroups.	2015-09-01 23:52:27 +02:00
Thomas Hindoe Paaboel Andersen	7d6884b65e	tree-wide: fix indentation	2015-08-06 00:44:19 +02:00
Lennart Poettering	63c372cb9d	util: rework strappenda(), and rename it strjoina() After all it is now much more like strjoin() than strappend(). At the same time, add support for NULL sentinels, even if they are normally not necessary.	2015-02-03 02:05:59 +01:00
Lennart Poettering	a931ad47a8	core: introduce new Delegate=yes/no property controlling creation of cgroup subhierarchies For priviliged units this resource control property ensures that the processes have all controllers systemd manages enabled. For unpriviliged services (those with User= set) this ensures that access rights to the service cgroup is granted to the user in question, to create further subgroups. Note that this only applies to the name=systemd hierarchy though, as access to other controllers is not safe for unpriviliged processes. Delegate=yes should be set for container scopes where a systemd instance inside the container shall manage the hierarchies below its own cgroup and have access to all controllers. Delegate=yes should also be set for user@.service, so that systemd --user can run, controlling its own cgroup tree. This commit changes machined, systemd-nspawn@.service and user@.service to set this boolean, in order to ensure that container management will just work, and the user systemd instance can run fine.	2014-11-05 18:49:14 +01:00
Zbigniew Jędrzejewski-Szmek	ee26bcc038	core/dbus: simplify handling of CPUQuotaPerSecUSec No functional change intended.	2014-09-29 11:08:12 -04:00
Lennart Poettering	9a05490933	cgroups: simplify CPUQuota= logic Only accept cpu quota values in percentages, get rid of period definition. It's not clear whether the CFS period controllable per-cgroup even has a future in the kernel, hence let's simplify all this, hardcode the period to 100ms and only accept percentage based quota values.	2014-05-22 11:53:12 +09:00
Lennart Poettering	db785129c9	cgroup: rework startup logic Introduce a (unsigned long) -1 as "unset" state for cpu shares/block io weights, and keep the startup unit set around all the time.	2014-05-22 07:13:56 +09:00
WaLyong Cho	95ae05c0e7	core: add startup resource control option Similar to CPUShares= and BlockIOWeight= respectively. However only assign the specified weight during startup. Each control group attribute is re-assigned as weight by CPUShares=weight and BlockIOWeight=weight after startup. If not CPUShares= or BlockIOWeight= be specified, then the attribute is re-assigned to each default attribute value. (default cpu.shares=1024, blkio.weight=1000) If only CPUShares=weight or BlockIOWeight=weight be specified, then that implies StartupCPUShares=weight and StartupBlockIOWeight=weight.	2014-05-22 07:13:56 +09:00
Lennart Poettering	b2f8b02ec2	core: expose CFS CPU time quota as high-level unit properties	2014-04-25 13:27:25 +02:00
Lennart Poettering	3051f1871e	core: make sure we always write changed cgroup attributes to the cgroupfs	2014-04-25 13:27:01 +02:00
Lennart Poettering	9c96019d31	cgroup: parse array cgroup properties correctly when they aren't at the end of the message	2014-02-24 03:38:58 +01:00
Lennart Poettering	90060676c4	cgroup: Extend DeviceAllow= syntax to whitelist groups of devices, not just particular devices nodes	2014-02-22 03:05:34 +01:00
Zbigniew Jędrzejewski-Szmek	ccd06097c7	Use format patterns for usec_t, pid_t, nsec_t, usec_t It is nicer to predefine patterns using configure time check instead of using casts everywhere. Since we do not need to use any flags, include "%" in the format instead of excluding it like PRI* macros.	2014-01-02 19:45:47 -05:00
Lennart Poettering	43a99a7afe	build-sys: minor fixes found with cppcheck	2013-12-25 19:00:38 +01:00
Lennart Poettering	610f780cd6	core: the cgroup properties are not actually const	2013-12-22 21:11:03 +01:00
Lennart Poettering	556089dc57	bus: decorate the various object vtables with SD_BUS_VTABLE_PROPERTY_CONST where appropriate	2013-12-22 03:50:52 +01:00
Lennart Poettering	ebcf1f97de	bus: rework message handlers to always take an error argument Message handler callbacks can be simplified drastically if the dispatcher automatically replies to method calls if errors are returned. Thus: add an sd_bus_error argument to all message handlers. When we dispatch a message handler and it returns negative or a set sd_bus_error we send this as message error back to the client. This means errors returned by handlers by default are given back to clients instead of rippling all the way up to the event loop, which is desirable to make things robust. As a side-effect we can now easily turn the SELinux checks into normal function calls, since the method call dispatcher will generate the right error replies automatically now. Also, make sure we always pass the error structure to all property and method handlers as last argument to follow the usual style of passing variables for return values as last argument.	2013-11-21 21:12:36 +01:00
Lennart Poettering	718db96199	core: convert PID 1 to libsystemd-bus This patch converts PID 1 to libsystemd-bus and thus drops the dependency on libdbus. The only remaining code using libdbus is a test case that validates our bus marshalling against libdbus' marshalling, and this dependency can be turned off. This patch also adds a couple of things to libsystem-bus, that are necessary to make the port work: - Synthesizing of "Disconnected" messages when bus connections are severed. - Support for attaching multiple vtables for the same interface on the same path. This patch also fixes the SetDefaultTarget() and GetDefaultTarget() bus calls which used an inappropriate signature. As a side effect we will now generate PropertiesChanged messages which carry property contents, rather than just invalidation information.	2013-11-20 20:52:36 +01:00
Kay Sievers	7759ecb21e	silent a few more gcc warnings	2013-10-21 18:40:33 +02:00

1 2

64 commits