Ninjatrappeur's systemd working tree
Go to file
Michal Schmidt 89439d4fc0 hashmap: rewrite the implementation
This is a rewrite of the hashmap implementation. Its advantage is lower
memory usage.

It uses open addressing (entries are stored in an array, as opposed to
linked lists). Hash collisions are resolved with linear probing and
Robin Hood displacement policy. See the references in hashmap.c.

Some fun empirical findings about hashmap usage in systemd on my laptop:
  - 98 % of allocated hashmaps are Sets.
  - Sets contain 78 % of all entries, plain Hashmaps 17 %, and
    OrderedHashmaps 5 %.
  - 60 % of allocated hashmaps contain only 1 entry.
  - 90 % of allocated hashmaps contain 5 or fewer entries.
  - 75 % of all entries are in hashmaps that use trivial_hash_ops.

Clearly it makes sense to:
  - store entries in distinct entry types. Especially for Sets - their
    entries are the most numerous and they require the least information
    to store an entry.
  - have a way to store small numbers of entries directly in the hashmap
    structs, and only allocate the usual entry arrays when the direct
    storage is full.

The implementation has an optional debugging feature (enabled by
defining the ENABLE_HASHMAP_DEBUG macro), where it:
  - tracks all allocated hashmaps in a linked list so that one can
    easily find them in gdb,
  - tracks which function/line allocated a given hashmap, and
  - checks for invalid mixing of hashmap iteration and modification.

Since entries are not allocated one-by-one anymore, mempools are not
used for entries. Originally I meant to drop mempools entirely, but it's
still worth it to use them for the hashmap structs. My testing indicates
that it makes loading of units about 5 % faster (a test with 10000 units
where more than 200000 hashmaps are allocated - pure malloc: 449±4 ms,
mempools: 427±7 ms).

Here are some memory usage numbers, taken on my laptop with a more or
less normal Fedora setup after booting with SELinux disabled (SELinux
increases systemd's memory usage significantly):

systemd (PID 1)                            Original   New    Change
dirty memory (from pmap -x 1) [KiB]            2152  1264     -41 %
total heap allocations (from gdb-heap) [KiB]   1623   756     -53 %
2014-10-30 19:50:51 +01:00
catalog catalog: add Polish translation 2014-09-27 19:14:18 -04:00
docs libudev: queue provide file descriptor to watch busy event queue 2014-06-27 17:56:41 +02:00
factory/etc factory: remove broken pam_limits 2014-07-30 15:21:54 +02:00
hwdb keymap: Ignore brightness keys on Dell Inspiron 1520 to avoid double events 2014-10-30 11:20:34 +01:00
m4 build-sys: fix conftest.c to work on arm 2014-08-03 00:28:12 -04:00
man busctl: add new "capture" verb to record bus messages in libpcap compatible files, for dissection with wireshark 2014-10-30 01:13:54 +01:00
network udev: link_setup - respect kernel name assign policy 2014-08-08 13:30:15 +02:00
po readahead: wipe out readahead 2014-09-25 16:39:18 +02:00
rules /proc/sys prefixes are not necessary for sysctl anymore 2014-10-07 09:19:51 -04:00
shell-completion zsh-completion: update start/restart completions 2014-10-29 23:48:10 -04:00
src hashmap: rewrite the implementation 2014-10-30 19:50:51 +01:00
sysctl.d sysctl.d: default to fq_codel, fight bufferbloat 2014-10-20 18:19:00 +02:00
system-preset readahead: wipe out readahead 2014-09-25 16:39:18 +02:00
sysusers.d sysusers: realign sysusers snippets 2014-08-19 16:47:52 +02:00
test bus-proxyd: assorted cleanups and fixes 2014-10-07 18:02:38 +02:00
tmpfiles.d tmpfiles: make resolv.conf entry conditional on resolved support 2014-08-27 18:17:16 +02:00
tools terminal: add unifont font-handling 2014-07-18 17:45:33 +02:00
units udev hwdb: Support shipping pre-compiled database in system images 2014-10-28 14:28:18 +01:00
.dir-locals.el Keep emacs configuration in one configuration file. 2011-03-08 01:53:46 +01:00
.gitattributes git: indicate that tabs are never OK in the systemd tree 2013-10-30 02:25:38 +01:00
.gitignore login: remove multi-seat-x 2014-10-28 02:24:46 +01:00
.mailmap prepare NEWS 2014-02-18 02:51:47 +01:00
.travis.yml test: Make testing work on systems without or old systemd 2013-08-22 00:52:14 -04:00
.vimrc vimrc: disable -fdiagnostics-color output 2013-10-20 04:29:39 +02:00
.ycm_extra_conf.py ycm: update flag blacklist 2014-06-04 15:41:10 -04:00
autogen.sh autogen: add "t" switch with --enable-terminal 2014-07-18 13:00:30 +02:00
CODING_STYLE CODING_STYLE: clarify that we really should use O_CLOEXEC everywhere 2014-10-30 17:05:25 +01:00
configure.ac util: make use of the new getrandom() syscall if it is available when needing entropy 2014-10-29 17:06:32 +01:00
DISTRO_PORTING man: wording and grammar updates 2013-10-21 20:50:46 -04:00
LICENSE.GPL2 relicense to LGPLv2.1 (with exceptions) 2012-04-12 00:24:39 +02:00
LICENSE.LGPL2.1 licence: remove references to old FSF address 2012-12-17 11:41:31 +01:00
LICENSE.MIT relicense to LGPLv2.1 (with exceptions) 2012-04-12 00:24:39 +02:00
Makefile-man.am man: document sd_bus_creds_get_connection_name() 2014-10-20 19:23:13 +02:00
Makefile.am hashmap: rewrite the implementation 2014-10-30 19:50:51 +01:00
NEWS NEWS: well, it's Options= now, not Discard= 2014-10-28 20:36:32 +01:00
README README: simplify documented dependency on util-linux 2014-10-22 12:37:08 +02:00
TODO update TODO 2014-10-30 17:39:29 +01:00

systemd System and Service Manager

DETAILS:
        http://0pointer.de/blog/projects/systemd.html

WEB SITE:
        http://www.freedesktop.org/wiki/Software/systemd

GIT:
        git://anongit.freedesktop.org/systemd/systemd
        ssh://git.freedesktop.org/git/systemd/systemd

GITWEB:
        http://cgit.freedesktop.org/systemd/systemd

MAILING LIST:
        http://lists.freedesktop.org/mailman/listinfo/systemd-devel
        http://lists.freedesktop.org/mailman/listinfo/systemd-commits

IRC:
        #systemd on irc.freenode.org

BUG REPORTS:
        https://bugs.freedesktop.org/enter_bug.cgi?product=systemd

AUTHOR:
        Lennart Poettering
        Kay Sievers
        ...and many others

LICENSE:
        LGPLv2.1+ for all code
        - except sd-readahead.[ch] which is MIT
        - except src/shared/MurmurHash2.c which is Public Domain
        - except src/shared/siphash24.c which is CC0 Public Domain
        - except src/journal/lookup3.c which is Public Domain
        - except src/udev/* which is (currently still) GPLv2, GPLv2+

REQUIREMENTS:
        Linux kernel >= 3.7
        Linux kernel >= 3.8 for Smack support

        Kernel Config Options:
          CONFIG_DEVTMPFS
          CONFIG_CGROUPS (it is OK to disable all controllers)
          CONFIG_INOTIFY_USER
          CONFIG_SIGNALFD
          CONFIG_TIMERFD
          CONFIG_EPOLL
          CONFIG_NET
          CONFIG_SYSFS
          CONFIG_PROC_FS
          CONFIG_FHANDLE (libudev, mount and bind mount handling)

        udev will fail to work with the legacy sysfs layout:
          CONFIG_SYSFS_DEPRECATED=n

        Legacy hotplug slows down the system and confuses udev:
          CONFIG_UEVENT_HELPER_PATH=""

        Userspace firmware loading is not supported and should
        be disabled in the kernel:
          CONFIG_FW_LOADER_USER_HELPER=n

        Some udev rules and virtualization detection relies on it:
          CONFIG_DMIID

        Support for some SCSI devices serial number retrieval, to
        create additional symlinks in /dev/disk/ and /dev/tape:
          CONFIG_BLK_DEV_BSG

        Required for PrivateNetwork in service units:
          CONFIG_NET_NS

        Optional but strongly recommended:
          CONFIG_IPV6
          CONFIG_AUTOFS4_FS
          CONFIG_TMPFS_POSIX_ACL
          CONFIG_TMPFS_XATTR
          CONFIG_SECCOMP

        Required for CPUShares in resource control unit settings
          CONFIG_CGROUP_SCHED
          CONFIG_FAIR_GROUP_SCHED

        For systemd-bootchart, several proc debug interfaces are required:
          CONFIG_SCHEDSTATS
          CONFIG_SCHED_DEBUG

        For UEFI systems:
          CONFIG_EFIVAR_FS
          CONFIG_EFI_PARTITION

        Note that kernel auditing is broken when used with systemd's
        container code. When using systemd in conjunction with
        containers, please make sure to either turn off auditing at
        runtime using the kernel command line option "audit=0", or
        turn it off at kernel compile time using:
          CONFIG_AUDIT=n
        If systemd is compiled with libseccomp support on
        architectures which do not use socketcall() and where seccomp
        is supported (this effectively means x86-64 and ARM, but
        excludes 32-bit x86!), then nspawn will now install a
        work-around seccomp filter that makes containers boot even
        with audit being enabled. This works correctly only on kernels
        3.14 and newer though. TL;DR: turn audit off, still.

        glibc >= 2.14
        libcap
        libseccomp >= 1.0.0 (optional)
        libblkid >= 2.20 (from util-linux) (optional)
        libkmod >= 15 (optional)
        PAM >= 1.1.2 (optional)
        libcryptsetup (optional)
        libaudit (optional)
        libacl (optional)
        libselinux (optional)
        liblzma (optional)
        liblz4 >= 119 (optional)
        libgcrypt (optional)
        libqrencode (optional)
        libmicrohttpd (optional)
        libpython (optional)
        libidn (optional)
        gobject-introspection > 1.40.0 (optional)
        elfutils >= 158 (optional)
        make, gcc, and similar tools

        During runtime, you need the following additional
        dependencies:

        util-linux >= v2.25 required
        dbus >= 1.4.0 (strictly speaking optional, but recommended)
        dracut (optional)
        PolicyKit (optional)

        When building from git, you need the following additional
        dependencies:

        docbook-xsl
        xsltproc
        automake
        autoconf
        libtool
        intltool
        gperf
        gtkdocize (optional)
        python (optional)
        python-lxml (optional, but required to build the indices)
        sphinx (optional)

        When systemd-hostnamed is used, it is strongly recommended to
        install nss-myhostname to ensure that, in a world of
        dynamically changing hostnames, the hostname stays resolvable
        under all circumstances. In fact, systemd-hostnamed will warn
        if nss-myhostname is not installed.

        To build HTML documentation for python-systemd using sphinx,
        please first install systemd (using 'make install'), and then
        invoke sphinx-build with 'make sphinx-<target>', with <target>
        being 'html' or 'latexpdf'. If using DESTDIR for installation,
        pass the same DESTDIR to 'make sphinx-html' invocation.

USERS AND GROUPS:
        Default udev rules use the following standard system group
        names, which need to be resolvable by getgrnam() at any time,
        even in the very early boot stages, where no other databases
        and network are available:

        audio, cdrom, dialout, disk, input, kmem, lp, tape, tty, video

        During runtime, the journal daemon requires the
        "systemd-journal" system group to exist. New journal files will
        be readable by this group (but not writable), which may be used
        to grant specific users read access.

        It is also recommended to grant read access to all journal
        files to the system groups "wheel" and "adm" with a command
        like the following in the post installation script of the
        package:

        # setfacl -nm g:wheel:rx,d:g:wheel:rx,g:adm:rx,d:g:adm:rx /var/log/journal/

        The journal gateway daemon requires the
        "systemd-journal-gateway" system user and group to
        exist. During execution this network facing service will drop
        privileges and assume this uid/gid for security reasons.

        Similarly, the NTP daemon requires the "systemd-timesync" system
        user and group to exist.

        Similarly, the network management daemon requires the
        "systemd-network" system user and group to exist.

        Similarly, the name resolution daemon requires the
        "systemd-resolve" system user and group to exist.

        Similarly, the kdbus dbus1 proxy daemon requires the
        "systemd-bus-proxy" system user and group to exist.

NSS:
        systemd ships with three NSS modules:

        nss-myhostname resolves the local hostname to locally
        configured IP addresses, as well as "localhost" to
        127.0.0.1/::1.

        nss-resolve enables DNS resolution via the systemd-resolved
        DNS/LLMNR caching stub resolver "systemd-resolved".

        nss-mymachines enables resolution of all local containers
        registered with machined to their respective IP addresses.

        To make use of these NSS modules, please add them to the
        "hosts: " line in /etc/nsswitch.conf. The "resolve" module
        should replace the glibc "dns" module in this file.

        The three modules should be used in the following order:

                hosts: files mymachines resolve myhostname

WARNINGS:
        systemd will warn you during boot if /etc/mtab is not a
        symlink to /proc/mounts. Please ensure that /etc/mtab is a
        proper symlink.

        systemd will warn you during boot if /usr is on a different
        file system than /. While in systemd itself very little will
        break if /usr is on a separate partition, many of its
        dependencies very likely will break sooner or later in one
        form or another. For example, udev rules tend to refer to
        binaries in /usr, binaries that link to libraries in /usr or
        binaries that refer to data files in /usr. Since these
        breakages are not always directly visible, systemd will warn
        about this, since this kind of file system setup is not really
        supported anymore by the basic set of Linux OS components.

        systemd requires that the /run mount point exists. systemd also
        requires that /var/run is a a symlink to /run.

        For more information on this issue consult
        http://freedesktop.org/wiki/Software/systemd/separate-usr-is-broken

        To run systemd under valgrind, compile with VALGRIND defined
        (e.g. ./configure CPPFLAGS='... -DVALGRIND=1'). Otherwise,
        false positives will be triggered by code which violates
        some rules but is actually safe.