From ecb1a44cc93d9c280bf8230a6aa8644c9a5e2794 Mon Sep 17 00:00:00 2001 From: Lennart Poettering Date: Wed, 28 Nov 2018 21:26:36 +0100 Subject: [PATCH] docs: add brief docs explaing udev's flock() block device node synchronization --- docs/BLOCK_DEVICE_LOCKING.md | 63 ++++++++++++++++++++++++++++++++++++ docs/index.md | 1 + 2 files changed, 64 insertions(+) create mode 100644 docs/BLOCK_DEVICE_LOCKING.md diff --git a/docs/BLOCK_DEVICE_LOCKING.md b/docs/BLOCK_DEVICE_LOCKING.md new file mode 100644 index 0000000000..bf84484255 --- /dev/null +++ b/docs/BLOCK_DEVICE_LOCKING.md @@ -0,0 +1,63 @@ +# Locking Block Device Access + +*TL;DR: Use BSD file locks +[(`flock(2)`)](http://man7.org/linux/man-pages/man2/flock.2.html) on block +device nodes to synchronize access for partitioning and file system formatting +tools.* + +`systemd-udevd` probes all block devices showing up for file system superblock +and partition table information (utilizing `libblkid`). If another program +concurrently modifies a superblock or partition table this probing might be +affected, which is bad in itself, but also might in turn result in undesired +effects in programs subscribing to `udev` events. + +Applications manipulating a block device can temporarily stop `systemd-udevd` +from processing rules on it — and thus bar it from probing the device — by +taking a BSD file lock on the block device node. Specifically, whenever +`systemd-udevd` starts processing a block device it takes a `LOCK_SH|LOCK_NB` +lock using [`flock(2)`](http://man7.org/linux/man-pages/man2/flock.2.html) on +the main block device (i.e. never on any partition block device, but on the +device the partition belongs to). If this lock cannot be taken (i.e. `flock()` +returns `EBUSY`), it refrains from processing the device. If it manages to take +the lock it is kept for the entire time the device is processed. + +Note that `systemd-udevd` also watches all block device nodes it manages for +`inotify()` `IN_CLOSE` events: whenever such an event is seen, this is used as +trigger to re-run the rule-set for the device. + +These two concepts allow tools such as disk partitioners or file system +formatting tools to safely and easily take exclusive ownership of a block +device while operating: before starting work on the block device, they should +take an `LOCK_EX` lock on it. This has two effects: first of all, in case +`systemd-udevd` is still processing the device the tool will wait for it to +finish. Second, after the lock is taken, it can be sure that that +`systemd-udevd` will refrain from processing the block device, and thus all +other client applications subscribed to it won't get device notifications from +potentially half-written data either. After the operation is complete the +partitioner/formatter can simply close the device node. This has two effects: +it implicitly releases the lock, so that `systemd-udevd` can process events on +the device node again. Secondly, it results an `IN_CLOSE` event, which causes +`systemd-udevd` to immediately re-process the device — seeing all changes the +tool made — and notify subscribed clients about it. + +Besides synchronizing block device access between `systemd-udevd` and such +tools this scheme may also be used to synchronize access between those tools +themselves. However, do note that `flock()` locks are advisory only. This means +if one tool honours this scheme and another tool does not, they will of course +not be synchronized properly, and might interfere with each other's work. + +Note that the file locks follow the usual access semantics of BSD locks: since +`systemd-udevd` never writes to such block devices it only takes a `LOCK_SH` +*shared* lock. A program intending to make changes to the block device should +take a `LOCK_EX` *exclusive* lock instead. For further details, see the +`flock(2)` man page. + +And please keep in mind: BSD file locks (`flock()`) and POSIX file locks +(`lockf()`, `F_SETLK`, …) are different concepts, and in their effect +orthogonal. The scheme discussed above uses the former and not the latter, +because the these types of locks more closely match the required semantics. + +Summarizing: it is recommended to take `LOCK_EX` BSD file locks when +manipulating block devices in all tools that change file system block devices +(`mkfs`, `fsck`, …) or partition tables (`fdisk`, `parted`, …), right after +opening the node. diff --git a/docs/index.md b/docs/index.md index e851f1b7ce..03600691fc 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,6 +1,7 @@ # systemd Documentation * [Automatic Boot Assessment](https://systemd.io/AUTOMATIC_BOOT_ASSESSMENT) +* [Locking Block Device Access](https://systemd.io/BLOCK_DEVICE_LOCKING) * [The Boot Loader Interface](https://systemd.io/BOOT_LOADER_INTERFACE) * [The Boot Loader Specification](https://systemd.io/BOOT_LOADER_SPECIFICATION) * [Control Group APIs and Delegation](https://systemd.io/CGROUP_DELEGATION)