import: drop logic of setting up /var/lib/machines as btrfs loopback mount

Let's simplify things and drop the logic that /var/lib/machines is setup
as auto-growing btrfs loopback file /var/lib/machines.raw.

THis was done in order to make quota available for machine management,
but quite frankly never really worked properly, as we couldn't grow the
file system in sync with its use properly. Moreover philosophically it's
problematic overriding the admin's choice of file system like this.

Let's hence drop this, and simplify things. Deleting code is a good
feeling.

Now that regular file systems provide project quota we could probably
add per-machine quota support based on that, hence the btrfs quota
argument is not that interesting anymore (though btrfs quota is a bit
more powerful as it allows recursive quota, i.e. that the machine pool
gets an overall quota in addition to per-machine quota).
master
Lennart Poettering 4 years ago
parent e21b7229ff
commit 5f7ecd610c
  1. 6
      TODO
  2. 38
      man/machinectl.xml
  3. 90
      src/basic/btrfs-util.c
  4. 3
      src/basic/btrfs-util.h
  5. 13
      src/import/import-raw.c
  6. 13
      src/import/import-tar.c
  7. 6
      src/import/importd.c
  8. 10
      src/import/pull-job.c
  9. 3
      src/import/pull-job.h
  10. 6
      src/import/pull-raw.c
  11. 6
      src/import/pull-tar.c
  12. 31
      src/machine/machined-dbus.c
  13. 365
      src/shared/machine-pool.c
  14. 6
      src/shared/machine-pool.h
  15. 7
      units/var-lib-machines.mount

@ -235,8 +235,7 @@ Features:
the runtime dir as we maintain for the fdstore: i.e. keep it around as long
as the unit is running or has a job queued.
* support projid-based quota in machinectl for containers, and then drop
implicit btrfs loopback magic in machined
* support projid-based quota in machinectl for containers
* Add NetworkNamespacePath= to specify a path to a network namespace
@ -883,9 +882,6 @@ Features:
- "machinectl commit" that takes a writable snapshot of a tree, invokes a
shell in it, and marks it read-only after use
* importd:
- generate a nice warning if mkfs.btrfs is missing
* cryptsetup:
- cryptsetup-generator: allow specification of passwords in crypttab itself
- support rd.luks.allow-discards= kernel cmdline params in cryptsetup generator

@ -650,22 +650,7 @@
units. If the size limit shall be disabled, specify
<literal>-</literal> as size.</para>
<para>Note that per-container size limits are only supported
on btrfs file systems. Also note that, if
<command>set-limit</command> is invoked without an image
parameter, and <filename>/var/lib/machines</filename> is
empty, and the directory is not located on btrfs, a btrfs
loopback file is implicitly created as
<filename>/var/lib/machines.raw</filename> with the given
size, and mounted to
<filename>/var/lib/machines</filename>. The size of the
loopback may later be readjusted with
<command>set-limit</command>, as well. If such a
loopback-mounted <filename>/var/lib/machines</filename>
directory is used, <command>set-limit</command> without an image
name alters both the quota setting within the file system as
well as the loopback file and file system size
itself.</para></listitem>
<para>Note that per-container size limits are only supported on btrfs file systems.</para></listitem>
</varlistentry>
<varlistentry>
@ -803,12 +788,8 @@
image is read from standard input, in which case the second
argument is mandatory.</para>
<para>Both <command>pull-tar</command> and <command>pull-raw</command>
will resize <filename>/var/lib/machines.raw</filename> and the
filesystem therein as necessary. Optionally, the
<option>--read-only</option> switch may be used to create a
read-only container or VM image. No cryptographic validation
is done when importing the images.</para>
<para>Optionally, the <option>--read-only</option> switch may be used to create a read-only container or VM
image. No cryptographic validation is done when importing the images.</para>
<para>Much like image downloads, ongoing imports may be listed
with <command>list-transfers</command> and aborted with
@ -920,18 +901,7 @@
<filename>/var/lib/machines/</filename> to make them available for
control with <command>machinectl</command>.</para>
<para>Note that some image operations are only supported,
efficient or atomic on btrfs file systems. Due to this, if the
<command>pull-tar</command>, <command>pull-raw</command>,
<command>import-tar</command>, <command>import-raw</command> and
<command>set-limit</command> commands notice that
<filename>/var/lib/machines</filename> is empty and not located on
btrfs, they will implicitly set up a loopback file
<filename>/var/lib/machines.raw</filename> containing a btrfs file
system that is mounted to
<filename>/var/lib/machines</filename>. The size of this loopback
file may be controlled dynamically with
<command>set-limit</command>.</para>
<para>Note that some image operations are only supported, efficient or atomic on btrfs file systems.</para>
<para>Disk images are understood by
<citerefentry><refentrytitle>systemd-nspawn</refentrytitle><manvolnum>1</manvolnum></citerefentry>

@ -870,96 +870,6 @@ int btrfs_subvol_set_subtree_quota_limit(const char *path, uint64_t subvol_id, u
return btrfs_subvol_set_subtree_quota_limit_fd(fd, subvol_id, referenced_max);
}
int btrfs_resize_loopback_fd(int fd, uint64_t new_size, bool grow_only) {
struct btrfs_ioctl_vol_args args = {};
char p[SYS_BLOCK_PATH_MAX("/loop/backing_file")], q[DEV_NUM_PATH_MAX];
_cleanup_free_ char *backing = NULL;
_cleanup_close_ int loop_fd = -1, backing_fd = -1;
struct stat st;
dev_t dev = 0;
int r;
/* In contrast to btrfs quota ioctls ftruncate() cannot make sense of "infinity" or file sizes > 2^31 */
if (!FILE_SIZE_VALID(new_size))
return -EINVAL;
/* btrfs cannot handle file systems < 16M, hence use this as minimum */
if (new_size < 16*1024*1024)
new_size = 16*1024*1024;
r = btrfs_get_block_device_fd(fd, &dev);
if (r < 0)
return r;
if (r == 0)
return -ENODEV;
xsprintf_sys_block_path(p, "/loop/backing_file", dev);
r = read_one_line_file(p, &backing);
if (r == -ENOENT)
return -ENODEV;
if (r < 0)
return r;
if (isempty(backing) || !path_is_absolute(backing))
return -ENODEV;
backing_fd = open(backing, O_RDWR|O_CLOEXEC|O_NOCTTY);
if (backing_fd < 0)
return -errno;
if (fstat(backing_fd, &st) < 0)
return -errno;
if (!S_ISREG(st.st_mode))
return -ENODEV;
if (new_size == (uint64_t) st.st_size)
return 0;
if (grow_only && new_size < (uint64_t) st.st_size)
return -EINVAL;
xsprintf_dev_num_path(q, "block", dev);
loop_fd = open(q, O_RDWR|O_CLOEXEC|O_NOCTTY);
if (loop_fd < 0)
return -errno;
if (snprintf(args.name, sizeof(args.name), "%" PRIu64, new_size) >= (int) sizeof(args.name))
return -EINVAL;
if (new_size < (uint64_t) st.st_size) {
/* Decrease size: first decrease btrfs size, then shorten loopback */
if (ioctl(fd, BTRFS_IOC_RESIZE, &args) < 0)
return -errno;
}
if (ftruncate(backing_fd, new_size) < 0)
return -errno;
if (ioctl(loop_fd, LOOP_SET_CAPACITY, 0) < 0)
return -errno;
if (new_size > (uint64_t) st.st_size) {
/* Increase size: first enlarge loopback, then increase btrfs size */
if (ioctl(fd, BTRFS_IOC_RESIZE, &args) < 0)
return -errno;
}
/* Make sure the free disk space is correctly updated for both file systems */
(void) fsync(fd);
(void) fsync(backing_fd);
return 1;
}
int btrfs_resize_loopback(const char *p, uint64_t new_size, bool grow_only) {
_cleanup_close_ int fd = -1;
fd = open(p, O_RDONLY|O_NOCTTY|O_CLOEXEC);
if (fd < 0)
return -errno;
return btrfs_resize_loopback_fd(fd, new_size, grow_only);
}
int btrfs_qgroupid_make(uint64_t level, uint64_t id, uint64_t *ret) {
assert(ret);

@ -62,9 +62,6 @@ int btrfs_quota_scan_start(int fd);
int btrfs_quota_scan_wait(int fd);
int btrfs_quota_scan_ongoing(int fd);
int btrfs_resize_loopback_fd(int fd, uint64_t size, bool grow_only);
int btrfs_resize_loopback(const char *path, uint64_t size, bool grow_only);
int btrfs_subvol_make(const char *path);
int btrfs_subvol_make_fd(int fd, const char *subvolume);

@ -37,7 +37,6 @@ struct RawImport {
char *local;
bool force_local;
bool read_only;
bool grow_machine_directory;
char *temp_path;
char *final_path;
@ -47,8 +46,6 @@ struct RawImport {
ImportCompress compress;
uint64_t written_since_last_grow;
sd_event_source *input_event_source;
uint8_t buffer[16*1024];
@ -95,7 +92,6 @@ int raw_import_new(
_cleanup_(raw_import_unrefp) RawImport *i = NULL;
_cleanup_free_ char *root = NULL;
bool grow;
int r;
assert(ret);
@ -104,8 +100,6 @@ int raw_import_new(
if (!root)
return -ENOMEM;
grow = path_startswith(root, "/var/lib/machines");
i = new(RawImport, 1);
if (!i)
return -ENOMEM;
@ -117,7 +111,6 @@ int raw_import_new(
.userdata = userdata,
.last_percent = (unsigned) -1,
.image_root = TAKE_PTR(root),
.grow_machine_directory = grow,
};
RATELIMIT_INIT(i->progress_rate_limit, 100 * USEC_PER_MSEC, 1);
@ -307,11 +300,6 @@ static int raw_import_write(const void *p, size_t sz, void *userdata) {
RawImport *i = userdata;
ssize_t n;
if (i->grow_machine_directory && i->written_since_last_grow >= GROW_INTERVAL_BYTES) {
i->written_since_last_grow = 0;
grow_machine_directory();
}
n = sparse_write(i->output_fd, p, sz, 64);
if (n < 0)
return (int) n;
@ -319,7 +307,6 @@ static int raw_import_write(const void *p, size_t sz, void *userdata) {
return -EIO;
i->written_uncompressed += sz;
i->written_since_last_grow += sz;
return 0;
}

@ -37,7 +37,6 @@ struct TarImport {
char *local;
bool force_local;
bool read_only;
bool grow_machine_directory;
char *temp_path;
char *final_path;
@ -47,8 +46,6 @@ struct TarImport {
ImportCompress compress;
uint64_t written_since_last_grow;
sd_event_source *input_event_source;
uint8_t buffer[16*1024];
@ -102,7 +99,6 @@ int tar_import_new(
_cleanup_(tar_import_unrefp) TarImport *i = NULL;
_cleanup_free_ char *root = NULL;
bool grow;
int r;
assert(ret);
@ -111,8 +107,6 @@ int tar_import_new(
if (!root)
return -ENOMEM;
grow = path_startswith(root, "/var/lib/machines");
i = new(TarImport, 1);
if (!i)
return -ENOMEM;
@ -124,7 +118,6 @@ int tar_import_new(
.userdata = userdata,
.last_percent = (unsigned) -1,
.image_root = TAKE_PTR(root),
.grow_machine_directory = grow,
};
RATELIMIT_INIT(i->progress_rate_limit, 100 * USEC_PER_MSEC, 1);
@ -245,17 +238,11 @@ static int tar_import_write(const void *p, size_t sz, void *userdata) {
TarImport *i = userdata;
int r;
if (i->grow_machine_directory && i->written_since_last_grow >= GROW_INTERVAL_BYTES) {
i->written_since_last_grow = 0;
grow_machine_directory();
}
r = loop_write(i->tar_fd, p, sz, false);
if (r < 0)
return r;
i->written_uncompressed += sz;
i->written_since_last_grow += sz;
return 0;
}

@ -719,7 +719,7 @@ static int method_import_tar_or_raw(sd_bus_message *msg, void *userdata, sd_bus_
if (!machine_name_is_valid(local))
return sd_bus_error_setf(error, SD_BUS_ERROR_INVALID_ARGS, "Local name %s is invalid", local);
r = setup_machine_directory((uint64_t) -1, error);
r = setup_machine_directory(error);
if (r < 0)
return r;
@ -783,7 +783,7 @@ static int method_import_fs(sd_bus_message *msg, void *userdata, sd_bus_error *e
if (!machine_name_is_valid(local))
return sd_bus_error_setf(error, SD_BUS_ERROR_INVALID_ARGS, "Local name %s is invalid", local);
r = setup_machine_directory((uint64_t) -1, error);
r = setup_machine_directory(error);
if (r < 0)
return r;
@ -924,7 +924,7 @@ static int method_pull_tar_or_raw(sd_bus_message *msg, void *userdata, sd_bus_er
if (v < 0)
return sd_bus_error_setf(error, SD_BUS_ERROR_INVALID_ARGS, "Unknown verification mode %s", verify);
r = setup_machine_directory((uint64_t) -1, error);
r = setup_machine_directory(error);
if (r < 0)
return r;

@ -74,7 +74,6 @@ static int pull_job_restart(PullJob *j) {
j->payload_allocated = 0;
j->written_compressed = 0;
j->written_uncompressed = 0;
j->written_since_last_grow = 0;
r = pull_job_begin(j);
if (r < 0)
@ -224,11 +223,6 @@ static int pull_job_write_uncompressed(const void *p, size_t sz, void *userdata)
if (j->disk_fd >= 0) {
if (j->grow_machine_directory && j->written_since_last_grow >= GROW_INTERVAL_BYTES) {
j->written_since_last_grow = 0;
grow_machine_directory();
}
if (j->allow_sparse)
n = sparse_write(j->disk_fd, p, sz, 64);
else {
@ -250,7 +244,6 @@ static int pull_job_write_uncompressed(const void *p, size_t sz, void *userdata)
}
j->written_uncompressed += sz;
j->written_since_last_grow += sz;
return 0;
}
@ -577,9 +570,6 @@ int pull_job_begin(PullJob *j) {
if (j->state != PULL_JOB_INIT)
return -EBUSY;
if (j->grow_machine_directory)
grow_machine_directory();
r = curl_glue_make(&j->curl, j->url, j);
if (r < 0)
return r;

@ -80,9 +80,6 @@ struct PullJob {
char *checksum;
bool grow_machine_directory;
uint64_t written_since_last_grow;
VerificationStyle style;
};

@ -56,7 +56,6 @@ struct RawPull {
char *local;
bool force_local;
bool grow_machine_directory;
bool settings;
bool roothash;
@ -119,7 +118,6 @@ int raw_pull_new(
_cleanup_(sd_event_unrefp) sd_event *e = NULL;
_cleanup_(raw_pull_unrefp) RawPull *i = NULL;
_cleanup_free_ char *root = NULL;
bool grow;
int r;
assert(ret);
@ -128,8 +126,6 @@ int raw_pull_new(
if (!root)
return -ENOMEM;
grow = path_startswith(root, "/var/lib/machines");
if (event)
e = sd_event_ref(event);
else {
@ -150,7 +146,6 @@ int raw_pull_new(
.on_finished = on_finished,
.userdata = userdata,
.image_root = TAKE_PTR(root),
.grow_machine_directory = grow,
.event = TAKE_PTR(e),
.glue = TAKE_PTR(g),
};
@ -689,7 +684,6 @@ int raw_pull_start(
i->raw_job->on_open_disk = raw_pull_job_on_open_disk_raw;
i->raw_job->on_progress = raw_pull_job_on_progress;
i->raw_job->calc_checksum = verify != IMPORT_VERIFY_NO;
i->raw_job->grow_machine_directory = i->grow_machine_directory;
r = pull_find_old_etags(url, i->image_root, DT_REG, ".raw-", ".raw", &i->raw_job->old_etags);
if (r < 0)

@ -52,7 +52,6 @@ struct TarPull {
char *local;
bool force_local;
bool grow_machine_directory;
bool settings;
pid_t tar_pid;
@ -112,7 +111,6 @@ int tar_pull_new(
_cleanup_(sd_event_unrefp) sd_event *e = NULL;
_cleanup_(tar_pull_unrefp) TarPull *i = NULL;
_cleanup_free_ char *root = NULL;
bool grow;
int r;
assert(ret);
@ -121,8 +119,6 @@ int tar_pull_new(
if (!root)
return -ENOMEM;
grow = path_startswith(root, "/var/lib/machines");
if (event)
e = sd_event_ref(event);
else {
@ -143,7 +139,6 @@ int tar_pull_new(
.on_finished = on_finished,
.userdata = userdata,
.image_root = TAKE_PTR(root),
.grow_machine_directory = grow,
.event = TAKE_PTR(e),
.glue = TAKE_PTR(g),
};
@ -512,7 +507,6 @@ int tar_pull_start(
i->tar_job->on_open_disk = tar_pull_job_on_open_disk_tar;
i->tar_job->on_progress = tar_pull_job_on_progress;
i->tar_job->calc_checksum = verify != IMPORT_VERIFY_NO;
i->tar_job->grow_machine_directory = i->grow_machine_directory;
r = pull_find_old_etags(url, i->image_root, DT_DIR, ".tar-", NULL, &i->tar_job->old_etags);
if (r < 0)

@ -41,15 +41,10 @@ static int property_get_pool_usage(
_cleanup_close_ int fd = -1;
uint64_t usage = (uint64_t) -1;
struct stat st;
assert(bus);
assert(reply);
/* We try to read the quota info from /var/lib/machines, as
* well as the usage of the loopback file
* /var/lib/machines.raw, and pick the larger value. */
fd = open("/var/lib/machines", O_RDONLY|O_CLOEXEC|O_DIRECTORY);
if (fd >= 0) {
BtrfsQuotaInfo q;
@ -58,11 +53,6 @@ static int property_get_pool_usage(
usage = q.referenced;
}
if (stat("/var/lib/machines.raw", &st) >= 0) {
if (usage == (uint64_t) -1 || st.st_blocks * 512ULL > usage)
usage = st.st_blocks * 512ULL;
}
return sd_bus_message_append(reply, "t", usage);
}
@ -77,15 +67,10 @@ static int property_get_pool_limit(
_cleanup_close_ int fd = -1;
uint64_t size = (uint64_t) -1;
struct stat st;
assert(bus);
assert(reply);
/* We try to read the quota limit from /var/lib/machines, as
* well as the size of the loopback file
* /var/lib/machines.raw, and pick the smaller value. */
fd = open("/var/lib/machines", O_RDONLY|O_CLOEXEC|O_DIRECTORY);
if (fd >= 0) {
BtrfsQuotaInfo q;
@ -94,11 +79,6 @@ static int property_get_pool_limit(
size = q.referenced_max;
}
if (stat("/var/lib/machines.raw", &st) >= 0) {
if (size == (uint64_t) -1 || (uint64_t) st.st_size < size)
size = st.st_size;
}
return sd_bus_message_append(reply, "t", size);
}
@ -877,19 +857,10 @@ static int method_set_pool_limit(sd_bus_message *message, void *userdata, sd_bus
return 1; /* Will call us back */
/* Set up the machine directory if necessary */
r = setup_machine_directory(limit, error);
r = setup_machine_directory(error);
if (r < 0)
return r;
/* Resize the backing loopback device, if there is one, except if we asked to drop any limit */
if (limit != (uint64_t) -1) {
r = btrfs_resize_loopback("/var/lib/machines", limit, false);
if (r == -ENOTTY)
return sd_bus_error_setf(error, SD_BUS_ERROR_NOT_SUPPORTED, "Quota is only supported on btrfs.");
if (r < 0 && r != -ENODEV) /* ignore ENODEV, as that's what is returned if the file system is not on loopback */
return sd_bus_error_set_errnof(error, r, "Failed to adjust loopback limit: %m");
}
(void) btrfs_qgroup_set_limit("/var/lib/machines", 0, limit);
r = btrfs_subvol_set_subtree_quota_limit("/var/lib/machines", 0, limit);

@ -1,46 +1,13 @@
/* SPDX-License-Identifier: LGPL-2.1+ */
#include <errno.h>
#include <fcntl.h>
#include <linux/loop.h>
#include <signal.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/file.h>
#include <sys/ioctl.h>
#include <sys/mount.h>
#include <sys/prctl.h>
#include <sys/stat.h>
#include <sys/statfs.h>
#include <sys/statvfs.h>
#include <unistd.h>
#include "sd-bus-protocol.h"
#include "sd-bus.h"
#include "alloc-util.h"
#include "btrfs-util.h"
#include "fd-util.h"
#include "fileio.h"
#include "fs-util.h"
#include "label.h"
#include "lockfile-util.h"
#include "log.h"
#include "machine-pool.h"
#include "macro.h"
#include "missing.h"
#include "mkdir.h"
#include "mount-util.h"
#include "parse-util.h"
#include "path-util.h"
#include "process-util.h"
#include "signal-util.h"
#include "stat-util.h"
#include "string-util.h"
#define VAR_LIB_MACHINES_SIZE_START (1024UL*1024UL*500UL)
#define VAR_LIB_MACHINES_FREE_MIN (1024UL*1024UL*750UL)
static int check_btrfs(void) {
struct statfs sfs;
@ -56,344 +23,24 @@ static int check_btrfs(void) {
return F_TYPE_EQUAL(sfs.f_type, BTRFS_SUPER_MAGIC);
}
static int setup_machine_raw(uint64_t size, sd_bus_error *error) {
_cleanup_free_ char *tmp = NULL;
_cleanup_close_ int fd = -1;
struct statvfs ss;
pid_t pid = 0;
int setup_machine_directory(sd_bus_error *error) {
int r;
/* We want to be able to make use of btrfs-specific file
* system features, in particular subvolumes, reflinks and
* quota. Hence, if we detect that /var/lib/machines.raw is
* not located on btrfs, let's create a loopback file, place a
* btrfs file system into it, and mount it to
* /var/lib/machines. */
fd = open("/var/lib/machines.raw", O_RDWR|O_CLOEXEC|O_NONBLOCK|O_NOCTTY);
if (fd >= 0)
return TAKE_FD(fd);
if (errno != ENOENT)
return sd_bus_error_set_errnof(error, errno, "Failed to open /var/lib/machines.raw: %m");
r = tempfn_xxxxxx("/var/lib/machines.raw", NULL, &tmp);
if (r < 0)
return r;
(void) mkdir_p_label("/var/lib", 0755);
fd = open(tmp, O_RDWR|O_CREAT|O_EXCL|O_NOCTTY|O_CLOEXEC, 0600);
if (fd < 0)
return sd_bus_error_set_errnof(error, errno, "Failed to create /var/lib/machines.raw: %m");
if (fstatvfs(fd, &ss) < 0) {
r = sd_bus_error_set_errnof(error, errno, "Failed to determine free space on /var/lib/machines.raw: %m");
goto fail;
}
if (ss.f_bsize * ss.f_bavail < VAR_LIB_MACHINES_FREE_MIN) {
r = sd_bus_error_setf(error, SD_BUS_ERROR_FAILED, "Not enough free disk space to set up /var/lib/machines.");
goto fail;
}
if (ftruncate(fd, size) < 0) {
r = sd_bus_error_set_errnof(error, errno, "Failed to enlarge /var/lib/machines.raw: %m");
goto fail;
}
r = safe_fork("(mkfs)", FORK_RESET_SIGNALS|FORK_DEATHSIG, &pid);
if (r < 0) {
sd_bus_error_set_errnof(error, r, "Failed to fork mkfs.btrfs: %m");
goto fail;
}
if (r == 0) {
/* Child */
fd = safe_close(fd);
execlp("mkfs.btrfs", "-Lvar-lib-machines", tmp, NULL);
if (errno == ENOENT)
_exit(99);
_exit(EXIT_FAILURE);
}
r = wait_for_terminate_and_check("mkfs", pid, 0);
pid = 0;
if (r < 0) {
sd_bus_error_set_errnof(error, r, "Failed to wait for mkfs.btrfs: %m");
goto fail;
}
if (r == 99) {
r = sd_bus_error_set_errnof(error, ENOENT, "Cannot set up /var/lib/machines, mkfs.btrfs is missing");
goto fail;
}
if (r != EXIT_SUCCESS) {
r = sd_bus_error_setf(error, SD_BUS_ERROR_FAILED, "mkfs.btrfs failed with error code %i", r);
goto fail;
}
r = rename_noreplace(AT_FDCWD, tmp, AT_FDCWD, "/var/lib/machines.raw");
if (r < 0) {
sd_bus_error_set_errnof(error, r, "Failed to move /var/lib/machines.raw into place: %m");
goto fail;
}
return TAKE_FD(fd);
fail:
unlink_noerrno(tmp);
if (pid > 1)
kill_and_sigcont(pid, SIGKILL);
return r;
}
int setup_machine_directory(uint64_t size, sd_bus_error *error) {
_cleanup_(release_lock_file) LockFile lock_file = LOCK_FILE_INIT;
struct loop_info64 info = {
.lo_flags = LO_FLAGS_AUTOCLEAR,
};
_cleanup_close_ int fd = -1, control = -1, loop = -1;
_cleanup_free_ char* loopdev = NULL;
char tmpdir[] = "/tmp/machine-pool.XXXXXX", *mntdir = NULL;
bool tmpdir_made = false, mntdir_made = false, mntdir_mounted = false;
char buf[FORMAT_BYTES_MAX];
int r, nr = -1;
/* btrfs cannot handle file systems < 16M, hence use this as minimum */
if (size == (uint64_t) -1)
size = VAR_LIB_MACHINES_SIZE_START;
else if (size < 16*1024*1024)
size = 16*1024*1024;
/* Make sure we only set the directory up once at a time */
r = make_lock_file("/run/systemd/machines.lock", LOCK_EX, &lock_file);
if (r < 0)
return r;
r = check_btrfs();
if (r < 0)
return sd_bus_error_set_errnof(error, r, "Failed to determine whether /var/lib/machines is located on btrfs: %m");
if (r > 0) {
(void) btrfs_subvol_make_label("/var/lib/machines");
r = btrfs_quota_enable("/var/lib/machines", true);
if (r < 0)
log_warning_errno(r, "Failed to enable quota for /var/lib/machines, ignoring: %m");
r = btrfs_subvol_auto_qgroup("/var/lib/machines", 0, true);
if (r < 0)
log_warning_errno(r, "Failed to set up default quota hierarchy for /var/lib/machines, ignoring: %m");
return 1;
}
if (path_is_mount_point("/var/lib/machines", NULL, AT_SYMLINK_FOLLOW) > 0) {
log_debug("/var/lib/machines is already a mount point, not creating loopback file for it.");
return 0;
}
r = dir_is_populated("/var/lib/machines");
if (r < 0 && r != -ENOENT)
return r;
if (r > 0) {
log_debug("/var/log/machines is already populated, not creating loopback file for it.");
return 0;
}
r = mkfs_exists("btrfs");
if (r == 0)
return sd_bus_error_set_errnof(error, ENOENT, "Cannot set up /var/lib/machines, mkfs.btrfs is missing");
if (r < 0)
return r;
fd = setup_machine_raw(size, error);
if (fd < 0)
return fd;
control = open("/dev/loop-control", O_RDWR|O_CLOEXEC|O_NOCTTY|O_NONBLOCK);
if (control < 0)
return sd_bus_error_set_errnof(error, errno, "Failed to open /dev/loop-control: %m");
nr = ioctl(control, LOOP_CTL_GET_FREE);
if (nr < 0)
return sd_bus_error_set_errnof(error, errno, "Failed to allocate loop device: %m");
if (asprintf(&loopdev, "/dev/loop%i", nr) < 0) {
r = -ENOMEM;
goto fail;
}
loop = open(loopdev, O_CLOEXEC|O_RDWR|O_NOCTTY|O_NONBLOCK);
if (loop < 0) {
r = sd_bus_error_set_errnof(error, errno, "Failed to open loopback device: %m");
goto fail;
}
if (ioctl(loop, LOOP_SET_FD, fd) < 0) {
r = sd_bus_error_set_errnof(error, errno, "Failed to bind loopback device: %m");
goto fail;
}
if (ioctl(loop, LOOP_SET_STATUS64, &info) < 0) {
r = sd_bus_error_set_errnof(error, errno, "Failed to enable auto-clear for loopback device: %m");
goto fail;
}
/* We need to make sure the new /var/lib/machines directory
* has an access mode of 0700 at the time it is first made
* available. mkfs will create it with 0755 however. Hence,
* let's mount the directory into an inaccessible directory
* below /tmp first, fix the access mode, and move it to the
* public place then. */
if (!mkdtemp(tmpdir)) {
r = sd_bus_error_set_errnof(error, errno, "Failed to create temporary mount parent directory: %m");
goto fail;
}
tmpdir_made = true;
mntdir = strjoina(tmpdir, "/mnt");
if (mkdir(mntdir, 0700) < 0) {
r = sd_bus_error_set_errnof(error, errno, "Failed to create temporary mount directory: %m");
goto fail;
}
mntdir_made = true;
if (mount(loopdev, mntdir, "btrfs", 0, NULL) < 0) {
r = sd_bus_error_set_errnof(error, errno, "Failed to mount loopback device: %m");
goto fail;
}
mntdir_mounted = true;
r = btrfs_quota_enable(mntdir, true);
if (r < 0)
log_warning_errno(r, "Failed to enable quota, ignoring: %m");
r = btrfs_subvol_auto_qgroup(mntdir, 0, true);
if (r < 0)
log_warning_errno(r, "Failed to set up default quota hierarchy, ignoring: %m");
if (chmod(mntdir, 0700) < 0) {
r = sd_bus_error_set_errnof(error, errno, "Failed to fix owner: %m");
goto fail;
}
(void) mkdir_p_label("/var/lib/machines", 0700);
if (mount(mntdir, "/var/lib/machines", NULL, MS_BIND, NULL) < 0) {
r = sd_bus_error_set_errnof(error, errno, "Failed to mount directory into right place: %m");
goto fail;
}
(void) syncfs(fd);
log_info("Set up /var/lib/machines as btrfs loopback file system of size %s mounted on /var/lib/machines.raw.", format_bytes(buf, sizeof(buf), size));
(void) umount2(mntdir, MNT_DETACH);
(void) rmdir(mntdir);
(void) rmdir(tmpdir);
return 1;
fail:
if (mntdir_mounted)
(void) umount2(mntdir, MNT_DETACH);
if (mntdir_made)
(void) rmdir(mntdir);
if (tmpdir_made)
(void) rmdir(tmpdir);
if (loop >= 0) {
(void) ioctl(loop, LOOP_CLR_FD);
loop = safe_close(loop);
}
(void) ioctl(control, LOOP_CTL_REMOVE, nr);
return r;
}
static int sync_path(const char *p) {
_cleanup_close_ int fd = -1;
fd = open(p, O_RDONLY|O_CLOEXEC|O_NOCTTY);
if (fd < 0)
return -errno;
if (syncfs(fd) < 0)
return -errno;
return 0;
}
int grow_machine_directory(void) {
char buf[FORMAT_BYTES_MAX];
struct statvfs a, b;
uint64_t old_size, new_size, max_add;
int r;
/* Ensure the disk space data is accurate */
sync_path("/var/lib/machines");
sync_path("/var/lib/machines.raw");
if (statvfs("/var/lib/machines.raw", &a) < 0)
return -errno;
if (statvfs("/var/lib/machines", &b) < 0)
return -errno;
/* Don't grow if not enough disk space is available on the host */
if (((uint64_t) a.f_bavail * (uint64_t) a.f_bsize) <= VAR_LIB_MACHINES_FREE_MIN)
return 0;
/* Don't grow if at least 1/3th of the fs is still free */
if (b.f_bavail > b.f_blocks / 3)
return 0;
/* Calculate how much we are willing to add at most */
max_add = ((uint64_t) a.f_bavail * (uint64_t) a.f_bsize) - VAR_LIB_MACHINES_FREE_MIN;
/* Calculate the old size */
old_size = (uint64_t) b.f_blocks * (uint64_t) b.f_bsize;
/* Calculate the new size as three times the size of what is used right now */
new_size = ((uint64_t) b.f_blocks - (uint64_t) b.f_bavail) * (uint64_t) b.f_bsize * 3;
/* Always, grow at least to the start size */
if (new_size < VAR_LIB_MACHINES_SIZE_START)
new_size = VAR_LIB_MACHINES_SIZE_START;
/* If the new size is smaller than the old size, don't grow */
if (new_size < old_size)
return 0;
/* Ensure we never add more than the maximum */
if (new_size > old_size + max_add)
new_size = old_size + max_add;
r = btrfs_resize_loopback("/var/lib/machines", new_size, true);
if (r < 0)
return log_debug_errno(r, "Failed to resize loopback: %m");
if (r == 0)
return 0;
(void) btrfs_subvol_make_label("/var/lib/machines");
/* Also bump the quota, of both the subvolume leaf qgroup, as
* well as of any subtree quota group by the same id but a
* higher level, if it exists. */
r = btrfs_qgroup_set_limit("/var/lib/machines", 0, new_size);
r = btrfs_quota_enable("/var/lib/machines", true);
if (r < 0)
log_debug_errno(r, "Failed to set btrfs limit: %m");
log_warning_errno(r, "Failed to enable quota for /var/lib/machines, ignoring: %m");
r = btrfs_subvol_set_subtree_quota_limit("/var/lib/machines", 0, new_size);
r = btrfs_subvol_auto_qgroup("/var/lib/machines", 0, true);
if (r < 0)
log_debug_errno(r, "Failed to set btrfs subtree limit: %m");
log_warning_errno(r, "Failed to set up default quota hierarchy for /var/lib/machines, ignoring: %m");
log_info("Grew /var/lib/machines btrfs loopback file system to %s.", format_bytes(buf, sizeof(buf), new_size));
return 1;
}

@ -5,8 +5,4 @@
#include "sd-bus.h"
/* Grow the /var/lib/machines directory after each 10MiB written */
#define GROW_INTERVAL_BYTES (UINT64_C(10) * UINT64_C(1024) * UINT64_C(1024))
int setup_machine_directory(uint64_t size, sd_bus_error *error);
int grow_machine_directory(void);
int setup_machine_directory(sd_bus_error *error);

@ -7,8 +7,13 @@
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.
# This unit is required for pre-240 versions of systemd that automatically set
# up /var/lib/machines.raw as loopback-mounted btrfs file system. Later
# versions don't do that anymore, but let's keep minimal compatibility by
# mounting the image still, if it exists.
[Unit]
Description=Virtual Machine and Container Storage
Description=Virtual Machine and Container Storage (Compatibility)
ConditionPathExists=/var/lib/machines.raw
[Mount]

Loading…
Cancel
Save