Quoting https://github.com/systemd/systemd/issues/10074:
> detect_vm_uml() reads /proc/cpuinfo with read_full_file()
> read_full_file() has a file max limit size of READ_FULL_BYTES_MAX=(4U*1024U*1024U)
> Unfortunately, the size of my /proc/cpuinfo is bigger, approximately:
> echo $(( 4* $(cat /proc/cpuinfo | wc -c)))
> 9918072
> This causes read_full_file() to fail and the Condition test fallout.
Let's just read line by line until we find an intersting line. This also
helps if not running under UML, because we avoid reading as much data.
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
They are not needed, because anything that is non-zero is converted
to true.
C11:
> 6.3.1.2: When any scalar value is converted to _Bool, the result is 0 if the
> value compares equal to 0; otherwise, the result is 1.
https://stackoverflow.com/questions/31551888/casting-int-to-bool-in-c-c
Let's simplify the code a bit. Let's reduce the number of redundant if
checks a bit, (i.e. if we want to check for equality with
VIRTUALIZATION_VM_OTHER there's no need to check for non-equality with
VIRTUALIZATION_NONE first). As a very welcome side-effect this means we
lose some lines of code and our level of indentation is reduced.
No changes in behaviour.
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
$ sudo systemd-run -p RootDirectory=/usr -E LD_LIBRARY_PATH=/lib/systemd/ -E SYSTEMD_LOG_LEVEL=debug /bin/systemd-detect-virt
Before
systemd-detect-virt[18498]: No virtualization found in DMI
systemd-detect-virt[18498]: No virtualization found in CPUID
systemd-detect-virt[18498]: Virtualization XEN not found, /proc/xen does not exist
systemd-detect-virt[18498]: This platform does not support /proc/device-tree
systemd-detect-virt[18498]: Failed to check for virtualization: No such file or directory
The first four lines are at debug level, so the user would only see that last
one usually, which is not very enlightening.
This now becomes:
systemd-detect-virt[21172]: No virtualization found in DMI
systemd-detect-virt[21172]: No virtualization found in CPUID
systemd-detect-virt[21172]: Virtualization XEN not found, /proc/xen does not exist
systemd-detect-virt[21172]: This platform does not support /proc/device-tree
systemd-detect-virt[21172]: /proc/cpuinfo not found, assuming no UML virtualization.
systemd-detect-virt[21172]: This platform does not support /proc/sysinfo
systemd-detect-virt[21172]: Found VM virtualization none
systemd-detect-virt[21172]: none
We do more checks, which is good too.
Use sscanf instead of the built-in safe_atolu because the scanned string
lacks the leading "0x", it is generated with snprintf(b, "%08x", val).
As a result strtoull handles it as octal, and parsing fails.
The initial submission already used sscanf, then parsing was replaced by
safe_atolu without retesting the updated PR.
Fixes 575e6588d ("virt: use XENFEAT_dom0 to detect the hardware domain
(#6442, #6662) (#7581)")
The __get_cpuid() function only calls __cpuid() if __get_cpuid_max()
returns a value that is less than or equal to the leaf value.
In QEMU/KVM, I found that the special hypervisor leaf value (0x40000000U)
is always larger than the value retured by __get_cpuid_max().
Avoid this problem by calling the __cpuid() macro directly once we have
checked the hypervisor bit from leaf 1.
Fixes: d31b0033b7
The detection of ConditionVirtualisation= relies on the presence of
/proc/xen/capabilities. If the file exists and contains the string
"control_d", the running system is a dom0 and VIRTUALIZATION_NONE should
be set. In case /proc/xen exists, or some sysfs files indicate "xen",
VIRTUALIZATION_XEN should be set to indicate the system is a domU.
With an (old) xenlinux based kernel, /proc/xen/capabilities is always
available and the detection described above works always. But with a
pvops based kernel, xenfs must be mounted on /proc/xen to get
"capabilities". This is done by a proc-xen.mount unit, which is part of
xen.git. Since the mounting happens "late", other units may be scheduled
before "proc-xen.mount". If these other units make use of
"ConditionVirtualisation=", the virtualization detection returns
incorect results. detect_vm() will set VIRTUALIZATION_XEN because "xen"
is found in sysfs. This value will be cached. Once xenfs is mounted, the
next process that runs detect_vm() will get VIRTUALIZATION_NONE.
This misdetection can be fixed by using
/sys/hypervisor/properties/features, which exports the value returned by
the "XENVER_get_features" hypercall. If the bit XENFEAT_dom0 is set, the
domain is the "hardware domain". It is supposed to have permissions to
access all hardware. The used sysfs file is available since v2.6.31.
The commonly used term "dom0" refers to the control domain which runs
the toolstack and has access to all hardware. But the virtualization
host may be configured such that one dedicated domain becomes the
"hardware domain", and another one the "toolstack domain".
Update detect_vm_xen_dom0 to propagate errors in case reading
/proc/xen/capabilites fails. This does not fix any bugs, it just makes
it consistent with other functions called by detect_vm.
The file /proc/xen/capabilities is only available if xenfs is mounted.
With a classic xenlinux based kernel that file is available
unconditionally. But with a modern pvops based kernel, xenfs must be
mounted before the "capabilities" may appear. xenfs is mounted very late
via .services files provided by the Xen toolstack. Other units may be
scheduled before xenfs is mounted, which will confuse the detection of
VIRTUALIZATION_XEN.
In all Xen enabled kernels, and if that kernel is actually running on
the Xen hypervisor, the "/proc/xen" directory is the reliable indicator
that this instance runs in a "Xen guest".
Adjust the code to check for /proc/xen instead of
/proc/xen/capabilities.
Fixes commit 3f61278b5 ("basic: Bugfix Detect XEN Dom0 as no virtualization")
This moves pretty much all uses of getpid() over to getpid_raw(). I
didn't specifically check whether the optimization is worth it for each
replacement, but in order to keep things simple and systematic I
switched over everything at once.
QEMU >= 2.10 will include a CPUID leaf with value "TCGTCGTCGTCG"
on x86 when running with the TCG CPU emulator:
https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg05231.html
Existing methods of detecting QEMU are left unchanged for sake of
backcompatibility.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This reverts commit 295ee9845c.
Let'd revert this for now, see #5446 for discussions.
We want systemd-detect-virt --chroot to return true for all chroot-like stuff, for
example mock environments which have use a mount namespace. The downside
of this revert that systemctl will not work from our own namespaced services, anything
with RootDirectory=/RootImage= set.
This breaks again, this time for setups where Qemu is not reported via DMI for whatever
reason. So swap order of cpuid and dmi again, but properly detect oracle.
See issue #5318.
In commit 050e65a we swapped order of detect_vm_{cpuid,dmi}(). That
fixed Virtualbox but broke qemu with kvm, which is expected to return
'kvm'. So check for qemu/kvm first, then DMI, CPUID last.
This fixes#5318.
Signed-off-by: Christian Hesse <mail@eworm.de>
Previously, systemd-detect-virt was unable to detect "systemd-nspawn -a"
container environments, i.e. where PID 1 is a stub process running in host
context, as in that case /proc/1/environ was inherited from the host. Let's
improve that, and add an additional check for container environments where
/proc/1/environ is not cleaned up and does not contain the $container
environment variable:
The /proc/1/sched file shows the host PID in the first line. if this is not
1, we know we are running in a PID namespace (but not which implementation).
With these changes we should be able to detect container environments that
don't set $container at all.
Let's be a bit more careful when detecting chroot() environments, so that we
can discern them from namespaced environments.
Previously this would simply check if the root directory of PID 1 matches our
own root directory. With this commit, we also check whether the namespaces of
PID 1 and ourselves are the same. If not we assume we are running inside of a
namespaced environment instead of a chroot() environment.
This has the benefit that systemctl (which uses running_in_chroot()) will work
as usual when invoked in a namespaced service.
ENOENT should be treated as "false", but because of the broken errno check it
was treated as an error. So ConditionVirtualization=user-namespaces probably
returned the correct answer, but only by accident.
Fixes#4608.
Various things don't work when we're running in a user namespace, but it's
pretty hard to reliably detect if that is true.
A function is added which looks at /proc/self/uid_map and returns false
if the default "0 0 UINT32_MAX" is found, and true if it finds anything else.
This misses the case where an 1:1 mapping with the full range was used, but
I don't know how to distinguish this case.
'systemd-detect-virt --private-users' is very similar to
'systemd-detect-virt --chroot', but we check for a user namespace instead.
When running in XEN Dom0 the virtualization check:
1) detect_xen returns HYPERVISOR_NONE so next checks are executed
2) /proc/sys/hypervisor detects a XEN hypervisor
it is lacking the special Dom0 detection as in detect_xen
With this patch, at the end of all virtualization checks we double-check if running in XEN Dom0 or DomU.
Virtualbox should be detected as 'oracle'. This used to work but broke
with commit:
commit 75f86906c5
Author: Lennart Poettering <lennart@poettering.net>
Date: Mon Sep 7 13:42:47 2015 +0200
basic: rework virtualization detection API
We swap detection for dmi and cpuid, this fixes Virtualbox with KVM.
Hopefully it does not break anything else.
src/basic/virt.c: In function 'detect_vm_device_tree':
src/basic/virt.c:117:17: error: unknown type name '_cleanup_closedir_'
_cleanup_closedir_ DIR *dir = NULL;
src/basic/virt.c:128:17: error: implicit declaration of function 'FOREACH_DIRENT' [-Werror=implicit-function-declaration]
FOREACH_DIRENT(dent, dir, return -errno)
If we don't know a container manager, we should consider it as "other"
rather than as no container manager at all, to provide a somwhat useful
upgrade path.
Some guests (ARM, AArch64, x86-RHEL) have 'KVM' in the product name.
Look for that first in order to more precisely report "kvm" when
detecting a QEMU/KVM guest. Without this patch we report "qemu",
even if KVM acceleration is in use on ARM/AArch64 guests.
I've only tested a backported version of this and the previous
patch on an AArch64 guest (which worked). Of course it would be
nice to get regression testing on all guest types that depend on
dmi done.