On really old kernels (< 4.14+) a bug in /proc/1/sched handling in the
kernel could be used to determine whether we are running in a PID
namespace. This hasn't worked for a long time, and there's little point
in making things work on old kernels we can't make work on current
kernels, hence let's drop that old cruft.
See: #8153
UML runs as a user-process so it can quite easily be ran inside of
another hypervisor, for instance inside a KVM instance. UML passes
through the CPUID from the host machine so in this case detect_vm
incorrectly identifies as running under KVM. So check we are running
a UML kernel first, before we check any other hypervisors.
Resolves: #17754
Signed-off-by: Christopher Obbard <chris.obbard@collabora.com>
Currently systemd-detect-virt fails to detect running under PowerVM.
Add code to detect PowerVM based on code in util-linux.
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
This has the major benefit that the entire payload of the container can
access these files there. Previously, we'd set them only as env vars,
but that meant only PID 1 could read them directly or other privileged
payload code with access to /run/1/environ.
WSL2 will soon (TM) include the "WSL2" string in /proc/sys/kernel/osrelease
so the workaround will no longer be necessary.
We have several different cloud images which do include the "microsoft"
string already, which would break this detection. They are for internal
usage at the moment, but the userspace side can come from all over the
place so it would be quite hard to track and downstream-patch to avoid
breakages.
This reverts commit a2f838d590.
proot provides userspace-powered emulation of chroot and mount --bind,
lending it to be used on environments without unprivileged user
namespaces, or in otherwise restricted environments like Android.
In order to achieve this, proot makes use of the kernel's ptrace()
facility, which we can use in order to detect its presence. Since it
doesn't use any kind of namespacing, including PID namespacing, we don't
need to do any tricks when trying to get the tracer's metadata.
For our purposes, proot is listed as a "container", since we mostly use
this also as the bucket for non-container-but-container-like
technologies like WSL. As such, it seems like a good fit for this
section as well.
From https://github.com/microsoft/WSL/issues/423#issuecomment-221627364:
> it's unlikely we'll change it to something that doesn't contain "Microsoft"
> or "WSL".
... but well, it happened. If they change it incompatibly w/o adding an stable
detection mechanism, I think we should not add yet another detection method.
But adding a different casing of "microsoft" is not a very big step, so let's
do that.
Follow-up for #11932.
In a current VirtualBox installation the board_vendor is set to "Oracle
Corporation". So we need to add this to the dmi_vendor_table for a
relieable detection.
This fixes#13429
Signed-off-by: Jan Losinski <losinski@wh2.tu-dresden.de>
If a container manager does not set $container, we could end up
in a strange situation when detect-virt returns container-other when
run as non-pid-1 and none when run as pid-1.
All users of the macro (except for one, in serialize.c), use the macro in
connection with read_line(), so they must include fileio.h. Let's not play
libc games and require multiple header file to be included for the most common
use of a function.
The removal of def.h includes is not exact. I mostly went over the commits that
switch over to use read_line() and add def.h at the same time and reverted the
addition of def.h in those files.
Quoting https://github.com/systemd/systemd/issues/10074:
> detect_vm_uml() reads /proc/cpuinfo with read_full_file()
> read_full_file() has a file max limit size of READ_FULL_BYTES_MAX=(4U*1024U*1024U)
> Unfortunately, the size of my /proc/cpuinfo is bigger, approximately:
> echo $(( 4* $(cat /proc/cpuinfo | wc -c)))
> 9918072
> This causes read_full_file() to fail and the Condition test fallout.
Let's just read line by line until we find an intersting line. This also
helps if not running under UML, because we avoid reading as much data.
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
They are not needed, because anything that is non-zero is converted
to true.
C11:
> 6.3.1.2: When any scalar value is converted to _Bool, the result is 0 if the
> value compares equal to 0; otherwise, the result is 1.
https://stackoverflow.com/questions/31551888/casting-int-to-bool-in-c-c
Let's simplify the code a bit. Let's reduce the number of redundant if
checks a bit, (i.e. if we want to check for equality with
VIRTUALIZATION_VM_OTHER there's no need to check for non-equality with
VIRTUALIZATION_NONE first). As a very welcome side-effect this means we
lose some lines of code and our level of indentation is reduced.
No changes in behaviour.
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
$ sudo systemd-run -p RootDirectory=/usr -E LD_LIBRARY_PATH=/lib/systemd/ -E SYSTEMD_LOG_LEVEL=debug /bin/systemd-detect-virt
Before
systemd-detect-virt[18498]: No virtualization found in DMI
systemd-detect-virt[18498]: No virtualization found in CPUID
systemd-detect-virt[18498]: Virtualization XEN not found, /proc/xen does not exist
systemd-detect-virt[18498]: This platform does not support /proc/device-tree
systemd-detect-virt[18498]: Failed to check for virtualization: No such file or directory
The first four lines are at debug level, so the user would only see that last
one usually, which is not very enlightening.
This now becomes:
systemd-detect-virt[21172]: No virtualization found in DMI
systemd-detect-virt[21172]: No virtualization found in CPUID
systemd-detect-virt[21172]: Virtualization XEN not found, /proc/xen does not exist
systemd-detect-virt[21172]: This platform does not support /proc/device-tree
systemd-detect-virt[21172]: /proc/cpuinfo not found, assuming no UML virtualization.
systemd-detect-virt[21172]: This platform does not support /proc/sysinfo
systemd-detect-virt[21172]: Found VM virtualization none
systemd-detect-virt[21172]: none
We do more checks, which is good too.
Use sscanf instead of the built-in safe_atolu because the scanned string
lacks the leading "0x", it is generated with snprintf(b, "%08x", val).
As a result strtoull handles it as octal, and parsing fails.
The initial submission already used sscanf, then parsing was replaced by
safe_atolu without retesting the updated PR.
Fixes 575e6588d ("virt: use XENFEAT_dom0 to detect the hardware domain
(#6442, #6662) (#7581)")
The __get_cpuid() function only calls __cpuid() if __get_cpuid_max()
returns a value that is less than or equal to the leaf value.
In QEMU/KVM, I found that the special hypervisor leaf value (0x40000000U)
is always larger than the value retured by __get_cpuid_max().
Avoid this problem by calling the __cpuid() macro directly once we have
checked the hypervisor bit from leaf 1.
Fixes: d31b0033b7
The detection of ConditionVirtualisation= relies on the presence of
/proc/xen/capabilities. If the file exists and contains the string
"control_d", the running system is a dom0 and VIRTUALIZATION_NONE should
be set. In case /proc/xen exists, or some sysfs files indicate "xen",
VIRTUALIZATION_XEN should be set to indicate the system is a domU.
With an (old) xenlinux based kernel, /proc/xen/capabilities is always
available and the detection described above works always. But with a
pvops based kernel, xenfs must be mounted on /proc/xen to get
"capabilities". This is done by a proc-xen.mount unit, which is part of
xen.git. Since the mounting happens "late", other units may be scheduled
before "proc-xen.mount". If these other units make use of
"ConditionVirtualisation=", the virtualization detection returns
incorect results. detect_vm() will set VIRTUALIZATION_XEN because "xen"
is found in sysfs. This value will be cached. Once xenfs is mounted, the
next process that runs detect_vm() will get VIRTUALIZATION_NONE.
This misdetection can be fixed by using
/sys/hypervisor/properties/features, which exports the value returned by
the "XENVER_get_features" hypercall. If the bit XENFEAT_dom0 is set, the
domain is the "hardware domain". It is supposed to have permissions to
access all hardware. The used sysfs file is available since v2.6.31.
The commonly used term "dom0" refers to the control domain which runs
the toolstack and has access to all hardware. But the virtualization
host may be configured such that one dedicated domain becomes the
"hardware domain", and another one the "toolstack domain".
Update detect_vm_xen_dom0 to propagate errors in case reading
/proc/xen/capabilites fails. This does not fix any bugs, it just makes
it consistent with other functions called by detect_vm.
The file /proc/xen/capabilities is only available if xenfs is mounted.
With a classic xenlinux based kernel that file is available
unconditionally. But with a modern pvops based kernel, xenfs must be
mounted before the "capabilities" may appear. xenfs is mounted very late
via .services files provided by the Xen toolstack. Other units may be
scheduled before xenfs is mounted, which will confuse the detection of
VIRTUALIZATION_XEN.
In all Xen enabled kernels, and if that kernel is actually running on
the Xen hypervisor, the "/proc/xen" directory is the reliable indicator
that this instance runs in a "Xen guest".
Adjust the code to check for /proc/xen instead of
/proc/xen/capabilities.
Fixes commit 3f61278b5 ("basic: Bugfix Detect XEN Dom0 as no virtualization")
This moves pretty much all uses of getpid() over to getpid_raw(). I
didn't specifically check whether the optimization is worth it for each
replacement, but in order to keep things simple and systematic I
switched over everything at once.