We manage to improve the cold cache generation from 98s to ~30s on my
desktop.
Two things have been done to improve that performance:
1. This one was stupid. I forgot a debug tracing routine that should
have been removed in the code… This tracing routine was forcing us
to cache the libraries… …twice. Massive facepalm. Addressing this
reduced the cold runtime by 50%.
2. Instead of spinning up a patchelf subprocess for each library, we
batch these operations as much as possible in a single subprocess.
This trick shaves about 30% of the remaining runtime.
After profiling a nixglhost hot run, it turns out that we were
spending more than 98% of the run time reading and sha256-hashing
files.
Let's give up on content hashing the files and assume that using their
name, size and last write time is good enough.
On a hot run, we reduce the run time from about 3s to 0.3s on a
nvme-powered ryzen 7 desktop.
I guess this 10x speedup probably worth the little cache corectness we
lose on the way.
We made the incorrect assumption that the first DSO we'd stumble upon
in the load path would be the most appropriate one for the host
system. IE. it'd be the x86_64-gnu-linux on such a system. It turned
out not being the case, meaning we can't take such a shortcut and have
to handle the case where we end up with multiple same libraries for
different archs.
This forced us to do two things:
1. It forced us to rethink the cache directory structure. We now have
to cache each library directory separately. Meaning we now have to
inject the GLX/EGL/Cuda directories of *all* these library
directories subpaths to the LD_LIBRARY_PATH. The dynamic linker
will then try to open these DSOs (and fail) until it stumble upon
the appropriate one.
2. We had to give up on the way we injected EGL libraries using
asolute paths. We don't know which DSO is adapted to the wrapped
program arch. Instead of injecting the absolute paths through the
JSON configuration files, we just stipulate the libraries names in
them. We then inject the various EGL DSOs we find through the
LD_LIBRARY_PATH, similarly to what we already do for GLX and Cuda.
We add a -d/--driver-directory flag allowing the user to circomvent
the DSO automatic lookup and instead force nix-gl-host to load its
dynamic libraries from a specific directory.
Copying & patching all the DSOs is a time consuming process (~10s on a
slow hard drive computer). We definitely don't want to go through it
for each process start, we need to introduce a cache.
For this cache, we go the concervative way. We're going to "resolve" a
DSO name (ie. find the DSO absolute path) and sha256-hash each DSO.
We're then going to compare the fingerprints to determine whether or
not we need to nuke and rebuild the DSO cache.
The cache state is persisted through a JSON file saved in the cache dir.
We create a new function in charge of finding the CUDA DSOs. We also
modify the find_nvidia_dsos function and remove the cuda-related
libraries from its output.
We take advantage of this new feature to factor out the file searching
logic in its own function.
Good news: we did not have to patch libglvnd for the EGL support. All
the low-level machinery was already here (but sadly undocumented).
Bad news: properly supporting the EGL stack turned out being more
involving than its GLX counterpart on the wrapper side. Not only you
need the main EGL lib in charge of implementing the
primitives (libEGL_nvidia.so), but also two other libraries
implementing the wayland and gbm bindings.
These DSOs in turn depend on some non-glibc but open source shared
libraries. I decided to use the ones coming from the host system
rather than the ones provided by Nixpkgs: it's best to assume that the
host system did its homework to determine which version of these
libraries the Nvidia driver is expecting to work with.
First take.
For now, we're only try to support the Nvidia proprietary driver. We
also cut quite some corners :)
We hardcode a list of DSOs we're looking for in the code. That's
obviously the best long-term decision, we'll have to revise this
particular approach later on.
We're looking for these listed DSOs in the GL_VENDOR_PATH provided by
the user. We'll need to patch these DSOs and we obviously don't want
to alter the host OS configuration. So we have to first copy them to
the user XDG cache directory.
Once copied, we alter their runpath to point to the user cache dir:
these DSO can depend on each other.
Finally, we point the patched libglvnd GLX implementation to the cache
dir and replace the current process with the target one.