We had a race condition on cold cache builds. When several nix-gl-host
instances were called, they were trying to concurrently build the
cache.
This lead to some weird errors and busted caches.
We introduce a file lock preventing any concurrent access to the
cache.
We take advantage of this bug to rethink the way we build the cache
and do it in a more robust way. Instead of building it in place, we're
now building it first in a temporary directory (while making sure to
patchelf the DSOs according their final destination). We then move
this directory to the actual cache destination iff the cache has been
built successfully.
It seems like some drivers installations are missing some NVidia
subsystems. We stumbled upon the case of somebody missing the Cuda
libraries.
It did end up making fail the patchelf call.
Preventing the copying/patch routine to run when we do not have any
DSO to copy/patch.
We manage to improve the cold cache generation from 98s to ~30s on my
desktop.
Two things have been done to improve that performance:
1. This one was stupid. I forgot a debug tracing routine that should
have been removed in the code… This tracing routine was forcing us
to cache the libraries… …twice. Massive facepalm. Addressing this
reduced the cold runtime by 50%.
2. Instead of spinning up a patchelf subprocess for each library, we
batch these operations as much as possible in a single subprocess.
This trick shaves about 30% of the remaining runtime.