It seems like some drivers installations are missing some NVidia
subsystems. We stumbled upon the case of somebody missing the Cuda
libraries.
It did end up making fail the patchelf call.
Preventing the copying/patch routine to run when we do not have any
DSO to copy/patch.
We manage to improve the cold cache generation from 98s to ~30s on my
desktop.
Two things have been done to improve that performance:
1. This one was stupid. I forgot a debug tracing routine that should
have been removed in the code… This tracing routine was forcing us
to cache the libraries… …twice. Massive facepalm. Addressing this
reduced the cold runtime by 50%.
2. Instead of spinning up a patchelf subprocess for each library, we
batch these operations as much as possible in a single subprocess.
This trick shaves about 30% of the remaining runtime.