We manage to improve the cold cache generation from 98s to ~30s on my
desktop.
Two things have been done to improve that performance:
1. This one was stupid. I forgot a debug tracing routine that should
have been removed in the code… This tracing routine was forcing us
to cache the libraries… …twice. Massive facepalm. Addressing this
reduced the cold runtime by 50%.
2. Instead of spinning up a patchelf subprocess for each library, we
batch these operations as much as possible in a single subprocess.
This trick shaves about 30% of the remaining runtime.
After profiling a nixglhost hot run, it turns out that we were
spending more than 98% of the run time reading and sha256-hashing
files.
Let's give up on content hashing the files and assume that using their
name, size and last write time is good enough.
On a hot run, we reduce the run time from about 3s to 0.3s on a
nvme-powered ryzen 7 desktop.
I guess this 10x speedup probably worth the little cache corectness we
lose on the way.