glibc/sysdeps
Noah Goldstein 9a421348cd elf: Optimize _dl_new_hash in dl-new-hash.h
Unroll slightly and enforce good instruction scheduling. This improves
performance on out-of-order machines. The unrolling allows for
pipelined multiplies.

As well, as an optional sysdep, reorder the operations and prevent
reassosiation for better scheduling and higher ILP. This commit
only adds the barrier for x86, although it should be either no
change or a win for any architecture.

Unrolling further started to induce slowdowns for sizes [0, 4]
but can help the loop so if larger sizes are the target further
unrolling can be beneficial.

Results for _dl_new_hash
Benchmarked on Tigerlake: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz

Time as Geometric Mean of N=30 runs
Geometric of all benchmark New / Old: 0.674
  type, length, New Time, Old Time, New Time / Old Time
 fixed,      0,    2.865,     2.72,               1.053
 fixed,      1,    3.567,    2.489,               1.433
 fixed,      2,    2.577,    3.649,               0.706
 fixed,      3,    3.644,    5.983,               0.609
 fixed,      4,    4.211,    6.833,               0.616
 fixed,      5,    4.741,    9.372,               0.506
 fixed,      6,    5.415,    9.561,               0.566
 fixed,      7,    6.649,   10.789,               0.616
 fixed,      8,    8.081,   11.808,               0.684
 fixed,      9,    8.427,   12.935,               0.651
 fixed,     10,    8.673,   14.134,               0.614
 fixed,     11,    10.69,   15.408,               0.694
 fixed,     12,   10.789,   16.982,               0.635
 fixed,     13,   12.169,   18.411,               0.661
 fixed,     14,   12.659,   19.914,               0.636
 fixed,     15,   13.526,   21.541,               0.628
 fixed,     16,   14.211,   23.088,               0.616
 fixed,     32,   29.412,   52.722,               0.558
 fixed,     64,    65.41,  142.351,               0.459
 fixed,    128,  138.505,  295.625,               0.469
 fixed,    256,  291.707,  601.983,               0.485
random,      2,   12.698,   12.849,               0.988
random,      4,   16.065,   15.857,               1.013
random,      8,   19.564,   21.105,               0.927
random,     16,   23.919,   26.823,               0.892
random,     32,   31.987,   39.591,               0.808
random,     64,   49.282,   71.487,               0.689
random,    128,    82.23,  145.364,               0.566
random,    256,  152.209,  298.434,                0.51

Co-authored-by: Alexander Monakov <amonakov@ispras.ru>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2022-05-23 10:38:40 -05:00
..
aarch64 aarch64: Move ld.so _start to separate file and drop _dl_skip_args 2022-05-17 10:14:03 +01:00
alpha rtld: Remove DL_ARGV_NOT_RELRO and make _dl_skip_args const 2022-05-17 10:14:03 +01:00
arc rtld: Remove DL_ARGV_NOT_RELRO and make _dl_skip_args const 2022-05-17 10:14:03 +01:00
arm rtld: Remove DL_ARGV_NOT_RELRO and make _dl_skip_args const 2022-05-17 10:14:03 +01:00
csky rtld: Remove DL_ARGV_NOT_RELRO and make _dl_skip_args const 2022-05-17 10:14:03 +01:00
generic elf: Optimize _dl_new_hash in dl-new-hash.h 2022-05-23 10:38:40 -05:00
gnu Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
hppa elf: Replace PI_STATIC_AND_HIDDEN with opposite HIDDEN_VAR_NEEDS_DYNAMIC_RELOC 2022-04-26 09:26:22 -07:00
htl htl: Fix initializing the key lock 2022-02-14 19:29:02 +01:00
hurd hurd: Fix pthread_kill on exiting/ted thread 2022-01-15 15:11:54 +01:00
i386 i386: Regenerate ulps 2022-04-26 10:52:41 -04:00
ia64 rtld: Remove DL_ARGV_NOT_RELRO and make _dl_skip_args const 2022-05-17 10:14:03 +01:00
ieee754 math: Use builtin for ldbl-96 copysign 2022-04-07 14:54:14 -03:00
m68k m68k: Use an autoconf template to produce `preconfigure' 2022-05-13 17:07:23 +01:00
mach linux: Add P_PIDFD 2022-05-17 10:34:36 -03:00
microblaze elf: Replace PI_STATIC_AND_HIDDEN with opposite HIDDEN_VAR_NEEDS_DYNAMIC_RELOC 2022-04-26 09:26:22 -07:00
mips MIPS: Use an autoconf template to produce `preconfigure' 2022-05-13 17:07:23 +01:00
nios2 rtld: Remove DL_ARGV_NOT_RELRO and make _dl_skip_args const 2022-05-17 10:14:03 +01:00
nptl nptl: Add backoff mechanism to spinlock loop 2022-05-09 14:38:40 -07:00
or1k elf: Replace PI_STATIC_AND_HIDDEN with opposite HIDDEN_VAR_NEEDS_DYNAMIC_RELOC 2022-04-26 09:26:22 -07:00
posix gmon: Remove unused sprofil.c functions 2022-03-23 14:29:25 -03:00
powerpc powerpc32: Remove unused HAVE_PPC_SECURE_PLT 2022-05-02 08:55:36 -07:00
pthread nptl: Handle spurious EINTR when thread cancellation is disabled (BZ#29029) 2022-04-14 12:48:31 -03:00
riscv RISC-V: Use an autoconf template to produce `preconfigure' 2022-05-13 17:07:23 +01:00
s390 S390: Enable static PIE 2022-05-18 14:31:26 +02:00
sh elf: Replace PI_STATIC_AND_HIDDEN with opposite HIDDEN_VAR_NEEDS_DYNAMIC_RELOC 2022-04-26 09:26:22 -07:00
sparc rtld: Remove DL_ARGV_NOT_RELRO and make _dl_skip_args const 2022-05-17 10:14:03 +01:00
unix linux: Add tst-pidfd.c 2022-05-17 10:36:59 -03:00
wordsize-32 Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
wordsize-64 Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
x86 elf: Optimize _dl_new_hash in dl-new-hash.h 2022-05-23 10:38:40 -05:00
x86_64 x86_64: Remove bzero optimization 2022-05-16 09:36:06 -03:00