glibc/sysdeps
H.J. Lu c15f8eb50c x86-64: Improve branch predication in _dl_runtime_resolve_avx512_opt [BZ #21258]
On Skylake server, _dl_runtime_resolve_avx512_opt is used to preserve
the first 8 vector registers.  The code layout is

  if only %xmm0 - %xmm7 registers are used
     preserve %xmm0 - %xmm7 registers
  if only %ymm0 - %ymm7 registers are used
     preserve %ymm0 - %ymm7 registers
  preserve %zmm0 - %zmm7 registers

Branch predication always executes the fallthrough code path to preserve
%zmm0 - %zmm7 registers speculatively, even though only %xmm0 - %xmm7
registers are used.  This leads to lower CPU frequency on Skylake
server.  This patch changes the fallthrough code path to preserve
%xmm0 - %xmm7 registers instead:

  if whole %zmm0 - %zmm7 registers are used
    preserve %zmm0 - %zmm7 registers
  if only %ymm0 - %ymm7 registers are used
     preserve %ymm0 - %ymm7 registers
  preserve %xmm0 - %xmm7 registers

Tested on Skylake server.

	[BZ #21258]
	* sysdeps/x86_64/dl-trampoline.S (_dl_runtime_resolve_opt):
	Define only if _dl_runtime_resolve is defined to
	_dl_runtime_resolve_sse_vex.
	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_opt):
	Fallthrough to _dl_runtime_resolve_sse_vex.
2017-03-21 11:00:12 -07:00
..
aarch64 Add ifunc support for aarch64. 2017-03-15 16:46:26 -07:00
alpha Remove _dl_platform_string 2017-03-14 17:18:52 +01:00
arm Update arm, mips, powerpc-nofpu libm-test-ulps. 2017-02-17 23:10:01 +00:00
generic Remove _dl_platform_string 2017-03-14 17:18:52 +01:00
gnu Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
hppa hppa: Fix setting of __libc_stack_end 2017-03-15 13:37:16 -07:00
i386 Remove _dl_platform_string 2017-03-14 17:18:52 +01:00
ia64 Allow direct use of math_ldbl.h in testsuite. 2017-02-25 10:40:48 -05:00
ieee754 Improve float range reduction accuracy near pi/2 (bug 21094). 2017-03-15 22:00:54 +00:00
init_array Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
m68k m68k: fix 64bit atomic ops 2017-02-01 01:32:31 +01:00
mach hurd: Make send/recv more posixish 2017-03-13 20:41:12 +01:00
microblaze Remove very old libm-test-ulps entries. 2017-01-20 23:58:49 +00:00
mips Remove _dl_platform_string 2017-03-14 17:18:52 +01:00
nacl Narrowing the visibility of libc-internal.h even further. 2017-03-01 20:33:46 -05:00
nios2 New pthread rwlock that is more scalable. 2017-01-10 11:50:17 +01:00
nptl Narrowing the visibility of libc-internal.h even further. 2017-03-01 20:33:46 -05:00
posix Remove the str(n)dup inlines from string/bits/string2.h. Although inlining 2017-03-13 18:45:42 +00:00
powerpc Miscellaneous low-risk changes preparing for _ISOMAC testsuite. 2017-03-01 20:32:50 -05:00
pthread Refer to <signal.h> instead of <pthread.h> in <bits/sigthread.h> 2017-02-28 10:34:15 +01:00
s390 Remove _dl_platform_string 2017-03-14 17:18:52 +01:00
sh sh: Fix building with gcc5/6 2017-03-12 17:29:32 -03:00
sparc Fix sparc64 bits/setjmp.h namespace (bug 21261). 2017-03-18 00:17:25 +00:00
tile tile: Check for pointer add overflow in memchr 2017-01-16 15:44:48 -05:00
unix conformtest: Add x32 XFAILs for mq_attr element types (bug 21279). 2017-03-20 21:30:28 +00:00
wordsize-32 Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
wordsize-64 Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
x86 Use CPU_FEATURES_CPU_P to check if AVX is available 2017-03-17 11:38:13 -07:00
x86_64 x86-64: Improve branch predication in _dl_runtime_resolve_avx512_opt [BZ #21258] 2017-03-21 11:00:12 -07:00