Commit graph

38939 commits

Author SHA1 Message Date
Paul E. Murphy aa13fd1618 nptl_db: disable DT_RELR on libthread_db.so
Some nptl tests inadvertently use the host's gdb to verify
libthread_db.so, which is loaded with the host's runtime.  This causes
a couple of test failures when the host glibc does not support DT_RELR.

The not correct, but simple, workaround is to build without DT_RELR
as this library is otherwise likely to load on glibc 2.17 and newer
today.

This allows tst-pthread-gdb-attach{,-static} to continue working
when testing on a gdb loaded with an older glibc.

This avoids a failure in tst-pthread-gdb-attach similar to:

  Trying host libthread_db library: .../build/glibc/nptl_db/libthread_db.so.1.
  dlopen failed: /lib64/libc.so.6: version `GLIBC_ABI_DT_RELR' not found (required by .../build/glibc/nptl_db/libthread_db.so.1).

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-06-08 11:17:47 -05:00
Andreas Schwab c2f39be490 elf: add missing newlines in lateglobal test 2022-06-08 15:28:41 +02:00
Adhemerval Zanella c7d36dcecc nptl: Fix __libc_cleanup_pop_restore asynchronous restore (BZ#29214)
This was due a wrong revert done on 404656009b.

Checked on x86_64-linux-gnu.
2022-06-08 09:23:02 -03:00
Noah Goldstein c28db9cb29 x86: ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST expect no transactions
Give fall-through path to `vzeroupper` and taken-path to `vzeroall`.

Generally even on machines with RTM the expectation is the
string-library functions will not be called in transactions.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-06-07 13:10:32 -07:00
Noah Goldstein 56da3fe1dd x86: Shrink code size of memchr-evex.S
This is not meant as a performance optimization. The previous code was
far to liberal in aligning targets and wasted code size unnecissarily.

The total code size saving is: 64 bytes

There are no non-negligible changes in the benchmarks.
Geometric Mean of all benchmarks New / Old: 1.000

Full xcheck passes on x86_64.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-06-07 13:10:32 -07:00
Noah Goldstein 6dcbb7d95d x86: Shrink code size of memchr-avx2.S
This is not meant as a performance optimization. The previous code was
far to liberal in aligning targets and wasted code size unnecissarily.

The total code size saving is: 59 bytes

There are no major changes in the benchmarks.
Geometric Mean of all benchmarks New / Old: 0.967

Full xcheck passes on x86_64.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-06-07 13:10:31 -07:00
Noah Goldstein af5306a735 x86: Optimize memrchr-avx2.S
The new code:
    1. prioritizes smaller user-arg lengths more.
    2. optimizes target placement more carefully
    3. reuses logic more
    4. fixes up various inefficiencies in the logic. The biggest
       case here is the `lzcnt` logic for checking returns which
       saves either a branch or multiple instructions.

The total code size saving is: 306 bytes
Geometric Mean of all benchmarks New / Old: 0.760

Regressions:
There are some regressions. Particularly where the length (user arg
length) is large but the position of the match char is near the
beginning of the string (in first VEC). This case has roughly a
10-20% regression.

This is because the new logic gives the hot path for immediate matches
to shorter lengths (the more common input). This case has roughly
a 15-45% speedup.

Full xcheck passes on x86_64.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-06-07 13:10:27 -07:00
Noah Goldstein b4209615a0 x86: Optimize memrchr-evex.S
The new code:
    1. prioritizes smaller user-arg lengths more.
    2. optimizes target placement more carefully
    3. reuses logic more
    4. fixes up various inefficiencies in the logic. The biggest
       case here is the `lzcnt` logic for checking returns which
       saves either a branch or multiple instructions.

The total code size saving is: 263 bytes
Geometric Mean of all benchmarks New / Old: 0.755

Regressions:
There are some regressions. Particularly where the length (user arg
length) is large but the position of the match char is near the
beginning of the string (in first VEC). This case has roughly a
20% regression.

This is because the new logic gives the hot path for immediate matches
to shorter lengths (the more common input). This case has roughly
a 35% speedup.

Full xcheck passes on x86_64.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-06-07 13:10:24 -07:00
Noah Goldstein 731feee386 x86: Optimize memrchr-sse2.S
The new code:
    1. prioritizes smaller lengths more.
    2. optimizes target placement more carefully.
    3. reuses logic more.
    4. fixes up various inefficiencies in the logic.

The total code size saving is: 394 bytes
Geometric Mean of all benchmarks New / Old: 0.874

Regressions:
    1. The page cross case is now colder, especially re-entry from the
       page cross case if a match is not found in the first VEC
       (roughly 50%). My general opinion with this patch is this is
       acceptable given the "coldness" of this case (less than 4%) and
       generally performance improvement in the other far more common
       cases.

    2. There are some regressions 5-15% for medium/large user-arg
       lengths that have a match in the first VEC. This is because the
       logic was rewritten to optimize finds in the first VEC if the
       user-arg length is shorter (where we see roughly 20-50%
       performance improvements). It is not always the case this is a
       regression. My intuition is some frontend quirk is partially
       explaining the data although I haven't been able to find the
       root cause.

Full xcheck passes on x86_64.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-06-07 13:09:36 -07:00
Noah Goldstein d0370d992e Benchtests: Improve memrchr benchmarks
Add a second iteration for memrchr to set `pos` starting from the end
of the buffer.

Previously `pos` was only set relative to the beginning of the
buffer. This isn't really useful for memrchr because the beginning
of the search space is (buf + len).
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-06-07 13:09:16 -07:00
Noah Goldstein dd5c483b25 x86: Add COND_VZEROUPPER that can replace vzeroupper if no ret
The RTM vzeroupper mitigation has no way of replacing inline
vzeroupper not before a return.

This can be useful when hoisting a vzeroupper to save code size
for example:

```
L(foo):
	cmpl	%eax, %edx
	jz	L(bar)
	tzcntl	%eax, %eax
	addq	%rdi, %rax
	VZEROUPPER_RETURN

L(bar):
	xorl	%eax, %eax
	VZEROUPPER_RETURN
```

Can become:

```
L(foo):
	COND_VZEROUPPER
	cmpl	%eax, %edx
	jz	L(bar)
	tzcntl	%eax, %eax
	addq	%rdi, %rax
	ret

L(bar):
	xorl	%eax, %eax
	ret
```

This code does not change any existing functionality.

There is no difference in the objdump of libc.so before and after this
patch.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-06-07 13:08:28 -07:00
Noah Goldstein 8a780a6b91 x86: Create header for VEC classes in x86 strings library
This patch does not touch any existing code and is only meant to be a
tool for future patches so that simple source files can more easily be
maintained to target multiple VEC classes.

There is no difference in the objdump of libc.so before and after this
patch.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-06-07 13:08:28 -07:00
Matheus Castanho 0218463dd8 powerpc: Fix VSX register number on __strncpy_power9 [BZ #29197]
__strncpy_power9 initializes VR 18 with zeroes to be used throughout the
code, including when zero-padding the destination string. However, the
v18 reference was mistakenly being used for stxv and stxvl, which take a
VSX vector as operand. The code ended up using the uninitialized VSR 18
register by mistake.

Both occurrences have been changed to use the proper VSX number for VR 18
(i.e. VSR 50).

Tested on powerpc, powerpc64 and powerpc64le.

Signed-off-by: Kewen Lin <linkw@gcc.gnu.org>
2022-06-07 15:07:25 -03:00
Wilco Dijkstra eea282d9c6 AArch64: Sort makefile entries
Sort makefile entries to reduce conflicts.
2022-06-07 16:58:15 +01:00
Wilco Dijkstra 9f298bfe1f AArch64: Add SVE memcpy
Add an initial SVE memcpy implementation.  Copies up to 32 bytes use SVE
vectors which improves the random memcpy benchmark significantly.
Cleanup the memcpy and memmove ifunc selectors.
2022-06-07 16:58:03 +01:00
Raghuveer Devulapalli 5082a287d5 x86_64: Add strstr function with 512-bit EVEX
Adding a 512-bit EVEX version of strstr. The algorithm works as follows:

(1) We spend a few cycles at the begining to peek into the needle. We
locate an edge in the needle (first occurance of 2 consequent distinct
characters) and also store the first 64-bytes into a zmm register.

(2) We search for the edge in the haystack by looking into one cache
line of the haystack at a time. This avoids having to read past a page
boundary which can cause a seg fault.

(3) If an edge is found in the haystack we first compare the first
64-bytes of the needle (already stored in a zmm register) before we
proceed with a full string compare performed byte by byte.

Benchmarking results: (old = strstr_sse2_unaligned, new = strstr_avx512)

Geometric mean of all benchmarks: new / old =  0.66

Difficult skiptable(0) : new / old =  0.02
Difficult skiptable(1) : new / old =  0.01
Difficult 2-way : new / old =  0.25
Difficult testing first 2 : new / old =  1.26
Difficult skiptable(0) : new / old =  0.05
Difficult skiptable(1) : new / old =  0.06
Difficult 2-way : new / old =  0.26
Difficult testing first 2 : new / old =  1.05
Difficult skiptable(0) : new / old =  0.42
Difficult skiptable(1) : new / old =  0.24
Difficult 2-way : new / old =  0.21
Difficult testing first 2 : new / old =  1.04
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-06-06 19:46:55 -07:00
Adhemerval Zanella 8521001731 scripts/glibcelf.py: Add PT_AARCH64_MEMTAG_MTE constant
It was added in commit 603e5c8ba7.
This caused the elf/tst-glibcelf consistency check to fail.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-06-06 15:56:48 -03:00
Dmitriy Fedchenko 999835533b socket: Fix mistyped define statement in socket/sys/socket.h (BZ #29225) 2022-06-06 12:46:14 -03:00
Joseph Myers 828c72519f Declare timegm for ISO C2X
The next revision of the ISO C standard has added the timegm function
(that was already supported in glibc).  Update the feature test
conditionals on its declaration in <time.h> accordingly.

Tested for x86_64.
2022-06-06 14:47:03 +00:00
Joseph Myers 603e5c8ba7 Add PT_AARCH64_MEMTAG_MTE from Linux 5.18 to elf.h
Linux 5.18 defines a new AArch64 ELF segment type
PT_AARCH64_MEMTAG_MTE; add it to elf.h.

Tested with build-many-glibcs.py for aarch64-linux-gnu.
2022-06-06 14:45:34 +00:00
Sam James 7df596a58c grep: egrep -> grep -E, fgrep -> grep -F
Newer versions of GNU grep (after grep 3.7, not inclusive) will warn on
'egrep' and 'fgrep' invocations.

Convert usages within the tree to their expanded non-aliased counterparts
to avoid irritating warnings during ./configure and the test suite.

Signed-off-by: Sam James <sam@gentoo.org>
Reviewed-by: Fangrui Song <maskray@google.com>
2022-06-05 12:09:02 -07:00
H.J. Lu 3c23fa9f44 string.h: Fix boolean spelling in comments 2022-06-03 10:22:38 -07:00
Carlos O'Donell 48f4b30780 elf: Add #include <errno.h> for use of E* constants.
In __strerror_r we use errno constants and must include errno.h.

Tested on x86_64 and i686 without regression.
2022-06-02 15:20:36 -04:00
Carlos O'Donell 62c888b337 elf: Add #include <sys/param.h> for MAX usage.
In _dl_audit_pltenter we use MAX and so need to include param.h.

Tested on x86_64 and i686 without regression.
2022-06-02 15:20:36 -04:00
Adhemerval Zanella 1002f1af1c linux: Add process_mrelease
Added in Linux 5.15 (884a7e5964e06ed93c7771c0d7cf19c09a8946f1), the new
syscalls allows a caller to free the memory of a dying target process.

Checked on x86_64-linux-gnu.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-06-02 15:43:28 -03:00
Adhemerval Zanella d19ee3473d linux: Add process_madvise
It was added on Linux 5.10 (ecb8ac8b1f146915aa6b96449b66dd48984caacc)
with the same functionality as madvise but using a pidfd of the target
process.

Checked on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-06-02 15:43:28 -03:00
Adhemerval Zanella 7d3e91ba19 linux: Set tst-pidfd-consts unsupported for kernels headers older than 5.10
Instead of fail trying to build the compare source file.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Matheus Castanho <msc@linux.ibm.com>
Reviewed-by: Matheus Castanho <msc@linux.ibm.com>
2022-06-02 15:43:25 -03:00
Florian Weimer bb8887379f testrun.sh: Support passing strace and valgrind arguments
This is a bit of a hack, but it works quite well in practice.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-06-02 18:37:30 +02:00
Florian Weimer 4b527650e0 Linux: Adjust struct rseq definition to current kernel version
This definition is only used as a fallback with old kernel headers.
The change follows kernel commit bfdf4e6208051ed7165b2e92035b4bf11
("rseq: Remove broken uapi field layout on 32-bit little endian").

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-06-02 16:29:59 +02:00
Adhemerval Zanella c789e6e409 iconv: Use 64 bit stat for gconv_parseconfdir (BZ# 29213)
The issue is only when used within libc.so (iconvconfig already builds
with _TIME_SIZE=64).

This is a missing spot initially from 52a5fe70a2.

Checked on i686-linux-gnu.
2022-06-01 13:23:16 -03:00
Adhemerval Zanella 634f566c3e catgets: Use 64 bit stat for __open_catalog (BZ# 29211)
This is a missing spot initially from 52a5fe70a2.

Checked on i686-linux-gnu.
2022-06-01 13:23:16 -03:00
Adhemerval Zanella 3cd4785ea0 inet: Use 64 bit stat for ruserpass (BZ# 29210)
This is a missing spot initially from 52a5fe70a2.

Checked on i686-linux-gnu.
2022-06-01 13:23:16 -03:00
Adhemerval Zanella 87f1ec12e7 socket: Use 64 bit stat for isfdtype (BZ# 29209)
This is a missing spot initially from 52a5fe70a2.

Checked on i686-linux-gnu.
2022-06-01 13:23:16 -03:00
Adhemerval Zanella 6e7137f28c posix: Use 64 bit stat for fpathconf (_PC_ASYNC_IO) (BZ# 29208)
This is a missing spot initially from 52a5fe70a2.

Checked on i686-linux-gnu.
2022-06-01 13:23:16 -03:00
Adhemerval Zanella 574ba60fc8 posix: Use 64 bit stat for posix_fallocate fallback (BZ# 29207)
This is a missing spot initially from 52a5fe70a2.

Checked on i686-linux-gnu.
2022-06-01 13:23:16 -03:00
Adhemerval Zanella ec995fb215 misc: Use 64 bit stat for getusershell (BZ# 29203)
This is a missing spot initially from 52a5fe70a2.

Checked on i686-linux-gnu.
2022-06-01 13:23:16 -03:00
Adhemerval Zanella 3fbc33010c misc: Use 64 bit stat for daemon (BZ# 29203)
This is a missing spot initially from 52a5fe70a2.

Checked on i686-linux-gnu.
2022-06-01 13:23:13 -03:00
WANG Xuerui e6547d635b linux: use statx for fstat if neither newfstatat nor fstatat64 is present
LoongArch is going to be the first architecture supported by Linux that
has neither fstat* nor newfstatat [1], instead exclusively relying on
statx. So in fstatat64's implementation, we need to also enable statx
usage if neither fstatat64 nor newfstatat is present, to prepare for
this new case of kernel ABI.

[1]: https://lore.kernel.org/all/20220518092619.1269111-1-chenhuacai@loongson.cn/

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-06-01 12:29:01 -03:00
Joseph Myers de3501d60f Add MADV_DONTNEED_LOCKED from Linux 5.18 to bits/mman-linux.h
Linux 5.18 adds a constant MADV_DONTNEED_LOCKED (defined in multiple
header files, but with the same value on all architectures).  Add this
constant to bits/mman-linux.h.

Tested for x86_64.
2022-06-01 14:45:48 +00:00
Joseph Myers 9d03bac7e7 Add HWCAP2_MTE3 from Linux 5.18 to AArch64 bits/hwcap.h
Linux 5.18 defines a new AArch64 HWCAP value HWCAP2_MTE3; add it to
glibc's sysdeps/unix/sysv/linux/aarch64/bits/hwcap.h.

Tested with build-many-glibcs.py for aarch64-linux-gnu.
2022-06-01 14:43:06 +00:00
Adhemerval Zanella 5a6f2cabb6 i686: Use generic sincosf implementation for SSE2 version
The generic implementation shows slight better performance
(gcc 11.2.1 on a Ryzen 9 5900X):

* s_sincosf-sse2.S:
  "sincosf": {
   "workload-random": {
    "duration": 3.89961e+09,
    "iterations": 9.5472e+07,
    "reciprocal-throughput": 40.8429,
    "latency": 40.8483,
    "max-throughput": 2.4484e+07,
    "min-throughput": 2.44808e+07
   }
  }

* generic s_cossinf.c:
  "sincosf": {
   "workload-random": {
    "duration": 3.71953e+09,
    "iterations": 1.48512e+08,
    "reciprocal-throughput": 25.0515,
    "latency": 25.0391,
    "max-throughput": 3.99177e+07,
    "min-throughput": 3.99375e+07
   }
  }

Checked on i686-linux-gnu.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-06-01 10:47:44 -03:00
Adhemerval Zanella dc208f4a53 benchtests: Add workload name for sincosf
So it can show both reciprocal-throughput and latency.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-06-01 10:47:44 -03:00
Adhemerval Zanella 3323476641 i686: Use generic sinf implementation for SSE2 version
Performance seems to be similar (gcc 11.2.1 on a Ryzen 9 5900X),
the generic algorithm shows slight better performance for
the 'workload-huge.wrf' input set.

* s_sinf-sse2.S:
  "sinf": {
   "": {
    "duration": 3.72405e+09,
    "iterations": 2.38374e+08,
    "max": 63.973,
    "min": 11.211,
    "mean": 15.6227
   },
   "workload-random.wrf": {
    "duration": 3.76923e+09,
    "iterations": 8.4e+07,
    "reciprocal-throughput": 17.6355,
    "latency": 72.108,
    "max-throughput": 5.67037e+07,
    "min-throughput": 1.38681e+07
   },
   "workload-huge.wrf": {
    "duration": 3.76943e+09,
    "iterations": 6e+07,
    "reciprocal-throughput": 29.3493,
    "latency": 96.2985,
    "max-throughput": 3.40724e+07,
    "min-throughput": 1.03844e+07
   }
  }

* generic s_sinf.c:
  "sinf": {
   "": {
    "duration": 3.70989e+09,
    "iterations": 2.18025e+08,
    "max": 69.782,
    "min": 11.1,
    "mean": 17.0159
   },
   "workload-random.wrf": {
    "duration": 3.77213e+09,
    "iterations": 9.6e+07,
    "reciprocal-throughput": 17.5402,
    "latency": 61.0459,
    "max-throughput": 5.70119e+07,
    "min-throughput": 1.63811e+07
   },
   "workload-huge.wrf": {
    "duration": 3.81576e+09,
    "iterations": 5.6e+07,
    "reciprocal-throughput": 38.2111,
    "latency": 98.0659,
    "max-throughput": 2.61704e+07,
    "min-throughput": 1.01972e+07
   }
  }

Checked on i686-linux-gnu.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-06-01 10:47:44 -03:00
Adhemerval Zanella da39afa4ff i686: Use generic cosf implementation for SSE2 version
Performance seems to be similar (gcc 11.2.1 on a Ryzen 9 5900X):

* s_cosf-sse2.S:
  "cosf": {
   "workload-random": {
    "duration": 3.74987e+09,
    "iterations": 9.616e+07,
    "reciprocal-throughput": 15.8141,
    "latency": 62.1782,
    "max-throughput": 6.32346e+07,
    "min-throughput": 1.60828e+07
   }
  }

* generic s_cosf.c:
  "cosf": {
   "workload-random": {
    "duration": 3.87298e+09,
    "iterations": 1.00968e+08,
    "reciprocal-throughput": 18.3448,
    "latency": 58.3722,
    "max-throughput": 5.45113e+07,
    "min-throughput": 1.71314e+07
   }
  }

Checked on i686-linux-gnu.
2022-06-01 10:47:44 -03:00
Adhemerval Zanella c1176b62a9 benchtests: Add workload name for cosf
So it can show both reciprocal-throughput and latency.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-06-01 10:47:44 -03:00
Andreas Schwab dc1e5eeb25 x86_64: Optimize sincos where sin/cos is optimized (bug 29193)
The compiler may substitute calls to sin or cos with calls to sincos, thus
we should have the same optimized implementations for sincos.  The
optimized implementations may produce results that differ, that also makes
sure that the sincos call aggrees with the sin and cos calls.
2022-06-01 10:29:52 +02:00
Andreas Schwab d976d44a89 manual: fix reference to source file 2022-05-31 16:22:49 +02:00
Joseph Myers 6488f4d006 Add SOL_SMC from Linux 5.18 to bits/socket.h
Linux 5.18 adds a constant SOL_SMC to the getsockopt / setsockopt
levels; add this constant to bits/socket.h.

Tested for x86_64.
2022-05-31 13:49:53 +00:00
Adhemerval Zanella 81e7fdd7cc elf: Remove _dl_skip_args
Now that no architecture uses it anymore.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-05-30 16:33:54 -03:00
Adhemerval Zanella ec7bc492b6 x86_64: Remove _dl_skip_args usage
Since ad43cac44a the generic code already shuffles the argv/envp/auxv
on the stack to remove the ld.so own arguments and thus _dl_skip_args
is always 0.   So there is no need to adjust the argc or argv.

Checked on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-05-30 16:33:34 -03:00