bench-my-utf8/readme.md

51 lines
1.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# UTF-8 Benchmark
The idea is to benchmark several utf-8 aspects of the Haskell Text package. Namely utf-8 encoding/decoding and various unicode casing operations. Hopefully, we'll find ways to improve text's performance.
For the time being, we're comparing the Text implementation with the C ICU one. In the future I also plan to test it against C++ Boost and the Rust stdlib.
# Implemented so far
- [x] UTF-8 decoding from file.
- [x] English
- [ ] Chinese
- [ ] French
- [ ] Russian
- [ ] UTF-8 encoding to file.
- [ ] English
- [ ] Chinese
- [ ] French
- [ ] Russian
- [ ] UTF-8 encoding to file.
- [ ] English
- [ ] Chinese
- [ ] French
- [ ] Russian
# Usage
```bash
nix-shell
make all
```
# Findings
## UTF-8 Decoding
```
hyperfine ./haskell-read-utf8 ./icu-read-utf8
Benchmark #1: ./haskell-read-utf8
Time (mean ± σ): 23.3 ms ± 0.9 ms [User: 14.9 ms, System: 8.3 ms]
Range (min … max): 22.0 ms … 26.1 ms 111 runs
Benchmark #2: ./icu-read-utf8
Time (mean ± σ): 12.5 ms ± 0.8 ms [User: 7.6 ms, System: 4.9 ms]
Range (min … max): 11.5 ms … 16.1 ms 176 runs
Summary
'./icu-read-utf8' ran
1.85 ± 0.14 times faster than './haskell-read-utf8'
```