You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
Félix Baylac-Jacqué d4488c5096 Initial benchmark: comparing haskell's Text with icu on EN corpus 3 months ago
haskell Initial benchmark: comparing haskell's Text with icu on EN corpus 3 months ago
icu Initial benchmark: comparing haskell's Text with icu on EN corpus 3 months ago
Makefile Initial benchmark: comparing haskell's Text with icu on EN corpus 3 months ago
default.nix Initial benchmark: comparing haskell's Text with icu on EN corpus 3 months ago
readme.md Initial benchmark: comparing haskell's Text with icu on EN corpus 3 months ago

readme.md

UTF-8 Benchmark

The idea is to benchmark several utf-8 aspects of the Haskell Text package. Namely utf-8 encoding/decoding and various unicode casing operations. Hopefully, we'll find ways to improve text's performance.

For the time being, we're comparing the Text implementation with the C ICU one. In the future I also plan to test it against C++ Boost and the Rust stdlib.

Implemented so far

  • UTF-8 decoding from file.
    • English
    • Chinese
    • French
    • Russian
  • UTF-8 encoding to file.
    • English
    • Chinese
    • French
    • Russian
  • UTF-8 encoding to file.
    • English
    • Chinese
    • French
    • Russian

Usage

nix-shell
make all

Findings

UTF-8 Decoding

hyperfine ./haskell-read-utf8 ./icu-read-utf8
Benchmark #1: ./haskell-read-utf8
  Time (mean ± σ):      23.3 ms ±   0.9 ms    [User: 14.9 ms, System: 8.3 ms]
  Range (min … max):    22.0 ms …  26.1 ms    111 runs

Benchmark #2: ./icu-read-utf8
  Time (mean ± σ):      12.5 ms ±   0.8 ms    [User: 7.6 ms, System: 4.9 ms]
  Range (min … max):    11.5 ms …  16.1 ms    176 runs

Summary
  './icu-read-utf8' ran
    1.85 ± 0.14 times faster than './haskell-read-utf8'