User:Gso321/Benchmarks

From Gentoo Wiki
Jump to:navigation Jump to:search

These are my custom benchmarks that I run on my gentoo linux.

LLVM and Clang with BOLT-PGO.cmake

Note
Building LLVM took more than 4 hours with 8 jobs for my machine. Uname -a was Linux tux 6.6.67-gentoo-gentoo-dist #1 SMP PREEMPT_DYNAMIC Wed Jan 8 20:53:16 EST 2025 x86_64 Intel(R) Core(TM) i5-1035G1 CPU @ 1.00GHz GenuineIntel GNU/Linux.

Install LLVM:

user $cd llvm-project
user $mkdir build
user $cd build

Run the commands to build LLVM:

cmake -S ../llvm -G Ninja -C ../clang/cmake/caches/BOLT-PGO.cmake \ -DBOOTSTRAP_LLVM_ENABLE_LLD=ON \ -DBOOTSTRAP_BOOTSTRAP_LLVM_ENABLE_LLD=ON \ -DPGO_INSTRUMENT_LTO=Thin \ -DCMAKE_INSTALL_PREFIX="/home/kael/llvm-project/bin" -DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_ENABLE_ASSERTIONS=OFF -DLLVM_ENABLE_PROJECTS="bolt;clang;lld;polly" -DCMAKE_C_FLAGS="-O3 -march=native -pipe -fmerge-all-constants -fpointer-tbaa" -DCMAKE_CXX_FLAGS="-O3 -march=native -pipe -fmerge-all-constants -fpointer-tbaa" -DCMAKE_C_COMPILER="clang" -DCMAKE_CXX_COMPILER="clang++" -DLLVM_ENABLE_LTO="Thin" -DLLVM_ENABLE_LLD="true"

ninja -j8

Both were clean and run with the commands:

user $make clean
user $time make LLVM=1

Original LLVM is llvm-core/llvm 19.1.4 build with -O2 -march=native -pipe First when building the 6.6.67-gentoo kernel, the time command showed:

real 53m3.855s user 55m38.382s sys 2m34.281s

Building the kernel using LLVM 20.0.0git 14b44179cb61dd551c911dea54de57b588621005 showed:

real 48m18.362s user 50m38.903s

This new LLVM saves about 9% real time.

BOLT Zig

user $mkdir build
user $cd build
user $cmake .. -DZIG_NO_LIB=ON -GNinja -DCMAKE_BUILD_TYPE=Debug
user $ninja install


Remove .zig-cache everytime zig is run.

Results:

Benchmark 1 (3 runs): ./zig-normal.sh

 measurement          mean ± σ            min … max           outliers         delta
 wall_time           399s  ± 13.8s      385s  …  412s           0 ( 0%)        0%
 peak_rss           5.67GB ±  590MB    4.99GB … 6.01GB          0 ( 0%)        0%
 cpu_cycles         1.38T  ± 4.15G     1.37T  … 1.38T           0 ( 0%)        0%
 instructions       2.42T  ±  591M     2.42T  … 2.42T           0 ( 0%)        0%
 cache_references   24.7G  ±  166M     24.6G  … 24.9G           0 ( 0%)        0%
 cache_misses       2.05G  ± 5.98M     2.05G  … 2.06G           0 ( 0%)        0%
 branch_misses      3.57G  ± 5.76M     3.56G  … 3.57G           0 ( 0%)        0%

Benchmark 2 (3 runs): ./zig-bolt.sh

 measurement          mean ± σ            min … max           outliers         delta
 wall_time           344s  ± 3.62s      341s  …  348s           0 ( 0%)        ⚡- 13.8% ±  5.7%
 peak_rss           6.02GB ± 13.4MB    6.01GB … 6.04GB          0 ( 0%)          +  6.2% ± 16.7%
 cpu_cycles         1.21T  ±  309M     1.21T  … 1.21T           0 ( 0%)        ⚡- 12.0% ±  0.5%
 instructions       2.32T  ±  249M     2.32T  … 2.32T           0 ( 0%)        ⚡-  4.1% ±  0.0%
 cache_references   16.7G  ± 50.6M     16.6G  … 16.7G           0 ( 0%)        ⚡- 32.6% ±  1.1%
 cache_misses       2.05G  ± 15.4M     2.03G  … 2.06G           0 ( 0%)          -  0.2% ±  1.3%
 branch_misses      3.36G  ± 8.38M     3.35G  … 3.37G           0 ( 0%)        ⚡-  5.9% ±  0.5%