Tensile, as a part of ROCm stack, is a development toolkit for tuning GEMM operation on GPUs via benchmarks, and then create backend libraries for GEMM applications (rocBLAS).
dev-util/Tensile installs the python scripts for running benchmarks, analyzing data, and building backend libraries. It also ships various common benchmark configurations and shell scripts, as well as C++ source code for building tensile_client.
emerge --ask dev-util/Tensile
Running benchmarks to stretch GPU GEMM performance
Please refer to the official Tensile wiki about how to write benchmark configurations and run benchmarks. Since Gentoo already installs the command
Tensile, so extra installation is not needed, just execute
Tensile [-v] --global-parameters=Architecture=<your GPU arch> <benchmark_config.ymal> <benchmark_directory>