Tensile, as a part of ROCm stack, is a development toolkit for tuning GEMM operation on GPUs via benchmarks, and then create backend libraries for GEMM applications (rocBLAS).


dev-util/Tensile installs the python scripts for running benchmarks, analyzing data, and building backend libraries. It also ships various common benchmark configurations and shell scripts, as well as C++ source code for building tensile_client.


Install dev-util/Tensile:

root #emerge --ask dev-util/Tensile


Running benchmarks to stretch GPU GEMM performance

Please refer to the official Tensile wiki about how to write benchmark configurations and run benchmarks. Since Gentoo already installs the command Tensile, so extra installation is not needed, just execute

user $Tensile [-v] --global-parameters=Architecture=<your GPU arch> <benchmark_config.ymal> <benchmark_directory>