According to ROCm official document (v5.4.3) "ROCm is a brand name for ROCm open software platform (for software) or the ROCm™ open platform ecosystem (includes hardware like FPGAs or other CPU architectures)."
In the scope of Gentoo distribution, "ROCm" refers to ROCm open software platform, currently supporting AMDGPU as its hardware.
Notice that ROCm itself aims for as an environment for heterogeneous computing, not limiting to AMDGPU. It is the current packaging strategy of Gentoo that ROCm only supports AMDGPU; if ROCm is needed for other vendors (typically the cuda backend of
sci-libs/hip-*packages), please file a bug to Gentoo Bugzilla
Note, ROCm is not:
- ROCm is not only "the CUDA" for AMD GPUs. Although it provides HIP, whose api and syntax is similar with CUDA, it also provides opencl and openmp programming model.
- ROCm is not the only way to run (compute) tasks on AMD GPUs. The ROCm kernel driver, is a part of amdgpu linux driver. There are OpenGL, Vulkan, etc which is independent of ROCm.
Components of ROCm
ROCm can be classified into five categories:
- Drivers and runtimes, provided by the amdgpu kernel model and
- Programming models. See ROCm#Programming_models for details.
- Compilers and tools. Gentoo uses to vanilla clang (
- Libraries. Gentoo has packaged most libraries prefixed by
sci-libs/roc*packages are written in HIP and uses hipamd as backend, while
sci-libs/hip*are simple wrappers.
- Deployment tools. As a user of Gentoo, the best choice to deploy common ROCm components is via portage.
It is recommended to use recent Linux kernel to achieve wider range of supported device, better performance and proper error handling.
See amdgpu kernel document for detailed information.
The following kernel config is required:
CONFIG_DRM_AMDGPU CONFIG_DRM_AMDGPU_USERPTR CONFIG_HMM_MIRROR CONFIG_HSA_AMD CONFIG_ZONE_DEVICE
It will also be checked when emerging
It is recommended to turn on the following to achieve unified memory and managed memory in HIP:
Kernel command line parameters
See amdgpu parameters kernel document for detailed information.
For example, setting
amdgpu.ppfeaturemask=0xffffffff gives full features in AMDGPU power play, which maybe useful when adjusting GPU power profiles via
System monitoring tools
emerge --ask dev-util/rocm-smi
Detailed information can be seen in OpenCL#AMD.
Detailed information can be seen in HIP.
To enable openmp offloading on AMDGPU, install
sys-libs/libomp with AMDGPU offload enabled.
Set USE flags for the package:
sys-libs/libomp offload LLVM_TARGETS: AMDGPU
emerge --ask sys-libs/libomp
Clang cannot detect gpu architecture automatically (or in cross compile, arch is not present on compile machine), so clang needs a GPU arch specifier script:
#!/bin/bash echo "gfx90a" # Change to the target to compile here, but do not append target features such as :xnack-
Make script executable:
chmod +x /tmp/print_gpu_arch.sh
Then compile openmp:
clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa --libomptarget-amdgcn-bc-path=/usr/lib64/ --amdgpu-arch-tool=/tmp/print_gpu_arch.sh <openmp source code> -o <executable>
The backend of ROCm is currently llvm/clang, so any programming model that can generate LLVM IR for AMDGPU can use ROCm. Numba is a jit compiler for python codes, and can offload to ROCm. Currently Gentoo does not packaged numba with ROCm yet.
Currently, Gentoo has packages
sci-libs category. Those are math and deep learning libraries written in HIP and runs on AMD GPUs.
Wrapper packages, are
hipBLAS (wrapper of
hipCUB (wrapper of
hipDNN is currently not packaged. It's a wrapper of
nccl) is collective communication routines for AMD GPUs. It can also run tests, but tests are only meaningful on multi GPU systems.
sci-libs/rocALUTION (targeting paralution) is currently in development.
Specifying architectures to compile
rocm.eclass (ROCm version >=5.1.3), Gentoo handles the
USE_EXPAND. The map between GPU and arch name can be viewed via checking use flag for ROCm libraries:
equery uses rocBLAS
* Found these USE flags for sci-libs/rocBLAS-5.4.2-r1: U I - - amdgpu_targets_gfx1010 : RDNA GPU, codename navi10, including Radeon RX 5700XT/5700/5700M/5700B/5700XTB/5600XT/5600/5600M, Radeon Pro 5700XT/5700, Radeon Pro W5700X/W5700 - - amdgpu_targets_gfx1011 : RDNA GPU, codename navi12, including Radeon Pro 5600M/V520 - - amdgpu_targets_gfx1012 : RDNA GPU, codename navi14, including Radeon RX 5500XT/5500/5500M/5500XTB/5300/5300M, Radeon Pro 5500XT/5500M/5300/5300M, Radeon Pro W5500X/W5500/W5500M/W5300M + - amdgpu_targets_gfx1030 : RDNA2 GPU, codename navi21/sienna cichlid, including Radeon RX 6950XT/6900XT/6800XT/6800, Radeon Pro W6800 - - amdgpu_targets_gfx1031 : RDNA2 GPU, codename navi22/navy flounder, including Radeon RX 6750XT/6700XT/6800M/6700M - - amdgpu_targets_gfx1100 : RDNA3 GPU, codename navi31/plum bonito, including Radeon RX 7900XTX/7900XT - - amdgpu_targets_gfx1101 : RDNA3 GPU, codename navi32 - - amdgpu_targets_gfx1102 : RDNA3 GPU, codename navi33 - - amdgpu_targets_gfx803 : Fiji GPU, codename fiji, including Radeon R9 Nano/Fury/FuryX, Radeon Pro Duo, FirePro S9300x2, Radeon Instinct MI8 - - amdgpu_targets_gfx900 : Vega GPU, codename vega10, including Radeon Vega Frontier Edition, Radeon RX Vega 56/64, Radeon RX Vega 64 Liquid, Radeon Pro Vega 48/56/64/64X, Radeon Pro WX 8200/9100, Radeon Pro V320/V340/SSG, Radeon Instinct MI25 + - amdgpu_targets_gfx906 : Vega GPU, codename vega20, including Radeon (Pro) VII, Radeon Instinct MI50/MI60 + - amdgpu_targets_gfx908 : CDNA Accelerator, codename arcturus, including AMD Instinct MI100 Accelerator + - amdgpu_targets_gfx90a : CDNA2 Accelerator, codename aldebaran, including AMD Instinct MI200 series Accelerators - - benchmark : Build and install rocblas-bench. - - doc : Add extra documentation (API, Javadoc, etc). It is recommended to enable per package instead of globally - - test : Perform rocblas-test to compare the result between rocBLAS and system BLAS.
By default, officially supported architectures (
gfx906 gfx908 gfx90a gfx1030) are turned on. For example, for a system with Radeon VII and RX 6700XT, specify GPU archs for all packages:
# disable gfx908, gfx90a, gfx1030; turn on gfx1031; gfx906 remains on */* AMDGPU_TARGETS: -gfx908 -gfx90a -gfx1030 gfx1031
Adjusting use flags for individual packages is also supported. Portage will take care of the dependencies: if
sci-libs/rocBLAS should turns on
gfx1031, or when portage will try to add it to
Upgrade to 5.1.3 or above from the legacy way
rocm.eclass (ROCm version <5.1.3), architectures are specified via environment variable
For users installing ROCm libraries using the legacy method (specifying
/etc/portage/make.conf), upgrading to 5.1.3 takes two steps:
/etc/portage/package.use/00-amdgpu-targetsmentioned in ROCm#Specifying_architectures_to_compile
Contributing and developing guide
Testing ROCm libraries is not easy -- it requires recent AMD discrete GPUs and days of compilation and testing. If using ROCm libraries and mathematical correctness is considered important, please test the hardware by enabling tests:
Then emerge the desired ROCm package. If test failures occurs, usually it is caused by small inconsistencies between ROCm libraries and CPU reference implementations. Or it is caused by upstream bugs, or Gentoo deployment strategy. In either situation, filing a bug report to Gentoo Bugzilla is welcome, and it would be better to report to upstream for mathematical errors.