ROCm
According to ROCm official document (v5.4.3) "ROCm is a brand name for ROCm open software platform (for software) or the ROCm™ open platform ecosystem (includes hardware like FPGAs or other CPU architectures)."
In the scope of Gentoo distribution, "ROCm" refers to ROCm open software platform, currently supporting AMDGPU as its hardware.
Notice that ROCm itself aims for as an environment for heterogeneous computing, not limiting to AMDGPU. It is the current packaging strategy of Gentoo that ROCm only supports AMDGPU; if ROCm is needed for other vendors (typically the cuda backend of
sci-libs/hip-*
packages), please file a bug to Gentoo BugzillaNote, ROCm is not:
- ROCm is not only "the CUDA" for AMD GPUs. Although it provides HIP, whose api and syntax is similar with CUDA, it also provides opencl and openmp programming model.
- ROCm is not the only way to run (compute) tasks on AMD GPUs. The ROCm kernel driver, is a part of amdgpu linux driver. There are OpenGL, Vulkan, etc which is independent of ROCm.
Components of ROCm
ROCm can be classified into five categories:
- Drivers and runtimes, provided by the amdgpu kernel model and
dev-libs/roct-thunk-interface
anddev-libs/rocr-runtime
. - Programming models. See ROCm#Programming_models for details.
- Compilers and tools. Gentoo uses to vanilla clang (
>=sys-devel/clang-14.0.6-r1
). - Libraries. Gentoo has packaged most libraries prefixed by
roc
andhip
insci-libs
category, withsrc_test
enabled. Allsci-libs/roc*
packages are written in HIP and uses hipamd as backend, whilesci-libs/hip*
are simple wrappers. - Deployment tools. As a user of Gentoo, the best choice to deploy common ROCm components is via portage.
Installation guide
Kernel driver
It is recommended to use recent Linux kernel to achieve wider range of supported device, better performance and proper error handling.
Kernel configurations
See amdgpu kernel document for detailed information.
The following kernel config is required:
CONFIG_DRM_AMDGPU CONFIG_DRM_AMDGPU_USERPTR CONFIG_HMM_MIRROR CONFIG_HSA_AMD CONFIG_ZONE_DEVICE
It will also be checked when emerging dev-libs/roct-thunk-interface
It is recommended to turn on the following to achieve unified memory and managed memory in HIP:
CONFIG_HSA_AMD_SVM
Kernel command line parameters
See amdgpu parameters kernel document for detailed information.
For example, setting amdgpu.ppfeaturemask=0xffffffff
gives full features in AMDGPU power play, which maybe useful when adjusting GPU power profiles via rocm-smi
.
System monitoring tools
Install dev-util/rocm-smi:
root #
emerge --ask dev-util/rocm-smi
Programming models
OpenCL
Detailed information can be seen in OpenCL#AMD.
HIP
Detailed information can be seen in HIP.
OpenMP
To enable openmp offloading on AMDGPU, install sys-libs/libomp
with AMDGPU offload enabled.
Set USE flags for the package:
/etc/portage/package.use/99-rocm
sys-libs/libomp offload LLVM_TARGETS: AMDGPU
Install sys-libs/libomp:
root #
emerge --ask sys-libs/libomp
Clang cannot detect gpu architecture automatically (or in cross compile, arch is not present on compile machine), so clang needs a GPU arch specifier script:
/tmp/print_gpu_arch.sh
GPU arch specifier#!/bin/bash echo "gfx90a" # Change to the target to compile here, but do not append target features such as :xnack-
Make script executable:
user $
chmod +x /tmp/print_gpu_arch.sh
Then compile openmp:
user $
clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa --libomptarget-amdgcn-bc-path=/usr/lib64/ --amdgpu-arch-tool=/tmp/print_gpu_arch.sh <openmp source code> -o <executable>
Others
The backend of ROCm is currently llvm/clang, so any programming model that can generate LLVM IR for AMDGPU can use ROCm. Numba is a jit compiler for python codes, and can offload to ROCm. Currently Gentoo does not packaged numba with ROCm yet.
ROCm libraries
Currently, Gentoo has packages rocBLAS
, rocFFT
, rocPRIM
, rocRAND
, rocSOLVER
, rocSPARSE
, rocThrust
, and miopen
, in sci-libs
category. Those are math and deep learning libraries written in HIP and runs on AMD GPUs.
Wrapper packages, are hipBLAS
(wrapper of rocBLAS
+rocSOLVER
vs cuBLAS
+cuSOLVER
), hipCUB
(wrapper of rocPRIM
vs CUB
), hipFFT
(rocFFT
vs cuFFT
), and hipSPARSE
(rocSPARSE
vs cuSPARSE
).
hipDNN
is currently not packaged. It's a wrapper of miopen
vs cudnn
.
dev-libs/rccl
(targeting nccl
) is collective communication routines for AMD GPUs. It can also run tests, but tests are only meaningful on multi GPU systems.
Ebuild for sci-libs/rocALUTION
(targeting paralution) is currently in development.
Specifying architectures to compile
With rocm.eclass
(ROCm version >=5.1.3), Gentoo handles the AMDGPU_TARGETS
USE_EXPAND
. The map between GPU and arch name can be viewed via checking use flag for ROCm libraries:
example $
equery uses rocBLAS
* Found these USE flags for sci-libs/rocBLAS-5.4.2-r1: U I - - amdgpu_targets_gfx1010 : RDNA GPU, codename navi10, including Radeon RX 5700XT/5700/5700M/5700B/5700XTB/5600XT/5600/5600M, Radeon Pro 5700XT/5700, Radeon Pro W5700X/W5700 - - amdgpu_targets_gfx1011 : RDNA GPU, codename navi12, including Radeon Pro 5600M/V520 - - amdgpu_targets_gfx1012 : RDNA GPU, codename navi14, including Radeon RX 5500XT/5500/5500M/5500XTB/5300/5300M, Radeon Pro 5500XT/5500M/5300/5300M, Radeon Pro W5500X/W5500/W5500M/W5300M + - amdgpu_targets_gfx1030 : RDNA2 GPU, codename navi21/sienna cichlid, including Radeon RX 6950XT/6900XT/6800XT/6800, Radeon Pro W6800 - - amdgpu_targets_gfx1031 : RDNA2 GPU, codename navi22/navy flounder, including Radeon RX 6750XT/6700XT/6800M/6700M - - amdgpu_targets_gfx1100 : RDNA3 GPU, codename navi31/plum bonito, including Radeon RX 7900XTX/7900XT - - amdgpu_targets_gfx1101 : RDNA3 GPU, codename navi32 - - amdgpu_targets_gfx1102 : RDNA3 GPU, codename navi33 - - amdgpu_targets_gfx803 : Fiji GPU, codename fiji, including Radeon R9 Nano/Fury/FuryX, Radeon Pro Duo, FirePro S9300x2, Radeon Instinct MI8 - - amdgpu_targets_gfx900 : Vega GPU, codename vega10, including Radeon Vega Frontier Edition, Radeon RX Vega 56/64, Radeon RX Vega 64 Liquid, Radeon Pro Vega 48/56/64/64X, Radeon Pro WX 8200/9100, Radeon Pro V320/V340/SSG, Radeon Instinct MI25 + - amdgpu_targets_gfx906 : Vega GPU, codename vega20, including Radeon (Pro) VII, Radeon Instinct MI50/MI60 + - amdgpu_targets_gfx908 : CDNA Accelerator, codename arcturus, including AMD Instinct MI100 Accelerator + - amdgpu_targets_gfx90a : CDNA2 Accelerator, codename aldebaran, including AMD Instinct MI200 series Accelerators - - benchmark : Build and install rocblas-bench. - - doc : Add extra documentation (API, Javadoc, etc). It is recommended to enable per package instead of globally - - test : Perform rocblas-test to compare the result between rocBLAS and system BLAS.
By default, officially supported architectures (gfx906 gfx908 gfx90a gfx1030
) are turned on. For example, for a system with Radeon VII and RX 6700XT, specify GPU archs for all packages:
/etc/portage/package.use/00-amdgpu-targets
Exmaple for AMDGPU_TARGETS use flag# disable gfx908, gfx90a, gfx1030; turn on gfx1031; gfx906 remains on */* AMDGPU_TARGETS: -gfx908 -gfx90a -gfx1030 gfx1031
Adjusting use flags for individual packages is also supported. Portage will take care of the dependencies: if sci-libs/mipoen
enables gfx1031
, then sci-libs/rocBLAS
should turns on gfx1031
, or when portage will try to add it to /etc/portage/package.use/zz-autounmask
.
Upgrade to 5.1.3 or above from the legacy way
Before introducing rocm.eclass
(ROCm version <5.1.3), architectures are specified via environment variable AMDGPU_TARGETS
:
For users installing ROCm libraries using the legacy method (specifying /etc/portage/make.conf
), upgrading to 5.1.3 takes two steps:
1. RemoveAMDGPU_TARGETS
entry in/etc/portage/make.conf
2. Add/etc/portage/package.use/00-amdgpu-targets
mentioned in ROCm#Specifying_architectures_to_compile
Contributing and developing guide
Testing ROCm libraries is not easy -- it requires recent AMD discrete GPUs and days of compilation and testing. If using ROCm libraries and mathematical correctness is considered important, please test the hardware by enabling tests:
/etc/portage/make.conf
FEATURES="test"
Then emerge the desired ROCm package. If test failures occurs, usually it is caused by small inconsistencies between ROCm libraries and CPU reference implementations. Or it is caused by upstream bugs, or Gentoo deployment strategy. In either situation, filing a bug report to Gentoo Bugzilla is welcome, and it would be better to report to upstream for mathematical errors.