ROCm

From Gentoo Wiki
Jump to:navigation Jump to:search
This article is a stub. Please help out by expanding it - how to get started.

According to ROCm official document (v5.4.3) "ROCm is a brand name for ROCm open software platform (for software) or the ROCm™ open platform ecosystem (includes hardware like FPGAs or other CPU architectures)."

In the scope of Gentoo distribution, "ROCm" refers to ROCm open software platform, currently supporting AMDGPU as its hardware.

Note
Notice that ROCm itself aims for as an environment for heterogeneous computing, not limiting to AMDGPU. It is the current packaging strategy of Gentoo that ROCm only supports AMDGPU; if ROCm is needed for other vendors (typically the cuda backend of sci-libs/hip-* packages), please file a bug to Gentoo Bugzilla

Note, ROCm is not:

  1. ROCm is not only "the CUDA" for AMD GPUs. Although it provides HIP, whose api and syntax is similar with CUDA, it also provides opencl and openmp programming model.
  2. ROCm is not the only way to run (compute) tasks on AMD GPUs. The ROCm kernel driver, is a part of amdgpu linux driver. There are OpenGL, Vulkan, etc which is independent of ROCm.

Components of ROCm

ROCm can be classified into five categories:

  1. Drivers and runtimes, provided by the amdgpu kernel model and dev-libs/roct-thunk-interface and dev-libs/rocr-runtime.
  2. Programming models. See ROCm#Programming_models for details.
  3. Compilers and tools. Gentoo uses to vanilla clang (>=sys-devel/clang-14.0.6-r1).
  4. Libraries. Gentoo has packaged most libraries prefixed by roc and hip in sci-libs category, with src_test enabled. All sci-libs/roc* packages are written in HIP and uses hipamd as backend, while sci-libs/hip* are simple wrappers.
  5. Deployment tools. As a user of Gentoo, the best choice to deploy common ROCm components is via portage.

Installation guide

Kernel driver

It is recommended to use recent Linux kernel to achieve wider range of supported device, better performance and proper error handling.

Kernel configurations

See amdgpu kernel document for detailed information.

The following kernel config is required:

KERNEL
CONFIG_DRM_AMDGPU
CONFIG_DRM_AMDGPU_USERPTR
CONFIG_HMM_MIRROR
CONFIG_HSA_AMD
CONFIG_ZONE_DEVICE

It will also be checked when emerging dev-libs/roct-thunk-interface

It is recommended to turn on the following to achieve unified memory and managed memory in HIP:

KERNEL
CONFIG_HSA_AMD_SVM
Kernel command line parameters

See amdgpu parameters kernel document for detailed information.

For example, setting amdgpu.ppfeaturemask=0xffffffff gives full features in AMDGPU power play, which maybe useful when adjusting GPU power profiles via rocm-smi.

System monitoring tools

Install dev-util/rocm-smi:

root #emerge --ask dev-util/rocm-smi

Programming models

OpenCL

Detailed information can be seen in OpenCL#AMD.

HIP

Detailed information can be seen in HIP.

OpenMP

To enable openmp offloading on AMDGPU, install sys-libs/libomp with AMDGPU offload enabled.

Set USE flags for the package:

FILE /etc/portage/package.use/99-rocm
sys-libs/libomp offload LLVM_TARGETS: AMDGPU

Install sys-libs/libomp:

root #emerge --ask sys-libs/libomp

Clang cannot detect gpu architecture automatically (or in cross compile, arch is not present on compile machine), so clang needs a GPU arch specifier script:

FILE /tmp/print_gpu_arch.shGPU arch specifier
#!/bin/bash
echo "gfx90a"  # Change to the target to compile here, but do not append target features such as :xnack-

Make script executable:

user $chmod +x /tmp/print_gpu_arch.sh

Then compile openmp:

user $clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa --libomptarget-amdgcn-bc-path=/usr/lib64/ --amdgpu-arch-tool=/tmp/print_gpu_arch.sh <openmp source code> -o <executable>

Others

The backend of ROCm is currently llvm/clang, so any programming model that can generate LLVM IR for AMDGPU can use ROCm. Numba is a jit compiler for python codes, and can offload to ROCm. Currently Gentoo does not packaged numba with ROCm yet.

ROCm libraries

Currently, Gentoo has packages rocBLAS, rocFFT, rocPRIM, rocRAND, rocSOLVER, rocSPARSE, rocThrust, and miopen, in sci-libs category. Those are math and deep learning libraries written in HIP and runs on AMD GPUs.

Wrapper packages, are hipBLAS (wrapper of rocBLAS+rocSOLVER vs cuBLAS+cuSOLVER), hipCUB (wrapper of rocPRIM vs CUB), hipFFT(rocFFT vs cuFFT), and hipSPARSE(rocSPARSE vs cuSPARSE).

hipDNN is currently not packaged. It's a wrapper of miopen vs cudnn.

dev-libs/rccl (targeting nccl) is collective communication routines for AMD GPUs. It can also run tests, but tests are only meaningful on multi GPU systems.

Ebuild for sci-libs/rocALUTION (targeting paralution) is currently in development.

Specifying architectures to compile

With rocm.eclass (ROCm version >=5.1.3), Gentoo handles the AMDGPU_TARGETS USE_EXPAND. The map between GPU and arch name can be viewed via checking use flag for ROCm libraries:

example $equery uses rocBLAS
 * Found these USE flags for sci-libs/rocBLAS-5.4.2-r1:
 U I
 - - amdgpu_targets_gfx1010 : RDNA GPU, codename navi10, including Radeon RX 5700XT/5700/5700M/5700B/5700XTB/5600XT/5600/5600M, Radeon
                              Pro 5700XT/5700, Radeon Pro W5700X/W5700
 - - amdgpu_targets_gfx1011 : RDNA GPU, codename navi12, including Radeon Pro 5600M/V520
 - - amdgpu_targets_gfx1012 : RDNA GPU, codename navi14, including Radeon RX 5500XT/5500/5500M/5500XTB/5300/5300M, Radeon Pro
                              5500XT/5500M/5300/5300M, Radeon Pro W5500X/W5500/W5500M/W5300M
 + - amdgpu_targets_gfx1030 : RDNA2 GPU, codename navi21/sienna cichlid, including Radeon RX 6950XT/6900XT/6800XT/6800, Radeon Pro
                              W6800
 - - amdgpu_targets_gfx1031 : RDNA2 GPU, codename navi22/navy flounder, including Radeon RX 6750XT/6700XT/6800M/6700M
 - - amdgpu_targets_gfx1100 : RDNA3 GPU, codename navi31/plum bonito, including Radeon RX 7900XTX/7900XT
 - - amdgpu_targets_gfx1101 : RDNA3 GPU, codename navi32
 - - amdgpu_targets_gfx1102 : RDNA3 GPU, codename navi33
 - - amdgpu_targets_gfx803  : Fiji GPU, codename fiji, including Radeon R9 Nano/Fury/FuryX, Radeon Pro Duo, FirePro S9300x2, Radeon
                              Instinct MI8
 - - amdgpu_targets_gfx900  : Vega GPU, codename vega10, including Radeon Vega Frontier Edition, Radeon RX Vega 56/64, Radeon RX Vega
                              64 Liquid, Radeon Pro Vega 48/56/64/64X, Radeon Pro WX 8200/9100, Radeon Pro V320/V340/SSG, Radeon
                              Instinct MI25
 + - amdgpu_targets_gfx906  : Vega GPU, codename vega20, including Radeon (Pro) VII, Radeon Instinct MI50/MI60
 + - amdgpu_targets_gfx908  : CDNA Accelerator, codename arcturus, including AMD Instinct MI100 Accelerator
 + - amdgpu_targets_gfx90a  : CDNA2 Accelerator, codename aldebaran, including AMD Instinct MI200 series Accelerators
 - - benchmark              : Build and install rocblas-bench.
 - - doc                    : Add extra documentation (API, Javadoc, etc). It is recommended to enable per package instead of globally
 - - test                   : Perform rocblas-test to compare the result between rocBLAS and system BLAS. 

By default, officially supported architectures (gfx906 gfx908 gfx90a gfx1030) are turned on. For example, for a system with Radeon VII and RX 6700XT, specify GPU archs for all packages:

FILE /etc/portage/package.use/00-amdgpu-targetsExmaple for AMDGPU_TARGETS use flag
# disable gfx908, gfx90a, gfx1030; turn on gfx1031; gfx906 remains on
*/* AMDGPU_TARGETS: -gfx908 -gfx90a -gfx1030 gfx1031

Adjusting use flags for individual packages is also supported. Portage will take care of the dependencies: if sci-libs/mipoen enables gfx1031, then sci-libs/rocBLAS should turns on gfx1031, or when portage will try to add it to /etc/portage/package.use/zz-autounmask.

Upgrade to 5.1.3 or above from the legacy way

Before introducing rocm.eclass (ROCm version <5.1.3), architectures are specified via environment variable AMDGPU_TARGETS:

For users installing ROCm libraries using the legacy method (specifying /etc/portage/make.conf), upgrading to 5.1.3 takes two steps:

1. Remove AMDGPU_TARGETS entry in /etc/portage/make.conf
2. Add /etc/portage/package.use/00-amdgpu-targets mentioned in ROCm#Specifying_architectures_to_compile

Contributing and developing guide

Testing ROCm libraries is not easy -- it requires recent AMD discrete GPUs and days of compilation and testing. If using ROCm libraries and mathematical correctness is considered important, please test the hardware by enabling tests:

FILE /etc/portage/make.conf
FEATURES="test"

Then emerge the desired ROCm package. If test failures occurs, usually it is caused by small inconsistencies between ROCm libraries and CPU reference implementations. Or it is caused by upstream bugs, or Gentoo deployment strategy. In either situation, filing a bug report to Gentoo Bugzilla is welcome, and it would be better to report to upstream for mathematical errors.

Hardware support

References