ROCm

From Gentoo Wiki
Jump to:navigation Jump to:search
This article is a stub. Please help out by expanding it - how to get started.

According to ROCm official document (v5.2.1) "ROCm is a brand name for ROCm open software platform (for software) or the ROCm™ open platform ecosystem (includes hardware like FPGAs or other CPU architectures)."

In the scope of Gentoo distribution, "ROCm" refers to ROCm open software platform, currently supporting AMDGPU as its hardware.

Note
Notice that ROCm itself aims for as an environment for heterogeneous computing, not limiting to AMDGPU. It is the current packaging strategy of Gentoo that ROCm only supports AMDGPU; if ROCm is needed for other vendors (typically the cuda backend of sci-libs/hip-* packages), please file a bug to Gentoo Bugzilla

Note, ROCm is not:

  1. ROCm is not only "the CUDA" for AMD GPUs. Although it provides HIP, whose api and syntax is similar with CUDA, it also provides opencl and openmp programming model.
  2. ROCm is not the only way to run (compute) tasks on AMD GPUs. The ROCm kernel driver, is a part of amdgpu linux driver. There are OpenGL, Vulkan, etc which is independent of ROCm.

Components of ROCm

ROCm can be classified into five categories:

  1. Drivers and runtimes, provided by the amdgpu kernel model and dev-libs/roct-thunk-interface and dev-libs/rocr-runtime.
  2. Programming models. See ROCm#Programming_models for details.
  3. Compilers and tools. Before for ROCm version <=5.0.2, Gentoo provides sys-devel/llvm-roc as the compiler, which is a forked llvm/clang by ROCm upstream. From 5.1.3, Gentoo switches to vanilla clang (>=sys-devel/clang-14.0.6-r1).
  4. Libraries. Gentoo has packaged most libraries prefixed by roc and hip in sci-libs category, with src_test enabled. All sci-libs/roc* packages are written in HIP and uses hipamd as backend, while sci-libs/hip* are simple wrappers.
  5. Deployment tools. As a user of Gentoo, the best choice to deploy common ROCm components is via portage.

Installation guide

Kernel driver

It is recommended to use recent Linux kernel to achieve wider range of supported device, better performance and proper error handling.

Kernel configurations

See amdgpu kernel document for detailed information.

The following kernel config is required:

KERNEL
CONFIG_DRM_AMDGPU
CONFIG_DRM_AMDGPU_USERPTR
CONFIG_HMM_MIRROR
CONFIG_HSA_AMD
CONFIG_ZONE_DEVICE

It will also be checked when emerging dev-libs/roct-thunk-interface

It is recommended to turn on the following to achieve unified memory and managed memory in HIP:

KERNEL
CONFIG_HSA_AMD_SVM
Kernel command line parameters

See amdgpu parameters kernel document for detailed information.

For example, setting amdgpu.ppfeaturemask=0xffffffff gives full features in AMDGPU power play, which maybe useful when adjusting GPU power profiles via rocm-smi.

System monitoring tools

Install dev-util/rocm-smi:

root #emerge --ask dev-util/rocm-smi

Programming models

OpenCL

Detailed information can be seen in OpenCL#AMD.

HIP

Detailed information can be seen in HIP.

OpenMP

To enable openmp offloading on AMDGPU, install sys-libs/libomp with AMDGPU offload enabled.

Set USE flags for the package:

FILE /etc/portage/package.use/99-rocm
sys-libs/libomp offload LLVM_TARGETS: AMDGPU

Install sys-libs/libomp:

root #emerge --ask sys-libs/libomp

Clang cannot detect gpu architecture automatically (or in cross compile, arch is not present on compile machine), so clang needs a GPU arch specifier script:

FILE /tmp/print_gpu_arch.shGPU arch specifier
#!/bin/bash
echo "gfx90a"  # Change to the target to compile here, but do not append target features such as :xnack-

Make script executable:

user $chmod +x /tmp/print_gpu_arch.sh

Then compile openmp:

user $clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa --libomptarget-amdgcn-bc-path=/usr/lib64/ --amdgpu-arch-tool=/tmp/print_gpu_arch.sh <openmp source code> -o <executable>

Others

The backend of ROCm is currently llvm/clang, so any programming model that can generate LLVM IR for AMDGPU can use ROCm. Numba is a jit compiler for python codes, and can offload to ROCm. Currently Gentoo does not packaged numba with ROCm yet.

ROCm libraries

Currently, Gentoo has packages rocBLAS, rocFFT, rocPRIM, rocRAND, rocSOLVER, rocSPARSE, rocThrust, and miopen, in sci-libs category. Those are math and deep learning libraries written in HIP and runs on AMD GPUs.

Wrapper packages, are hipBLAS (wrapper of rocBLAS+rocSOLVER vs cuBLAS+cuSOLVER), hipCUB (wrapper of rocPRIM vs CUB), hipFFT(rocFFT vs cuFFT), and hipSPARSE(rocSPARSE vs cuSPARSE).

hipDNN is currently not packaged. It's a wrapper of miopen vs cudnn.

dev-libs/rccl (targeting nccl) is collective communication routines for AMD GPUs. It can also run tests, but tests are only meaningful on multi GPU systems.

Ebuild for sci-libs/rocALUTION (targeting paralution) is currently in development.

Specifying architectures to compile

Before introducing rocm.eclass, architectures can be specified via environment variable AMDGPU_TARGETS:

FILE /etc/portage/make.confExmaple for AMDGPU_TARGETS
AMDGPU_TARGETS="gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx1030"

AMDGPU_TARGETS is a semicolon separated, follows <target>:<feature> syntax. For details, refer to HIP#Usage.

Contributing and developing guide

Testing ROCm libraries is not easy -- it requires recent AMD discrete GPUs and days of compilation and testing. If using ROCm libraries and mathematical correctness is considered important, please test the hardware by enabling tests:

FILE /etc/portage/make.conf
FEATURES="test"

Then emerge the desired ROCm package. If test failures occurs, usually it is caused by small inconsistencies between ROCm libraries and CPU reference implementations. Or it is caused by upstream bugs, or Gentoo deployment strategy. In either situation, filing a bug report to Gentoo Bugzilla is welcome, and it would be better to report to upstream for mathematical errors.

Hardware support

References