Google Summer of Code/2022/Ideas/Refine and complete ROCm: eclass, more packages and downstream softwares

From Gentoo Wiki
Jump to:navigation Jump to:search

Refine and complete ROCm: eclass, more packages and downstream softwares

ROCm™ open software platform is a open source software for HPC/Hyperscale-class GPU computing developed by AMD[1]. It currently support various AMD GPUs (also Nvidia GPUs, by wrapping CUDA), and may include more hardware like FPGA in the future. Packages can be classified to 4 categories: low-level drivers and runtime libraries, developer toolkit, high level libs and frameworks. Thanks to the contributor from ROCm overlay[2], Gentoo has packaged the most important ones.

However there are still a lot to be done:

1. Enable ROCm for packages like tensorflow, jax, cupy;
2. Write a rocm.eclass to make ROCm related packages more maintainable[3], and consider USE Flag for different GPU architecture;
3. Enable more testing;
4. Hold a discussion about open source, heterogeneous computing platform and GNU/Linux distos. Due to it's open-source nature, ROCm packages can be carefully treated to meet FHS standard. But it contains binary kernels for GPU, which is not well considered[4], and testing GPU libraries require specific hardware[5]. Those are the challenges we must face if distros package heterogeneous compute packages;
5. More packages missing in ::gentoo, such as ROCgdb, rocWMMA, etc.
6. Current ebuild maintenance, including bug fix, stabilization.
7. Wiki page for ROCm usage and development.

References:

[1] https://rocmdocs.amd.com/en/latest/index.html
[2] https://github.com/justxi/rocm
[3] https://bugs.gentoo.org/810619
[4] https://bugs.gentoo.org/795825
[5] https://bugs.gentoo.org/817440



Contacts Required Skills
Benda Xu
  • CMake build system
  • Bash and command-line utilities, especially sed
  • Ebuild and eclass writing
  • Git
Expected Project Size Expected Outcomes
200 to 300 hours, depend on the actual plan.
  • New ebuilds or "rocm" USE flag for important packages in ROCm github repo, and other ROCm enabled packages like tensorflow, blender.
  • New eclass: rocm
Project Difficulty
  • Package Maintenance: easy or medium.
  • New packages: medium for small packages, hard for large frameworks and package testing.
  • New eclass: medium or hard.