Google Summer of Code/2021/Ideas/Big Data Infrastructure by Gentoo
The big data infrastructures are mostly built on the Java virtual machine ecosystem, most notably in Java and Scala.
Nevertheless, Java has not been adopted smoothly into GNU/Linux distributions. The packaging of Java software are considered difficult by the GNU/Linux community (e.g. Debian, Archlinux, Fedora). At the same time, the Java community has its own set of repositories like maven, functionally similar to packages in GNU/Linux distributions.
The Gentoo Java Project has done a good job laying out the framework of the Java ecosystem in Gentoo. At the same time, there are still thousands of useful Java packages to be packaged and maintained. Last year, Zongyu Zhang has developed the Maven ebuild generator and published the spark overlay, make spark available for Gentoo users. This project will build upon that, to design and set up a test framework for the generated ebuilds, handle kotlin and scala packages, and add h2o big data analysis platform into the overlay.