Writing Rust ebuilds

From Gentoo Wiki
Jump to:navigation Jump to:search

This is a short reference, intended to be read alongside Basic guide to write Gentoo Ebuilds and the cargo.eclass documentation.

cargo.eclass

cargo.eclass is an eclass for utilizing Rust's own package manager, Cargo. Documentation for the eclass is available in the relevant section of the Development Guide and in the cargo.eclass(5) man page provided by the app-doc/eclass-manpages package.

Cargo is very convenient for software developers. However, it comes with some caveats for maintainers.

Rust programs are usually statically bound. This makes runtime dependencies easier to figure out compared to C programs, but build dependencies are harder to get right. The good thing is that upstream needs to figure that out for packagers. Packagers only need to translate what upstream already figured out into the cargo.eclass way of handling dependency programs.

Rust's dependency mechanism

Rust dependencies come in the form of 'crates'. Crates can be libraries or runnable programs. Like the term 'package', 'crate' is not defined strictly, so basically every Rust program that is provided in binary or source form can be considered a crate. In the context of packaging, crates are often dependency libraries of a package. Rust developers upload their crates along with documentation to crates.io. Other Rust developers can use the uploaded crates in their program by stating its name and version in a configuration file for Rust's build system, Cargo. If a dependency is only declared using the crate name, Cargo assumes it should download that dependency from crates.io.

However, developers can also declare dependencies with additional information, like a URL to a Git Repository and a Git tag, to get crates from elsewhere.

Making a versioned ebuild

Gentoo has its own programs to make ebuilds out of Rust projects automatically: app-portage/pycargoebuild and dev-util/cargo-ebuild. Both require the repository of the Rust software being packaged - for example, by cloning the repository.

Once the repository has been obtained, change to the directory containing the sources, and generate an ebuild by running either:

user $pycargoebuild ./

or

user $cargo ebuild

as appropriate.

cargo ebuild automatically checks dependencies for vulnerabilities, which takes a while. To speed up this process, one can instead run:

user $cargo ebuild --noaudit


If everything goes well, there will now be an ebuild in the repository, which can then be moved into an ebuild repository like ::gentoo or ::guru. However, note that the HOMEPAGE and DESCRIPTION variables in the ebuild will need to be added, and if the software is not on crates.io, the SRC_URI will need to be added as well.

Licenses

Both cargo ebuild and pycargoebuild generate a LICENSE variable in the ebuild, but the value of this variable is not guaranteed to be correct. To generate a list of a Rust project's licenses, run:

user $cargo license

in the Rust repository.

Packagers must be aware that cargo license will give the licenses in SPDX format, which Gentoo does not always use. So some effort needs to be put into 'translating' them.

Because the code of the dependencies is compiled into the finished binary (statically linked), all licenses of every crate used in a package must be stated in the LICENSE variable.

Common problems with dependencies in versioned ebuilds

crates.io dependencies

If all dependencies are fetched from crates.io, the ebuilds generated by cargo ebuild or pycargoebuild often work from scratch. Both add crates.io dependencies to the CRATES variable in the ebuild, combining their name and version number with the '@' symbol, e.g.:

CRATES="
       addr2line@0.21.0
       adler@1.0.2
       adler32@1.2.0
       ...
"

and also add ${CARGO_CRATE_URIS} to the SRC_URI variable, e.g.:

SRC_URI="${CARGO_CRATE_URIS}"

These dependencies will automatically be fetched and unpacked into the right place via cargo_src_unpack, from cargo.eclass.

GitHub/GitLab etc. dependencies

Sometimes upstream developers are not happy with the version of a crate on crates.io, or a crate is not available there and must instead be obtained from a git repository. In these situations, such a dependency is specified in Cargo.toml like this:

FILE Cargo.toml
tree-sitter-haxe = { git = "https://github.com/vantreeseba/tree-sitter-haxe", version = "0.2.2", optional = true }

The above specifies that release 0.2.2 from the tree-sitter-haxe GitHub repository should be checked out.

We can't clone a git repository without git-r3.eclass, but we can simulate this in the ebuild. In this example, we would add the lines:

declare -A GIT_CRATES=(
       [tree-sitter-haxe]="https://github.com/vantreeseba/tree-sitter-haxe;32f6bda9b568ae47c89678096de9b4d0cbd450b8"
)

to the ebuild.

The commit hash is obtained by browsing the files of that release (v0.2.2) on GitHub and copying it from the URL. It can also often be found in the Cargo.toml and Cargo.lock files of the upstream repository, by grepping the repository for the crate's name.

If the Cargo.toml of a Rust program has a line like:

tree-sitter-c-sharp = { git = "https://github.com/tree-sitter/tree-sitter-c-sharp", branch = "master" }

in its dependencies, it wants to fetch the latest commit on the master branch of that GitHub repository. This essentially means packagers must make a live ebuild, rather than a versioned ebuild.

Usually the cargo.eclass looks for Cargo.toml in the folder $WORKDIR/$crate_name-$commit-hash. Thus, in the above example, the folder would be $WORKDIR/tree-sitter-haxe-32f6bda9b568ae47c89678096de9b4d0cbd450b8.

However, some crates have Cargo.toml in a different place, or have a different name in Cargo.toml as their GitHub repository name. In such cases, the path to Cargo.toml can be defined using:

declare -A GIT_CRATES=(
       [tree-sitter-haxe]="https://github.com/vantreeseba/tree-sitter-haxe;32f6bda9b568ae47c89678096de9b4d0cbd450b8;tree-sitter-haxe-%commit%/mypath/to/cargotoml"
)

where %commit% is automatically replaced with the commit hash specified.

If the last part of that line is not provided, the cargo.eclass composes that path out of the crate name (the part in []) and the commit hash.

The path is a relative path that starts from $WORKDIR. The part in [] usually needs to be the name of the package in the Cargo.toml of that package.

Writing live ebuilds

Create the skeleton ebuild via cargo ebuild or pycargoebuild, as described above. Then change the version in the ebuild's file name to 9999.

In the ebuild itself:

  • Remove the unnecessary CRATES variable.
  • Remove the unnecessary SRC_URI variable; sources are typically fetched via git-r3.eclass or similar.
  • Create a src_unpack() function, containing cargo_live_src_unpack:
src_unpack() {
    cargo_live_src_unpack
}

This will fetch the needed dependency crates using Cargo.

Problems with live ebuilds

"can't update a git repository in the offline mode"

When a Rust project uses unpinned dependencies in its Cargo.toml, e.g.:

CODE
tree-sitter-c-sharp = { git = "https://github.com/tree-sitter/tree-sitter-c-sharp", branch = "master" }

Cargo will always want to check if that dependency is up-to-date in later stages of the ebuild. This will result in the error "can't update a git repository in the offline mode".

To fix this, add cargo_src_configure --frozen to src_configure(). This will stop Cargo from checking whether what it just fetched is up-to-date.

USE flags

USE flags for Rust are usually added via cargo.eclass:

CODE
src_configure() {
     local myfeatures=(
         barfeature
         $(usev foo)
     )
     cargo_src_configure
 }

Switching features off

Often Rust programs do not allow switching individual default features off. So in order to only enable certain default features, disable all default features with cargo_src_configure --no-default-features, then enable the desired features via myfeatures, as described above.

Unbundling C libraries

Start with constructing a dependency tree:

user $cargo tree --all-features | less

There's a naming convention[1] for crates linking with native libraries to have a -sys suffix. Your job as a maintainer is to find out which features pull them and put corresponding native libraries into package dependencies.

However, most crates use vendored C libraries by default (see bug #709568), which is discouraged by Gentoo policies.

pkg-config crate

If this crate is pulled, you need to add virtual/pkgconfig to ebuild's BDEPEND and explicitly allow cross-compilation:

CODE
export PKG_CONFIG_ALLOW_CROSS=1

Common -sys crates

Crate Dependency Unbundling method Notes
jemalloc-sys dev-libs/jemalloc
CODE
export JEMALLOC_OVERRIDE="${ESYSROOT}/usr/$(get_libdir)/libjemalloc.so"
N/A
libgit2-sys dev-libs/libgit2
CODE
export LIBGIT2_NO_VENDOR=1
Since 0.16.0
libz-sys sys-libs/zlib N/A Uses system libs unless revdeps enable static or zlib-ng features
openssl-sys dev-libs/openssl
CODE
export OPENSSL_NO_VENDOR=1
Since 0.9.55
zstd-sys app-arch/zstd
CODE
export ZSTD_SYS_USE_PKG_CONFIG=1
Since 0.12.2

Other -sys crates

First, inspect crate's Cargo.toml for features that force static linking. If they are enabled by any revdeps, you are out of luck.

Then, inspect crate's build.rs script for environment variables that control build flow. Set them in ebuild's src_configure() or global scope.

Using a vendor tarball like in Go ebuilds

The go-module.eclass and the cargo.eclass are very similar regarding the functionality of loading statically linked dependencies. In a Go ebuild a packager defines the variable EGO_SUM with a list of dependencies from upstream, in a Rust ebuild the CRATES variable has the same functionality. The eclasses then proceed to download those dependencies and put them in the correct places in the working directory.

While it is an ongoing and controversial discussion whether the EGO_SUM-functionality should be deprecated or not in the go-module.eclass, the cargo.eclass is built with the CRATES variable in mind. The go.eclass offers the possibility to use a vendor-tarball instead of the EGO_SUM-functionality. That is a tar archive that contains all the statically linked dependencies. While Cargo also offers the possibility to create such a vendor-tarball as easily as Go, the cargo.eclass does not offer to use such a vendor-tarball without doing some extra steps.

Using a vendor-tarball instead of using the CRATES variable has some benefits. For example you don't need to recreate the list of dependencies (e.g. by using pycargoebuild in the upstream repo) and copy it into the new ebuild when bumping the version of that ebuild. If the vendor tarball is hosted right, bumping a Rust-ebuild is as easy as renaming the ebuilds filename with the new version.

Creating a vendor tarball

Creating a vendor tarball can be done by running cargo vendor in the upstream repository. It will create a vendor folder, which can be put into a tar archive with XZ_OPT='-T0 -9' tar -acf vendor.tar.xz vendor.

Using a vendor tarball in an ebuild

Because the cargo.eclass is somewhat built with the CRATES variable in mind, it will complain when this variable is not set when the eclass in inherited. To work around this, we can define the variable as CRATES=" ". The cargo.eclass looks for dependencies in $ECARGO_VENDOR directory which defaults to $CARGO_HOME/gentoo. $CARGO_HOME itself is set to $WORKDIR/cargo_home. A packager can either

  1. change default variable by setting $ECARGO_VENDOR to other location before src_unpack() or
  2. unpack the vendor-tarball into default $ECARGO_VENDOR directory, (by not calling default or cargo_src_unpack in src_unpack() and writing src_unpack() with own instructions.) or
  3. let portage do its default thing and let it unpack the vendor tarball into $WORKDIR, then do ln -s "${WORKDIR}/<extracted directory>/"* "${CARGO_HOME}/gentoo/".

In the case that a packages uses crates from git instead of crates.io cargo needs additional configuration.Luckily the vendor tarball has that configuration already included in a file called vendor-config.toml. This file expects the directory with the vendored sources in the working directory of the package itself. So when the vendor tarball is unpacked in $WORKDIR/vendor link it to the packages workdir like this for example: ln -s "${WORKDIR}/vendor/" "${WORKDIR}/lapce-${PV}/vendor" || die (Set the path accordingly) The contents of vendor-config.toml collide with some settings that the cargo-eclass creates during cargo_gen_config in ${ECARGO_HOME}/config. Those settings need to be deleted:

sed -i "${ECARGO_HOME}/config" -e '/source.crates-io/d'  || die
sed -i "${ECARGO_HOME}/config" -e '/replace-with = "gentoo"/d'  || die
sed -i "${ECARGO_HOME}/config" -e '/local-registry = "\/nonexistent"/d'  || die

Then the contents of vendor-config.toml can be appended to ${ECARGO_CONFIG}/config: cat "${WORKDIR}/vendor/vendor-config.toml" >> "${ECARGO_HOME}/config" || die

See also

External resources