|GLEP 64: Export Package Manager cached information
|Anthony G. Basile <firstname.lastname@example.org>
During build time, important information is generated by the package manager (PM) about the package(s) being built. When Portage is used as the PM, this information is cached on a per package basis in directories under /var/db/pkg/<cat>/<pkg> (VDB). While this information can be regenerated on the fly, doing so may be expensive or impractical. Examples of such information include a complete list of all files belonging to a particular installed package or the dynamical linking information about a package's executable and/or shared objects. To avoid the unnecessary cost of regenerating, and to facilitate interoperability between all PM's and other tools that could use this information, all PM's should cache a standard set of information and provide a common API for exporting it. In this GLEP, we specify what information should be cached and exported.
Information generated by the PM at build time spans the spectrum from easy to difficult to regenerate. Some information, like a package's HOMEPAGE may be trivially regenerated by simply grepping the package's ebuild in portage tree. Despite this ease, however, even this information needs to be cached in case the ebuild is removed from the tree, but the package is still installed on the system. But even if the installed package and the ebuild in the tree are not "out of sync", there is yet another reason to cache information generated by the PM at build time. Some information, like the list of all installed files belonging to a particular package, cannot be trivially regenerated. If such a list were not cached, the PM would have to rebuild the package in order to regenerate it, and even then this regenerated list is not guaranteed to represent the actual state of the installed package because of possible changes in the environment of the rest of the system between builds. Apart from the fact that the PM itself needs this list when uninstalling, and so much cache it for itself, listing a package's files is useful for other utilities. For example, at the time of this writing, sys-apps/elfix, app-portage/gentoolkit, app-portage/portage-utils and app-portage/eix, are some examples of utilities that make use of portage's VDB to obtain this cached list.
Another example of information which is usefeul and expensive to regenerate, but perhaps less obvious than the previous example, is linking information such as that reported by running `readelf` or `scanelf` on ELF objects, or similar utilities for other executable formats like Mach-O or COFF. On a "rolling release" such as Gentoo, tracing forward and reverse dependencies between executable objects and their libraries is critical to avoid breakage during upgrade. The need to trace these dependencies is evident in PMS features like sub-slotting which aim to make sure that executables are always consistently built against libraries: upgrading a library which breaks backwards compatibility automatically triggers rebuilding of its dependent executable(s). While sufficient in their own scope, these PMS features have limitations: 1) this information is calculated to ensure consistency at build time, but is not cached and exported afterwards for use by other tools, such as `revdep-pax` which uses the same information to consistently apply PaX markings between executables objects and libraries ; and, 2) such information is not sufficiently fine grained for tools which require discrimination on the basis of ABI, SONAME, library path name etc. By caching and exporting this formation, an entire "linkage graph" of executables objects and libraries on a system can be constructed  to facilitate quick traversal of both forwards and backwards dependencies. Questions like "what are the path names of all the executables on this system which link against libssl.so.1.0.0 for ABI=x32?" can be quickly answered without having to reread the dynamic section of every object on the system in a search for those which are x32 and need libssl.so.
The above examples motivate us to created a uniform standard for any utility that would like to make use of this generated information. Below, we specify a standard minimum set of information that should be generated by any PM at build time, cached and then exported by an common API.
For each package installed, the following information should be generated at build time, cached, and later exported:
- All portage variables as specified as part of the Metadata Cache as defined in PMS 13.2  Note that, as with the Metadata Cache, these variable should be stored with all the conditionals evaluated.
- A list of all files belonging to the package, along with a designation of the file type (regular, directory, symlink, pipe, etc), MD5SUM or other checksum, and mtime time.
- A list of all executable or shared objects for each package and the corresponding linking information, including full path to the object, its architecture and ABI, SONAME, RPATH and any NEEDED objects they link against, as reported by `readelf` on ELF systems, or similar tools for other executable formats. Currently this information is being cached by Portage in NEEDED.ELF.2, NEEDED.MACHO.3, NEEDED.XCOFF, NEEDED.PECOFF, etc.
- Flags affecting the package's build system behavior, including at least CHOST, CBUILD, CTARGET, CFLAGS, CXXFLAGS, CPPFLAGS, and LDFLAGS. In case a fortran compiler is used, FFLAGS should also be included. These may be empty in the case of packages where compiling/linking is unnecessary.
- Flags affecting the PM's behavior which are not already specified in PMS 13.2, including at least USE and KEYWORDS.
- Dependency between packages calculated by the PM, including at least DEPEND, RDEPEND, and PDEPEND.
- Miscellaneous information including the time the packages was built, the repository name, DEFINED_PHASES, EAPI, INHERITED eclasses and SLOT.
It is not the purpose of this GLEP to specify the details of a common API for exporting the above information. Even less so is it our purpose to delineate the implemenatation details for each PM. However, a common API for exporting the above information should be developed and specified by the PM teams and be included in future PMS documentation. Any changes to API should be versioned to allow for consistency as it develops over time.
As a guide, we recommend a plain CLI API which answers questions as follows:
- What is the SLOT number of a particular version of webkit-gtk?
query-installed metadata =net-libs/webkit-gtk-2.4.4-r200 SLOT
- What is the ABI and of a particular file and the libraries it links against?
query-installed file /usr/bin/timeout ABI NEEDED
- Portage has cached all the above information since v2.2_pre7 2008-05-21; however, it is not exported via a consistent API. Versions of portage with the above specified API implemented can make use of caches built as far back as 2008.
- For PM's that do not cache any of above, a migration scheme should be implemented to generate the cache without having to rebuild world.
- This has been ratified by the Council. See http://www.gentoo.org/proj/en/council/meeting-logs/20130910-summary.txt
- This is specified in PMS. See http://dev.gentoo.org/~zmedico/portage/doc/portage.html#package-ebuild-eapi-4-slot-abi-metadata-slot-sub-slot-abi
- http://git.overlays.gentoo.org/gitweb/?p=proj/elfix.git;a=blob;f=scripts/revdep-pax. The man page can be viewed at http://www.linuxhowtos.org/manpages/1/revdep-pax.htm.