User:Blueness/GLEP64

Abstract
During build time, important information is generated by the package management system (PMS) about the package(s) being built. When Portage is used as the PMS, this information is cached on a per package basis in directories under /var/db/pkg/ / (VDB). While this information can be regenerated on the fly, doing so may be expensive, as is the case with the linking information cached in NEEDED.ELF.2. To avoid the unnecessary cost of regenerating, and to facilitate interoperability between the PMS and other tools which could use this information, all PMS's should cache their VDB information and provide an API for exporting it. In this GLEP, we specify what information should be cached and exported.

Motivation
Most information generated at build time and cached in VDB by portage is useful and should be export; however, linking information such as that reported by `readelf -d` on ELF objects, is exceptionally useful and expensive to regenerate. On a "rolling release" such as Gentoo, tracing forward and reverse dependencies between executable objects and their libraries is critical to avoid breakage during upgrade. The need to trace these dependencies is evident in PMS features like sub-slotting which aim to make sure that executables are always consistently built against libraries: upgrading a library which breaks backwards compatibility automatically triggers rebuilding of its dependent executable(s). While sufficient in their own scope, the current standardized PMS features tracing these dependencies have limitations: 1) this information is calculated to ensure consistency at build time, but is not cached and exported afterwards for use by other tools, such as `revdep-pax`, which use the same information to consistently apply PaX markings between executables and libraries ; and, 2) such information is not sufficiently fine grained for tools which require discrimination on the basis of ABI, soname, library path name etc. Using VDB information as currently cached by portage in NEEDED.ELF.2, an entire "linkage map" of executables and libraries on a system can be constructed to facilitate quick traversal of both forwards and backwards dependencies. Questions like "what are the path names of all the executables on this system which link against libssl.so.1.0.0 for ABI=x32?" can be quickly answered without having to reread the dynamic section of many executable objects.

Specifications
The following information should be gathered/calculated at build time, cached and later exported for all package installed on a system. Each PMS must document its API for exporting to other tools. For each package, the following must be cached:


 * The package category, name, version, as well as its DESCRIPTION, HOMEPAGE and LICENSE.
 * A list of all files belonging to each package, along with a designation of the file type (regular, directory, symlink, pipe, etc), MD5SUM or other checksum, and creation time.
 * A list of all executable or shared objects for each package and the corresponding linking information, including full path to the object, its architecture and ABI, SONAME, RPATH and any NEEDED objects they link against, as reported by `readelf -d` or similar tools.
 * Flags affecting the package's build system behavior, including at least CHOST, CBUILD, CTARGET, CFLAGS, CXXFLAGS, CPPFLAGS, and LDFLAGS. These may be empty in the case of packages where compiling/linking is unnecessary.
 * Flags affecting the PMS's behavior, including at least USE, IUSE, FEATURES and KEYWORDS.
 * Dependency between packages calculated by the PMS, including at least DEPEND, RDEPEND, or PDEPEND.
 * Miscellaneous information gathered or calculated by the PMS, including at least the repository name, BUILD_TIME, DEFINED_PHASES, EAPI, INHERITED eclasses, and SLOT.

Implementation notes
Each PMS is free to implement the exporting of the above information as they see fit and documentation must provided which references the above. However, given the state of current PMS's, the following recommendations are made:


 * 1) Exporting should be done via a python module, e.g. `import portage`.  Bindings for other languages can be developed on an as needed basis.
 * 2) Portage does cache and export the above information via

However, further awkward unpacking of vardb is required to obtain specific information on each package. It is recommended that an abstraction layer be added which simplifies access. For example, a list of the SONAMES of the libraries that link against /usr/bin/timeout from sys-apps/coreutils can be obtained by

Similarly its MD5SUM can be obtained by

Backwards compatibility

 * 1) Portage has cached all the above information since v2.2_pre7 2008-05-21; however, it is not exported via a consistent API. Versions of portage with the above specified API implemented can make use of caches built as far back as 2008.
 * 2) For PMS's that do not cache the above, a migration scheme can be implemented to recalculate VDB information without having to rebuild world to generate the cache.