Project:Toolchain/Corrupt VDB ELF files

Background
The key facts are as follows:

VDB

 * The VDB is a set of files with metadata about installed packages, typically under . Information about each package is recorded there at emerge-time about the contents of the package, its dependencies, the build environment, etc.

Problem point

 * If Portage was at some point unable to run (from ) during a package install, it may have generated incomplete ELF metadata inside the VDB, such as missing ,  ,  , or   files.
 * These are the files that Portage refers to when determining out what old libraries need to be preserved during an upgrade because of existing consumers of the old versions.


 * Only packages installed while was unavailable are broken.
 * (So, one could grep for packages upgraded between major versions and  in )


 * Typically an affected system will have some packages that included shared libraries (.so files listed in its  file), but have no   and/or   files.


 * This corruption may not reveal itself until a change occurs in the corrupted packages, e.g. new major version with ABI incompatibility and a new SONAME.

Impact

 * If a change occurs in a package with corrupted metadata, Portage will fail to detect that the older version of the library is required due to the corrupt database.
 * It may then remove older versions of the library depended on by other packages (until they are e.g. rebuilt) which could leave them in a broken state.


 * Most of the time, if an old library is removed when it was still needed by some package, this will have only a fairly local effect: said package can't run until it is rebuilt against the newer library.
 * However, the worst-case-scenario is when a library critical to Portage has been removed, e.g. a upgrade.

Install recovery tool
It provides two user-facing scripts:
 * 1)  - to find broken packages (detection)
 * 2)  - to fix the VDB (mitigation/fix)

Check for broken files
Run the detection shell script provided by and place a list of broken packages into :

It is only necessary to run this as a one-off, given fixes have been made to Portage and the Gentoo repository.

Fix up VDB (recommended, but optional)
Instruct the tool to make corrections to the VDB which outputs to a temporary location by default:

The tool must be run again to actually make changes.

If the output looks correct, either manually merge the temporary directory it creates with the VDB at or run the tool again as:

Rebuild affected packages
Upgrade to ensure no future corruption occurs:

Then rebuild the broken packages, as the system should now be in a safe enough state to do so:

Rebuild all packages
Given that there are possible other side-effects of the corruption/bug, it is strongly recommended that if any corruption is detected, all packages on the system should be rebuilt, after following the above steps:

Post-mortem

 * glibc packaging changes
 * glibc now depends on a newer version of pax-utils
 * glibc now has an explicit comment about checking the pax-utils lower bound
 * pax-utils now has an explicit comment about checking glibc when bumping pax-utils
 * Created a checklist for bumping
 * ebuild now references the new checklist.


 * Portage
 * Portage seemingly wasn't handling error-checking corectly when failed.
 * See PR (and commit).
 * Future work: We may invoke scanelf without seccomp within Portage, or attempt it once w/ seccomp with a known binary, and fall back if it fails.
 * Further-into-the-future work: Possibly replace usage within Portage with a native Python solution, e.g. pyelftools.
 * Portage did not warn when installing a package with inconsistent metadata.
 * See PR.
 * Portage installed binpkgs with corrupt VDB metadata without any warnings.
 * See same PR.
 * Future work: Portage could try to fix the VDB if it notices corruption?

Links

 * ('sys-apps/portage: fails to sometimes preserve library (libffi.so.7) (ImportError: libffi.so.7: cannot open shared object file: No such file or directory')