Project:Toolchain/Corrupt VDB ELF files
Background
The key facts are as follows:
VDB
- The VDB is a set of files with metadata about installed packages, typically under /var/db/pkg/. Information about each package is recorded there at emerge-time about the contents of the package, its dependencies, the build environment, etc.
Problem point
- If Portage was at some point unable to run scanelf (from app-misc/pax-utils) during a package install, it may have generated incomplete ELF metadata inside the VDB, such as missing
NEEDED
,NEEDED.ELF.2
,PROVIDES
, orREQUIRES
files.- These are the files that Portage refers to when determining out what old libraries need to be preserved during an upgrade because of existing consumers of the old versions.
- Only packages installed while scanelf was unavailable are broken.
- (So, one could grep for packages upgraded between sys-libs/glibc major versions and app-misc/pax-utils in /var/log/emerge.log)
- Typically an affected system will have some packages that included shared libraries (.so files listed in its
CONTENTS
file), but have noPROVIDES
and/orNEEDED*
files.
- This corruption may not reveal itself until a change occurs in the corrupted packages, e.g. new major version with ABI incompatibility and a new SONAME.
Impact
- If a change occurs in a package with corrupted metadata, Portage will fail to detect that the older version of the library is required due to the corrupt database.
- It may then remove older versions of the library depended on by other packages (until they are e.g. rebuilt) which could leave them in a broken state.
- Most of the time, if an old library is removed when it was still needed by some package, this will have only a fairly local effect: said package can't run until it is rebuilt against the newer library.
- However, the worst-case-scenario is when a library critical to Portage has been removed, e.g. a dev-libs/libffi upgrade.
Solution
Install recovery tool
root #
emerge --ask app-portage/recover-broken-vdb
It provides two user-facing scripts:
- recover-broken-vdb-find-broken.sh - to find broken packages (detection)
- recover-broken-vdb - to fix the VDB (mitigation/fix)
Check for broken files
Run the detection shell script provided by app-portage/recover-broken-vdb and place a list of broken packages into broken_vdb_packages:
root #
recover-broken-vdb-find-broken.sh | tee broken_vdb_packages
It is only necessary to run this as a one-off, given fixes have been made to Portage and the Gentoo repository.
Fix up VDB (recommended, but optional)
This step may take several minutes (possibly with no output) depending on machine speed and the number of installed packages.
Backup the VDB first:
root #
cp -r /var/db/pkg /var/db/pkg.orig
Instruct the tool to make corrections to the VDB which outputs to a temporary location by default:
root #
recover-broken-vdb
The tool must be run again to actually make changes.
If the output looks correct, either manually merge the temporary directory it creates with the VDB at /var/db/pkg or run the tool again as:
root #
recover-broken-vdb --output /var/db/pkg
Rebuild affected packages
Upgrade app-misc/pax-utils to ensure no future corruption occurs:
root #
emerge --ask --verbose --oneshot ">=app-misc/pax-utils-1.3.3"
Then rebuild the broken packages, as the system should now be in a safe enough state to do so:
root #
emerge --ask --verbose --oneshot --usepkg=n $(grep -v '#' broken_vdb_packages)
If the relevant packages are no longer on your system (previous command fails), you can do this instead:
root #
emerge --ask --verbose --oneshot --usepkg=n $(grep -v '#' broken_vdb_packages | sed -e "s:^=:>=:")
Rebuild all packages
Note that binary packages may need to be discarded given they may contain corrupt metadata. It may be possible to skip discarding them if a system with all of them installed has no detected corruption using the above tools.
Given that there are possible other side-effects of the corruption/bug, it is strongly recommended that if any corruption is detected, all packages on the system should be rebuilt, after following the above steps:
root #
emerge --ask --emptytree --usepkg=n @world
Post-mortem
- glibc packaging changes
- glibc now depends on a newer version of pax-utils
- glibc now has an explicit comment about checking the pax-utils lower bound
- pax-utils now has an explicit comment about checking glibc when bumping pax-utils
- Created a checklist for bumping sys-libs/glibc
- ebuild now references the new checklist.
- Portage
- Portage seemingly wasn't handling error-checking corectly when scanelf failed.
- Portage did not warn when installing a package with inconsistent metadata.
- See PR.
- Portage installed binpkgs with corrupt VDB metadata without any warnings.
- See same PR.
- Future work: Portage could try to fix the VDB if it notices corruption?
Links
- bug #811462 ('sys-apps/portage: fails to sometimes preserve library (libffi.so.7) (ImportError: libffi.so.7: cannot open shared object file: No such file or directory')
- Fix my Gentoo — rescuing an installation when a chroot is not possible