Difference between revisions of "Project:Toolchain/Corrupt VDB ELF files"

From Gentoo Wiki
Jump to:navigation Jump to:search
(Drop commented lines from broken_vdb_packages (see talk page))
(Add version specifier sed (see talk))
 
Line 65: Line 65:
 
Then rebuild the broken packages, as the system should now be in a safe enough state to do so:
 
Then rebuild the broken packages, as the system should now be in a safe enough state to do so:
 
{{RootCmd|emerge --ask --verbose --oneshot --usepkg{{=}}n $(grep -v '#' broken_vdb_packages)}}
 
{{RootCmd|emerge --ask --verbose --oneshot --usepkg{{=}}n $(grep -v '#' broken_vdb_packages)}}
 +
 +
If the relevant packages are no longer on your system (previous command fails), you can do this instead:
 +
{{RootCmd|emerge --ask --verbose --oneshot --usepkg{{=}}n $(grep -v '#' broken_vdb_packages {{!}} sed -e "s:^{{=}}:>{{=}}:")}}
  
 
=== Rebuild all packages ===
 
=== Rebuild all packages ===

Latest revision as of 05:26, 18 December 2021

Background

The key facts are as follows:

VDB

  • The VDB is a set of files with metadata about installed packages, typically under /var/db/pkg/. Information about each package is recorded there at emerge-time about the contents of the package, its dependencies, the build environment, etc.

Problem point

  • If Portage was at some point unable to run scanelf (from app-misc/pax-utils) during a package install, it may have generated incomplete ELF metadata inside the VDB, such as missing NEEDED, NEEDED.ELF.2, PROVIDES, or REQUIRES files.
    • These are the files that Portage refers to when determining out what old libraries need to be preserved during an upgrade because of existing consumers of the old versions.
  • Only packages installed while scanelf was unavailable are broken.
  • Typically an affected system will have some packages that included shared libraries (.so files listed in its CONTENTS file), but have no PROVIDES and/or NEEDED* files.
  • This corruption may not reveal itself until a change occurs in the corrupted packages, e.g. new major version with ABI incompatibility and a new SONAME.

Impact

  • If a change occurs in a package with corrupted metadata, Portage will fail to detect that the older version of the library is required due to the corrupt database.
    • It may then remove older versions of the library depended on by other packages (until they are e.g. rebuilt) which could leave them in a broken state.
  • Most of the time, if an old library is removed when it was still needed by some package, this will have only a fairly local effect: said package can't run until it is rebuilt against the newer library.
    • However, the worst-case-scenario is when a library critical to Portage has been removed, e.g. a dev-libs/libffi upgrade.

Solution

Install recovery tool

root #emerge --ask app-portage/recover-broken-vdb

It provides two user-facing scripts:

  1. recover-broken-vdb-find-broken.sh - to find broken packages (detection)
  2. recover-broken-vdb - to fix the VDB (mitigation/fix)

Check for broken files

Run the detection shell script provided by app-portage/recover-broken-vdb and place a list of broken packages into broken_vdb_packages:

root #recover-broken-vdb-find-broken.sh | tee broken_vdb_packages

It is only necessary to run this as a one-off, given fixes have been made to Portage and the Gentoo repository.

Fix up VDB (recommended, but optional)

Tip
This step may take several minutes (possibly with no output) depending on machine speed and the number of installed packages.
Warning
Backup the VDB first:
root #cp -r /var/db/pkg /var/db/pkg.orig

Instruct the tool to make corrections to the VDB which outputs to a temporary location by default:

root #recover-broken-vdb

The tool must be run again to actually make changes.

If the output looks correct, either manually merge the temporary directory it creates with the VDB at /var/db/pkg or run the tool again as:

root #recover-broken-vdb --output /var/db/pkg

Rebuild affected packages

Upgrade app-misc/pax-utils to ensure no future corruption occurs:

root #emerge --ask --verbose --oneshot ">=app-misc/pax-utils-1.3.3"

Then rebuild the broken packages, as the system should now be in a safe enough state to do so:

root #emerge --ask --verbose --oneshot --usepkg=n $(grep -v '#' broken_vdb_packages)

If the relevant packages are no longer on your system (previous command fails), you can do this instead:

root #emerge --ask --verbose --oneshot --usepkg=n $(grep -v '#' broken_vdb_packages | sed -e "s:^=:>=:")

Rebuild all packages

Warning
Note that binary packages may need to be discarded given they may contain corrupt metadata. It may be possible to skip discarding them if a system with all of them installed has no detected corruption using the above tools.

Given that there are possible other side-effects of the corruption/bug, it is strongly recommended that if any corruption is detected, all packages on the system should be rebuilt, after following the above steps:

root #emerge --ask --emptytree --usepkg=n @world

Post-mortem

  • glibc packaging changes
    • glibc now depends on a newer version of pax-utils
    • glibc now has an explicit comment about checking the pax-utils lower bound
    • pax-utils now has an explicit comment about checking glibc when bumping pax-utils
    • Created a checklist for bumping sys-libs/glibc
  • Portage
    • Portage seemingly wasn't handling error-checking corectly when scanelf failed.
      • See PR (and commit).
      • Future work: We may invoke scanelf without seccomp within Portage, or attempt it once w/ seccomp with a known binary, and fall back if it fails.
      • Further-into-the-future work: Possibly replace scanelf usage within Portage with a native Python solution, e.g. pyelftools.
    • Portage did not warn when installing a package with inconsistent metadata.
    • Portage installed binpkgs with corrupt VDB metadata without any warnings.
      • See same PR.
    • Future work: Portage could try to fix the VDB if it notices corruption?

Links

  • bug #811462 ('sys-apps/portage: fails to sometimes preserve library (libffi.so.7) (ImportError: libffi.so.7: cannot open shared object file: No such file or directory')
  • Fix my Gentoo — rescuing an installation when a chroot is not possible