Project:Infrastructure/Mirrors/Distfile Mirroring System
This guide describes how our distfile process flow works, including how a new tarball gets added to the distfile mirrors
Placing files on the Gentoo Mirror system
The mirror system will automatically fetch any distfile that is in the ebuild tree. Developers don't have to do anything unless an error occurs. The mirror system is designed to propogate to all nodes within 4 hours of the files hitting the Master Private Dist Mirror using cron jobs to pull from the Master. Due to various issues the nodes may take as long as 24 hours for your file to propogate. If you suspect that your file is not being fetched simply check the failure report .
If you're ebuild contains restrict="mirror" the file will not be mirrored. The only exception to that is mirror://gentoo/ . This is automatically done by the mirror system, no manual intervention is required.
Files placed in dev.gentoo.org:/space/distfiles-whitelist/current will be retained for six months unless manually deleted earlier. Place any file that you want to be retained on the mirror system, even if no ebuild refers to it, here. Keep in mind that the mirror system will retain files for two weeks after it is last referred to in an ebuild so only use distfiles-whitelist if absolutely necessary.
All entries in dev.gentoo.org:/space/distfiles-whitelist/current MUST come with a comment in the same format as profiles/package.mask . If you wish to whitelist a lot of files, you should create a seperate file in the same directory instead.
Placing files in the distfiles-whitelist takes them out of the control of the Mirror System. If you remove the file the system will automatically take back control and clean the file like normal. Files are automatically removed from whitelist after six months.
Automatic fetch failure
When the automatic fetch fails it is the responsibility of the package maintainer to manually retrieve the file from the original location and place it in /space/distfiles-local on dev.gentoo.org. This file is published as an rsync directory, to which the private master distfile mirror connects to and retrieves any files in the directory. These files are synchronized to /home/distfiles/distfiles-local on the private master distfile mirror. From there, the /home/distfiles/scripts/distsync.sh runs every 30 minutes to synchronize /home/distfiles/distfiles-local and /home/distfiles/distfiles on the private master distfile mirror. Files placed in distfiles-local are automatically removed after two weeks and the Mirror System takes control of the file.
Files placed in distfiles-local will override existing files of the same name that already exist taking them out of the control of the Mirror System for the full two weeks that the file resides in distfiles-local. If you place a file here make sure that it does not already exist or breakage could occur.
The mirror system only downloads the first instance of a file name. If subsequent ebuilds reference this file name the checksums of the two URI's are compared, if they do not match the second file will not be fetched. The mirror system will produce an error and human intervention is required. Please check file names carefully.
Common fetch errors:
- URI port must be 80, 443, or 23
- URI is malformed (mirrors:// is a common mistake, mirror:// is proper)
- Mirror target isn't valid (doesn't specify a valid tier)
- Checksum conflict with another ebuild in the tree - check your file name
- Upstream host timeout while attempting to connect - Mirror System will reattempt at next pass
- Upstream host isn't valid - check your URL name.
Technical details and requirements
master private distfile mirror
Source tarballs are automatically fetched and placed on/removed from the mirror system and an exception report generated by three related scripts: update_distfiles.sh , mirror-dist.sh and gen-report-xml.py . These scripts run on osprey.gentoo.org. (all currently maintained by zmedico and ferringb).
The master script is /home/distfiles/scripts/update_distfiles.sh and runs once every four hours via cron job. The /home/distfiles/scripts/mirror-dist.sh script maintains a database of the death and purgatory lists. The /home/distfiles/scripts/gen-report-xml.py script generates an xml file ( /home/distfiles/reports/failure.xml ) based on /home/distfiles/logs/failure.log .
The master private distfile mirror needs a distfiles user account. This account should be configured to run /home/distfiles/distfiles/scripts/update_distfiles.sh every four hours. Files are placed in /mnt/distfiles/distfiles which is configured in /etc/rsync/rsyncd.conf to be available as an rsync module. From there, gentoo.oregonstate.edu runs an hourly cron job that syncs this directory. gentoo.oregonstate.edu has a password-protected rsync module available, the information which is only distributed to official Gentoo distfile mirrors. Each mirror should be synchronizing with this directory once every four hours.
- A distfiles user account on the private master distfile mirror
- The update_distfiles.sh , mirror-dist.sh and gen-report-xml.py scripts
- /mnt/distfiles/distfiles configured as an rsync module
- The necessary cron job set up to run the master script, update_distfiles.sh, every four hours
Step by Step
- update_distfiles calls mirror-dist.sh
- mirror-dist.sh calls ebuild which scans the tree and collects file/digest pairs.
- If the URI is a mirror, verify the mirror URI. If invalid, fail and write an error in the fail log.
- If an existing file is found on the mirror system it's checksum is verifed. If it matches the file is used. If it fails the file is deleted.
- Files that don't exist on the mirror system yet are downloaded from the source URI's until the file is complete or all source URI's are exhausted.
- Once all files are complete the death-watch database is updated by recursing the tree and looking for any files that exist on the mirror system but do not appear in any ebuild.
- Any file that doesn't exist in an ebuild is added to death watch.
- Any file with a death watch date of > two weeks is moved to purgatory.
- Files in purgatory are removed after two weeks.
- Exceptions to the death watch list can be added in /space/distfiles-whitelist
- Files removed from the whitelist are deleted from the mirror system as normal.
- Dump stats
- update_distfiles.sh calls gen-report-xml.py
- gen-report-xml.py creates a report from the stats.
- The report is copied to http://dev.gentoo.org/~zmedico/infra/distfiles/failure.xml via cronjob.
- /space/distfiles-local configured as an rsync module on dev.gentoo.org
- An rsync command to synchronize dev.gentoo.org::distfiles-local with /home/distfiles/distfiles-local on the private master distfile mirror.
- The distsync.sh script
- The necessary cron jobs set up to run the above scripts and commands at the right times.
We would like to thank the following authors and editors for their contributions to this guide:
- Kurt Lieber
- Curtis Napier
- Zach Medico
- Brian Harring
- Robin H. Johnson