Project:Infrastructure/Redirector

Objective
Replace distfiles.g.o and bouncer.g.o with a new service redirector.g.o, that solves the problems of both old service, due to their large overlap.

Non-objectives

 * Replace distfiles.g.o with a CDN service
 * This may happen in parallel in future, but very high bandwidth costs should be considered
 * Replace with a caching proxy that serves distfiles from a LRU cache
 * High bandwidth cost and hosting costs

Problems

 * bouncer.g.o needs many things fixed:
 * Bouncer is very obsolete and not maintained upstream
 * Bouncer DID do a good job checking w/ HTTP HEAD requests that files existed before redirecting users to them.
 * Bouncer itself has a very high latency from some parts of the world.
 * Prior Bouncer replacement attempts were a failure
 * Mirrorbrain is tightly coupled with Apache and has specific requirements for mirrors to run checks locally
 * Configuration for bouncer does not scale:
 * Mirror configuration manual & painful
 * File configuration painful


 * distfiles.g.o needs many things fixed:
 * How it works: distfiles.g.o is run as a DNS-round robin, with a small subset of Gentoo mirrors that agreed to respond to HTTP Vhost requests for  Host: distfiles.gentoo.org .
 * Despite the name, it serves distfiles, releases, experimental, snapshots
 * Performance is terrible: very small set of mirrors involved
 * SSL:
 * HTTPS is becoming mandatory in Browsers, breaking access to  http://distfiles.gentoo.org 
 * Unless we provide & maintain SSL certificates for each participating mirrors, the service will break in the near future. Mirrors have indicated an unwillingness to raise their maintenance cost.

User Story
A user should be able to use a single service that can redirect to a local mirror for fetching distfiles, releases, snapshots exist.

Redirector

 * Lightweight on-demand HTTP redirection
 * Scale-to-zero required

Checker

 * checking objects:
 * validate existence of objects: most important
 * validate size of objects: important
 * validate content of objects: not important, other validation paths exist
 * validate non-existence of old objects: least important, mirrors sometimes delete at a slower rate
 * re-validate prior objects


 * Able to efficiently check large numbers of objects on every mirror
 * distfiles</tt>: ~73000 files, ~73000 symlinks-to-files, ~300 directories
 * releases,experimental,snapshots</tt>: ~2400 files, 350 directories, ~70 symlinks-to-files, ~80 symlinks-to-directory


 * Able to handle a large number of mirrors
 * ~60 mirrors (excluding protocol differences)
 * ~155 different mirror access points (~40 ftp, ~60 http, ~30 https, ~25 rsync)

Proposal
TL;DR: Run a service at CDN-edge-like locations that generates HTTP temporary redirects to objects, with a backing service that checks existence.


 * Leader:
 * Populate storage w/ expected state of all objects
 * Fields:
 * Name/Path
 * Type (file/symlink**/directory)
 * Size
 * Mtime
 * Optional: Checksums (**etag format for some mirrors, expensive to check)
 * Retain knowledge of old objects, do not delete
 * Probably runs on the master distfiles central node w/ emirrordist</tt>


 * Checker:
 * Concept: fetch metadata for every object and compare to expected state
 * Run HTTP or rsync requests for every object.
 * HTTP HEAD</tt> must be done for every object. HTTP Pipelining will have large benefits here.
 * rsync -n</tt>: should be able to just run for the entire mirror in one pass and parse output.
 * Must: Prioritize checking new objects on all mirrors
 * Should: Run checkers regionally/close to each mirror, because checks are latency-bound


 * Redirection Data Builder:
 * For each region, build a redirection map, object -> nearest mirror(s)
 * Rebuild maps on some regular cadence (hourly?)


 * Storage:
 * Store check data
 * Store redirection maps


 * Redirector:
 * Run using AWS Lambda@Edge OR OpenFaaS at many points (prefer Lambda@Edge for better scaling & locality)
 * Load redirection map for that region
 * If object is known in map, pick one of the valid mirrors in that region to send traffic to
 * Valid: object exists on that mirror && mirror online
 * If object is not known in map, redirect to special fallback host
 * Maybe: rank mirrors?
 * Maybe: set headers for caching proxies to not cache the redirect?