Project:Infrastructure/Redirector

DRAFT STATUS!

Objective

Replace distfiles.g.o and bouncer.g.o with a new service redirector.g.o, that solves the problems of both old service, due to their large overlap.

Non-objectives

Replace distfiles.g.o with a CDN service
- This may happen in parallel in future, but very high bandwidth costs should be considered
Replace with a caching proxy that serves distfiles from a LRU cache
- High bandwidth cost and hosting costs

Problems

bouncer.g.o needs many things fixed:
- Bouncer is very obsolete and not maintained upstream
- Bouncer DID do a good job checking w/ HTTP HEAD requests that files existed before redirecting users to them.
- Bouncer itself has a very high latency from some parts of the world.
- Prior Bouncer replacement attempts were a failure
  - Mirrorbrain is tightly coupled with Apache and has specific requirements for mirrors to run checks locally
- Configuration for bouncer does not scale:
  - Mirror configuration manual & painful
  - File configuration painful

distfiles.g.o needs many things fixed:
- How it works: distfiles.g.o is run as a DNS-round robin, with a small subset of Gentoo mirrors that agreed to respond to HTTP Vhost requests for Host: distfiles.gentoo.org.
- Despite the name, it serves distfiles, releases, experimental, snapshots
- Performance is terrible: very small set of mirrors involved
- SSL:
  - HTTPS is becoming mandatory in Browsers, breaking access to http://distfiles.gentoo.org
  - Unless we provide & maintain SSL certificates for each participating mirrors, the service will break in the near future. Mirrors have indicated an unwillingness to raise their maintenance cost.

User story

A user should be able to use a single service that can redirect to a local mirror for fetching distfiles, releases, snapshots exist.

Requirements

Redirector

Lightweight on-demand HTTP redirection
Scale-to-zero required

Checker

checking objects:
- validate existence of objects: most important
- validate size of objects: important
- validate content of objects: not important, other validation paths exist
- validate non-existence of old objects: least important, mirrors sometimes delete at a slower rate
- re-validate prior objects

Able to efficiently check large numbers of objects on every mirror
- distfiles: ~73000 files, ~73000 symlinks-to-files, ~300 directories
- releases,experimental,snapshots: ~2400 files, 350 directories, ~70 symlinks-to-files, ~80 symlinks-to-directory

Able to handle a large number of mirrors
- ~60 mirrors (excluding protocol differences)
- ~155 different mirror access points (~40 ftp, ~60 http, ~30 https, ~25 rsync)

Proposal

TL;DR: Run a service at CDN-edge-like locations that generates HTTP temporary redirects to objects, with a backing service that checks existence.

Leader:
- Populate storage w/ expected state of all objects
- Fields:
  - Name/Path
  - Type (file/symlink**/directory)
  - Size
  - Mtime
  - Optional: Checksums (**etag format for some mirrors, expensive to check)
- Retain knowledge of old objects, do not delete
- Probably runs on the master distfiles central node w/ emirrordist

Checker:
- Concept: fetch metadata for every object and compare to expected state
- Run HTTP or rsync requests for every object.
  - HTTP HEAD must be done for every object. HTTP Pipelining will have large benefits here.
  - rsync -n: should be able to just run for the entire mirror in one pass and parse output.
- Must: Prioritize checking new objects on all mirrors
- Should: Run checkers regionally/close to each mirror, because checks are latency-bound

Redirection Data Builder:
- For each region, build a redirection map, object -> nearest mirror(s)
- Rebuild maps on some regular cadence (hourly?)

Storage:
- Store check data
- Store redirection maps

Redirector:
- Run using AWS Lambda@Edge OR OpenFaaS at many points (prefer Lambda@Edge for better scaling & locality)
- Load redirection map for that region
- If object is known in map, pick one of the valid mirrors in that region to send traffic to
  - Valid: object exists on that mirror && mirror online
- If object is not known in map, redirect to special fallback host
- Maybe: rank mirrors?
- Maybe: set headers for caching proxies to not cache the redirect?