Project:Infrastructure/Redirector

From Gentoo Wiki
Jump to:navigation Jump to:search

DRAFT STATUS!

Objective

Replace distfiles.g.o and bouncer.g.o with a new service redirector.g.o, that solves the problems of both old service, due to their large overlap.

Non-objectives

  • Replace distfiles.g.o with a CDN service
    • This may happen in parallel in future, but very high bandwidth costs should be considered
  • Replace with a caching proxy that serves distfiles from a LRU cache
    • High bandwidth cost and hosting costs

Problems

  • bouncer.g.o needs many things fixed:
    • Bouncer is very obsolete and not maintained upstream
    • Bouncer DID do a good job checking w/ HTTP HEAD requests that files existed before redirecting users to them.
    • Bouncer itself has a very high latency from some parts of the world.
    • Prior Bouncer replacement attempts were a failure
      • Mirrorbrain is tightly coupled with Apache and has specific requirements for mirrors to run checks locally
    • Configuration for bouncer does not scale:
      • Mirror configuration manual & painful
      • File configuration painful
  • distfiles.g.o needs many things fixed:
    • How it works: distfiles.g.o is run as a DNS-round robin, with a small subset of Gentoo mirrors that agreed to respond to HTTP Vhost requests for Host: distfiles.gentoo.org.
    • Despite the name, it serves distfiles, releases, experimental, snapshots
    • Performance is terrible: very small set of mirrors involved
    • SSL:
      • HTTPS is becoming mandatory in Browsers, breaking access to http://distfiles.gentoo.org
      • Unless we provide & maintain SSL certificates for each participating mirrors, the service will break in the near future. Mirrors have indicated an unwillingness to raise their maintenance cost.

User story

A user should be able to use a single service that can redirect to a local mirror for fetching distfiles, releases, snapshots exist.

Requirements

Redirector

  • Lightweight on-demand HTTP redirection
  • Scale-to-zero required

Checker

  • checking objects:
    • validate existence of objects: most important
    • validate size of objects: important
    • validate content of objects: not important, other validation paths exist
    • validate non-existence of old objects: least important, mirrors sometimes delete at a slower rate
    • re-validate prior objects
  • Able to efficiently check large numbers of objects on every mirror
    • distfiles: ~73000 files, ~73000 symlinks-to-files, ~300 directories
    • releases,experimental,snapshots: ~2400 files, 350 directories, ~70 symlinks-to-files, ~80 symlinks-to-directory
  • Able to handle a large number of mirrors
    • ~60 mirrors (excluding protocol differences)
    • ~155 different mirror access points (~40 ftp, ~60 http, ~30 https, ~25 rsync)

Proposal

TL;DR: Run a service at CDN-edge-like locations that generates HTTP temporary redirects to objects, with a backing service that checks existence.

  • Leader:
    • Populate storage w/ expected state of all objects
    • Fields:
      • Name/Path
      • Type (file/symlink**/directory)
      • Size
      • Mtime
      • Optional: Checksums (**etag format for some mirrors, expensive to check)
    • Retain knowledge of old objects, do not delete
    • Probably runs on the master distfiles central node w/ emirrordist
  • Checker:
    • Concept: fetch metadata for every object and compare to expected state
    • Run HTTP or rsync requests for every object.
      • HTTP HEAD must be done for every object. HTTP Pipelining will have large benefits here.
      • rsync -n: should be able to just run for the entire mirror in one pass and parse output.
    • Must: Prioritize checking new objects on all mirrors
    • Should: Run checkers regionally/close to each mirror, because checks are latency-bound
  • Redirection Data Builder:
    • For each region, build a redirection map, object -> nearest mirror(s)
    • Rebuild maps on some regular cadence (hourly?)
  • Storage:
    • Store check data
    • Store redirection maps
  • Redirector:
    • Run using AWS Lambda@Edge OR OpenFaaS at many points (prefer Lambda@Edge for better scaling & locality)
    • Load redirection map for that region
    • If object is known in map, pick one of the valid mirrors in that region to send traffic to
      • Valid: object exists on that mirror && mirror online
    • If object is not known in map, redirect to special fallback host
    • Maybe: rank mirrors?
    • Maybe: set headers for caching proxies to not cache the redirect?