Project:Infrastructure/Backups v3

= Planning new backups = Infra used to have good backups, but all past backups got lost with multiple sponsor losses (we had two copies of the data, one in europe, one in USA, both got lost for different reasons, in a short time frame).

We should get backups going again, and cover EVERYTHING in future, and keep a redundant copy of the backups in the cloud, since we lost multiple hosts before :-(.

Requirements

 * MUST: True incremental
 * with garbage collection
 * Potentially forever storage
 * MUST: Encryption
 * MUST: Unattended backup
 * SHOULD: encryption should be optional, as we have some public data
 * MUST: Compression
 * SHOULD: be optional, some repos are already over pre-compressed data (eg distfiles, releases)
 * MUST: provide validation of backups
 * MUST: External storage
 * MUST: Off-host
 * MUST: Support cloud storage (AWS-S3, Ceph-S3, other).
 * Maybe free Ceph-S3 from Dreamhost...
 * AWS S3-IA/Glacier are $0.0125/USD/GB/month or less.
 * rsync.net has a Borg/Attic option at $0.03USD/GB/month (http://rsync.net/products/attic.html)
 * MUST: Scale to >2TB single repos (releases+distfiles historical)
 * Attic had a corruption issue at scale: http://librelist.com/browser/attic/2015/3/31/comparison-of-attic-vs-bup-vs-obnam/#cbbe599389a20c787a74b137dc78fb1a
 * MUST: be open-source
 * SHOULD: de-dupe
 * MUST: provide a CLI
 * SHOULD: bundle small files into blobs

Known Backup software
[Arch Linux Sync & Backup programs] contains a good feature comparison list.


 * Amanda
 * Attic
 * Arq (closed source, included for comparision)
 * BackupPC
 * Bacula
 * Backupninja
 * BorgBackup (fork of Attic)
 * See also borgmatic, Atticmatic
 * bup
 * Burp
 * btar
 * DAR
 * ddar
 * Dirvish
 * Deja-dup
 * Duplicati
 * duplicity
 * git-annex
 * obnam
 * rdiff-backup
 * rsnapshot
 * SafeKeep
 * SyncThing
 * tarsnap (closed source, included for comparision)
 * Unison
 * ZBackup
 * See also https://github.com/davidbartonau/zbackup-tar