Project:Infrastructure/Git migration

From Gentoo Wiki
Jump to:navigation Jump to:search

Status

Final hosting is ready. Launch planning for weekend of August 8/9.

Blockers

  • Infra Manpower
  • Ensure availability of final history conversion host
    • Needs lots of RAM, parallel CPU and some SSD backing
    • Consider a 1-month Hetzner server bidding option
    • Consider a RackSpace OnMetal I/O node (if available by the hour/day)
    • Consider a large AWS instance by hour
      • r3.2xlarge, m4.4xlarge, c4.8xlarge; maybe even larger?

Launch plan

Steps

Top-level items in bold are considered critical path to service migration.

  1. Freeze
    • No more CVS commits to gentoo-x86 ever again
    • CVS->rsync conversion frozen
  2. Take backups
    • Final tree snapshot
    • Final CVS history backup
    • Publish both
  3. Perform cleanups on final snapshot
    • Remove ChangeLog files
    • Convert to thin manifests
  4. Publish cleaned snapshot as reference
  5. Commit fixed snapshot as initial signed commit on new history
  6. Allow developers to clone new repo and commit to it
  7. Turn on git->rsync
    • Manifests: Converts thin->thick
    • Changelogs: (temporary) we explicitly copy the changelog-as-is from the final
  8. Review/fix all scripts for further breakages
  9. Perform history conversion
  10. Re-introduce cleanups in history
    • The state of (history conversion + cleanups) MUST match the state of (initial commit) at this point
  11. Make converted history available as graft point
  12. Adjust git->sync
    • Re-enable true ChangeLog generation
    • (maybe) Implement ChangeLog expiry mechanisms

Tentative date and times

Date and time Event
2015/08/08 15:00 UTC Freeze
2015/08/08 19:00 UTC Git commits open for developers
2015/08/09 01:00 UTC Rsync live again (with delayed changelogs)
2015/08/11 History repo available to graft
2015/08/12 rsync mirrors carry up-to-date changelogs again

Resources

People

This is in a roughly chronological order, and apologies to anybody that was left out.

Contact

For Git migration discussions subscribe to gentoo-scm mailing list: gentoo-scm@lists.gentoo.org

Conversion process

Goals

  • Each Git commit should be mapped to one or more CVS commits
    • Portage two-phase commits (commit 1: ebuilds/files/Manifest, commit 2: Manifest regenerated from $Header$ changes, optionally GPG-signed) should be mapped to a single commit
    • Portage trailer data in CVS commit log should be converted to newline format Git logs
  • As the validation settles, it should become possible to have CVS commits generate known Git commit IDs
    • Start list of validated commit IDs

Pseudocode

do {
 do {
  adjust conversion scripts
  do test conversion
  validated all newly converted commits
 } while (not validation passed on all commits)
 switch CVS to read only
 do final conversion
 final validation
 if(final validation passed) {
   activate Git repo for public commits
   lock CVS permanently 
 } else {
   unlock CVS
 }
} while(still using CVS)

Historical migration

Here is how to generate the historical migration in git:

Validation

Quick notes on how to test:

  • Source for the validation scripts at: https://github.com/rich0/gitvalidate.git
  • Clone the git bundle into a directory
  • Extract the cvs root into a directory
  • (uncertain - may need to set up local bind mounts or symlinks to match the path in the cvs keywords)
  • Checkout the cvs gentoo-x86 module into another directory
  • (uncertain - may need to edit config files to ensure that cvs checkouts hit the local root, and don't hit Gentoo infra - test before running the script, or watch the script and if it isn't using near 100% CPU it probably is hammering the server so stop it!)
  • Use git log to obtain the hash of the last git commit
  • Point TMPDIR at a location with ~10GB of space (/tmp on tmpfs may not cut it and sort will fail).
  • Run gitdump/gitprocesstree.sh <path to git tree root> <head commit hash> > g
  • Run cvsdump/cvsprocesstree.sh <path to gentoo-x86 in cvs root> <path to checkout of gentoo-x86>. > c
  • Create a table in mysql to hold the cvs output:
CREATE TABLE `cvs` (
 `key` int(11) NOT NULL AUTO_INCREMENT,
 `filename` varchar(500) COLLATE utf8_bin NOT NULL,
 `type` varchar(5) COLLATE utf8_bin NOT NULL,
 `hash` varchar(50) COLLATE utf8_bin NOT NULL,
 `timestamp` int(11) NOT NULL,
 `author` varchar(200) COLLATE utf8_bin NOT NULL,
 `message` text COLLATE utf8_bin NOT NULL,
 `revision` varchar(10) COLLATE utf8_bin NOT NULL,
 PRIMARY KEY (`key`),
 KEY `filename` (`filename`(255),`hash`),
 KEY `hash` (`hash`)
) ENGINE=MyISAM AUTO_INCREMENT=3132434 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
  • Create a table in mysql to hold the git output:
CREATE TABLE `git` (
 `key` int(11) NOT NULL AUTO_INCREMENT,
 `filename` varchar(500) COLLATE utf8_bin NOT NULL,
 `type` varchar(5) COLLATE utf8_bin NOT NULL,
 `hash` varchar(50) COLLATE utf8_bin NOT NULL,
 `timestamp` int(11) NOT NULL,
 `author` varchar(200) COLLATE utf8_bin NOT NULL,
 `message` text COLLATE utf8_bin NOT NULL,
 `commit` varchar(50) COLLATE utf8_bin NOT NULL,
 PRIMARY KEY (`key`),
 KEY `filename` (`filename`(255),`hash`),
 KEY `hash` (`hash`)
) ENGINE=MyISAM AUTO_INCREMENT=3030211 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
load data local infile 'c' into table cvs fields terminated by ',' lines terminated by '\n'
(filename,type,hash,timestamp,author,message,revision); load data local infile 'g' into table git 
fields terminated by ',' lines terminated by '\n' (filename,type,hash,timestamp,author,message,commit);
  • Process the data into several tables:
create table onlycvs ENGINE = MYISAM select cvs.* from `cvs` left join `git` as g on 
cvs.hash=g.hash where g.hash is null ; create table onlygit ENGINE = MYISAM select g.* 
from `git` as g left join `cvs`on cvs.hash=g.hash where cvs.hash is null ; 
delete from onlycvs where revision="1.1.1.1" ; delete from onlycvs where filename like "%Manifest%" ; 
delete from onlygit where filename like "%Manifest%" ; create table baddate ENGINE = MYISAM select c.*,g.commit 
from `cvs` as c join `git` as g on (g.hash=c.hash and g.filename=c.filename) where 
abs(c.timestamp - g.timestamp) > 60*60 ; create table badmessage ENGINE = MYISAM 
select c.*, g.author as gauthor, g.commit, g.message as gmessage from `cvs` as c join `git` as g 
on (g.hash=c.hash and g.filename=c.filename) where c.message <> g.message and 
g.filename not like "%Manifest%" and abs(c.timestamp - g.timestamp) < 60*60; 
UPDATE `badmessage` SET `author`=BASE64_DECODE(`author`), `gauthor`=BASE64_DECODE(`gauthor`), 
`message`=BASE64_DECODE(`message`), `gmessage`=BASE64_DECODE(`gmessage`);

History

2006

  • The first major work in VCS Migration was done as a GSoC 2006 project by User:Antarus.
    • Git was mostly too resource intensive at this point for serious consideration, and was slower than CVS.
    • Conversion takes more than 7 days.
    • Decision to stay on CVS

2007

2008

2009

  • October: Gentoo meeting at the GSoC Mentor Summit
    • All Gentoo developers present held a meeting, one of the major topics was blockers and plans for the Git migration.
    • Shawn Pearce, one of the major Git developers, and author of the Repo tool.
    • Decision of a monolith repo, per-category repo, per-package repos: monolith repo wins.

2010

2011

2012

2013

2014

See also