Project:Infrastructure/Git migration/GSoC2006

From Gentoo Wiki
Jump to:navigation Jump to:search

This page was created to track progress regarding the migration of Gentoo's CVS repositories to another versioning control system. This is being done in conjuction with a Google Summer of Code project.

CVS Migration

Reasons for Migration

  • CVS does not support branching in a sane manner.
  • CVS commits are not atomic.
  • CVS has a lot of overhead when working with a remote checkout.


Milestones for success

Task Status Due Date Date of Completion Notes
Select VCS systems for testing. We have selected Mercurial, Subversion, GIT, and our control, CVS. May 22nd, 2006 May 22nd, 2006 Other VCS systems may be added provided a repository snapshot is provided by June 19th 2006.
Convert gentoo-x86 to each VCS System. Completed git, svn, cvs June 5th, 2006 July 14th, 2006 Completed a migration to svn and git, total time ~30 days!
Design a set of stress tests and analysis tools for Version Control Systems. Skipped, see notes June 12th, 2006 Completed June 19th, 2006 I was hoping to have a design prior to starting, but work commitments forced me to accelerate my project a bit. This phase was moreso me scratching things out on a napkin ;)
Implement a set of stress tests and analysis tools for Version Control Systems. Dropped June 19th, 2006 July 14th I basically broke down and used dstat, I got to the point where I had spent about 12 hours on the code for this tool, and then figured why spend more time when a superior tool exists. As such I decided use the better tool instead of writing a replacement.
Run the stress tests on each VCS system in order to generate a useful data set. July 26th, 2006 August 3rd, 2006 Completed Auguest 15th This was completed around August 3rd
Analyze the data and present this to the Gentoo Community. Attempt to have the community choose a VCS system. In the event that the community takes too long in determining their future VCS system; discuss with Lance and pick a VCS to continue the project with. Started Aug 10th, 2006 September 4th The code was ported to all three systems; isntead of choosing just one.
Compose a Migration Plan to migrate to the new VCS System. Started August 21st August 30th Sept 4th GLEP XX is currently in the submittal process.
Update and author developer documentation related to the VCS system. This will include updating any tools that are focussed on VCS systems such as echangelog, repoman, and the cvs->rsync scripts. Start July 14th Aug 8th, 2006 Pending Completion Repoman and echangelog been released but need thorough testing. Please see here
Set up test environment and give developers a change to use the new VCS when it is not live. This is also a chance to make sure all tools work properly. Not yet started Sept 1st, 2006 Pending Completion GLEP XX must be approved before this can start.
Test in the testing environment for up to one month, ensure sufficient hardware requirements and also ensure that real world data matches data collected during the data mining. Not yet started Oct 1st, 2006 Pending Completion GLEP XX must be approved before this can start
Set up the live system and migrate to it. Not yet started Nov 1st, 2006 Pending Completion GLEP XX must be approved before this can start.


Version Control Systems under consideration

System Pros Cons Migration Full Checkout Space Considerations Bandwidth Usage Memory Usage Others?
Subversion Atomic Commits, Merging, Tagging, Branching is a copy operation, Versioned Metadata, Directory Versioning, Annotation Twice the disk space Migration Complete (cvs2svn) ( conversion stats) 17 minutes, 3 seconds. Server Usage (7.3gb) Client Usage (2.8gb) 21.8mb/s 20mb per checkout server statsclient stats
GIT Annotation; Two, interchangeable, on-disk formats are used: An efficient, packed format that saves space and network bandwidth An unpacked format, optimized for fast writes and incremental work. Merging, tagging, branching, very fine grained control. Being a distributed VCS means it may be difficult for us to use, has high server spec requirements. Migration complete, minus the Authors file.
  • Smart Clone: 84 minutes, 58 seconds
  • Dumb Clone: 121 minutes, 52 seconds (over http)

Note: Smart clone being a checkout over a smart protocol, one that will generate the packs for you; this generally pains the server (lots of ram and cpu usage). However the cloned repo will be all ready for you to use. A Dumb clone is one over http, or rsync, where the server just tranfersfiles and the client does all the work to prep the repository.

Packed (1.1gb), Unpacked (1.6gb) 1.72mb/s (smart), 1.2mb/s (dumb)
  • Up to 400mb per clone (Smart clone on the server), server process was causing a high server load (~1.0 load per checkout)
  • Up to 60mb per clone (Dumb clone on the server), server process only occasionally caused a high load (spikes to 1.0, but usually around .2)


Smart clone StatsDumb clone Stats (Server)Dumb clone Stats (Client)
CVS (Stats on the current usage) Already converted, status quo, no migration, no training, does what we need 90% of the time. Sucks at branching, merging branches back in. Migration Unnecessary 8 minutes, 54 seconds. Server (1.6gb) Checkout(~880Mb) 13.18mb/s 15mb per checkout server statsclient stats

This page is based on a document formerly found on our main website gentoo.org.
The following people contributed to the original document: Alec Warner
They are listed here because wiki history does not allow for any external attribution. If you edit the wiki article, please do not add yourself here; your contributions are recorded on each article's associated history page.