Project:Infrastructure/Rsync

This document explains how to set up a official rsync mirror and your own local mirror.

Terms, names and all that
This guide is intended for people who would like to set up an rsync mirror of their own. It caters not only to those who want to run an official rsync mirror but also those wanting to run private mirrors.

There are three kinds of Gentoo rsync mirrors: main rotation mirrors, community mirrors and private mirrors. Main rotation mirrors are maintained by the Gentoo infrastructure team. They handle the bulk of the Gentoo rsync traffic. The community mirrors are run by volunteers from the Gentoo community. Private mirrors are mirrors run by individuals which are closed off to the public and meant to cut traffic costs and latency for an organization or individual.

At this time, we have enough community mirrors and are actively seeking additional main rotation mirrors. Hardware specifications for main rotation servers include:


 * Minimum of a 2GHz Pentium 4 processor (64-bit with at least 2 cores preferred)
 * Minimum of 2GB RAM (3GB - 4GB is ideal)
 * 15GB of disk space (IDE/SATA is fine)

You would maintain physical possession ownership of the hardware, and keep it online in your own colocation space. Average bandwidth consumption for each main rotation mirror is currently ~10Mbit/sec (around 2.6 TiB per month). As the number of main rotation mirrors increases, this number should decrease accordingly.

If you would like to donate your machine, please email the Mirror Admins with the pertinent information.

Introduction
Many users run Gentoo on several machines and need to sync the portage trees on all of them. Using public mirrors is simply a waste of bandwidth at both ends. Syncing only one machine against a public mirror and all others against that computer would save resources on Gentoo mirrors and save users' bandwidth.

The same holds true for organizations who would like to control the rsync mirror their servers and workstations sync against. Of course, they usually also want to save on bandwidth and traffic costs.

All you need to do is select which machine is going to be your own local rsync mirror and set it up. You should choose a computer that can handle the CPU and disk load that an rsync operation requires. Your local mirror also needs to be available whenever any of your other computers syncs its portage tree. Besides, it should have a static IP address or a name that always resolves to your server. Configuring a DHCP and/or a DNS server is beyond the scope of this guide.

Note that these instructions assume your private rsync mirror is a Gentoo machine. If you intend to run it on a different distribution, the guide for setting up a community mirror might be more helpful. Just don't sync the mirror every half hour but once or twice a day.

Setting up the server
There is no extra package to install as the required software is already on your computer. Setting up your own local rsync mirror is just a matter of configuring the  daemon to make your  directory available for syncing. Create the following configuration file:

You do not need to use the  and   options. By default, all clients will be allowed to connect. The order in which you write the options is not relevant. The server will always check the  option first and grant the connection if the connecting host matches any of the listed patterns. The server will then check the  option and refuse the connection if any match is found. Any host that does not match anything will be granted a connection. Please read the man page for more information.

Now, start your rsync daemon with the following command as the root user:

Let's test your rsync mirror. You do not need to try from another machine but it would be a good idea to do so. If your server is not known by name from all your computers, you can use its IP address instead.

You should see the content of /usr/portage on your mirror.

Your rsync mirror is now set up. Keep running  as you have done so far to keep your server up-to-date. If you use cron or similar facilities to sync regularly, remember to keep it down to a sensible frequency like once or twice a day.

Configuring your clients
Now, make your other computers use your own local rsync mirror instead of a public one. Edit your and make the   variable point to your server.

You can check that your computer has been properly set up by syncing against your own local mirror for the first time:

Sync against your local mirror.

That's it! All your computers will now use your local rsync mirror whenever you run.

Introduction
Right now, mirroring our Portage tree requires around 600Mb, so it isn't space intensive; having at least 1Gb free should allow for growing room. Setting up a Portage tree mirror is simple -- first, ensure that your mirror has rsync installed. Then, set up your file to look something like this:

You can pick your own locations for most of the files, of course. What's important is the section name. This is the location that rsync clients will try to sync from.

For security reasons, the use of a chrooted environment is required! This has implications for the logged timestamps -- see the FAQ below.

Now, you need to mirror the Gentoo Linux Portage tree. You can use the script below to do so. Again, you'll probably want to change some of the file locations to suit your needs -- in particular, they should match those of your.

Your should contain your IP address and other relevant information about your mirror, such as information about the host providing the Portage mirror and an administrative contact. You can now test your server as outlined in the chapter above.

After you have been approved as an official rsync mirror, your host will be aliased with a name of the form:.

Q: Who should I contact regarding rsync issues and maintenance?
A: Visit Gentoo Bugzilla and fill out a bug on the product "Mirrors", component "Server Problem".

Q: How can I check the freshness of an official rsync server?
The Gentoo infrastructure team monitors all community rsync servers for freshness. You can see the results on the corresponding web page.

Q: I run a private rsync mirror for my company. Can I still access masterportage.gentoo.org?
A: Because our resources are limited, we need to ensure we allocate them in such a way as to provide the maximum amount of benefit to our users. As such, we limit connections to our master rsync and distfile mirrors to public mirrors only. Users are welcome to use our regular mirror system to establish a private rsync mirror, though they are asked to follow certain basic rsync etiquette guidelines.

Q: Is it important that I sync my community rsync mirror twice an hour?
A: Yes it is important. You do not need to perform the syncs at exactly :00 and :30 but the syncs should take place in each of the following two windows:


 * :00 to :10
 * :30 to :40

Additionally, please make sure that your syncs are exactly 30 minutes apart. So, if you schedule the first sync of each hour for :08, please schedule the second sync of the hour for :38.

Q: Where should I sync my rsync mirror before I become an official Gentoo mirror?

 * For European-based rsync mirror: sync to rsync.de.gentoo.org
 * For US-based rsync mirror: sync to rsync.us.gentoo.org
 * For all others: sync to rsync.us.gentoo.org

Q: How do I find the mirror nearest to me?
A:  was designed to do this for you. If you haven't already run  then do it. Then run:. After a minute or so netselect will print an IP address. Take this address and use it as the only parameter for rsync with two colons appended to it. e.g.:. You should be able to find out which mirror that is from the banner message. Update your accordingly.

Q: Can I use compression when syncing against masterportage.gentoo.org?
A: No. Compression utilizes too many resources on the server, so we have forcibly disabled it on. Please do not attempt to use compression when syncing against this server.

Q: I'm seeing a lot of old and probably dead rsync processes, how can I get rid of them?
This command will help you to kill old rsync processes that sometimes lie around due to connection problems. It's important to kill those because they count as valid connections for the 'max connections' option. You may run this command via crontab every hour, it will search and kill rsync processes older than one hour.

Q: There are many users who connect to my rsync server very frequently, sometimes even causing a DoS to my mirror, is there any way to prevent this?
In some cases, there are a few inconsiderate users who abuse the rsync mirror system by syncing more than 1-2 times per day. In the most extreme cases, users schedule cron jobs to sync every 15 minutes or so. This often leads to a Denial of Service attack by continually occupying an rsync slot that could have otherwise gone to another user. To try and prevent this, you may use the this perl script which will scan your rsync log files, pick out IP addresses that have already connected more than  times that day and dynamically create a  file, including the offending IP addresses in the 'hosts deny' directive. The following line controls what  equals (in this case 4):

Define maximum number of connections per IP

If you use this script, please remember to rotate your rsync log files daily and modify the script to match the location of your file. This script is tested on Gentoo Linux, but should work suitably on other arches that support both rsync and perl.

Acknowledgements
We would like to thank the following authors and editors for their contributions to this guide:


 * Gentoo Mirror Administrators
 * Tobias Klausmann
 * Xavier Neys