Important: You are required to change your passwords used for Gentoo services and set an email address for your Wiki account if you haven't done so. See the full announcement and Wiki email policy change for more information.

Project:Infrastructure/Rsync

From Gentoo Wiki
< Project:Infrastructure
Revision as of 13:29, 11 December 2013 by SwifT (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This document explains how to set up a official rsync mirror and your own local mirror.

Preliminaries

Terms, names and all that

This guide is intended for people who would like to set up an rsync mirror of their own. It caters not only to those who want to run an official rsync mirror but also those wanting to run private mirrors.

There are three kinds of Gentoo rsync mirrors: main rotation mirrors, community mirrors and private mirrors. Main rotation mirrors are maintained by the Gentoo infrastructure team. They handle the bulk of the Gentoo rsync traffic. The community mirrors are run by volunteers from the Gentoo community. Private mirrors are mirrors run by individuals which are closed off to the public and meant to cut traffic costs and latency for an organization or individual.

At this time, we have enough community mirrors and are actively seeking additional main rotation mirrors. Hardware specifications for main rotation servers include:

  • Minimum of a 2GHz Pentium 4 processor (64-bit with at least 2 cores preferred)
  • Minimum of 2GB RAM (3GB - 4GB is ideal)
  • 15GB of disk space (IDE/SATA is fine)

You would maintain physical possession ownership of the hardware, and keep it online in your own colocation space. Average bandwidth consumption for each main rotation mirror is currently ~10Mbit/sec (around 2.6 TiB per month). As the number of main rotation mirrors increases, this number should decrease accordingly.

If you would like to donate your machine, please email the Mirror Admins with the pertinent information.

Setting up your own local rsync mirror

Introduction

Many users run Gentoo on several machines and need to sync the portage trees on all of them. Using public mirrors is simply a waste of bandwidth at both ends. Syncing only one machine against a public mirror and all others against that computer would save resources on Gentoo mirrors and save users' bandwidth.

The same holds true for organizations who would like to control the rsync mirror their servers and workstations sync against. Of course, they usually also want to save on bandwidth and traffic costs.

All you need to do is select which machine is going to be your own local rsync mirror and set it up. You should choose a computer that can handle the CPU and disk load that an rsync operation requires. Your local mirror also needs to be available whenever any of your other computers syncs its portage tree. Besides, it should have a static IP address or a name that always resolves to your server. Configuring a DHCP and/or a DNS server is beyond the scope of this guide.

Note that these instructions assume your private rsync mirror is a Gentoo machine. If you intend to run it on a different distribution, the guide for setting up a community mirror might be more helpful. Just don't sync the mirror every half hour but once or twice a day.

Setting up the server

There is no extra package to install as the required software is already on your computer. Setting up your own local rsync mirror is just a matter of configuring the rsyncd daemon to make your /usr/portage directory available for syncing. Create the following /etc/rsyncd.conf configuration file:

File/etc/rsyncd.conf

pid file = /var/run/rsyncd.pid
max connections = 5
use chroot = yes
uid = nobody
gid = nobody
# Optional: restrict access to your Gentoo boxes
hosts allow = 192.168.0.1 192.168.0.2 192.168.1.0/24
hosts deny  = *
  
[gentoo-portage]
path=/usr/portage
comment=Gentoo Portage
exclude=distfiles/ packages/

You do not need to use the hosts allow and hosts deny options. By default, all clients will be allowed to connect. The order in which you write the options is not relevant. The server will always check the hosts allow option first and grant the connection if the connecting host matches any of the listed patterns. The server will then check the hosts deny option and refuse the connection if any match is found. Any host that does not match anything will be granted a connection. Please read the man page ( man rsyncd.conf ) for more information.

Now, start your rsync daemon with the following command as the root user:

root # /etc/init.d/rsyncd start
root #
rc-update add rsyncd default

Let's test your rsync mirror. You do not need to try from another machine but it would be a good idea to do so. If your server is not known by name from all your computers, you can use its IP address instead.

root # rsync 192.168.0.1::
gentoo-portage     Gentoo Portage
root # rsync your_server_name::gentoo-portage

You should see the content of /usr/portage on your mirror.

Your rsync mirror is now set up. Keep running emerge --sync as you have done so far to keep your server up-to-date. If you use cron or similar facilities to sync regularly, remember to keep it down to a sensible frequency like once or twice a day.

Note
Please note that most public mirror administrators consider syncing more than once or twice a day an abuse. Some if not most of them will ban your IP from their server if you start abusing their machines.

Configuring your clients

Now, make your other computers use your own local rsync mirror instead of a public one. Edit your /etc/portage/make.conf and make the SYNC variable point to your server.

File/etc/portage/make.confSet SYNC

# (Use your server IP address)
SYNC="rsync://192.168.0.1/gentoo-portage"
# (Or use your server name)
SYNC="rsync://your_server_name/gentoo-portage"

You can check that your computer has been properly set up by syncing against your own local mirror for the first time:

root # emerge --info | grep SYNC
SYNC="rsync://your_server_name/gentoo-portage"

Sync against your local mirror.

root # emerge --sync

That's it! All your computers will now use your local rsync mirror whenever you run emerge --sync .

Setting up a community rsync server

Introduction

Note
You can find sample configuration and script files in the gentoo-rsync-mirror package. Just do emerge gentoo-rsync-mirror .

Right now, mirroring our Portage tree requires around 600Mb, so it isn't space intensive; having at least 1Gb free should allow for growing room. Setting up a Portage tree mirror is simple -- first, ensure that your mirror has rsync installed. Then, set up your rsyncd.conf file to look something like this:

Filersyncd.conf

uid = nobody
gid = nobody
use chroot = yes
max connections = 15
pid file = /var/run/rsyncd.pid
motd file = /etc/rsync/rsyncd.motd
log file = /var/log/rsync.log
transfer logging = yes
log format = %t %a %m %f %b
syslog facility = local3
timeout = 300
  
[gentoo-portage]
#modern versions of portage use this entry
path = /gentoo/rsync
comment = Gentoo Linux Portage tree mirror
exclude = distfiles

You can pick your own locations for most of the files, of course. What's important is the section name ( [gentoo-portage] ). This is the location that rsync clients will try to sync from.

For security reasons, the use of a chrooted environment is required! This has implications for the logged timestamps -- see the FAQ below.

Now, you need to mirror the Gentoo Linux Portage tree. You can use the script below to do so. Again, you'll probably want to change some of the file locations to suit your needs -- in particular, they should match those of your rsyncd.conf .

Filersync-gentoo-portage.sh

#!/bin/bash
  
RSYNC="/usr/bin/rsync"
OPTS="--quiet --recursive --links --perms --times -D --delete --timeout=300"
#Uncomment the following line only if you have been granted access to masterportage.gentoo.org
#SRC="rsync://masterportage.gentoo.org/gentoo-portage"
#If you are waiting for access to our master mirror, select one of our mirrors to mirror from:
SRC="rsync://rsync.de.gentoo.org/gentoo-portage"
DST="/space/gentoo/rsync/"
  
echo "Started update at" `date` >> $0.log 2>&1
logger -t rsync "re-rsyncing the gentoo-portage tree"
${RSYNC} ${OPTS} ${SRC} ${DST} >> $0.log 2>&1
  
echo "End: "`date` >> $0.log 2>&1

Your rsyncd.motd should contain your IP address and other relevant information about your mirror, such as information about the host providing the Portage mirror and an administrative contact. You can now test your server as outlined in the #local chapter above.

After you have been approved as an official rsync mirror, your host will be aliased with a name of the form: rsync[num].[country code].gentoo.org .

Short FAQ

Q: Who should I contact regarding rsync issues and maintenance?

A: Visit Gentoo Bugzilla and fill out a bug on the product "Mirrors", component "Server Problem".

Q: How can I check the freshness of an official rsync server?

The Gentoo infrastructure team monitors all community rsync servers for freshness. You can see the results on the corresponding web page .

Q: I run a private rsync mirror for my company. Can I still access masterportage.gentoo.org?

A: Because our resources are limited, we need to ensure we allocate them in such a way as to provide the maximum amount of benefit to our users. As such, we limit connections to our master rsync and distfile mirrors to public mirrors only. Users are welcome to use our regular mirror system to establish a private rsync mirror, though they are asked to follow certain basic rsync etiquette guidelines .

Q: Is it important that I sync my community rsync mirror twice an hour?

A: Yes it is important. You do not need to perform the syncs at exactly :00 and :30 but the syncs should take place in each of the following two windows:

  1.  :00 to :10
  2.  :30 to :40

Additionally, please make sure that your syncs are exactly 30 minutes apart. So, if you schedule the first sync of each hour for :08, please schedule the second sync of the hour for :38.

Q: Where should I sync my rsync mirror before I become an official Gentoo mirror?

  • For European-based rsync mirror: sync to rsync.de.gentoo.org
  • For US-based rsync mirror: sync to rsync.us.gentoo.org
  • For all others: sync to rsync.us.gentoo.org

Q: How do I find the mirror nearest to me?

A: netselect was designed to do this for you. If you haven't already run emerge netselect then do it. Then run: netselect rsync.gentoo.org . After a minute or so netselect will print an IP address. Take this address and use it as the only parameter for rsync with two colons appended to it. e.g.: rsync 1.2.3.4:: . You should be able to find out which mirror that is from the banner message. Update your /etc/portage/make.conf accordingly.

Q: Can I use compression when syncing against masterportage.gentoo.org?

A: No. Compression utilizes too many resources on the server, so we have forcibly disabled it on masterportage.gentoo.org . Please do not attempt to use compression when syncing against this server.

Q: I'm seeing a lot of old and probably dead rsync processes, how can I get rid of them?

This command will help you to kill old rsync processes that sometimes lie around due to connection problems. It's important to kill those because they count as valid connections for the 'max connections' option. You may run this command via crontab every hour, it will search and kill rsync processes older than one hour.

root # /bin/kill -9 `/bin/ps --no-headers -Crsync -o etime,user,pid,command | /bin/grep nobody | \
/bin/grep "[0-9]\{2\}:[0-9]\{2\}:" | /bin/awk '{print $3}'`

Q: There are many users who connect to my rsync server very frequently, sometimes even causing a DoS to my mirror, is there any way to prevent this?

In some cases, there are a few inconsiderate users who abuse the rsync mirror system by syncing more than 1-2 times per day. In the most extreme cases, users schedule cron jobs to sync every 15 minutes or so. This often leads to a Denial of Service attack by continually occupying an rsync slot that could have otherwise gone to another user. To try and prevent this, you may use the this perl script which will scan your rsync log files, pick out IP addresses that have already connected more than N times that day and dynamically create a rsyncd.conf file, including the offending IP addresses in the 'hosts deny' directive. The following line controls what N equals (in this case 4):

CodeDefine maximum number of connections per IP

@badhosts=grep {$hash{$_}>4} keys %hash;

If you use this script, please remember to rotate your rsync log files daily and modify the script to match the location of your rsyncd.conf file. This script is tested on Gentoo Linux, but should work suitably on other arches that support both rsync and perl.

Acknowledgements

We would like to thank the following authors and editors for their contributions to this guide:

  • Gentoo Mirror Administrators
  • Tobias Klausmann
  • Xavier Neys