Corosync

Corosync is the currently preferred cluster messaging layer in the Linux cluster community. It is typically used with Pacemaker to set up Gentoo-based clusters.

Installing
Recently there has been a fair amount of standardization-oriented changes within the Linux cluster community. Perhaps as a result, at present the version of Corosync available in the portage tree is out of date.

Download
To get the absolutely latest version of Corosync installed (usually a good idea), you can take the following steps.

First, download the 'git HEAD' (latest release) corosync ebuild from here and temporarily install it in to your local tree.

Unmask
Next, unmask the package.

If you are on amd64 or another untested architecture, you may also need to do the following.

Install
Now emerge the git HEAD release of corosync.

Unmask the git HEAD version (ie. -9999) of pacemaker.

If you are on amd64 or another untested architecture, you may also need to un-archmask it as follows (replace amd64 with your architecture).

Install pacemaker.

Configuring
Gentoo installs the example corosync configuration in to /etc/corosync/corosync.conf.example. First copy this to /etc/corosync/corosync.conf.

Then edit the file to express your appropriate configuration, using your favourite editor.

The main resources for configuration are the man pages, accessible via:

For the quorum section, you can also review:

Note for two-node clusters
If you only have two nodes, you will need to enable the two_nodes directive under the quorum{} section, ie: quorum { provider: corosync_votequorum two_nodes: 1 }

Note on hostnames
When building clusters with Corosync and Pacemaker, the primary management tool crm_mon will identify hosts based upon their hostname. Therefore it is desirable to set a hostname that is definitely unique on each node. You can achieve this easily in one of two ways, either setting up hostname entries on your DHCP server (if nodes are DHCP configured), or by setting the hostname from a unique identifier (such as the eth0 MAC address). Here's my hack for the latter, which I run from a custom /init (passed as a kernel option to diskless nodes with NFS root): hostname `cat /sys/class/net/eth0/address|sed 's/://'` echo "hostname=\"`hostname`\"" >/etc/conf.d/hostname
 * 1) set hostname

If you find yourself with the cluster remembering old/wrong hostname for nodes and you are still in the testing phase, then you can resolve the issue by shutting down all cluster nodes, removing their /var/lib/corosync/ring* cache files, and restarting. This might not be a good idea on live clusters.

Running
Corosync is managed as a standard Gentoo OpenRC service, ie. you can start and stop it as follows.

Debugging
Corosync logs to /var/log/cluster/corosync.log by default. To view the log, run:

If you are having issues even starting Corosync successfull (such as receiving "Status: crashed" when executing /etc/init.d/corosync status), then you can start the daemon manually with the -f (foreground) option as follows. (You might also consider first enabling the log_to_stderr directive within /etc/corosync/corosync.conf).

Next steps
Once you have corosync installed and talking between a couple of machines, you may wish to move on to installing Pacemaker.

External resources

 * For help with configurations, try #linux-cluster (Corosync-oriented) or #linux-ha (Pacemaker-oriented) on Freenode IRC.