Difference between revisions of "Distcc"

From Gentoo Wiki
Jump to:navigation Jump to:search
(added note about risk of having a right gcc version as a slot)
(→‎Testing: Fixed example)
Line 355: Line 355:
 
}}
 
}}
  
Next, turn on verbose mode and compile the program using {{c|distcc}}:
+
Next, turn on verbose mode, compile the program using {{c|distcc}} and link generated object file into executable:
  
 
{{Cmd
 
{{Cmd
 
|export DISTCC_VERBOSE{{=}}1
 
|export DISTCC_VERBOSE{{=}}1
|distcc gcc -c main.c
+
|distcc gcc -c main.c -o main.o
 +
|gcc main.o -o main
 
}}
 
}}
  
Line 366: Line 367:
 
Finally, ensure the compiled program works properly. To test each host, enumerate each compile host in the hosts file.
 
Finally, ensure the compiled program works properly. To test each host, enumerate each compile host in the hosts file.
  
{{Cmd|./main.o|output=<pre>Hello distcc!</pre>}}
+
{{Cmd|./main|output=<pre>Hello distcc!</pre>}}
  
 
== Troubleshooting == <!--T:56-->
 
== Troubleshooting == <!--T:56-->

Revision as of 04:02, 19 February 2016

Resources

Distcc is a program designed to distribute compiling tasks across a network to participating hosts. It is comprised of a server, distccd, and a client program, distcc. Distcc can work transparently with ccache, Portage, and Automake with a little setup.

When planning on using distcc to help bootstrap a Gentoo installation, make sure to read Using distcc to bootstrap.

Installation

Before configuring distcc, let's first look into the installation of the sys-devel/distcc package on all hosts.

Requirements across all hosts

In order to use distcc, all of the computers on the network need to have the same GCC versions. For example, mixing 3.3.x (where the x varies) is okay, but mixing 3.3.x with 3.2.x may result in compilation errors or runtime errors.

Installing the software

Distcc ships with a graphical monitor to monitor tasks that a computer is sending away for compilation. This monitor is enabled when the gtk USE flag is set.

After configuring the USE setting, install the sys-devel/distcc package:

root #emerge --ask sys-devel/distcc
Important
Remember to install sys-devel/distcc on all of the participating machines.

Auto-starting the distcc daemon

In order to have distccd started automatically, follow the next set of instructions, depending on the init system used.

Using OpenRC

Edit /etc/conf.d/distccd and make sure to set the --allow directive to allow only trusted clients. For added security, use the --listen directive to tell the distccd daemon what IP to listen on (for multi-homed systems). More information on distcc security can be found at Distcc security notes.

The following example allows the distcc clients running at 192.168.0.4 and 192.168.0.5 to connect to the distccd server running locally:

FILE /etc/conf.d/distccdAllowing specific clients to connect to distccd
DISTCCD_OPTS="--port 3632 --log-level notice --log-file /var/log/distccd.log -N 15 --allow 192.168.0.4 --allow 192.168.0.5"
Important
It is important to use --allow and --listen. Please read the distccd man page or the above security document for more information.

Now start the distccd daemon on all the participating computers:

root #rc-update add distccd default
root #rc-service distccd start

Using systemd

Edit /etc/systemd/system/distccd.service.d/00gentoo.conf and add the allowed clients in CIDR format. Here is an example:

FILE /etc/systemd/system/distccd.service.d/00gentoo.confSetting ALLOWED_SERVERS
Environment="ALLOWED_SERVERS=192.168.1.0/24"
Note
The name "ALLOWED_SERVERS" here is rather confusing as it refers to the clients that are allowed to connect to the local distccd server. Nevertheless, it is this variable which is used in the distccd service as value for the --allow option – see /usr/lib/systemd/system/distccd.service.

Reload the unit files after making such changes:

root #systemctl daemon-reload

Enable auto-starting distccd and then start the service:

root #systemctl enable distccd
root #systemctl start distccd

Configuration

Let's now look into the configuration of distcc.

Specifying participating hosts

Use the distcc-config command to set the list of hosts.

The following is an example list of host definitions. In most cases, variants of lines 1 and 2 suffice. The latter uses the /limit syntax to inform distcc about the maximum amount of jobs to be launched on this node. More information about the syntax used in lines 3 and 4 can be found in the distcc manual page.

CODE Examples of host definitions
192.168.0.1          192.168.0.2                       192.168.0.3
192.168.0.1/2        192.168.0.2                       192.168.0.3/10
192.168.0.1:4000/2   192.168.0.2/1                     192.168.0.3:3632/4
@192.168.0.1         @192.168.0.2:/usr/bin/distccd     192.168.0.3

There are also several other methods of setting up hosts. See the distcc man page (man distcc) for more details.

If compilations should also occur on the local machine, put localhost in the hosts list. Conversely if the local machine is not to be used to compile, omit it from the hosts list. On a slow machine using localhost may actually slow things down. Make sure to test the settings for performance.

Let's configure distcc to use the hosts mentioned on the first line in the example:

root #/usr/bin/distcc-config --set-hosts "192.168.0.1 192.168.0.2 192.168.0.3"

Using distcc with Portage

Setting up Portage to use distcc is easy. It is a matter of enabling the distcc feature, and setting a decent value for the number of simultaneous build jobs (as distcc increases the amount of build resources).

Set the MAKEOPTS variable and FEATURES variable as shown below.

A common strategy is to

  • set the value of N to twice the number of total (local + remote) CPU cores + 1, and
  • set the value of M to the number of local CPU cores

The use of -lM in the MAKEOPTS variable will prevent spawning too many tasks when some of the distcc cluster hosts are unavailable (increasing the amount of simultaneous jobs on the other systems) or when an ebuild is configured to disallow remote builds (such as with gcc). This is accomplished by refusing to start additional jobs when the system load is at or above the value of M.

FILE /etc/portage/make.confSetting MAKEOPTS and FEATURES
# Replace N and M with the right value as calculated previously
MAKEOPTS="-jN -lM"
FEATURES="distcc distcc-pump"
Note
The distcc’s pump mode may significantly decreases build time for big packages. It caches preprocessed headers on a server side and as result get rid of repeated uploading and preprocessing header files.

For instance, when there are two quad-core host PCs running distccd and the local PC has a dual core CPU, then the MAKEOPTS variable could look like this:

FILE /etc/portage/make.confMAKEOPTS example for 2 quad-core (remote) and one dual core (local) PC
# 2 remote hosts with 4 cores each = 8 cores remote
# 1 local host with 2 cores = 2 cores local
# total number of cores is 10, so N = 2*10+1 and M=2
MAKEOPTS="-j21 -l2"

While editing the make.conf file, make sure that it does not have -march=native in the CFLAGS or CXXFLAGS variables. distccd will not distribute work to other machines if march is set to native. The appropriate -march= value can be obtained by running the following command:

user $gcc -v -E -x c -march=native -mtune=native - < /dev/null 2>&1 | grep cc1 | perl -pe 's/ -mno-\S+//g; s/^.* - //g;'

See Inlining -march=native for distcc for more information.

Using distcc with automake

This is, in some cases, easier than the Portage setup. All that is needed is to update the PATH variable to include /usr/lib/distcc/bin/ in front of the directory that contains gcc (/usr/bin/). However, there is a caveat. If ccache is used, then put the distcc location after the ccache one:

root #export PATH="/usr/lib/ccache/bin:/usr/lib/distcc/bin:${PATH}"

Put this in the user's ~/.bashrc or equivalent file to have the PATH set every time the user logs in, or set it globally through an /etc/env.d/ file.

Instead of calling make alone, add in -jN (where N is an integer). The value of N depends on the network and the types of computers that are used to compile. A heuristic approach to the right value is given earlier in this article.

Using distcc to bootstrap

Using distcc to bootstrap (i.e. build a working toolchain before installing the remainder of the system) requires some additional steps to take.

Step 1: configure Portage

Boot the new box with a Gentoo Linux LiveCD and follow the installation instructions, while keeping track of the instructions in the Gentoo FAQ for information about bootstrapping. Then configure Portage to use distcc:

FILE /etc/portage/make.confConfigure Portage to use distcc
FEATURES="distcc"
MAKEOPTS="-jN"

Update the PATH variable in the installation session as well:

root #export PATH="/usr/lib/ccache/bin:/usr/lib/distcc/bin:${PATH}"

Step 2: getting distcc

Install sys-devel/distcc:

root #USE='-*' emerge --nodeps sys-devel/distcc

Step 3: setting up distcc

Run distcc-config --install to setup distcc; substitute the host# in the example with the IP addresses or hostnames of the participating nodes.

root #/usr/bin/distcc-config --set-hosts "localhost host1 host2 host3 ..."

Distcc is now set up to bootstrap! Continue with the proper installation instructions and do not forget to run emerge distcc after running emerge @system. This is to make sure that all of the necessary dependencies are installed.

Note
During bootstrap and emerge @system distcc may not appear to be used. This is expected as some ebuilds do not work well with distcc, so they intentionally disable it.

Distcc extras

The distcc application has additional features and applications to support working in a distcc environment.

Distcc monitors

Distcc ships with two monitoring utilities. The text-based monitoring utility is always built and is called distccmon-text. Running it for the first time can be a bit confusing, but it is really quite easy to use. If the program is run with no parameter it will run just once. However, if it is passed a number it will update every N seconds, where N is the argument that was passed.

user $distccmon-text 10

The other monitoring utility is only enabled when the gtk USE flag is set. This one is GTK+ based, runs in an X environment, and it is quite lovely. For Gentoo, the GUI monitor has been renamed to distccmon-gui to make it less confusing (it is originally called distccmon-gnome).

user $distccmon-gui

To monitor Portage's distcc usage:

root #DISTCC_DIR="/var/tmp/portage/.distcc/" distccmon-text 10
root #DISTCC_DIR="/var/tmp/portage/.distcc/" distccmon-gui
Important
If the distcc directory is elsewhere, change the DISTCC_DIR variable accordingly.

A trick is to set DISTCC_DIR in environment variables:

root #echo 'DISTCC_DIR="/var/tmp/portage/.distcc/"' >> /etc/env.d/02distcc

Now update the environment:

root #env-update
root #source /etc/profile

Finally, start the GUI application:

root #distccmon-gui

Using SSH for distcc communication

Setting up distcc via SSH includes some pitfalls. First, generate an SSH key pair without password setup. Be aware that portage compiles programs as the Portage user (or as root if FEATURES="userpriv" is not set). The home folder of the Portage user is /var/tmp/portage/, which means the keys need to be stored in /var/tmp/portage/.ssh/

root #ssh-keygen -b 2048 -t rsa -f /var/tmp/portage/.ssh/id_rsa

Second, create a section for each host in the SSH configuration file:

FILE /var/tmp/portage/.ssh/configAdd per-host sections
Host test1
    HostName 123.456.789.1
    Port 1234
    User UserName
 
Host test2
    HostName 123.456.789.2
    Port 1234
    User UserName

Send the public key to each compilation node:

root #ssh-copy-id -i /var/tmp/portage/.ssh/id_rsa.pub UserName@CompilationNode

Also make sure that each host is available in the known_hosts file:

root #ssh-keyscan -t rsa <compilation-node-1> <compilation-node-2> [...] > /var/tmp/portage/.ssh/known_hosts

Fix the file ownership as follows:

root #chown -R portage:portage /var/tmp/portage/.ssh/

To set up the hosts test1 and test2, run:

root #/usr/bin/distcc-config --set-hosts "@test1 @test2"

Please note the @ (@ sign), which specifies ssh hosts for distcc.

Finally, tell distcc which SSH binary to use:

FILE /etc/portage/make.conf
DISTCC_SSH="ssh"

It is not necessary to run the distccd initscript on the hosts when distcc communicates via SSH.

Testing

To test distcc, write a simple Hello distcc program and run distcc in verbose mode to see if it communicates properly.

FILE main.c
#include <stdio.h>

int main() {
    printf("Hello distcc!\n");
    return 0;
}

Next, turn on verbose mode, compile the program using distcc and link generated object file into executable:

user $export DISTCC_VERBOSE=1
user $distcc gcc -c main.c -o main.o
user $gcc main.o -o main

There should be a bunch of output about distcc finding its configuration, selecting the host to connect to, starting to connect to it, and ultimately compile main.c. If the output does not list the desired distcc hosts, check the configuration.

Finally, ensure the compiled program works properly. To test each host, enumerate each compile host in the hosts file.

user $./main
Hello distcc!

Troubleshooting

If a problem occurs while using distcc, then this section might help in resolving the problem.

ERROR: failed to open /var/log/distccd.log

As of January 22nd, 2015 emerging fails to create the proper distccd.log file in /var/log/. This apparently only effects version 3.1-r8 of distcc. This bug is in the process of being corrected (see bug #477630). It is possible to work around this by manually creating the log file, giving it proper ownership, and restarting the distccd daemon:

root #mkdir -p /var/log/distcc
root #touch /var/log/distcc/distccd.log
root #chown distcc:daemon /var/log/distcc/distccd.log

Next update the /var/log path of the distccd configuration file in /etc/conf.d/distccd to the distcc directory created in the step before:

FILE /etc/conf.d/distccdUpdating log path
DISTCCD_OPTS="--port 3632 --log-level notice --log-file /var/log/distcc/distccd.log -N 15

Finally, restart the distccd service:

root #/etc/init.d/distccd restart

Some packages do not use distcc

As various packages are installed, users will notice that some of them aren't being distributed (and aren't being built in parallel). This may happen because the package' Makefile doesn't support parallel operations, or the maintainer of the ebuild has explicitly disabled parallel operations due to a known problem.

Sometimes distcc might cause a package to fail to compile. If this happens, please report it.

Mixed GCC versions

If the environment hosts different GCC versions, there will likely be very weird problems. The solution is to make certain all hosts have the same GCC version.

Recent Portage updates have made Portage use ${CHOST}-gcc (minus gcc) instead of gcc. This means that if i686 machines are mixed with other types (i386, i586) then the builds will run into troubles. A workaround for this may be to run:

root #export CC='gcc' CXX='c++'

It is also possible to set the CC and CXX variables in /etc/portage/make.conf to the values list in the command above.

Important
Doing this explicitly redefines some behavior of Portage and may have some weird results in the future. Only do this if mixing CHOSTs is unavoidable.

Please note that having a right version of gcc as a slot on a server isn’t enough. Portage uses distcc as a replacement for compiler from CHOST variable (i.e. x86_64-pc-linux-gnu) and distccd invokes it by exactly same name. A right version of gcc should be a default system’s compiler on all involved compilation hosts.

-march=native

Starting with GCC 4.3.0, the compiler supports the -march=native option which turns on CPU auto-detection and optimizations that are worth being enabled on the processor on which GCC is running. This creates a problem when using distcc because it allows the mixing of code optimized for different processors. For example, running distcc with -march=native on a system that has an AMD Athlon processor and doing the same on another system that has an Intel Pentium processor will mix code compiled on both processors together.

Heed the following warning:

Warning
Do not use -march=native or -mtune=native in the CFLAGS or CXXFLAGS variables of make.conf when compiling with distcc.

To know the flags that GCC would enable when called with -march=native, execute the following:

user $gcc -march=native -E -v - </dev/null 2>&1 | grep cc1
/usr/libexec/gcc/x86_64-pc-linux-gnu/4.7.3/cc1 -E -quiet -v - -march=corei7-avx \
  -mcx16 -msahf -mno-movbe -mno-aes -mpclmul -mpopcnt -mno-abm -mno-lwp -mno-fma \
  -mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mavx -mno-avx2 -msse4.2 -msse4.1 \
  -mno-lzcnt -mno-rdrnd -mno-f16c -mno-fsgsbase --param l1-cache-size=32 \
  --param l1-cache-line-size=64 --param l2-cache-size=6144 -mtune=corei7-avx

See also

  • The DistCC Cross-compiling guide explains how using one architecture to build programs for another architecture is done through distcc. This can be as simple as using an Athlon (i686) to build a program for a K6-2 (i586), or using a SPARC to build a program for a PowerPC.

External resources


This page is based on a document formerly found on our main website gentoo.org.
The following people contributed to the original document: Lisa Seelye, Mike Frysinger, Erwin, Sven Vermeulen, Lars Weiler, Tiemo Kieft and nightmorph
They are listed here because wiki history does not allow for any external attribution. If you edit the wiki article, please do not add yourself here; your contributions are recorded on each article's associated history page.