User:Sakaki/Sakaki's EFI Install Guide/Installing the Gentoo Stage 3 Files

Our main goals in this section, which shadows Chapter 5 of the Gentoo handbook, will be to download and unpack the Gentoo 'stage 3' tarball, and to set up Portage's main configuration file.

We'll also present a brief backgrounder covering some key Portage topics, which may be skipped if desired.

Double-Checking the Date and Time
Before downloading anything, double-check that you have the correct time and date on the target machine (you should do, as you set it up earlier, but since having an incorrect date can cause problems later, it is best to make sure now). Check it with:

Per the handbook, you should stick with UTC for now (the real timezone specification will come later in the install). If necessary, set the date and time, in MMDDhhmmYYYY format (Month, Day, hour, minute, year):

Downloading, Verifying and Unpacking the Gentoo Stage 3 Tarball
The 'Stage 3' file is a tarball containing a populated directory structure from a basic Gentoo system. Unlike the minimal installation image however, the stage 3 system contains no kernel, only binaries and libraries essential to bootstrapping. As such, we will have to host it within a chroot from our existing (minimal install image) system, until we have recompiled all system files and libraries, built a fresh kernel, and the new system becomes self-hosting.

The stage 3 tarball is generally released on the same date as the minimal install image, and may be found (together with the usual contents and digest files) in the same autobuilds directory. As the amount of data involved here is small, pace the handbook we'll skip using the slightly awkward links browser / mirror selection process at this stage, and just grab the files directly with wget.

Change to the Gentoo filesystem root mountpoint:

Now download the files. As before, substitute for YYYYMMDD in the below with the current release file date (open the link http://distfiles.gentoo.org/releases/amd64/autobuilds/latest-stage3-amd64.txt in a browser to determine the current name).

As before (when we downloaded the minimal install image), we have to go through a two-stage verification: first check the signature in the DIGESTS.asc file, and then check the digests in that file themselves. As this is the target machine (not the helper), we don't yet have the necessary Gentoo automated weekly release public key, whose fingerprint may be found on the Gentoo release engineering page. So let's fetch it now. Issue:

Once you have the public key, verify the digest file:

And assuming that worked (output reports 'Good signature'), next check the digests themselves; we'll use the SHA512 variants here:

If this outputs:

then continue.

The last step in this stage is to unpack the tarball. Double check you are in the directory, then issue:

As per the handbook, the options to tar are to extract, provide verbose output, decompress with bzip2 (j), preserve permissions and extract from a file, not standard input.

Check that the base system has unpacked OK:

If you see a file structure similar to the above, then you can proceed. As we no longer need the stage files, they can be deleted now to save space:

This structure looks, as it should, a lot like a normal Linux root directory, but it is positioned at. Later, we'll bind in a few additional special directories from our running (minimal) system, and then chroot</tt> into this new base to continue with the installation.

Lastly, return back to root's home:

Before we continue, we'll take a brief detour to discuss some essential Gentoo / Portage background information and terminology. If you are an old hand with this, feel free to skip this material.

<span id="gentoo_overview">Gentoo, Portage, Ebuilds and emerge</tt> (Background Reading)
Gentoo is a source-based distribution, the heart of which is a powerful package manager called Portage. Portage itself has two main components:
 * the ebuild system, which performs the work of fetching, configuring, building and installing packages, and
 * the emerge</tt> tool, which provides a command line interface to control ebuilds, and also allows you to update the Portage tree (discussed below).

Package ebuilds are Bash shell scripts, or more accurately shell script fragments, that are sourced into a larger build system 'host' script. This host script provides a package management control flow that invokes a set of default 'hook' functions, which a particular package's ebuild may override if it needs to (these are covered in detail in the Gentoo Development Guide). The ebuild also must define a minimum set of variables to allow the whole process to operate successfully (for example, the URI from where a package's source tarball may be downloaded must be assigned to the SRC_URI</tt> variable).

Now, when you invoke an ebuild to install a particular (as yet uninstalled) package on your system (via emerge</tt>, for example, as described below), it will typically carry out the following tasks (inter alia):
 * check that the specified package can be installed (that is, that it isn't masked, or has an incompatible license requirement);
 * download the package's tarball (or other format source archive) from an upstream repository (or Gentoo mirror);
 * unpack the tarball in a temporary working area;
 * patch (and otherwise modify) the unpacked source if need be;
 * configure the source to prepare it for compilation on your machine;
 * compile / build the source, as a non-privileged user in the temporary work area;
 * run tests (if provided and required);
 * install the built package to a dummy filesystem root; and
 * copy ('merge') the package installation files from the dummy filesystem root to the real filesystem root (keeping a record of what gets done).

Up until the final file copy-over step (the 'merge' in emerge</tt>), all operations (even where the package's make install</tt> is invoked, for example) take place in a temporary staging area. This enables Portage to keep track of all the files installed by a particular package, limit the damage caused by failed compiles or installs, and facilitate simple removal of installed packages. Furthermore, for most of these tasks, Portage operates in a 'sandbox' mode, where attempts to write directly to the real root filesystem (rather than the temporary work area) are detected, and cause an error to be thrown (NB this is not intended as a security system per se, but it does help prevent accidental filesystem corruption).

<span id="portage_tree">Portage stores ebuilds in a hierarchical folder structure - the Portage tree (or repository), which by default is located under. The first tree level is the package category, which is used to organize packages into groups which have broadly similar functionality. So, for example, non-core development utilities are typically placed in the dev-util</tt> category (in folder). The next tree level is the package name itself. To take a concrete example, the small utility diffstat</tt> (which, as its name suggests, displays a histogram of changes implied by a patch file, or other diff</tt> output), is located in the folder. Within that subdirectory we have the actual per-package content, specifically:
 * The ebuild</tt> files. Each supported version has a file of format  - .ebuild</tt>. At the time of writing, there are two supported versions (1.58 and 1.59) of diffstat</tt> in the Portage tree, so the ebuilds are located at and . Portage supports a complex version numbering taxonomy which, for the most part, reflects upstream versioning (discussed further below), and most packages, unlike diffstat</tt>, will have multiple ebuild versions available at any given time.
 * Package metadata. This is stored in an xml-format text file (one per package), named metadata.xml</tt>. Its contents are described here, and can contain detailed package descriptions, email addresses for upstream maintainers, documentation about USE flags etc. diffstat</tt>'s metadata file is at.
 * A change log for the package's ebuild(s). This is a text file documenting what changes have been checked in to source control over time. The filename is ChangeLog</tt>, so <tt>diffstat</tt>'s may be found at.
 * A manifest file, which contains digests (SHA256, SHA512 and Whirlpool) and file sizes for the contents of the package directory and any referenced tarballs (and patches, if present). It is used to detect corruption and possible tampering during package download / installation. This manifest, which may optionally be digitally signed, is stored in the <tt>Manifest</tt> file; <tt>diffstat</tt>'s therefore resides at.
 * An optional files directory. This is used to hold patches and other small files that are supplementary to the main source tarball but referenced by one or more of the package's ebuilds. The directory may be absent if unused. As (at the time of writing) <tt>diffstat</tt> does not require patches, it has no <tt>files</tt> subdirectory either.

<span id="diffstat_ebuild">A Simple <tt>ebuild</tt> (<tt>diffstat</tt>)
So what does an <tt>ebuild</tt> file actually look like, then? <tt>diffstat</tt> happens to be a good minimal example; here (at the time of writing) is what contains:

Not a lot to see, is there? That's because <tt>diffstat</tt> uses a standard 'Autotools'-style build, without patches, so the default <tt>ebuild</tt> control flow (and invoked 'hook' functions) can do almost everything for us. Therefore, all that has to be done is:
 * to specify (via the <tt>EAPI</tt> variable) that the <tt>ebuild</tt> makes use of the most modern package manager functionality, including built-in default behaviours (version <tt>5</tt>, at the time of writing).
 * to specify a brief <tt>DESCRIPTION</tt>, <tt>HOMEPAGE</tt> (both self-explanatory) and most importantly, <tt>SRC_URI</tt>; this last variable tells Portage the location from whence to download the package tarball, if it cannot find it in the Portage mirrors (the <tt>${P}</tt> expands out to be the package name and version);
 * to specify the <tt>LICENSE</tt> (the relevant text may be found at );
 * to specify that SLOTTING is not used by this ebuild (this is an advanced feature; see below for a brief overview); and
 * finally to list the architectures (<tt>KEYWORDS</tt>) for which this <tt>ebuild</tt> applies. Here, we can see for example that it is stable (no tilde) for <tt>amd64</tt>, but only in 'testing' (has a tilde) for <tt>mips</tt>.

That's all that is needed in this case, because the default <tt>ebuild</tt> functions will automatically pull down the tarball, unpack it, issue a <tt>./configure</tt>, issue a <tt>make</tt>, followed by a <tt>make install</tt> (to a dummy root), after which, the program file (plus manpage etc.) will be copied over ('merged') to the real filesystem (and any prior version's files safely unmerged immediately thereafter).

There are then two main ways to invoke the <tt>diffstat</tt> ebuild. The <span id="use_emerge">first (and more common way) is via <tt>emerge</tt>: typically, you would issue:

The second (lower level) way is invoke the <tt>ebuild</tt> directly; for example, you could issue:

which will clean Portage's temporary build directories, and then perform all the steps of the ebuild workflow, providing detailed output as it does so (you can also use the <tt>ebuild</tt> command to perform only certain steps, if you wish, and it can also create <tt>Manifest</tt> files; see the <tt>ebuild</tt> manpage for details).

<span id="nwipe_ebuild">A More Complex <tt>ebuild</tt> (<tt>nwipe</tt>)
The <tt>diffstat</tt> example above is about as simple as a real-world ebuild gets!

However, one common additional requirement is the need to apply patches. To do this, an ebuild will typically override the <tt>src_prepare</tt> ebuild 'hook' function (invoked by the standard <tt>ebuild</tt> flow after the source tarball has been successfully unpacked), and then use the epatch utility function to apply patches held in the <tt>files</tt> directory.

For example, consider the <tt>nwipe</tt> package, which provides tools to securely wipe disks. It lives in the <tt>app-crypt</tt> category. Looking in its corresponding directory we notice a number of interesting differences from <tt>diffstat</tt>:
 * there are quite a number of ebuilds (at the time of writing, <tt>nwipe-0.12.ebuild</tt>, <tt>nwipe-0.12-r1.ebuild</tt>, <tt>nwipe-0.13.ebuild</tt> and <tt>nwipe-0.14.ebuild</tt>);
 * there is a <tt>files</tt> subdirectory, containing a single patch (<tt>nwipe-0.12-ncurses.patch</tt>).

Now let's look at the version 0.12-r1 of the <tt>ebuild</tt>:

Most of this should be familiar enough from the <tt>diffstat</tt> example, but there are some new elements too. Specifically:
 * the <tt>inherit</tt> command is used to pull in two useful 'eclasses': eutils (which supplies the <tt>epatch</tt> function discussed shortly) and autotools (which supplies <tt>eautoreconf</tt>);
 * the <tt>SRC_URI</tt> makes use of the <tt>${PN}</tt> variable, which expands out to the package name, without version (a full list of these convenience variables may be found here);
 * the blank <tt>IUSE</tt> definition specifies that there are no ebuild-specific USE flags (see below for a brief introduction to USE flags);
 * the <tt>RDEPEND</tt> variable specifies a set of runtime dependencies, and the <tt>DEPEND</tt> a set of build/install time dependencies, for the package. This is used by Portage to ensure that all prerequisites are also installed, when you ask to <tt>emerge app-crypt/nwipe</tt>;
 * the <tt>DOCS</tt> variable is used to notify the default <tt>src_install</tt> 'hook' function of additional documentation files (in addition to those the <tt>make</tt> itself may copy over, such as manpages) which it should install;
 * the <tt>src_prepare</tt> 'hook' function (which by default is a no-op) is overridden to perform two custom tasks:
 * to patch the source using the <tt>epatch</tt> utility, using a patch file in (the <tt>${P}</tt> expansion excludes revision tags). As described here, <tt>epatch</tt> intelligently attempts to apply patches using different <tt>-p</tt> levels etc.
 * to invoke <tt>autoreconf</tt>, which updates the 'Autotools' <tt>configure</tt> scripts after the patch has been applied.

Overlays
What if you want to modify an <tt>ebuild</tt> yourself, or add a new one? You could of course submit the <tt>ebuild</tt> to Gentoo using Bugzilla, but that only really applies to completed work you want to share. For work in progress, or private ebuilds, a different approach is required. You can't simply insert new entries into the tree, as they'll get overwritten next time you synchronize the Gentoo repository.

Portage supports the concept of overlays to address just this issue. An overlay is an additional repository, similar in layout to the main Portage tree, which Portage (by default, and as the name suggests) 'overlays' on the file structure. To illustrate, suppose you created an directory at, say,, created the subfolders and , then created an ebuild  (and manifest, ), and then set PORTDIR_OVERLAY="/tmp/myoverlay" (in ). Then, when referring to (or installing) <tt>diffstat</tt>, Portage would use your version, rather than the 'official' <tt>ebuild</tt> (however, if you had created an ebuild with a lower version number, say 1.57, then by default Portage would still use the higher numbered version, from the official 'underlay').

Now, while overlays can be created and managed manually in this manner, it is generally easier to use Portage's plug-in sync system to do the job. We'll exploit this ability shortly, when we add the <tt>sakaki-tools</tt> overlay (which will contain a number of useful tools used in this installation walk-through).

<span id="portage_config_files">Portage's Configuration Files
Portage provides you, the user, with a great deal of flexibility. As such, it has many configuration options, specified via a set of files held in the directory (and subdirectories thereof). As our installation process is going to involve using Portage (via the command-line tool <tt>emerge</tt>) to download, then build and install up-to-date versions of all core system software, we first need to set up these configuration files appropriately.

The most important Portage configuration files you'll need to know about now are as follows (this is not complete - see this list for more information, and also the Portage manpage ):

<span id="atoms_and_friends">Atoms, Packages, Categories, Versions, Sets and SLOTs
Finally for this background overview, there are a few Portage <span id="atoms_etc">package mangement terms that are worth a brief recap: For more information on atom naming, see the <tt>ebuild</tt> (5) manpage.
 * As mentioned, a package refers to a homogeneous block of software managed by a single ebuild, whether third-party (e.g., <tt>openvpn</tt>) or internal to Gentoo itself (e.g., <tt>gentoolkit</tt>).
 * Packages are grouped (as leaves of a tree) into categories, which describe broad classes of functionality. For example, <tt>openvpn</tt> is in the <tt>net-misc</tt> category (along with other network tools like <tt>wget</tt> and <tt>rsync</tt>); <tt>gentoolkit</tt> is in the <tt>app-portage</tt> category (along with other Portage applications, like <tt>mirrorselect</tt> and <tt>elogviewer</tt>).
 * A package base atom simply refers to the name made up of the full category, followed by the package, without version information or other qualifiers. So for example, etc. You can find all the ebuilds in the currently sync'd tree for a given / base atom in the directory  (so, for example, ), and find more information about that base atom online at <tt>https://packages.gentoo.org/package/ / </tt> (so, for example, https://packages.gentoo.org/package/app-portage/gentoolkit). While it is often possible to drop the category name and simply use the package itself, it's generally safer to use the base atom, since two different packages of the same name may exist in different categories (e.g. <tt>axiom</tt> could refer to either , an object database over SQLite, or , a computer algebra system).
 * It is generally possible to specify that a specific repository should be used to supply a package, by appending <tt>:: </tt> to its atom. For example, <tt>emerge --ask --verbose dev-util/diffstat::myrepo</tt> would force Portage to install the <tt>diffstat</tt> package from the <tt>myrepo</tt> repository (and would fail if either that overlay was unknown, or if the package was not present in it).
 * Any given package will normally be supported at multiple versions within Portage (one ebuild per version). Not all versions from the upstream tree may be present as ebuilds, only certain selected versions. The online package data referred to above will show what versions are available, on which architectures, and which are marked as 'stable', which are 'testing' (shown with a tilde ('~')), and which are masked (will not be installed by Portage, generally due to known problems or with the ebuild, or incompatibilities with other packages). You can fully qualify an atom by specifying its version as a suffix - generally, you take the base atom, then add a hyphen ('-'), then add a period-separated list of numbers (possibly finishing with a letter, and/or a revision suffix). So, for example, version 2.3.2 of <tt>openvpn</tt> would be written as ; version 1.14 (r1) of <tt>wget</tt> as . Revisions are Gentoo <tt>ebuild</tt> specific, they do not relate to upstream versioning.
 * When specifying atoms to Portage in certain places (such as configuration files, like ), you can either specify base atoms (meaning apply the action to all ebuild versions), or a <span id="qualified_version_atom">qualified version atom . You can qualify a versioned atom with:
 * A <span id="atom_prefix">prefix ('>', '>=', '=', '<=', '<'], to restrict the action to particular versions relative to the stated variant (for example, if you appended "<tt>>=net-misc/openvpn-2.3.1 passwordsave</tt>" to, you'd be telling Portage to apply the <tt>passwordsave</tt> use flag to any version of <tt>openvpn</tt> at or above 2.3.1.
 * A extended prefix: there are a number of these but the most important is '~', which is used to specify any revision of the base version specified. So, for example, <tt>~app-portage/gentoolkit-0.3.0.8</tt> would refer to, , etc.
 * A wildcard suffix ('*'). This can be used to match any version with the same string prefix. So for example, <tt>net-misc/openvpn-2.3*</tt> would match, , etc.
 * A number of atoms may be <span id="about_sets">grouped together into a set, so that operations (e.g. reinstallation) can be easily targeted at the whole group. Sets are special names and are prefixed by '@': some of these are pre-defined in Portage (for example, the <tt>@system</tt> set (containing vital system software packages, the contents of the stage 3 tarball plus other component dictated by your profile), or the dynamically populated <tt>@preserved-rebuild</tt> set (which holds a list of packages using libraries whose sonames have changed (during an upgrade or downgrade) but whose rebuild has not been triggered automatically). The <tt>@world</tt> set refers to all packages you explicitly requested be installed (plus the <tt>@system</tt> set), and is contained in a file . You can even define your own sets if you like.
 * Portage <span id="slot_intro">also allows (subject to certain limitations) different versions of the same package to exist on a machine at the same time: we speak of them being installed in different SLOTs. We won't need to refer to the SLOT technology explicitly in this tutorial, but should you see a versioned atom with a colon ':' followed by some numbers and possibly other characters at the end, that's a SLOT reference. For example, with the library, it is possible to have version 2.24.24 and 3.10.7 installed in parallel, should you desire it (in SLOTs 2 and 3). You might then see a reference to <tt>x11-libs/gtk+:3</tt>, which would refer to any version of <tt>gtk+</tt> in SLOT 3 (which would, for example, cover version 3.4.4 as well).

That's about it for this sidebar on atoms and versioning, apart from one last point: unlike other Linux distributions, you'll see no reference to 'releases' of Gentoo itself - there's nothing similar to Ubuntu's "Quantal Quetzal" or "Trusty Tahr", Debian's "squeeze" or "wheezy", Fedora's "Spherical Cow" or "Heisenbug" etc. That's because, once installed, Gentoo itself is essentially versionless - when you update your system (more on which later), all installed software updates to the latest supported versions (subject to restrictions imposed by the Gentoo developers and you yourself, through settings in, etc.).

The upside of this is that you can get access to the latest and (often) greatest versions of software as soon as new ebuilds get released into the tree. The downside is that (particularly on the 'testing' (rather than the 'stable') branch), that sometimes updates fail to complete successfully, something which is very rare indeed when using binary distributed, release-based distributions such as Ubuntu.

Time to get back to the install!

<span id="configure_compile_opts">Configuring
Our first Portage configuration task is to ensure that the download / unpack / compile / install cycle (aka 'emerging') - which you'll see rather a lot of when running Gentoo - is as <span id="parallel_emerge">efficient as possible. That primarily means taking advantage of whatever parallelism your system can offer.

There are two main dimensions to this - the maximum number of concurrent Portage jobs that will be run at any one time, and the maximum number of parallel threads executed by the <tt>make</tt> process invoked by each ebuild itself.

As has been recommended, we'll set our number of concurrent jobs and parallel make threads to attempt, to be equal to the number of CPUs on the system, plus one. We'll also prevent new jobs or compilations starting when the system load average hits or exceeds the number of CPUs.

The two variables we'll need to set here are EMERGE_DEFAULT_OPTS (for Portage job control) and MAKEOPTS (to pass options on to <tt>make</tt>). These are often defined in the file, but we want to allow the values to be set programmatically. Since Portage doesn't support fancy bash features like command substitution, we'll set and export these variables in root's instead (these will then override any conflicting values in the  or profile, as explained earlier).

Start up your favourite editor: in this tutorial we'll be assuming <tt>nano</tt>:

<tt>nano</tt> is a pretty simple editor to use: move around using the arrow keys, type to edit as you would in any text processing program, and exit with when done: you'll be prompted whether to save changes if you have modified the file. At this point, enter and  to exit, saving changes, or  to exit without making changes. For some more information on the <tt>nano</tt> editor, see this Wiki entry.

Add the following text to the file:

Save and exit the <tt>nano</tt> editor.

Next, we need to make sure that the file is picked up by root's login shell, so copy across the default :

Next, <span id="setup_make_conf">on to the configuration file itself. The stage 3 tarball we extracted already contains a skeleton configuration. We'll open this file with <tt>nano</tt> (feel free to substitute your favourite alternative editor), delete the existing lines (in <tt>nano</tt>, can be used to quickly cut the current line), and enter our alternative configuration instead (see after for a line-by-line explanation). Issue:

Edit the file so it reads:

Save the file and exit <tt>nano</tt>.

Here is a <span id="make_conf_summary">brief summary of the shipped ('stage 3') values are, and what our version achieves:

<span id="next_steps">Next Steps
Now we have these options configured, we're ready to <tt>chroot</tt> into our 'stage 3' environment and start building! Click here to go to the next chapter, "Building the Gentoo Base System Minus Kernel".