LXC

Introduction
Lxc was initially created by IBM, available in the mainline Linux kernel. It uses cgroups and in concept is similar to Solaris Zones and FreeBSD Jails. As the previously named technologies it aims to provide an higher level of segregation than a simple chroot.

Virtualization concepts
This section is a basic overview of how lxc fits in to the virtualization world, the type of approach it uses, and the benefits and limitations thereof. If you are trying to figure out if lxc is for you, or it's your first time setting up virtualization under Linux, then you should at least skim this section.

Roughly speaking there are two types of virtualization in use today, container-based virtualization and full virtualization.

Container-based Virtualization (lxc)
Container based virtualization is very fast and efficient. It's based on the premise that an OS kernel provides different views of the system to different running processes. This sort of segregation or compartmentalisation (sometimes called "thick sandboxing") can be useful for ensuring guaranteed access to hardware resources such as CPU and IO bandwidth, whilst maintaining security and efficiency.

On the unix family of operating systems, it is said that container based virtualization has its roots in the 1982 release of the chroot tool, a filesystem subsystem specific container-based virtualization tool that was written by Sun Microsystems founder Bill Joy and published as part of 4.2BSD.

Since this early tool, which has become a mainstay of the unix world, a large number of unix developers have worked to mature more powerful container based virtualization solutions. Some examples:
 * Solaris Zones
 * FreeBSD Jails
 * Linux VServer
 * OpenVZ

On Linux, historically the major two techniques have been Linux-VServer (open source / community driven) and OpenVZ (a free spinoff of a commercial product).

However, neither of these will be accepted in to the Linux kernel. Instead Linus has opted for a more flexible, longer-term approach to achieving similar goals, using various new kernel features. lxc is the next-generation container-based virtualization solution that uses these new features.

Conceptually, lxc can be seen as a further development of the existing 'chroot' technique with extra dimensions added. Where 'chroot'-ing only offers isolation at the file system level, lxc offers complete logical isolation from a container to the host and all other containers. In fact, installing a new Gentoo container from scratch is pretty much the same as for any normal Gentoo installation.

Some of the most notably differences are:
 * each container will share the kernel with the host (and other containers). No kernel need to be present and/or mounted on the containers /boot directory;
 * devices and filesystem will be (more or less) 'inherited' from the host, and need not be configured as would apply for a normal installation;
 * if the host is using the openrc system for bootstrapping, such configuration items will "automagically" be omitted (i.e. filesystem mounts from fstab).

The last point is important to keep lxc based installation as much as simple and the same as for normal installations (no exceptions).

Full Virtualization (not lxc)
Full virtualization and paravirtualization solutions aim to simulate the underlying hardware. This type of solution, unlike lxc and other container-based solutions, usually allow you to run any operating system. Whilst this may be useful for the purposes of security and server consolidation, it is hugely inefficient compared to container based solutions. The most popular solutions in this area right now are probably VMWare, KVM/qemu and Xen.

Limitations of lxc
With lxc, you can efficiently manage resource allocation in real time. In addition, you should be able to run different Linux distributions on the same host kernel in different containers (though there may be teething issues with startup and shutdown 'run control' (rc) scripts, and these may need to be modified slightly to make some guests work. That said, maintainers of tools such as openrc are increasingly implementing lxc detection to ensure correct behaviour when their code runs within containers.)

Unlike full virtualization solutions, lxc will not let you run other operating systems (such as proprietary operating systems, or other types of unix).

However, in theory there is no reason why you can't install a full or paravirtualization solution on the same kernel as your lxc host system and run both full/paravirtualised guests in addition to lxc guests at the same time.

Should you elect to do this, there are powerful abstracted virtualization management API under development, such as [libvirt] and [ganeti], that you may wish to check out.

In short: ... but can co-exist with other virtualization solutions if required.
 * One kernel
 * One operating system
 * Many instances

MAJOR Temporary Problems with LXC - READ THIS
As documented over here, basically containers are not functional as security containers at present, in that if you have root on a container you have root on the whole box.
 * root in a container has all capabilities
 * Workaround:
 * Do not treat root privileges in the container any more lightly than on the host itself.
 * legacy UID/GID comparisons in many parts of the kernel code are dumb and will not respect containers
 * Workaround:
 * Do not mount parts of external filesystems within a container, except ro (read only).
 * Do not re-use UIDs/GIDs between the container and the host
 * shutdown and halt will run over the host system.
 * Workaround:
 * Restrict/Replace them in the container

Containers are still useful for isolating applications, including their networking interfaces, and applying resource limits and accounting to those applications. As the above issues are resolved, they will also become functional security containers.

If you are designing a virtualisation solution for the long term and want a timeframe, then with appropriate disclaimers, judging from various comments and experience, an extremely rough timeframe might be 'circa end of 2012'. But no guarantees.

See also CAP_SYS_ADMIN: the new root.

lxc Components
lxc uses two new / lesser known kernel features known as 'control groups' and 'POSIX file capabilities'. It also includes 'template scripts' to setup different guest environments.

Control Groups
Control Groups are a multi-hierarchy, multi-subsystem resource management / control framework for the Linux kernel.

In simpler language, what this means is that unlike the old chroot tool which was limited to the file subsystem, control groups let you define a 'group' encompassing one or more processes (eg: sshd, Apache) and then specify a variety of resource control and accounting options for that control group against multiple subsystems, such as:
 * filesystem access
 * general device access
 * memory resources
 * network device resources
 * CPU bandwidth
 * block device IO bandwidth
 * various other aspects of a control group's view of the system

The user-space access to these new kernel features is a kernel-provided filesystem, known as 'cgroup'. It is typically mounted at /cgroup and provides files similar to /proc and /sys representing the running environment and various kernel configuration options.

POSIX File Capabilities
POSIX file capabilities are a way to allocate privileges to a process that allow for more specific security controls than the traditional 'root' vs. 'user' privilege separation on unix family operating systems.

Host Setup
To get an lxc-capable host system working you will need the following components:
 * Kernel with the appropriate LXC related options enabled

Kernel with the appropriate LXC options enabled
If you are unfamiliar with recompiling kernels, see the copious documentation available on that subject in addition to the notes below.

Kernel options required
The complete list of relevant kernel options (tested on 3.2.1-gentoo-r2) is as follows. You can check your running kernel with the lxc-checkconfig script.

Freezer Support
Freezer support allows you to 'freeze' and 'thaw' a running guest, something like 'suspend' under VMWare products. It appears to be under heavy development as of October 2010 (LXC list) but is apparently mostly functional. Please add additional notes on this page if you explore further. CONFIG_CGROUP_FREEZER / "Freeze/thaw support" ('General Setup -> Control Group support -> Freezer cgroup subsystem')

Scheduling Options
Scheduling allows you to specify how much hardware access (CPU bandwidth, block device bandwidth, etc.) control groups have. CONFIG_CGROUP_SCHED / "Cgroup sched" ('General Setup -> Control Group support -> Group CPU scheduler') FAIR_GROUP_SCHED / "Group scheduling for SCHED_OTHER" ('General Setup -> Control Group support -> Group CPU scheduler -> Group scheduling for SCHED_OTHER') CONFIG_BLK_CGROUP / "Block IO controller" ('General Setup -> Control Group support -> Block IO controller') CONFIG_CFQ_GROUP_IOSCHED / "CFQ Group Scheduling support" ('Enable the block layer -> IO Schedulers -> CFQ I/O scheduler -> CFQ Group Scheduling support')

Resource Counters (Memory/Swap Accounting)
Resource counters are an 'accounting' feature - they allow you to measure resource utilisation in your guest. They are also an apparent prerequisite for limiting memory and swap utilisation. CONFIG_RESOURCE_COUNTERS / "Resource counters" ('General Setup -> Control Group support -> Resource counters')

For memory resources... CONFIG_CGROUP_MEM_RES_CTLR / "Cgroup memory controller" ('General Setup -> Control Group support -> Resource counters -> Memory Resource Controller for Control Groups')

If you want to also count swap utilisation, also select... CONFIG_CGROUP_MEM_RES_CTLR_SWAP / "Memory Resource Controller Swap Extension(EXPERIMENTAL)" ('General Setup -> Control Group support -> Resource counters -> Memory Resource Controller for Control Groups -> Memory Resource Controller Swap Extension')

CPU Accounting
This allows you to measure the CPU utilisation of your control groups. CONFIG_CGROUP_CPUACCT / "Cgroup cpu account" ('General Setup -> Control Group support -> Simple CPU accounting cgroup subsystem')

Networking Options
Ethernet bridging, veth, macvlan and vlan (802.1q) support are optional, but you probably want these. CONFIG_BRIDGE / "802.1d Ethernet Bridging" ('Networking support -> Networking options -> 802.1d Ethernet Bridging') CONFIG_VETH / "Veth pair device" CONFIG_MACVLAN / "Macvlan" CONFIG_VLAN_8021Q / "Vlan"

Reconfig Gentoo kernel
You can use the lxc-checkconfig tool to list kernel options that you need to enable in order to make your existing kernel configuration lxc compatible(tested on 3.2.1-gentoo-r2). Process would be something like...

Then copy your kernel to your boot partition, reconfigure your boot loader, and reboot.

lxc userspace utilities
Because lxc is currently very new, it is probably worth making sure that you have the absolute latest version. Therefore, before we begin, you should ensure that your portage tree is up to date with the following command.

Next, figure out which version of lxc is available with:

Now go ahead and install with...

Mounted cgroup filesystem
The 'cgroup' filesystem provides user-space access to the required kernel control group features, and is required by the lxc userspace utilities.

Recent kernels introduced /sys/fs/cgroup as default location.

The openrc has already mounts 'cgroup' filesystem during bootstrap, therefore, there is no need for users to mount it manually.

Networking: Ethernet bridge
You probably want to set up an ethernet bridge. Note that this requires the CONFIG_BRIDGE symbol to be enabled in your kernel.