LXC

From Gentoo Wiki
Jump to: navigation, search
External resources

LXC (Linux Containers) is a virtualization system, making use of the "cgroups" feature. It is conceptually similar to Solaris's Zones and FreeBSD's Jails, so to provide more segregation of a simple chroot without having to incur in the penalties of a full virtualisation solution.

Concepts

Virtualization concepts

This section is a basic overview of how lxc fits in to the virtualization world, the type of approach it uses, and the benefits and limitations thereof. If you are trying to figure out if lxc is for you, or it's your first time setting up virtualization under Linux, then you should at least skim this section.

Roughly speaking there are two types of virtualization in use today, container-based virtualization and full virtualization.

Container-based Virtualization (lxc)

Container based virtualization is very fast and efficient. It's based on the premise that an OS kernel provides different views of the system to different running processes. This sort of segregation or compartmentalisation (sometimes called "thick sandboxing") can be useful for ensuring guaranteed access to hardware resources such as CPU and IO bandwidth, whilst maintaining security and efficiency.

On the unix family of operating systems, it is said that container based virtualization has its roots in the 1982 release of the chroot tool, a filesystem subsystem specific container-based virtualization tool that was written by Sun Microsystems founder Bill Joy and published as part of 4.2BSD.

Since this early tool, which has become a mainstay of the unix world, a large number of unix developers have worked to mature more powerful container based virtualization solutions. Some examples:

  • Solaris Zones
  • FreeBSD Jails
  • Linux VServer
  • OpenVZ

On Linux, historically the major two techniques have been Linux-VServer (open source / community driven) and OpenVZ (a free spinoff of a commercial product).

However, neither of these will be accepted in to the Linux kernel. Instead Linus has opted for a more flexible, longer-term approach to achieving similar goals, using various new kernel features. lxc is the next-generation container-based virtualization solution that uses these new features.

Conceptually, lxc can be seen as a further development of the existing 'chroot' technique with extra dimensions added. Where 'chroot'-ing only offers isolation at the file system level, lxc offers complete logical isolation from a container to the host and all other containers. In fact, installing a new Gentoo container from scratch is pretty much the same as for any normal Gentoo installation.

Some of the most notably differences are:

  • each container will share the kernel with the host (and other containers). No kernel need to be present and/or mounted on the containers /boot directory;
  • devices and filesystem will be (more or less) 'inherited' from the host, and need not be configured as would apply for a normal installation;
  • if the host is using the OpenRC system for bootstrapping, such configuration items will "automagically" be omitted (i.e. filesystem mounts from fstab).

The last point is important to keep lxc based installation as much as simple and the same as for normal installations (no exceptions).

Full Virtualization (not lxc)

Full virtualization and paravirtualization solutions aim to simulate the underlying hardware. This type of solution, unlike lxc and other container-based solutions, usually allow you to run any operating system. Whilst this may be useful for the purposes of security and server consolidation, it is hugely inefficient compared to container based solutions. The most popular solutions in this area right now are probably VMware, KVM, Xen and VirtualBox.

Limitations of lxc

With lxc, you can efficiently manage resource allocation in real time. In addition, you should be able to run different Linux distributions on the same host kernel in different containers (though there may be teething issues with startup and shutdown 'run control' (rc) scripts, and these may need to be modified slightly to make some guests work. That said, maintainers of tools such as OpenRC are increasingly implementing lxc detection to ensure correct behaviour when their code runs within containers.)

Unlike full virtualization solutions, lxc will not let you run other operating systems (such as proprietary operating systems, or other types of unix).

However, in theory there is no reason why you can't install a full or paravirtualization solution on the same kernel as your lxc host system and run both full/paravirtualised guests in addition to lxc guests at the same time.

Should you elect to do this, there are powerful abstracted virtualization management API under development, such as libvirt and ganeti, that you may wish to check out.

In short:

  • One kernel
  • One operating system
  • Many instances

... but can co-exist with other virtualization solutions if required.

MAJOR Temporary Problems with LXC - READ THIS

As documented over here, basically containers are not functional as security containers at present, in that if you have root on a container you have root on the whole box.

  • root in a container has all capabilities
    • Workaround:
      • Do not treat root privileges in the container any more lightly than on the host itself.
  • legacy UID/GID comparisons in many parts of the kernel code are dumb and will not respect containers
    • Workaround:
      • Do not mount parts of external filesystems within a container, except ro (read only).
      • Do not re-use UIDs/GIDs between the container and the host
  • shutdown and halt will run over the host system.
    • Workaround:
      • Restrict/Replace them in the container

Containers are still useful for isolating applications, including their networking interfaces, and applying resource limits and accounting to those applications. As the above issues are resolved, they will also become functional security containers.

If you are designing a virtualisation solution for the long term and want a timeframe, then with appropriate disclaimers, judging from various comments and experience, an extremely rough timeframe might be 'circa end of 2012'. But no guarantees.

See also CAP_SYS_ADMIN: the new root.

lxc Components

lxc uses two new / lesser known kernel features known as 'control groups' and 'POSIX file capabilities'. It also includes 'template scripts' to setup different guest environments.

Control Groups

Control Groups are a multi-hierarchy, multi-subsystem resource management / control framework for the Linux kernel.

In simpler language, what this means is that unlike the old chroot tool which was limited to the file subsystem, control groups let you define a 'group' encompassing one or more processes (eg: sshd, Apache) and then specify a variety of resource control and accounting options for that control group against multiple subsystems, such as:

  • filesystem access
  • general device access
  • memory resources
  • network device resources
  • CPU bandwidth
  • block device IO bandwidth
  • various other aspects of a control group's view of the system

The user-space access to these new kernel features is a kernel-provided filesystem, known as 'cgroup'. It is typically mounted at /cgroup and provides files similar to /proc and /sys representing the running environment and various kernel configuration options.

POSIX File Capabilities

POSIX file capabilities are a way to allocate privileges to a process that allow for more specific security controls than the traditional 'root' vs. 'user' privilege separation on unix family operating systems.

Host Setup

To get an lxc-capable host system working you will need the following steps.

Kernel with the appropriate LXC options enabled

If you are unfamiliar with recompiling kernels, see the copious documentation available on that subject in addition to the notes below.

Kernel options required

The app-emulation/lxc ebuild will check for the most important options for the kernel that are required to set up a LXC host. This is, though, not a fatal check which means you have to make sure to have the options correctly enabled manually. The package also comes with an upstream-provided lxc-checkconfig script that should report on the proper options.

root # cd /usr/src/linux


root #
lxc-checkconfig

Note
Write down missing kernel options in output, or leave it open and switch to a new terminal; if lxc-checkconfig complaints about missing "File capabilities" ignore it; the feature is now enabled by default and the setting has been removed.

root #
make menuconfig
Note
Search for each CONFIG_SOMETHING using '/SOMETHING', enable them one by one, quit & save.

root #
make && make install modules_install
General Options
Kernel configuration

General setup  --->
 [*] Control Group support  --->
  [*]   Freezer cgroup subsystem 
  [*]   Device controller for cgroups
  [*]   Cpuset support
  [*]     Include legacy /proc/<pid>/cpuset file
  [*]   Simple CPU accounting cgroup subsystem 
  [*]   Resource counters 
  [*]     Memory Resource Controller for Control Groups
  [*]       Memory Resource Controller Swap Extension
  [*]         Memory Resource Controller Swap Extension enabled by default
  [*]   Enable perf_event per-cpu per-container group (cgroup) monitoring
  [*]   Group CPU scheduler  --->
   [*]   Group scheduling for SCHED_OTHER 
   [*]   Group scheduling for SCHED_RR/FIFO   
  <*>   Block IO controller
 -*- Namespaces support
  [*]   UTS namespace
  [*]   IPC namespace
  [*]   User namespace (EXPERIMENTAL)
  [*]   PID Namespaces
  [*]   Network namespace 
[*] Networking support  --->
      Networking options  --->
      <M> 802.1d Ethernet Bridging
      <M> 802.1Q VLAN Support 
Device Drivers  --->
      [*] Network device support  --->
       <M>   MAC-VLAN support (EXPERIMENTAL)
       <M>   Virtual ethernet pair device
      Character devices  --->
       -*- Unix98 PTY support
       [*]   Support multiple instances of devpts
Kernel configurationnamespaces

 General Setup 
-> Namespaces support 
CONFIG_NAMESPACES / "Namespaces" ('General Setup -> Namespaces support')
CONFIG_UTS_NS / "Utsname namespace" ('General Setup -> Namespaces Support / UTS namespace')
CONFIG_IPC_NS / "Ipc namespace" ('General Setup -> Namespaces Support / IPC namesapce')
CONFIG_USER_NS / "User namespace" ('General Setup -> Namespaces Support / User namespace (EXPERIMENTAL)')
CONFIG_PID_NS / "Pid namespace" ('General Setup -> Namespaces Support / PID Namespaces')
CONFIG_NET_NS / "Network namespace" ('General Setup -> Namespaces Support -> Network namespace')
  Device Drivers 
-> Character devices 
-> Unix98 PTY support -> ...
CONFIG_DEVPTS_MULTIPLE_INSTANCES / "Multiple /dev/pts instances" ('Device Drivers -> Character devices -> Unix98 PTY support -> Support multiple instances of devpts')
Kernel configurationcontrol groups

#  -> General Setup -> Control Group support -> ...
CONFIG_CGROUPS / "Cgroup" ('General Setup -> Control Group support')
CONFIG_CGROUP_DEVICE / "Cgroup device" ('General Setup -> Control Group support -> Device controller for cgroups') 
CONFIG_CPUSETS / "Cgroup cpuset"
Freezer Support

Freezer support allows you to 'freeze' and 'thaw' a running guest, something like 'suspend' under VMware products. It appears to be under heavy development as of October 2010 (LXC list) but is apparently mostly functional. Please add additional notes on this page if you explore further.

CONFIG_CGROUP_FREEZER / "Freeze/thaw support" ('General Setup -> Control Group support -> Freezer cgroup subsystem')
Scheduling Options

Scheduling allows you to specify how much hardware access (CPU bandwidth, block device bandwidth, etc.) control groups have.

CONFIG_CGROUP_SCHED / "Cgroup sched" ('General Setup -> Control Group support -> Group CPU scheduler')
FAIR_GROUP_SCHED / "Group scheduling for SCHED_OTHER" ('General Setup -> Control Group support -> Group CPU scheduler -> Group scheduling for SCHED_OTHER')
CONFIG_BLK_CGROUP / "Block IO controller" ('General Setup -> Control Group support -> Block IO controller')
CONFIG_CFQ_GROUP_IOSCHED / "CFQ Group Scheduling support" ('Enable the block layer -> IO Schedulers -> CFQ I/O scheduler -> CFQ Group Scheduling support')
Resource Counters (Memory/Swap Accounting)

Resource counters are an 'accounting' feature - they allow you to measure resource utilisation in your guest. They are also an apparent prerequisite for limiting memory and swap utilisation.

CONFIG_RESOURCE_COUNTERS / "Resource counters" ('General Setup -> Control Group support -> Resource counters')

For memory resources...

CONFIG_CGROUP_MEM_RES_CTLR / "Cgroup memory controller" ('General Setup -> Control Group support -> Resource counters -> Memory Resource Controller for Control Groups')

If you want to also count swap utilisation, also select...

CONFIG_CGROUP_MEM_RES_CTLR_SWAP / "Memory Resource Controller Swap Extension(EXPERIMENTAL)" ('General Setup -> Control Group support -> Resource counters -> Memory Resource Controller for Control Groups -> Memory Resource Controller Swap Extension')
CPU Accounting

This allows you to measure the CPU utilisation of your control groups.

CONFIG_CGROUP_CPUACCT / "Cgroup cpu account" ('General Setup -> Control Group support -> Simple CPU accounting cgroup subsystem')
Networking Options

Ethernet bridging, veth, macvlan and vlan (802.1q) support are optional, but you probably want at least one of these:

CONFIG_BRIDGE / "802.1d Ethernet Bridging" ('Networking support -> Networking options -> 802.1d Ethernet Bridging')
CONFIG_VETH / "Veth pair device"
CONFIG_MACVLAN / "Macvlan"
CONFIG_VLAN_8021Q / "Vlan"

Further details about LXC networking options are available on Flameeyes's Weblog

lxc userspace utilities

Due to LXC's still unstable nature, Gentoo provides ebuilds only for the most recent version available, therefore make sure to update your Portage tree before proceeding:

root # emerge --sync
root # emerge --ask app-emulation/lxc

Mounted cgroup filesystem

The 'cgroup' filesystem provides user-space access to the required kernel control group features, and is required by the lxc userspace utilities. Up to kernel 3.1, the filesystem's mountpoint wasn't well defined; nowadays it's defined to be mounted (split) within /sys/fs/cgroup. Recent OpenRC versions already mount it during boot, and the app-emulation/lxc ebuild already depends on a new enough version.

You can check it using:

root # mount | grep cgroup
cgroup_root on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime,size=10240k,mode=755)
openrc on /sys/fs/cgroup/openrc type cgroup (rw,nosuid,nodev,noexec,relatime,release_agent=/lib64/rc/sh/cgroup-release-agent.sh,name=openrc)
cpuset on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cpu on /sys/fs/cgroup/cpu type cgroup (rw,nosuid,nodev,noexec,relatime,cpu)
cpuacct on /sys/fs/cgroup/cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct)
memory on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
devices on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
freezer on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)

Network configuration

There are three main options for giving network access to containers:

  • Network bridges (the simplest solution in most cases);
  • isolated network;
  • MACVLAN.

Guest configuration for network bridges

Your guest network configuration resides in the guest's lxc.conf file.  Documentation for this file is accessible with:

man lxc.conf

If you have used a template script to create your guest, this will typically reside in the parent directory of the guest's root filesystem.  However, using /etc/lxc/ to store guest configurations is also common.

Your guest configuration should include the following network-related lines:

File/etc/lxc/guest.conf

lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = br0
#lxc.network.ipv4 = 192.168.0.88
#lxc.network.hwaddr = b6:65:81:93:cb:a0
Note
If you are not using dhcp inside the container to get an IP address, then just delete the 'lxc.network.hwaddr' line, and manually specify the IP you want to use next to lxc.network.ipv4.

If, like me, you are using dhcp inside the container to get an IP address, then run it once as shown. LXC will generate a random MAC address for the interface. To keep your DHCP server from getting confused, you will want to use that MAC address all the time. So find out what it is, and then uncomment the 'lxc.network.hwaddr' line and specify it there.

Guest Setup

Template scripts

A number of 'template scripts' are distributed with the lxc package. These scripts assist with generating various guest environments.

Template scripts live in /usr/share/lxc/templates but should be executed via the lxc-create tool as follows:

root # lxc-create -n guestname -t template-name -f configuration-file-path

The rootfs of linux container is stored in /etc/lxc/guestname/

Configuration files (the -f configuration-file option) are usually used to specify the network settings for the initial guest configuration. For example:

lxc.network.type=veth
lxc.network.link=br0
lxc.network.flags=up

Gentoo

Automatic setup: lxc-gentoo

The lxc-gentoo tool can download, extract and configure a gentoo guest for you. It fixes a lot of little issues that you may otherwise find tedious and are not yet outlined in the manual guest configuration section, below.

You can download it here: lxc-gentoo page

Additional developers, bug fixes, comments, etc. are welcome.

Manual Guest Configuration

LXC allows a configuration file for each guest container, specifying name, IP address, etc. yu can use lxc-gentoo script to create a gentoo guest:

root # /path/of/lxc-gentoo create -i 192.168.0.6/24 -g 192.168.0.1 -n <guestname> -u <utsname>

After creating the guest you can manage it as usual.

Option: Shared /usr/portage/distfiles

If you want to share distfiles from your host, you can set the PORTAGE_RO_DISTDIRS variable to a space-separated list of directories to search. Portage will create a symlink in DISTDIR to the first matching file found in PORTAGE_RO_DISTDIRS if the file does not already exist in DISTDIR.

In the latest lxc-gentoo script (comment=269ea1735503cd932421d7c63d729f849279690d), the fstab is hardcoded. Therefore, if you want to mount the shared distfiles, you should add lxc.mount option to utsname.conf file. You can get more information from flameeyes's blog.

Other distributions

Alt Linux

Fixme: this template script cannot be executed in gentoo linux directly, because it contains "apt-get" command when downloading Alt linux guest.

Arch Linux

lxc-archlinux template assists with setting up Archlinux guests (see Archlinux Chroot in Gentoo). Note that in order to use lxc-archlinux, you must:

root # emerge sys-apps/pacman

Fixme: It seems the pacman-4.0.1 cannot work correctly in gentoo linux

root # pacman
error: failed to initialize alpm library (Could not find or read directory)

Fixme: The archlinux template does not create a working container (app-emulation/lxc-0.8.0_rc2-r1), giving an error on not being able to find /sbin/init (the file /usr/lib/systemd/systemd does not exists). Chrooting into the linux container (the rootfs directory) and issuing:

root # pacman -S systemd systemd-sysvcompat initscripts

solves this issue. Also, you need CONFIG_DEVTMPFS activated in the kernel configuration if you configure the container as stated in the archlinux wiki

Note
Further steps are needed to set-up a working archlinux container in gentoo.

Busybox

lxc contains a minimal template script for busybox. Busybox is basically a base system oriented towards embedded use, where many base utilities exist in an optimized form within one stripped binary to save on memory. Busybox is installed as part of the base gentoo system, so the script works right away. Example:

root # lxc-create -t busybox -n guest-name -f config-file

Debian

You will need to install dev-util/debootstrap package:

root # emerge dev-util/debootstrap

You can then use the lxc supplied debian template script to download all required files, generate a configuration file and a root filesystem for your guest.

root # lxc-create -t debian -n guest-name -f configuration-file

Fedora

lxc-fedora template assists with setting up Fedora guests. Note that in order to use lxc-fedora, you must:

root # emerge sys-apps/yum

You will also need to install febootstrap tool from http://people.redhat.com/~rjones/febootstrap/. An ebuild has been created but is not yet in portage: bug #309805

Opensuse

lxc-opensuse template assists with setting up opensuse linux guest. Fixme: lack of zypper command line package manager tool in gentoo portage.

sshd

lxc contains a minimal template script for sshd guests. You can create the sshd guest through:

root # lxc-create -t sshd -n guest-name -f configuration-file

Ubuntu

lxc contains a minimal template script for Ubuntu guests (see ubuntu.com). Note that in order to use lxc-ubuntu, you must:

root # emerge dev-util/debootstrap

Usage is as follows...

root # lxc-create -t ubuntu -n ubuntu-guest -f network-configuration-file

Or, in versions < app-emulation/lxc-0.8.0_rc2-r1

root # lxc-create -t ubuntu -n ubuntu-guest -f network-configuration-file -- -r existing system user

This will create the folder ubuntu-guest. Inside the folder, there will be a file called config. It takes a very long time to create a ubuntu guest, please be patient. To start the container, issue the following command:

root # lxc-start -n ubuntu-guest -f ubuntu-guest/config

You should use the username and password of the existing system user used when creating the container. To set root password, enter the directory ubuntu-guest and you will see the directory rootfs. Issue:

root # chroot rootfs /bin/bash

Set the password with the command:

root # passwd

Using the Guest Container

Manual use

To start and stop the guest, simply run:

root # lxc-start -n guestname
root #
lxc-stop -n guestname

Please be aware, that when you have daemonized the booting process (-d), you will not get any output on screen. This might happen when you conveniently use an alias wich daemonizes by default, and forgot about it. You may get puzzled later by this if there is problem while booting a new the container that has not been configured properly (e.g. network).

Use from gentoo init system

Gentoo's ebuild (without the vanilla USE flag enabled), provides an init script to manage containers and start them at boot time. To make use of the init script you just have to create a symlink in the /etc/init.d/ folder:

root # ln -s lxc /etc/init.d/lxc.guestname
root #
/etc/init.d/lxc.guestname stop
root #
/etc/init.d/lxc.guestname start

Of course, the use of such scripts is primarily intended for booting and stopping the system. To make a guest to the rc chain, run:

root # rc-update add lxc.guestname default

To enter a (already started) guest directly from the host machine, see the lxc-console section below.

Use from gentoo systemd

Currently there is no script provided to start a LXC container using systemd. Add the following content to a file name lxc@.service in /etc/systemd/system

[Unit]
Description=Linux Container %I
After=network.target

[Service]
Type=simple
Restart=always
ExecStart=/usr/sbin/lxc-start -n %i
ExecReload=/usr/sbin/lxc-restart -n %i
ExecStop=/usr/sbin/lxc-stop -n %i

[Install]
WantedBy=multi-user.target

To start the system in the container, call

root # systemctl start lxc@guestname.service

and

root # systemctl stort lxc@guestname.service

to stop it again.

To start it automatically at (host) system boot up, use

root # systemctl enable lxc@guestname.service

Accessing the guest

lxc-console

Using lxc-console provides console access to the guest. To use type:

root # lxc-console -n guestname

If you get a message saying lxc-console: console denied by guestname, then you need to add to your container config:

lxc.tty = 1

To exit the console, use:

root # Ctrl+a q

Note that unless you log out inside the guest, you will remain logged on, so the next time you run lxc-console, you will return to the same session.

Usage of lxc-console should be restricted to root. It should be primarily a tool for system administrators (root) to enter a (newly) container after it is first created, e.g. when the network connection is not properly configured yet. Using multiple instances of lxc-console on distinct guests works fine, but starting a second instance for a guest that is already governed by another lxc-console session, leads to redirection of keyboard input and terminal output. Best is to avoid use of lxc-console at all. (Perhaps lxc developers should enhance the tool in such way that is only possible for singleton use per guest. ;-)

Accessing the container with sshd

A common technique to allow users direct access into a system container is to run a separate sshd inside the container. Users then connect to that sshd directly. In this way, you can treat the container just like you treat a full virtual machine where you grant external access. If you give the container a routable address, then users can reach it without using ssh tunneling.

If you set up the container with a virtual ethernet interface connected to a bridge on the host, then it can have its own ethernet address on the LAN, and you should be able to connect directly to it without logically involving the host (the host will transparently relay all traffic destined for the container, without the need for any special considerations). You should be able to simply 'ssh <container_ip>'.

Note
The above comments of Hu and BoneKracker have been taken from the Gentoo Forums.

Filesystem layout

Some of the lxc tools apparently assume that /etc/lxc/<guestname>/ exists. However, you should keep the guests' root filesystems out of /etc since it's not a path that's supposed to store large volumes of binary data.

The templates of lxc will use the following locations::

  • /etc/lxc/<guestname>/config = guest configuration file
  • /etc/lxc/<guestname>/fstab = optional guest fstab file
  • /var/lxc/<guestname>/rootfs = root filesystem image
  • /var/log/lxc/<guestname> = lxc-start logfile