LVM

LVM (Logical Volume Manager) is a software which uses physical devices abstract as PVs (Physical Volumes) in storage pools called VG (Volume Group). Whereas physical volumes could be a partition, whole SATA hard drives grouped as JBOD (Just a Bunch Of Disks), RAID systems, iSCSI, Fibre Channel, eSATA etc.

Kernel
You need to activate the following kernel options:

Software
Install :

Configuration
The configuration file is

Boot service
You can now start LVM:

To start LVM at boot time, add it your boot runlevel:

Usage
LVM organizes storage in three different levels as follows:
 * hard drives, partitions, RAID systems or other means of storage are initialized as PV (Physical Volume)
 * Physical Volumes (PV) are grouped together in Volume Groups (VG)
 * Logical Volumes (LV) are managed in Volume Groups (VG)

PV (Physical Volume)
Physical Volumes are the actual hardware or storage system LVM builds up upon.

Partitioning
The partition type for LVM is 8e (Linux LVM):

In fdisk, you can create MBR partitions using the n key and then change the partition type with the t key to 8e. We will end up with one primary partition /dev/sdX1 of partition type 8e (Linux LVM).

Create PV
The following command creates a Physical Volume (PV) on the two first primary partitions of /dev/sdX and /dev/sdY:

List PV
The folloing command lists all active Physical Volumes (PV) in the system:

You can scan for PV in the system, to troubleshoot not properly initialized or lost storage devices:

Remove PV
LVM automatically distributed the data onto all available PV, if not told otherwise. To make sure there is no data left on our device before we remove it, use the following command:

This might take a long time and once finished, there should be no data left on /dev/sdX1. We first remove the PV from our Volume Group (VG) and then the actual PV:

VG (Volume Group)
Volume Groups (VG) consist of one or more Physical Volumes (PV) and show up as /dev// in the device file system.

Create VG
The following command creates a Volume Group (VG) named vg0 on two previously initialized Physical Volumes (PV) named /dev/sdX1 and /dev/sdY1:

List VG
The folloing command lists all active Volume Groups (VG) in the system:

You can scan for VG in the system, to troubleshoot not properly created or lost VGs:

Extend VG
With the following command, we extend the exisiting Volume Group (VG) vg0 onto the Physical Volume (PV) /dev/sdZ1:

Reduce VG
Before we can remove a Physical Volume (PV), we need to make sure that LVM has no data left on the device. To move all data off that PV and distribute it onto the other available, use the following command:

This might take a while and once finished, we can remove the PV from our VG:

Remove VG
Before we can remove a Volume Group (VG), we have to remove all existing Snapshots, all Logical Volumes (LV) and all Physical Volumes (PV) but one. The following command removes the VG named vg0:

LV (Logical Volume)
Logical Volumes (LV) are created and managed in Volume Groups (VG), once created they show up as /dev// and can be used like normal partitions.

Create LV
With the following command, we create a Logical Volume (LV) named lvol1 in Volume Group (VG) vg0 with a size of 150MB:

There are other useful options to set the size of a new LV like:
 * -l 100%FREE = maximum size of the LV within the VG
 * -l 50%VG = 50% size of the whole VG

List LV
The folloing command lists all Logical Volumes (LV) in the system:

You can scan for LV in the system, to troubleshoot not properly created or lost LVs:

Extend LV
With the following command, we can extend the Logical Volume (LV) named lvol1 in Volume Group (VG) vg0 to 500MB:

Once the LV is extended, we need to grow the file system as well (in this example we used ext4 and the LV is mounted to /mnt/data):

Reduce LV
Before we can reduce the size of our Logical Volume (LV) without corrupting existing data, we have to shrink the file system on it. In this example we used ext4, the LV needs to be unmounted to shrink the file system:

Now we are ready to reduce the size of our LV:

LV Permissions
Logical Volumes (LV) can be set to be read only storage devices.

The LV needs to be remounted for the changes to take affect:

To set the LV to be read/write again:

Remove LV
Before we remove a Logical Volume (LV) we should unmount and deactivate, so no further write activity can take place:

The following command removes the LV named lvol1 from VG named vg0:

Thin metadata, pool, and LV
Recent versin of LVM2 (2.02.89) support "thin" volumes. Thin volumes are to block devices what sparse files are to filesystems. Thus, a thin LV within a pool can be "overcommitted" - it can even be larger than the pool itself. Just like a sparse file, the "holes" are filled as the block device gets populated. If the filesystem has "discard" support, as fiels are deleted, the "holes" can be recreated, reducing utilization of the thin pool.

Create thin pool
Each thin pool has some metadata associated with it, which is added to the thin pool size. You can specify it explicitly, otheriwse lvm2 will compute one based on the size of the thin pool as the minimum of pool_chunks * 64 bytes or 2MiB, whichever is larger.

This create a thin pool named "thin_pool" with a size of 150MB (actually, it slightly bigger than 150MB because of the metadata).

This create a thin pool named "thin_pool" with a size of 150MB and an explicit metadata size of 2MiB.

Unfortunately, because the metasize is added to thin pool size, the intuitive way of filling a VG wit ha thin pool doesn't work:

Note the thin pool does not have an associated device node like other LV's.

Create a thin LV
A Thin LV is somewhat unusual in LVM - the thin pool itself is an LV, so a thin LV is a "LV-within-an-LV". Since the volumes are sparse, a virtual size instead of a phyical size is specified:

Note how the LV is larger then the pool it is create in. Its also possible to create the thin metadata, pool and LV on the same command:

List thin pool and thin LV
Thin LV are just like any other lv are are displayed using the lvdisplay and scanned using lvscan

Extend thin pool
The thin pool is expanded like a non-thin LV:

or

Extend thin LV
A Thin LV is expanded just like a regular LV:

or

Note this is asymetric from create where the virtual size was specified with -V isntead of -L/-l. The filesystem can then be expanded using that filesystem's tools.

Reduce thin pool
Currently, LVM cannot reduce the size of the thin pool.

Reduce thin LV
Before shrinking an LV, shrink the filesystem first using that filesystem's tools. Some filesystems do not support shrinking. A Thin LV is reduced just like a regular LV:

or

Note this is asymetric from create where the virtual size was specified with -V isntead of -L/-l.

Thin pool Permissions
It is not possible to change the permission on the thin pool (nor would it make any sense to).

Thin LV Permissions
A thin LV can be set read-only/read-write the same waya regular LV is

Thin pool Removal
The thin pool cannot be removed until all the thin LV within it are removed. Once that is done, it can be removed:

Thin LV Removal
A thin is removed like a regular LV

Examples
We can create some scenarios using loopback devices, so no real storage devices are used.

Preparation
First we need to make sure the loopback module is loaded. If you want to play around with partitions, use the following option:

Now we need to either tell LVM to not use udev to scan for devices or change the filters in /etc/lvm/lvm.conf. In this case we just temporarely do not use udev:

We create some image files, that will become our storage devices (uses ~10GB of real hard drive space):

Check which loopback devices are available:

We assume all loopback devices are available and create our hard drives:

Now we can use /dev/loop[0-4] as we would use any other hard drive in the system.

LVM2 linear volumes
In this example, we will initialize two hard drive as PV and then create the VG vg0:

Now lets create the LV lvol1 in our VG vg0 and take the maximum space available:

Create the file system and mount it to /mnt/data:

Now we have the capacity of 2GB from each hard drive available in /mnt/data as one 4GB device.

/etc/fstab
Here is an example of an entry in fstab (using ext4):

For thin volumes, add the discard option:

LVM2 Snapshots and LVM2 Thin Snapshots
A snapshot is an LV as copy of another LV, which takes in all the changes that were made in the original LV to show the content of that LV in a different state. We once again use our two hard drives and create LV lvol1 this time with 60% of VG vg0:

LVM2 Snapshots
Now we create a snapshot of lvol1 named 08092011_lvol1 and give it 10% of VG vg0:

Mount our snapshot somewhere else:

We could now access data in lvol1 from a previous state.

LVM2 snapshots are writeable LV, we could use them to let a project go on into two different directions:

Now we have three different versions of LV lvol1, the original and two snapshots which can be used parallel and changes are written to the snapshots.

LVM2 Thin Snapshots
Creating a thin snapshot is simple:

Note how a size is not specified with -l/-L - nor the virtual size with -V. Snapshots have a virtual size the same as their origin, and a phyical size of 0 like all new thin volumes. This also means its not possible to limit the phyical size of the snapshot. Thin snapshots are writable just like regualr snapshot.

Recursive snapshpots can be created:

Thin snapshots have several advantages over regualr snapshots. First, thin snapshots are independent of their origins once created. The origin can be shrunk or deleted without affecting the snapshot. Second, thin snapshots can be efficently created recursively (snapshots of snapshots) without the "chaining" overhead of regualar recursive LVM snapshots.

LVM2 Rollback Snapshots
To rollback the logical volume to the version of the snapshot, use the following command:

This might take a couple of minutes, depending on the size of the volume.

LVM2 Thin Rollback Snapshots
For thin volumes, lvconvert --merge does not work. Instead, delete the origin and rename the snapshot:

LVM2 Mirrors
LVM support mirrored volume, which provide fault tolerance in the event of drive failure. Unlike RAID1, there is no performace benefit - all reads and writes are delivered to a single "leg" of the mirror. 1 additional PV is required for each mirror.

Mirrors support 3 kind of logs:


 * Disk mirror logs the state of the mirror on the disk in extra metadata extents. LVM keeps track of what mirrored and can pick up where it left off if incomplete. This is the default.
 * Mirror logs are disk logs that are themselves mirrored.
 * Core mirror logs record the state of the mirror in memory only. LVM will have to rebuild the mirror every time it is activated. Useful for temorary mirrors.

Creating an mirror LV
To create an LV with a single mirror:

The -m 1 indicate we want to create 1 (additonal) mirror, requiring 2 PV's. The --nosync</tt> option is an optimization - without it LVM will try synchronize the mirror by copying empty sectors from one LV to another.

Creatting a mirror of an existing LV
It is possible to create a mirror of an existing LV:

The mirrors an existing LV onto a different PV. The -b option puts the operation into the background, as mirroring an LV takes a long time.

Removing a mirror of an existing LV
To remove mirror, set the number of mirrors to 0:

Failed mirrors
To simulate a failure:

If part of the mirror is unavailable (usually because the disk containing the PV has failed), the VG will need to be brought up in degraded mode:

On the first write, LVM will notice the mirror is broken. The default policy ("remove") is to automatically reduce/break the mirror according to the number of pieces avaiable. A 3-way mirror with a missing PV will be reduced to 2-way mirror; a 2-way mirror will be reduced to a regular linear volume.

To recover the mirror, we need to kick out the failed PV from the VG, add a new PV, extend the VG to the new PV, then use lvcovert</tt> to recreate the mirror:

It is possible to have LVM recreate the mirror with free extents on a different PV if a "leg" fails, to do that, set mirror_image_fault_policy to "allocate" in lvm.conf.

Thin mirrors
It is not (yet) possible to create or add a mirror to thin pool or its volumes. It is possible to create a mirrored thin pool my creating a normal mirrored LV and then converting the LV it to a thin pool with lvconvert</tt>. 2 LV are required: One for the thin pool and one for the thin metadata, the conversion process will merge them into a single LV.

LVM2 RAID 0/Stripeset
Instead of a linear volume, wher multiple contiguous volumes are appended, it possible to create a striped or RAID 0 volume for better performance.

To create a 3-PV striped volume:

The -i option indicated how many PVs to stripe over, in this case, 3.

On each PV 400MB got reserved for LV lvm_stripe in VG vg00

It is not possible to stripe an existing non-striped volume, nor add/remove "reshape" additional PVs to the stripe set, nor to covert to RAID5 or RAID6,

It is possible to mirror a stripe set. The -i and -m options can be combined to create a striped mirror:

This creates a 2 PV stripe set and mirrors it on 2 different PVs, for a total of 4 PVs. An existing stripe set can be mirrored with lvconvert</tt>.

A thin pool can be striped like any other LV. All the thin volumes created from the pool inherit that settings - do not specify it manually when creating a thin volume.

LVM2 Stripeset with Parity (RAID4 and RAID5)
RAID 0 is not fault-tolerant - if any of the PVs fail the LV is unusable. By adding a parity stripe to RAID 0 the LV can still function will a single missing PV (performance may be degraded though). A new PV can then be added to restore fault tolerance.

Stripsets with parity come in 2 flavors: RAID 4 and RAID 5. Under RAID 4.all the parity stripes are stored on the same LV. The PV containg the LV can become a bottleneck because all writes hit that PV, and gets worse the more PVs in the array. With RAID 5, the parity data is distrubted evenly across the LVs and no PV is a bottleneck. For that reason, RAID 4 is rare is considered obsolete/historical and in practice all stripesets iwth parity are RAID 5.

Creating a RAID5 LV
Like the RAID0/Stripe without parity, the -i option is used to specify the number of PVs stripe. However, only the data PV are specified with -i - LVM adds the parity one automatically. Thus for a 3 PV RAID5, its -i 2 and not -i 3.

On each PV about 600MB got reserved for LV lvm_raid5 in VG vg00

Recovering from a failed RAID5
To simulate a failure:

The VG will need to be brought up in degraded mode

The volume will work normally at this point, however this degraded the array to RAID 0 until a replacement PV is added.

To repair the RAID5:

Its possible to replace a still working PV in RAID5 as well

The same restriction on reshaping stripe sets also apply to stripe sets with parity: Existintg LVs cannot be coverted to stripe sets with parity, RAID5 LVs cannot be reshaped to add/remove additional PV, and cannot be upgraded to RAID6 or downgraded to RAID0. RAID5 volumes can be mirrored, however.

Thin RAID5 LV
It is not (yet) possible to create or extend a strip set with parity (RAID5) to thin pool or its volumes. It is possible to create a RAID5 thin pool by creating a normal RAID5 LV and then converting the LV into a thin pool with lvconvert</tt>.

LVM2 RAID 6
RAID 6 is similar to RAID 5, however RAID 6 can survive up to TWO PV failures, thus offering more fault tolerance than RAID5 at the expense of an extra PV.

Creating a RAID6 LV
Like raid5, the -i option is used to specify the number of PVs stripe, excluding the 2 PV's for parity. Thus for 5 PV RAID6, its -i 3 and not -i 5.

On each PV about 680MB got reserved for LV lvm_raid6 in VG vg00

Recovering from a failed RAID6
Recovery for RAID6 is the same as RAID5. A RAID6 LV with a single failure reduces to RAID5. A RAID6 LV with 2 failures reduces to RAID0. It is left as an excercise to the reader to simulate a 2 PV failure.

RAID6 stripe sets share the same restriction of all stripe sets: It is not possible to stripe an existing volume, convert to a different stripe type (RAID5 or RAID0), or reshape an existing LV onto more or less PV's.

Thin RAID6 LV
It is not (yet) possible to create or extend a RAID6 LV to thin pool or its volumes. It is possible to create a RAID6 thin pool by creating a normal RAID6 LV and then converting the LV into a thin pool with lvconvert</tt>.

Troubleshooting
LVM has only MIRROR and snapshots to provide some level of redundancy. However there are certain situations where one might be able to restore lost PV or LV.

vgcfgrestore
In /etc/lvm/archive and /etc/lvm/backup are files which contain logs about metadata changes in LVM. To see what states of the VG are available to be restored:

In this example we removed the LV lvol1 by accident and want it back in our VG vg0:

Replace PV
We want to replace a PV and then restore the metadata to a new one, so that we reach the same state as before the device stopped working. To display all PV in a VG (even lost ones) use the following command:

In this example I let /dev/loop1 (unknown device) fail:

Using the UUID, we can tell LVM to restore new hardware and be implemented within the VG as the old one was.

Then we restore the VG to the state before the PV failed:

Now you can replay your file backup if you haven't already restored the PV itself.

Deactivate LV
You can deactivat a LV with the following command:

You will not be able to mount the LV anywhere before it got reactivated:

External resources

 * LVM2 sourceware.org
 * LVM tldp.org
 * LVM2 Wiki redhat.com