LVM

From Gentoo Wiki
Revision as of 07:04, 10 September 2013 by Salahx (Talk | contribs)

Jump to: navigation, search
External resources

LVM (Logical Volume Manager) is a software which uses physical devices abstract as PVs (Physical Volumes) in storage pools called VG (Volume Group). Whereas physical volumes could be a partition, whole SATA hard drives grouped as JBOD (Just a Bunch Of Disks), RAID systems, iSCSI, Fibre Channel, eSATA etc.

Contents

Installation

Kernel

You need to activate the following kernel options:

Kernel configuration

Device Drivers  --->
   Multiple devices driver support (RAID and LVM)  --->
       <*> Device mapper support
           <*> Crypt target support
           <*> Snapshot target
           <*> Mirror target
       <*> Multipath target
           <*> I/O Path Selector based on the number of in-flight I/Os
           <*> I/O Path Selector based on the service time
Note
You probably don't need everything enabled, but some of the options are needed for #LVM2_Snapshots, #LVM2_MIRROR, #LVM2_Stripeset and encryption.

Software

Install sys-fs/lvm2:

→ Information about USE flags
USE flag Default Recommended Description
clvm No Allow users to build clustered lvm2
cman No Cman support for clustered lvm
lvm1 Yes Allow users to build lvm2 with lvm1 support
readline Yes Enables support for libreadline, a GNU line-editing library that almost everyone wants
selinux No No  !!internal use only!! Security Enhanced Linux support, this must be set by the selinux profile or breakage will occur
static No  !!do not set this during bootstrap!! Causes binaries to be statically linked instead of dynamically
static-libs No Build static libraries
thin Yes Support for thin volumes
udev Yes Enable sys-fs/udev integration (device discovery, power and storage device support, etc)
root # emerge --ask lvm2

Configuration

The configuration file is /etc/lvm/lvm.conf

Boot service

openrc

To start LVM manaully:

root # /etc/init.d/lvm start

To start LVM at boot time:

root # rc-update add lvm boot

systemd

To start lvm manually:

root # systemctl start lvm.service

To start LVM at boot time:

root # systemctl enable lvm.service

LVM on root

Most bootloaders cannot boot from LVM directly - neither GRUB legacy nor LILO can. Grub 2 CAN boot from an LVM linear LV, mirrored LV and possibly some kinds of RAID LVs. No bootloader currently support thin LVs.

For that reason, it is recommended to use a non-LVM /boot partition and mount the LVM root from an initramfs. Most users will want to use a prebuilt one. Genkernel. Genkernel-next, and dracut can generate an initramfs suitable for most LV types.

  • genkernel can boot from all types except thin volumes (as it neither builds nor copies the thin-provisioning-tools binaries from the build host) and maybe RAID10 (RAID10 support requires LVM2 2.02.98, but genkernel builds 2.02.89, however if static binaries are available it can copy those)
  • genkernel-next can boot from all types volumes. but needs a new enough app-misc/pax-utils or the resulting thin binaries will be broken. [1]
  • dracut should boot all types, but only includes thin support in the initramfs if the host its being run on has a thin root. In that case should copies it thin-provisioning-tools binaries from the build host - but it fails to actually do so, and thus does not work.

Usage

LVM organizes storage in three different levels as follows:

  • hard drives, partitions, RAID systems or other means of storage are initialized as PV (Physical Volume)
  • Physical Volumes (PV) are grouped together in Volume Groups (VG)
  • Logical Volumes (LV) are managed in Volume Groups (VG)

PV (Physical Volume)

Physical Volumes are the actual hardware or storage system LVM builds up upon.

Partitioning

The partition type for LVM is 8e (Linux LVM):

root # fdisk /dev/sdX

In fdisk, you can create MBR partitions using the n key and then change the partition type with the t key to 8e. We will end up with one primary partition /dev/sdX1 of partition type 8e (Linux LVM).

Note
This step is not needed, since LVM can initialize whole hard drives as PV. Actual not using partitioning avoids LVM to be restricted to certain limits of MBR or GPT tables.

Create PV

The following command creates a Physical Volume (PV) on the two first primary partitions of /dev/sdX and /dev/sdY:

root # pvcreate /dev/sd[X-Y]1

List PV

The folloing command lists all active Physical Volumes (PV) in the system:

root # pvdisplay

You can scan for PV in the system, to troubleshoot not properly initialized or lost storage devices:

root # pvscan

Remove PV

LVM automatically distributed the data onto all available PV, if not told otherwise. To make sure there is no data left on our device before we remove it, use the following command:

root # pvmove -v /dev/sdX1

This might take a long time and once finished, there should be no data left on /dev/sdX1. We first remove the PV from our Volume Group (VG) and then the actual PV:

root # vgreduce vg0 /dev/sdX1 && pvremove /dev/sdX1
Note
If a whole hard drives was once initialized as PV, then you have to remove it before it can be properly partitioned again. That is because PV have no valid MBR table

VG (Volume Group)

Volume Groups (VG) consist of one or more Physical Volumes (PV) and show up as /dev/<VG name>/ in the device file system.

Create VG

The following command creates a Volume Group (VG) named vg0 on two previously initialized Physical Volumes (PV) named /dev/sdX1 and /dev/sdY1:

root # vgcreate vg0 /dev/sd[X-Y]1

List VG

The folloing command lists all active Volume Groups (VG) in the system:

root # vgdisplay

You can scan for VG in the system, to troubleshoot not properly created or lost VGs:

root # vgscan

Extend VG

With the following command, we extend the existing Volume Group (VG) vg0 onto the Physical Volume (PV) /dev/sdZ1:

root # vgextend vg0 /dev/sdZ1

Reduce VG

Before we can remove a Physical Volume (PV), we need to make sure that LVM has no data left on the device. To move all data off that PV and distribute it onto the other available, use the following command:

root # pvmove -v /dev/sdX1

This might take a while and once finished, we can remove the PV from our VG:

root # vgreduce vg0 /dev/sdX1

Remove VG

Before we can remove a Volume Group (VG), we have to remove all existing Snapshots, all Logical Volumes (LV) and all Physical Volumes (PV) but one. The following command removes the VG named vg0:

root # vgremove vg0

LV (Logical Volume)

Logical Volumes (LV) are created and managed in Volume Groups (VG), once created they show up as /dev/<VG name>/<LV name> and can be used like normal partitions.

Create LV

With the following command, we create a Logical Volume (LV) named lvol1 in Volume Group (VG) vg0 with a size of 150MB:

root # lvcreate -L 150M -n lvol1 vg0

There are other useful options to set the size of a new LV like:

  • -l 100%FREE = maximum size of the LV within the VG
  • -l 50%VG = 50% size of the whole VG

List LV

The following command lists all Logical Volumes (LV) in the system:

root # lvdisplay

You can scan for LV in the system, to troubleshoot not properly created or lost LVs:

root # lvscan

Extend LV

With the following command, we can extend the Logical Volume (LV) named lvol1 in Volume Group (VG) vg0 to 500MB:

root # lvextend -L500M /dev/vg0/lvol1
Note
use -L+350M to increase the current size of a LV by 350MB

Once the LV is extended, we need to grow the file system as well (in this example we used ext4 and the LV is mounted to /mnt/data):

Note
Some file systems do support online-resizing, like ext4 otherwise you have to umount the file system first
root # resize2fs /mnt/data 500M

Reduce LV

Before we can reduce the size of our Logical Volume (LV) without corrupting existing data, we have to shrink the file system on it. In this example we used ext4, the LV needs to be unmounted to shrink the file system:

root # umount /mnt/data
root #
e2fsck -f /dev/vg0/lvol1
root #
resize2fs /dev/vg0/lvol1 150M

Now we are ready to reduce the size of our LV:

root # lvreduce -L150M /dev/vg0/lvol1
Note
use -L-350M to reduce the current size of a LV by 350MB

LV Permissions

Logical Volumes (LV) can be set to be read only storage devices.

root # lvchange -p r /dev/vg0/lvol1

The LV needs to be remounted for the changes to take affect:

root # mount -o remount /dev/vg0/lvol1

To set the LV to be read/write again:

root # lvchange -p rw /dev/vg0/lvol1 && mount -o remount /dev/vg0/lvol1

Remove LV

Before we remove a Logical Volume (LV) we should unmount and deactivate, so no further write activity can take place:

root # umount /dev/vg0/lvol1 && lvchange -a n /dev/vg0/lvol1

The following command removes the LV named lvol1 from VG named vg0:

root # lvremove /dev/vg0/lvol1

Thin metadata, pool, and LV

Recent versions of LVM2 (2.02.89) support "thin" volumes. Thin volumes are to block devices what sparse files are to filesystems. Thus, a thin LV within a pool can be "overcommitted" - it can even be larger than the pool itself. Just like a sparse file, the "holes" are filled as the block device gets populated. If the filesystem has "discard" support, as files are deleted, the "holes" can be recreated, reducing utilization of the thin pool.

Create thin pool

Warning
If the thin pool metadata overflows, the pool will be corrupted. LVM cannot recover from this.
Note
If the thin pool gets exhausted, any process that would cause thin pool to overrun will be stuck in "killable sleep" state until either the thin pool is extended or the process recieves SIGKILL.

Each thin pool has some metadata associated with it, which is added to the thin pool size. You can specify it explicitly, otheriwse lvm2 will compute one based on the size of the thin pool as the minimum of pool_chunks * 64 bytes or 2MiB, whichever is larger.

root # lvcreate -L 150M --type thin-pool --thinpool thin_pool vg0

This create a thin pool named "thin_pool" with a size of 150MB (actually, it slightly bigger than 150MB because of the metadata).

root # lvcreate -L 150M --metadatasize 2M --type thin-pool --thinpool thin_pool vg0

This create a thin pool named "thin_pool" with a size of 150MB and an explicit metadata size of 2MiB.

Unfortunately, because the metasize is added to thin pool size, the intuitive way of filling a VG wit ha thin pool doesn't work:[2]

root # lvcreate -l 100%FREE --type thin-pool --thinpool thin_pool vg0
Insufficient suitable allocatable extents for logical volume thin_pool: 549 more required

Note the thin pool does not have an associated device node like other LV's.

Create a thin LV

A Thin LV is somewhat unusual in LVM - the thin pool itself is an LV, so a thin LV is a "LV-within-an-LV". Since the volumes are sparse, a virtual size instead of a physical size is specified:

root # lvcreate -T vg0/thin_pool -V 300M -n lvol1

Note how the LV is larger then the pool it is create in. Its also possible to create the thin metadata, pool and LV on the same command:

root # lvcreate -T vg0/thin_pool -V 300M -L150M -n lvol1

List thin pool and thin LV

Thin LV are just like any other lv are are displayed using the lvdisplay and scanned using lvscan

Extend thin pool

Warning
As of LVM2 2.02.89, the metadata size of the thin pool cannot be expanded, it is fixed at creation

The thin pool is expanded like a non-thin LV:

root # lvextend -L500M vg0/thin_pool

or

root # lvextend -L+350M vg0/thin_pool

Extend thin LV

A Thin LV is expanded just like a regular LV:

root # lvextend -L1G vg0/lvol1

or

root # lvextend -L+700M vg0/lvol1l

Note this is asymmetric from create where the virtual size was specified with -V instead of -L/-l. The filesystem can then be expanded using that filesystem's tools.

Reduce thin pool

Currently, LVM cannot reduce the size of the thin pool[3].

Reduce thin LV

Before shrinking an LV, shrink the filesystem first using that filesystem's tools. Some filesystems do not support shrinking. A Thin LV is reduced just like a regular LV:

root # lvreduce -L300M vg0/lvol1l

or

root # lvreduce -L-700M vg0/lvol1

Note this is asymmetric from create where the virtual size was specified with -V instead of -L/-l.

Thin pool Permissions

It is not possible to change the permission on the thin pool (nor would it make any sense to).

Thin LV Permissions

A thin LV can be set read-only/read-write the same waya regular LV is

Thin pool Removal

The thin pool cannot be removed until all the thin LV within it are removed. Once that is done, it can be removed:

root # lvremove vg0/thin_pool

Thin LV Removal

A thin is removed like a regular LV

Examples

We can create some scenarios using loopback devices, so no real storage devices are used.

Preparation

First we need to make sure the loopback module is loaded. If you want to play around with partitions, use the following option:

root # modprobe -r loop && modprobe loop max_part=63
Note
you cannot reload the module, if it is built into the kernel

Now we need to either tell LVM to not use udev to scan for devices or change the filters in /etc/lvm/lvm.conf. In this case we just temporarily do not use udev:

File/etc/lvm/lvm.conf

obtain_device_list_from_udev = 0
Important
this is for testing only, you want to change the setting back when dealing with real devices since it is much faster

We create some image files, that will become our storage devices (uses ~10GB of real hard drive space):

root # mkdir /var/lib/lvm_img
root #
dd if=/dev/null of=/var/lib/lvm_img/lvm0.img bs=1024 seek=2097152
root #
dd if=/dev/null of=/var/lib/lvm_img/lvm1.img bs=1024 seek=2097152
root #
dd if=/dev/null of=/var/lib/lvm_img/lvm2.img bs=1024 seek=2097152
root #
dd if=/dev/null of=/var/lib/lvm_img/lvm3.img bs=1024 seek=2097152
root #
dd if=/dev/null of=/var/lib/lvm_img/lvm4.img bs=1024 seek=2097152

Check which loopback devices are available:

root # losetup -a

We assume all loopback devices are available and create our hard drives:

root # losetup /dev/loop0 /var/lib/lvm_img/lvm0.img
root #
losetup /dev/loop1 /var/lib/lvm_img/lvm1.img
root #
losetup /dev/loop2 /var/lib/lvm_img/lvm2.img
root #
losetup /dev/loop3 /var/lib/lvm_img/lvm3.img
root #
losetup /dev/loop4 /var/lib/lvm_img/lvm4.img

Now we can use /dev/loop[0-4] as we would use any other hard drive in the system.

Note
On the next reboot, all the loopback devices will be released and the folder /var/lib/lvm_img can be deleted

LVM2 Linear volumes

Linear volumes are the most common kind of LVM volume. A linear volume can consume all or part of a LV. LVM will attempt to allocate the LV to be as physically contiguous as possible. If there's a PV large enough to hold the entire LV, LVM will allocate it there, otherwise it will split it up into a few pieces a possible.

A linear volume is actually implemented as degenerate stripe set (containing a single stripe).

Creating a linear volume

To create a linear volume:

root # pvcreate /dev/loop[0-2]
root #
vgcreate vg00 /dev/loop[0-2]
root #
lvcreate -L3G -n lvm_stripe1 vg00
root #
lvcreate -L2G -n lvm_stripe2 vg00

The linear volume is the default type.

root # pvscan
  PV /dev/loop0   VG vg00            lvm2 [2.00 GiB / 0    free]
  PV /dev/loop1   VG vg00            lvm2 [2.00 GiB / 1012.00 MiB free]
  PV /dev/loop2   VG vg00            lvm2 [2.00 GiB / 0    free]

LVM allocate the first LV to use all of the first PV and part of the second, and the second PV to use all of the third PV.

Because linear volumes have no special requirements, they are the easiest to manipulate and can be resized, relocated, at will. If an LV is allocated across multiple PVs, and any of the PV's are unavailable, that LV cannot be started and will be unusable.

/etc/fstab

Here is an example of an entry in fstab (using ext4):

File/etc/fstab

/dev/vg0/lvol1  /mnt/data  ext4  noatime  0 2

For thin volumes, add the discard option:

File/etc/fstab

/dev/vg0/lvol1  /mnt/data  ext4  noatime,discard  0 2

LVM2 Snapshots and LVM2 Thin Snapshots

A snapshot is an LV as copy of another LV, which takes in all the changes that were made in the original LV to show the content of that LV in a different state. We once again use our two hard drives and create LV lvol1 this time with 60% of VG vg0:

root # pvcreate /dev/loop[0-1]
root #
vgcreate vg0 /dev/loop[0-1]
root #
lvcreate -l 60%VG -n lvol1 vg0
root #
mkfs.ext4 /dev/vg0/lvol1
root #
mount /dev/vg0/lvol1 /mnt/data

LVM2 Snapshots

Now we create a snapshot of lvol1 named 08092011_lvol1 and give it 10% of VG vg0:

root # lvcreate -l 10%VG -s -n 08092011_lvol1 /dev/vg0/lvol1
Important
if a snapshot exceeds it's maximum size, it disappears

Mount our snapshot somewhere else:

root # mkdir /mnt/08092011_data
root #
mount /dev/vg0/08092011_lvol1 /mnt/08092011_data

We could now access data in lvol1 from a previous state.

LVM2 snapshots are writeable LV, we could use them to let a project go on into two different directions:

root # lvcreate -l 10%VG -s -n project1_lvol1 /dev/vg0/lvol1
root #
lvcreate -l 10%VG -s -n project2_lvol1 /dev/vg0/lvol1
root #
mkdir /mnt/project1 /mnt/project2
root #
mount /dev/vg0/project1_lvol1 /mnt/project1
root #
mount /dev/vg0/project2_lvol1 /mnt/project2

Now we have three different versions of LV lvol1, the original and two snapshots which can be used parallel and changes are written to the snapshots.

Note
the original LV lvol1 cannot be reduced in size or removed if snapshots of it exist. Snapshots can be increased in size without growing the file system on them, but they cannot exceed the size of the original LV

LVM2 Thin Snapshots

Note
A thin snapshot can only be taken on a thin origin. The thin device mapper target DOES support a thin snapshot of a read-only non-thin origin, but LVM2 does not support this. Its is possible to create a non-thin snapshot of a thin origin, however

Creating a thin snapshot is simple:

root # lvcreate -s -n 08092011_lvol1 /dev/vg0/lvol1

Note how a size is not specified with -l/-L - nor the virtual size with -V. Snapshots have a virtual size the same as their origin, and a phyical size of 0 like all new thin volumes. This also means its not possible to limit the phyical size of the snapshot. Thin snapshots are writable just like regualr snapshot.

Important
If -l/-L is specified, a snapshot will still be created, but the resulting snapshot will be a regular snapshot, not a thin snapshot

Recursive snapshpots can be created:

root # lvcreate -s -n 08092012_lvol1 /dev/vg0/08092011_lvol1

Thin snapshots have several advantages over regular snapshots. First, thin snapshots are independent of their origins once created. The origin can be shrunk or deleted without affecting the snapshot. Second, thin snapshots can be efficiently created recursively (snapshots of snapshots) without the "chaining" overhead of regular recursive LVM snapshots.

LVM2 Rollback Snapshots

To rollback the logical volume to the version of the snapshot, use the following command:

root # lvconvert --merge /dev/vg0/08092011_lvol1

This might take a couple of minutes, depending on the size of the volume.

Important
the snapshot will disappear and this is not revertible

LVM2 Thin Rollback Snapshots

For thin volumes, lvconvert --merge does not work. Instead, delete the origin and rename the snapshot:

root # umount /dev/vg0/lvol1
root #
lvremove /dev/vg0/lvol1
root #
lvrename vg0/08092011_lvol1 lvol1

LVM2 Mirrors

LVM support mirrored volume, which provide fault tolerance in the event of drive failure. Unlike RAID1, there is no performance benefit - all reads and writes are delivered to a single "leg" of the mirror. 1 additional PV is required for each mirror.

Mirrors support 3 kind of logs:

Note
For all mirror log types except core LVM prefers - and sometimes insists - the mirror logs be kept on PV that does not contain the mirrored LVs. If if its desired to have the mirror logs on the same PV as the mirrored LVs themselves, and LVM insists on a separate PV for the log, add the --alloc anywhere parameter.
  • Disk mirror logs the state of the mirror on the disk in extra metadata extents. LVM keeps track of what mirrored and can pick up where it left off if incomplete. This is the default.
  • Mirror logs are disk logs that are themselves mirrored.
  • Core mirror logs record the state of the mirror in memory only. LVM will have to rebuild the mirror every time it is activated. Useful for temporary mirrors.

Creating an mirror LV

To create an LV with a single mirror:

root # pvcreate /dev/loop[0-1]
root #
vgcreate vg0 /dev/loop[0-1]
root #
lvcreate -m 1 --mirrorlog core -l 40%VG --nosync vg00
WARNING: New mirror won't be synchronised. Don't read what you didn't write!

The -m 1 indicate we want to create 1 (additional) mirror, requiring 2 PV's. The --nosync option is an optimization - without it LVM will try synchronize the mirror by copying empty sectors from one LV to another.

Creating a mirror of an existing LV

It is possible to create a mirror of an existing LV:

root # pvcreate /dev/loop[0-1]
root #
vgcreate vg0 /dev/loop[0-1]
root #
lvcreate -l 40%VG vg00
root #
lvconvert -m 1 --mirrorlog core -b vg00/lvol0

The mirrors an existing LV onto a different PV. The -b option puts the operation into the background, as mirroring an LV takes a long time.

Removing a mirror of an existing LV

To remove mirror, set the number of mirrors to 0:

root # lvconvert -m0 vg0/lvol0

Failed mirrors

To simulate a failure:

Warning
Mirror failures can cause the device mapper to deadlock, requiring a reboot
root # vgchange -an vg00
root #
losetup -d /dev/loop1
root #
rm /var/lib/lvm_img/lvm1.img
root #
dd if=/dev/null of=/var/lib/lvm_img/lvm1.img bs=1024 seek=2097152
root #
losetup /dev/loop1 /var/lib/lvm_img/lvm1.img

If part of the mirror is unavailable (usually because the disk containing the PV has failed), the VG will need to be brought up in degraded mode:

root # vgchange -ay --partial vg00

On the first write, LVM will notice the mirror is broken. The default policy ("remove") is to automatically reduce/break the mirror according to the number of pieces available. A 3-way mirror with a missing PV will be reduced to 2-way mirror; a 2-way mirror will be reduced to a regular linear volume. If the failure is only transient, and the missing PV returns after LVM has broken the mirror, the mirrored LV will need to be recreated on it.

To recover the mirror, The failed PV needs to be removed from the VG, and replacement one added (or if the VG has a free PV, created on a different PV), the mirror recreated with lvconvert, and the old PV removed from the VG

root # pvcreate /dev/loop1
root #
vgextend vg00 /dev/loop1
root #
lvconvert -b -m 1 --mirrorlog disk vg00/lvol0
root #
vgreduce --removemissing vg00

It is possible to have LVM recreate the mirror with free extents on a different PV if a "leg" fails, to do that, set mirror_image_fault_policy to "allocate" in lvm.conf.

Thin mirrors

It is not (yet) possible to create a mirrored thin pool or thin volume. It is possible to create a mirrored thin pool my creating a normal mirrored LV and then converting the LV it to a thin pool with lvconvert. 2 LVs are required: One for the thin pool and one for the thin metadata, the conversion process will merge them into a single LV.

Warning
LVM 2.02.98 or above is required for this to work properly. Prior versions are either not capable or will segfault and corrupt the VG. Also, conversion of a mirror into a thin pool destroys all existing data in the mirror!
root # lvcreate -m 1 --mirrorlog mirrored -l40%VG -n thin_pool vg00
root #
lvcreate -m 1 --mirrorlog mirrored -L4MB -n thin_meta vg00
root #
lvconvert --thinpool vg00/thin_pool --poolmetadata vg00/thin_meta

LVM2 RAID 0/Stripeset

Important
If a linear volume suffers a disk failure, a giant, contiguous "hole" is created. It may be possible to recover data from outside that hole. If a striped volume suffers a disk failure, the instead of a contiguous hole, the geometry is closer to Swiss cheese; recovery of anything is slim to none.

Instead of a linear volume, where multiple contiguous volumes are appended, it possible to create a striped or RAID 0 volume for better performance.

Creating a stripe set

To create a 3-PV striped volume:

root # pvcreate /dev/loop[0-2]
root #
vgcreate vg00 /dev/loop[0-2]
root #
lvcreate -i 3 -l 20%VG -n lvm_stripe vg00
Using default stripesize 64.00 KiB

The -i option indicated how many PVs to stripe over, in this case, 3.

root # pvscan
  PV /dev/loop0   VG vg00            lvm2 [2.00 GiB / 1.60 GiB free]
  PV /dev/loop1   VG vg00            lvm2 [2.00 GiB / 1.60 GiB free]
  PV /dev/loop2   VG vg00            lvm2 [2.00 GiB / 1.60 GiB free]

On each PV 400MB got reserved for LV lvm_stripe in VG vg00

It is possible to mirror a stripe set. The -i and -m options can be combined to create a striped mirror:

root # lvcreate -i 2 -m 1 -l 10%VG vg00

This creates a 2 PV stripe set and mirrors it on 2 different PVs, for a total of 4 PVs. An existing stripe set can be mirrored with lvconvert.

A thin pool can be striped like any other LV. All the thin volumes created from the pool inherit that settings - do not specify it manually when creating a thin volume.

It is not possible to stripe an existing volume, nor reshape the stripes across more/less PVs, nor to convert to a different RAID level/linear volume. A stripe set can be mirrored. It is possible to extend a stripe set across additional PVs, but they must be added in multiples of the original stripe set (which will effectively linearly append a new stripe set), or --alloc anywhere must be specified (which can hurt performance). In the above example, 3 additional PVs would be required without --alloc anywhere.

LVM2 RAID 1

Unlike RAID 0, which is striping, RAID 1 is mirroring, but implemented differently than the original LVM mirror. Under RAID1, reads are spread out across PV, improving performance. RAID1 Mirror failures do not cause I/O to block because LVM does not need to break it on write.

Any place where an LVM mirror could be used, a RAID 1 mirror can be used in its place. Its possible to have LVM create RAID1 mirrors instead of regular mirrors implicitly by setting mirror_segtype_default in lvm.conf to raid1.

Creating RAID 1 LV

To create an LV with a single mirror:

root # pvcreate /dev/loop[0-1]
root #
vgcreate vg00 /dev/loop[0-1]
root #
lvcreate -m 1 --type raid1 -l 40%VG --nosync -n lvm_raid1 vg00
WARNING: New raid1 won't be synchronised. Don't read what you didn't write!
root # pvscan
  PV         VG     Fmt  Attr PSize   PFree  
  /dev/loop0 vg00   lvm2 a--    2.00g 408.00m
  /dev/loop1 vg00   lvm2 a--    2.00g 408.00m

On each PV about 1.2G got reserved for LV lvm_raid1 in VG vg00

Note the difference for creating a mirror: There is no mirrorlog specified, because RAID1 LV (explicit) do not an explicit mirror log - it built-in to the LV. Second, --type raid1 needs is added, it wasn't needed with LVM mirror before. Also note the similarities: -m 1 to for a single mirror (-i 1' works too for RAID 1, unlike an LVM mirror), and the --nosync to skip the initial sync

Converting existing LV to RAID 1

It is possible to convert an existing LV to RAID 1:

root # pvcreate /dev/loop[0-1]
root #
vgcreate vg00 /dev/loop[0-1]
root #
lvcreate -n lvm_raid1 -l20%VG vg00
root #
lvconvert -m 1 --type raid1 -b vg00/lvm_raid1

Conversion is similar to creating a mirror from an existing LV.

Removing a RAID 1 mirror

To remove a RAID 1 mirror, set the number of mirrors to 0:

root # lvconvert -m0 vg00/lvm_raid1

Same as an LVM mirror

Failed RAID 1

Simulating a failure is the same as an LVM mirror

If part of the RAID1 is unavailable (usually because the disk containing the PV has failed), the VG will need to be brought up in degraded mode:

root # vgchange -ay --partial vg00

Unlike an LVM mirror, writing missing part of the RAID does NOT breaking the mirroring. If the failure is only transient, and the missing PV returns, LVM will resync the mirror my copying cover the out-of-date segments instead of the entire LV.

To recover the RAID 1, The failed PV needs to be removed from the VG, and replacement one added (or if the VG has a free PV, created on a different PV), the mirror repaired with lvconvert, and the old PV removed from the VG.

root # pvcreate /dev/loop1
root #
vgextend vg00 /dev/loop1
root #
lvconvert --repair -b vg00/lvm_raid1
root #
vgreduce --removemissing vg00

Thin RAID1

It is not (yet) possible to create a RAID 1 thin pool or thin volume. It is possible to create a RAID 1 thin pool by creating a normal mirrored LV and then converting the LV it to a thin pool with lvconvert. 2 LVs are required: One for the thin pool and one for the thin metadata, the conversion process will merge them into a single LV.

Warning
LVM 2.02.98 or above is required for this to work properly. Prior versions are either not capable or will segfault and corrupt the VG. Also, conversion of a RAID 1 into a thin pool destroys all existing data in the mirror!
root # lvcreate -m 1 --type raid1 -l40%VG -n thin_pool vg00
root #
lvcreate -m 1 --type raid1 -L4MB -n thin_meta vg00
root #
lvconvert --thinpool vg00/thin_pool --poolmetadata vg00/thin_meta

LVM2 Stripeset with Parity (RAID4 and RAID5)

Note
Stripeset with Parity require at least 3 PVs

RAID 0 is not fault-tolerant - if any of the PVs fail the LV is unusable. By adding a parity stripe to RAID 0 the LV can still function will a single missing PV. A new PV can then be added to restore fault tolerance.

Stripsets with parity come in 2 flavors: RAID 4 and RAID 5. Under RAID 4.all the parity stripes are stored on the same LV. The PV containing the LV can become a bottleneck because all writes hit that PV, and gets worse the more PVs in the array. With RAID 5, the parity data is distributed evenly across the LVs and no PV is a bottleneck. For that reason, RAID 4 is rare is considered obsolete/historical and in practice all stripesets with parity are RAID 5.

Creating a RAID5 LV

root # pvcreate /dev/loop[0-2]
root #
vgcreate vg00 /dev/loop[0-2]
root #
lvcreate --type raid5 -l 20%VG -i 2 -n lvm_raid5 vg00

Like the RAID0/Stripe without parity, the -i option is used to specify the number of PVs stripe. However, only the data PV are specified with -i - LVM adds the parity one automatically. Thus for a 3 PV RAID5, its -i 2 and not -i 3 .

root # pvscan
  PV /dev/loop0   VG vg00            lvm2 [2.00 GiB / 1.39 GiB free]
  PV /dev/loop1   VG vg00            lvm2 [2.00 GiB / 1.39 GiB free]
  PV /dev/loop2   VG vg00            lvm2 [2.00 GiB / 1.39 GiB free]

On each PV about 600MB got reserved for LV lvm_raid5 in VG vg00

Recovering from a failed RAID5

To simulate a failure:

root # vgchange -an vg00
root #
losetup -d /dev/loop1
root #
rm /var/lib/lvm_img/lvm1.img
root #
dd if=/dev/null of=/var/lib/lvm_img/lvm1.img bs=1024 seek=2097152
root #
losetup /dev/loop1 /var/lib/lvm_img/lvm1.img

The VG will need to be brought up in degraded mode

root # vgchange -ay --partial vg00

The volume will work normally at this point, however this degraded the array to RAID 0 until a replacement PV is added. Performance is unlikely to be affected while the array is degraded - while it does need to recompute is missing data via parity, it only requires simple XOR the parity block with the remaining data. The overhead is negligible compared to the disk I/O.

To repair the RAID5:

root # pvcreate /dev/loop1
root #
vgextend vg00 /dev/loop1
root #
lvconvert --repair vg00/lvm_raid5
root #
vgreduce --removemissing vg00

Its possible to replace a still working PV in RAID5 as well

root # pvcreate /dev/loop3
root #
vgextend vg00 /dev/loop3
root #
lvconvert --replace /dev/loop1 vg00/lvm_raid5
root #
vgreduce vg00 /dev/loop1

The same restrictions of stripe sets apply to stripe sets with parity as well: It is not possible to stripe with parity an existing volume, nor reshape the stripes with parity across more/less PVs, nor to convert to a different RAID level/linear volume. A stripe set with parity can be mirrored. It is possible to extend a stripe set with parity across additional PVs, but they must be added in multiples of the original stripe set with parity (which will effectively linearly append a new stripe set with parity), or --alloc anywhere must be specified (which can hurt performance). In the above example, 3 additional PVs would be required without --alloc anywhere.

Thin RAID5 LV

It is not (yet) possible to create stripe set with parity (RAID5) thin pool or thin volume. It is possible to create a RAID5 thin pool by creating a normal RAID5 LV and then converting the LV into a thin pool with lvconvert. 2 LVs are required: One for the thin pool and one for the thin metadata, the conversion process will merge them into a single LV.

Warning
LVM 2.02.98 or above is required for this to work properly. Prior versions are either not capable or will segfault and corrupt the VG. Also, coversion of a RAID5 LV into a thin pool destroys all existing data in the LV!
root # lvcreate --type raid5 -i 2 -l20%VG -n thin_pool vg00
root #
lvcreate --type raid5 -i 2 -L4MB -n thin_meta vg00
root #
lvconvert --thinpool vg00/thin_pool --poolmetadata vg00/thin_meta

LVM2 RAID 6

Note
RAID6 requires as at 5 PVs

RAID 6 is similar to RAID 5, however RAID 6 can survive up to TWO PV failures, thus offering more fault tolerance than RAID5 at the expense of an extra PV.

Creating a RAID6 LV

root # pvcreate /dev/loop[0-4]
root #
vgcreate vg00 /dev/loop[0-4]
root #
lvcreate --type raid6 -l 20%VG -i 3 -n lvm_raid6 vg00

Like raid5, the -i option is used to specify the number of PVs stripe, excluding the 2 PV's for parity. Thus for 5 PV RAID6, its -i 3 and not -i 5.

root # pvscan
  PV /dev/loop0   VG vg00     lvm2 [2.00 GiB / 1.32 GiB free]
  PV /dev/loop1   VG vg00     lvm2 [2.00 GiB / 1.32 GiB free]
  PV /dev/loop2   VG vg00     lvm2 [2.00 GiB / 1.32 GiB free]
  PV /dev/loop3   VG vg00     lvm2 [2.00 GiB / 1.32 GiB free]
  PV /dev/loop4   VG vg00     lvm2 [2.00 GiB / 1.32 GiB free]

On each PV about 680MB got reserved for LV lvm_raid6 in VG vg00

Recovering from a failed RAID6

Recovery for RAID6 is the same as RAID5. A RAID6 LV with a single failure reduces to RAID5. A RAID6 LV with 2 failures reduces to RAID0. It is left as an exercise to the reader to simulate a 2 PV failure.

Unlike RAID5 where parity block is cheap to recompute vs disk I/O, this is only half true in RAID6. RAID6 uses 2 parity stripes: One stripe is computed the same way as RAID5 (simple XOR). The second parity stripe is much harder to compute[4].

The same restrictions of stripe sets with parity apply to RAID6 as well: It is not possible to RAID6 an existing volume, nor reshape a RAID6 across more/less PVs, nor to convert to a different RAID level/linear volume. A RAID6 can be mirrored. It is possible to extend a RAID6 across additional PVs, but they must be added in multiples of the original RAID6 (which will effectively linearly append a new RAID6), or --alloc anywhere must be specified (which can hurt performance). In the above example, 5 additional PVs would be required without --alloc anywhere.

Thin RAID6 LV

It is not (yet) possible to create a RAID6 thin pool or thin volumes. It is possible to create a RAID6 thin pool by creating a normal RAID6 LV and then converting the LV into a thin pool with lvconvert. 2 LVs are required: One for the thin pool and one for the thin metadata, the conversion process will merge them into a single LV.

Warning
LVM 2.02.98 or above is required for this to work properly. Prior versions are either not capable or will segfault and corrupt the VG. Also, conversion of a RAID6 LV into a thin pool destroys all existing data in the LV!
root # lvcreate --type raid6 -i 2 -l20%VG -n thin_pool vg00
root #
lvcreate --type raid6 -i 2 -L4MB -n thin_meta vg00
root #
lvconvert --thinpool vg00/thin_pool --poolmetadata vg00/thin_meta

LVM RAID10

Note
RAID10 requires as at 4 PVs. Also LVM syntax requires the number of PV be multiple of the numbers stripes and mirror, even though RAID10 format does not

RAID10 is a combination of RAID0 and RAID1. Its s more powerful than RAID 0+RAID 1 as mirror is done at the stripe level instead of the LV level, and therefore the layout need not be symmetric. A RAID10 volume can tolerate at least a single missing PV, and possibly more.

Creating a RAID10 LV

Note
LVM currently limits RAID10 to a single mirror.
root # pvcreate /dev/loop[0-3]
root #
vgcreate vg00 /dev/loop[0-3]
root #
lvcreate --type raid10 -l 1020 -i 2 -m 1 --nosync -n lvm_raid10 vg00
  Using default stripesize 64.00 KiB
  WARNING: New raid10 won't be synchronised. Don't read what you didn't write!
.

Both the -i AND -m options are specified: -i is the number of stripes and -m is the number of mirrors. 2 stripes and 1 mirror require 4 PVs. --nosync is an optimization to skip the initial copy.

root # pvscan
  PV         VG     Fmt  Attr PSize   PFree 
  /dev/loop0 vg00   lvm2 a--    2.00g     0 
  /dev/loop1 vg00   lvm2 a--    2.00g     0 
  /dev/loop2 vg00   lvm2 a--    2.00g     0 
  /dev/loop3 vg00   lvm2 a--    2.00g     0 

On each PV 2G got reserved for LV lvm_raid10 in VG vg00

Recovering from a failed RAID10

For a single failed PV, recovery for RAID10 is the same as RAID5. In the example above LVM chose to stripe over PV loop0 and loop2, and mirror on loop1 and loop3. The resulting array can tolerate the loss of any one PV, or 2 PV if they are on different mirrors (0/2, 0/3, 1/2, 1/3 but not 0/1 or 2/3)

The same restrictions of stripe sets apply to RAID10 as well: It is not possible to RAID10 an existing volume, nor reshape the RAID10 across more/less PVs, nor to convert to a different RAID level/linear volume , It is possible to extend a RAID10 across additional PVs, but they must be added in multiples of the original RAID10 (which will effectively linearly append a new RAID10), or --alloc anywhere must be specified (which can hurt performance). In the above example, 4 additional PVs would be required without --alloc anywhere

Thin RAID 10

It is not (yet) possible to create a RAID10 thin pool or thin volumes. It is possible to create a RAID6 thin pool by creating a normal RAID10 LV and then converting the LV into a thin pool with lvconvert. 2 LVs are required: One for the thin pool and one for the thin metadata, the conversion process will merge them into a single LV.

Warning
Conversion of a RAID6 LV into a thin pool destroys all existing data in the LV!
root # lvcreate -i 2 -m 1 --type raid10 -l 1012 -n thin_pool vg00
root #
lvcreate -i 2 -m 1 --type raid10 -l 6 -n thin_meta vg00
root #
lvconvert --thinpool vg00/thin_pool --poolmetadata vg00/thin_meta

Troubleshooting

LVM has only MIRROR and snapshots to provide some level of redundancy. However there are certain situations where one might be able to restore lost PV or LV.

vgcfgrestore utility

By default, on any change to a LVM PV, VG, or LV, LVM2 create a backup file of the metadata in /etc/lvm/archive. These files can be used to recover from an accidental change (like deleting the wrong LV), LVM also keeps a backup copy of the most recent metadata in /etc/lvm/backup. These can be used to restore metadata to a replacement disk, or repair corrupted metadata.

To see what states of the VG are available to be restored (this is just partial output)

root # vgcfgrestore --list vg00
  File:		/etc/lvm/archive/vg00_00042-302371184.vg
  VG name:    	vg00
  Description:	Created *before* executing 'lvremove vg00/lvm_raid1'
  Backup Time:	Sat Jul 13 01:41:32 201

Recovering an accidently deleted LV

Suppose LV lvm_raid1 was accidentally removed from VG vg00. It is possible to recover it:

root # vgcfgrestore -f /etc/lvm/archive/vg00_00042-302371184.vg vg00
Important
vgcfgrestore only restores LVM metadata, NOT the data inside the LV. However pvremove, vgremove, and lvremove only wipe metadata, leaving any data intact. However, if issue_discards is set in /etc/lvm/lvm.conf then these command ARE destructive to data.

Replacing a failed PV

In the above examples, when a disk containing a PV failed, an "add/remove" technique was used: a new PV was created on a new disk, the VG extended to it, the LV repaired and the old PV removed from the VG. However it possible to do a true "replace" and recreate the metadata on the disk to be the same as the old disk.

Following the above example for a failed RAID 1:

root # vgdisplay --partial --verbose
  --- Physical volumes ---
  PV Name               /dev/loop0     
  PV UUID               iLdp2U-GX3X-W2PY-aSlX-AVE9-7zVC-Cjr5VU
  PV Status             allocatable
  Total PE / Free PE    511 / 102
   
  PV Name               unknown device     
  PV UUID               T7bUjc-PYoO-bMqI-53vh-uxOV-xHYv-0VejBY
  PV Status             allocatable
  Total PE / Free PE    511 / 102

The important line here is the UUID "unknown device".

root # pvcreate --uuid T7bUjc-PYoO-bMqI-53vh-uxOV-xHYv-0VejBY --restorefile /etc/lvm/backup/vg00 /dev/loop1
  Couldn't find device with uuid T7bUjc-PYoO-bMqI-53vh-uxOV-xHYv-0VejBY.
  Physical volume "/dev/loop1" successfully created

This recreates the PV metadata, but not the missing LV or VG data on the PV.

root # vgcfgrestore -f /etc/lvm/backup/vg00 vg00
  Restored volume group vg00

This now reconstructs all the missing metadata on the PV, including the LV and VG data. However it doesn't restore the data, so the mirror is out of sync.

root # vgchange -ay vg00
  device-mapper: reload ioctl on  failed: Invalid argument
  1 logical volume(s) in volume group "vg00" now active
root # lvchange --resync vg00/lvm_raid1
Do you really want to deactivate logical volume lvm_raid1 to resync it? [y/n]: y

This will resync the mirror. This works with RAID 4,5 and 6 as well.

Deactivate LV

You can deactivat a LV with the following command:

root # umount /dev/vg0/lvol1
root #
lvchange -a n /dev/vg0/lvol1

You will not be able to mount the LV anywhere before it got reactivated:

root # lvchange -a y /dev/vg0/lvol1

External resources