Device-mapper

Normally, users rarely use dmsetup directly. The dmsetup is a very low level, and difficult tool to use. LVM, mdtool or dmsetup is generally the preferred way to do it, as it takes care of saving the metadata and issuing the dmsetup commands for you. However, sometimes one want to deal with it directly: sometimes for recovery purposes, or because LVM doesn't yet support what you want.

Create
The create command activates a new device mapper device. It appears in /dev/mapper. In addition, if the target has metadata, it reads it, or if this its first use, it initializes the metadata devices. Note the prior device mapper devices can be passed as paramters (if the target takes a device), thus it is possible to "stack" them. The syntax is:

Remove
The remove command deactivates a device mapper device. It removes it from /dev/mapper. Syntax is Note is not possible to remove a device that's in use. The -f option may be passed the replace the target with one that fails all I/O, hopefully allowing the reference count to drop to 0.

Message
The message command send a message to the device. What message are supported depend on the target Syntax is: The tends not to be used and is almost always 0.

Suspend
The suspend' command stops any NEW I/O. Existing I/O will still be completed. This can be used to quiesce a device. Syntax is:

Resume
The resume command allows I/O to be submitted to a previously suspended device. Syntax is:

Reload
The reload command replaces an existing device, possible with new targets and/or parameters. Syntax is the same as create.

Zero
See Documentation/device-mapper/zero.txt. This target has no target-specific parameters.

The "zero" target create that functions similarly to /dev/zero: All reads return binary zero, and all writes are discarded> Normaly used in tests, but also useful in recovering linear and raid-type targets, when combined with the 'snapshot' target: a "zero" target of the same size as the missing piece(s) is created, a (writable) snapshot created (usually a loop device backed by a large sparse file, but it can be far smaller than the missing piece since it only has to the hold the changes). Then the snapshot can be mounted, fsck'd, or recover tools run against it.

This creates a 1GB (1953125-sector) zero target:

Linear
See Documentation/device-mapper/linear.txt for paramters in usage. This target is the basic building block for the device mapper - it is used to both join and split (and often both at once) block device. For a simple identify mapping:

The 4 disks can be joined to together as one:

Note the peculiar syntax on the join. The --table argument only allows single-line tables. Multi-line tables must be read from stdin. Also notice the logical_start_sector is not 0 in this case, as each device were appending need to start where the previous ends. Its possible to split a disk, in this case into a 4 MiB (8192 sector) "small" and 1 GB "large" (1953125 sector) disks:

Note that in the second device, the offset is not 0, since it is desired to start 4 MiB (8192 sectors) in Both joining an splitting can be combined:

This creates a 4GB device using last 1GB of each disk.

Mirror
There is no kernel documentation for the mirror target. Parameters obtained from Linux sources: drivers/md/dm-log.c and drivers/md/dm-raid1.c  mirror  <#log_args> ... <#devs>  ...<device name N> <offset N> <#features> <feature_1>...<feature_N>

For log_type there are 4 values with different arguments:
 * core region_size [[no]sync]
 * disk logdevice region_size [[no]sync]

And the values of each argument:
 * region_size is the region size of the mirror in sectors. It must be power of 2 and at least of a kernel page (for Intel x86/x64 processors, this is 4 KiB (8 sectors) This is the granularity in which the mirror is kept to update. Its a tradeoff between increased metadata and wasted I/O. LVM uses a value of 512 KiB (1024 sectors).
 * logdevice is the device in which to store the metadata, for the disk log types
 * [no]sync is an optional argument. Default is sync. nosync skips the sync step, but any reads to unwritten regions to since the mirror was established are undefined. This is appropriate to use then the initial device is empty.

And there is only 1 feature:
 * handle_errors causes the mirror to respond to an error. Default is to ignore all errors. LVM enables this feature.

To create a mirror with in-memory log:

Without a persistent log, the mirror will have to be recreated every time by copying the entire block device to the other "legs". To avoid this, the log may be stored on disk:

Its possible to do LVM "--mirrorlog mirror" by creating 2 mirrors: a core mirror for the log device, and a disk mirror the data devices:

RAID1
See Documentation/device-mapper/dm-raid.txt</tt>. Note that <chunk_size> is unused for RAID1, but a value is still required, therefore is value should be set to 0. There 2 other important, though optional, parameters: region_size and [no]sync.


 * region_size has the same meaning as it does in the mirror target. Unlike the mirror target. it has a default of 4 MiB (8192 sectors). LVM uses a region size of 512 KiB (1024 sectors).
 * [no]sync has the same meaning as it does in the mirror target

To create a simple 1 GB raid1 with no metadata devices.

Note that because there's no metadata device, the array must be re-mirrored each time it is created. So normally, a metadata device is desired. Each "leg" needs it own metadata device If /dev/loop2 and /dev/loop3 are small metadata devices (4 MiB), then to create a 1G RAID1 would be:

Striped (RAID 0) and RAID 4/5/6/10
See Documentation/device-mapper/striped.txt</tt> and Documentation/device-mapper/dm-raid.txt</tt> for the parameters of this target. Three in particular are important: Because the number of sectors (1953125) is not a multiple of 128, it must be rounded down to the nearest multiple of 128 sectors, which can be done using this formula: bc So in this case:
 * chunk_size is the size I/O (in sectors) before its "split" across the array It must be both a power a two and a least a large as a kernel memory page (for x86/x64 processors, pages are 4 KiB, so must be at least 8.) LVM uses a default value of 64 KiB (128 sectors). Using LVM defaults, a 1 MiB (2048 sector) write will be split in 16 chunks, distributed as evenly as possible across the array. The size of the array MUST be a multiple of this value. Otherwise the target will give the error "Array size does not match requested target length".
 * region size has the same meaning and defaults as it does for the RAID1 target.
 * [no]sync has the same meaning as it does for the RAID1 target. It is usually not appropriate for RAID 4,5 and 6 as even for blank devices parity must still be computed, unless creating a degraded array.

Striped (RAID0)
Stripe sets allow multiple disks to be combined into one with improved performance. The striped target parameters is asymmetric to the RAID ones. First, the # devices comes first, not the cluster size. Second, one must specify the offset (usually 0) of each device the makes up the stripe set.Because there are 4 disks of 1953024 sectors each, the total array size will be 7812096 sectors. To create a stripe set (RAID0):

RAID4
RAID4 is striped set that can tolerate the failure of a single disk. Because RAID4 uses a dedicated parity disk, one disk is "unusable", therefore the total space is 3 disks * 1953024 sectors, or a total of 5859072 sectors. To create a RAID4 set with no metadata devices:

As RAID1, because there are no metadata devices, the parity disk will have to be rebuilt every time it is assembled. To create a RAID4 WITH metadata devices:

It is possible to create a RAID4 in degraded mode initially. It is necessary to not specify any metadata devices, and "nosync" must added

The reason for doing this is its faster to create a degraded array, populate it, tear it down, and then reassemble the array with the missing metadata devices and data device, so that the parity is only computed once, not twice.

RAID5
RAID5 is similar to RAID4, except in RAID5 the parity data is distributed across the stripe set. There are 4 "flavors" of RAID5. For LVM, the default is raid5_ls. The amount of parity used is the same as RAID4, so the total space is 5859072 sectors. To create a RAID5 set with no metadata devices:

To create a RAID5 with metadata:

To create a degraded RAID5:

RAID6
RAID6 is a stripe set that can tolerate the failure of up to 2 disks. Like RAID5, parity is distributed across the stripe set. There are 3 "flavors" of RAID 6. For LVM, the default is "raid6_zr". The total available space is 3906048 sectors. To create a RAID6 set with no metadata devices:

To create a RAID6 with metadata:

To create a degraded RAID6:

Note 2 devices are left out instead of 1.

RAID10
RAID10 combines mirroring (RAID 1) and striping (RAID10). Note is a better than stacking a RAID1 on top of RAID0 (or vice versa) - it is possible to do RAID10 on an odd number of disks. Half the disks are lost to the mirror, so the the total available space is 3906048 sectors. To create a RAID10 set with no metadata devices:

To create a RAID10 set with metadata:

If all the devices are empty, the nosync may be used to skip the initial sync, with the same caveats as mirror target.

Thin
See Documentation/device-mapper/thin-provisioning.txt</tt> for the paramters of this target. Thin pools are to block devices what sparse files are to filesystems. It is possible to create large, empty, even larger than the pool itself, or sums of objects greater than the pool size, and space isn't allocated until something is actually written to those areas. Futhermore, blocks can be returned to the thin pool via the trim/discard operation. Thin pool have a cheap snapshotting operation (different from the snapshot target) that remains cheap even upon multiple layers of indirections (snapshots of snapshots of snapshots...).

The thin target has 3 important parameters:
 * 'metadata_dev is where to store the metadata for the thin pool. The recommended size is 3*(data_dev_size/(32*data_block_bize)) sectors, but at least 2 MiB (4096 sectors). The thin-provisioning-tools package has a program, thin_metadata_size that will compute an suitable thin pool size given the data_block_size, data_dev_size, and number of volumes in the pool. The maximum supported size of the metadata device is 15.9375 GiB (33423360 sectors)
 * data_block_size controls the granularity of the thin pool. Data is allocated in blocks of this size. It must at least 64KiB (128 sectors), and be a multiple of 64 KiB (128) sectors.
 * low_water_mark is a lower boundary of free space within the pool. If the free space drops below this, a message a sent. Set it to 0 is disable.

Using the thin_metadata_size utility of thin-provisioning-tools:

Even anticipating 100 volumes with in a pool, it still less then the minimum recommended 2 MiB (4096 sector). To create the pool:

Creating thin volumes
The thin-pool target is unusual among the other targets, as it does not produce a usable disk by itself. Instead, by sending message to the target, it produce more device-mapper device which can be used for storage using the thin target. Volumes within a pool are referred to by 24-bit ordinal. Note that there's isn't a way to query the pool what ordinals are in use. To create a new thin volume with ordinal 17:

This allocates an ordinal but no storage. However, its possible to use the ordinal with the thin target to create a 200MB (390625-sector) thin volume:

Creating internal thin snapshot
Thin snapshots can be created of thin volume. First, the volume must be quiesced:

Then the snapshot taker. An ordinal needs to be allocated is needed for it, so for this example 6 will be chosen:

The volume can be resumed after the snapshot is taken:

The snapshot can now be activated like any other thin volume:

Creating external thin snaphot
Thin snapshot be taken of read-only external volumes. First, an ordinal is allocated as in create a thin volume:

Then a new thin volume is create life before, but an extra parameter is added to indicate the origin:

Deleting thin volumes
A snapshot can be deleted by unmapping it and sending the pool a delete message with the ordinal of the pool to delete:

Cache
See Documentation/device-mapper/cache.txt</tt> and Documentation/device-mapper/cache-policies.txt for parameters and usage. This target is intended to speed up access to a slow but large rotational disk by using a faster but smaller SSD as a cache. There is one important parameter:
 * block_size is the granularity of the cache. Data is promoted/demoted to/from the cache in blocks. It must be a multiple of 32k (64 sectors). LVM uses 64k (128 sectors) by default.

The recommended metadata device size is 8192 sectors + (nr_blocks/32) sectors</tt> where nr_blocks is the number of sectors on the "fast" device divided by the block_size. For this device:

Will round up to 8 MiB (16384 sectors) for safety, however 4 MiB (8192 sectors) would likely be more than enough anyway.

To create a cache device:

Its recommended to mirror the metadata device on the origin and cache device. To do so: