ZFS

From Gentoo Wiki
Jump to:navigation Jump to:search
This page contains changes which are not marked for translation.


ZFS is a next generation filesystem created by Matthew Ahrens and Jeff Bonwick. It was designed around a few key ideas:

  • Administration of storage should be simple.
  • Redundancy should be handled by the filesystem.
  • File-systems should never be taken offline for repair.
  • Automated simulations of worst case scenarios before shipping code is important.
  • Data integrity is paramount.

Development of ZFS started in 2001 at Sun Microsystems. It was released under the CDDL in 2005 as part of OpenSolaris. Pawel Jakub Dawidek ported ZFS to FreeBSD in 2007. Brian Behlendorf at LLNL started the ZFSOnLinux project in 2008 to port ZFS to Linux for High Performance Computing. Oracle purchased Sun Microsystems in 2010 and discontinued OpenSolaris later that year.

The Illumos project started to replace OpenSolaris and roughly 2/3 of the core ZFS team resigned, including Matthew Ahrens and Jeff Bonwick. Most of them took jobs at companies which continue to develop OpenZFS, initially as part of the Illumos project. The 1/3 of the ZFS core team at Oracle that did not resign continue development of an incompatible proprietary branch of ZFS in Oracle Solaris.

The first release of Solaris included a few innovative changes that were under development prior to the mass resignation. Subsequent releases of Solaris have included fewer and less ambitious changes. Today, a growing community continues development of OpenZFS across multiple platforms, including FreeBSD, Illumos, Linux and Mac OS X.

Features

A detailed list of features can be found in a separate article.

Installation

Modules

There are out-of-tree Linux kernel modules available from the ZFSOnLinux Project.

Since version 0.6.1, ZFS is considered "ready for wide scale deployment on everything from desktops to super computers" stable for wide scale deployment, by the OpenZFS Project.

Note
All changes to the git repository are subject to regression tests by LLNL.

USE flags

USE flags for sys-fs/zfs Userland utilities for ZFS Linux kernel module

custom-cflags Build with user-specified CFLAGS (unsupported)
debug Enable extra debug codepaths, like asserts and extra output. If you want to get meaningful backtraces see https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Backtraces
dist-kernel Enable subslot rebuilds on Distribution Kernel upgrades
kernel-builtin Disable dependency on sys-fs/zfs-kmod under the assumption that ZFS is part of the kernel source tree
minimal Don't install python scripts (arcstat, dbufstat etc) and avoid dependency on dev-lang/python
nls Add Native Language Support (using gettextGNU locale utilities)
pam Install zfs_key pam module, for automatically loading zfs encryption keys for home datasets
python Add optional support/bindings for the Python language
rootfs Enable dependencies required for booting off a pool containing a rootfs
split-usr Enable behavior to support maintaining /bin, /lib*, /sbin and /usr/sbin separately from /usr/bin and /usr/lib*
test-suite Install regression test suite

Emerge

To install ZFS, run:

root #emerge --ask sys-fs/zfs
Important
Remerge sys-fs/zfs-kmod after every kernel compile, even if the kernel changes are trivial. If you recompile the kernel after merging the kernel modules, you may encounter problems with zpool entering uninterruptible sleep (unkillable process) or crashing on execute. Alternatively, set USE=dist-kernel with the Distribution Kernel.
root #emerge -va @module-rebuild

OpenRC

Add the zfs scripts to runlevels for initialization at boot:

root #rc-update add zfs-import boot
root #rc-update add zfs-mount boot
root #rc-update add zfs-share default
root #rc-update add zfs-zed default
Note
Only the first two are necessary for most setups. zfs-share is for people using NFS shares while zfs-zed is for the ZFS Event Daemon that handles disk replacement via hotspares and email notification of failures.
Note
For those who want to use ZFS as root file system, as well as those who put their swaps on ZFS, they might add zfs-import and zfs-mount to sysinit level to make the file system accessible during boot or shutdown process.

systemd

Enable the service so it is automatically started at boot time:

root #systemctl enable zfs.target

To manually start the daemon:

root #systemctl start zfs.target

In order to mount zfs pools automatically on boot you need to enable the following services and targets:

root #systemctl enable zfs-import-cache
root #systemctl enable zfs-mount
root #systemctl enable zfs-import.target

Kernel

sys-fs/zfs requires Zlib kernel support (module or builtin).

KERNEL
Cryptographic API --->
  <*> Deflate compression algorithm

Module

Note
The kernel module must be rebuilt whenever the kernel is.

Install the kernel module:

root #emerge --ask sys-fs/zfs-kmod

If using an initramfs, please (re)generate it after (re)compiling the module.

Advanced

Installing into the kernel directory (for static installs)

This example uses 0.8.4, but just change it to the latest ~ or stable (when that happens) and you should be good. The only issue you may run into is having zfs and zfs-kmod out of sync with each other - avoid that.

This will generate the needed files, and copy them into the kernel sources directory.

root #env EXTRA_ECONF='--enable-linux-builtin' ebuild /var/db/repos/gentoo/sys-fs/zfs-kmod/zfs-kmod-0.8.4.ebuild clean configure
root #(cd /var/tmp/portage/sys-fs/zfs-kmod-0.8.4/work/zfs-0.8.4/ && ./copy-builtin /usr/src/linux)

After this, you just need to edit the kernel config to enable CONFIG_SPL and CONFIG_ZFS and emerge the zfs binaries.

root #mkdir -p /etc/portage/profile
root #echo 'sys-fs/zfs -kernel-builtin' >> /etc/portage/profile/package.use.mask
root #echo 'sys-fs/zfs kernel-builtin' >> /etc/portage/package.use/zfs.conf
root #emerge --oneshot --verbose sys-fs/zfs

The echo commands only need to be run once, but the emerge needs to be run every time you install a new version of zfs.

Alternative steps

Be sure to read through the steps above; the following steps only replace some of the above steps. The following was done on an amd64 gentoo install with llvm-12.0.1/clang-12.0.1/musl-1.2.2-r3 without binutils/gcc/glibc. config/kernel.m4 needs to be lightly patched. On the line defining ccflags-y, add -Wno-address-of-packed-member after -Werror. On the line calling make modules ..., add LLVM=1 LLVM_IAS=1.

root #ln -s x86 /usr/src/linux/arch/amd64
root #ebuild /var/db/repos/gentoo/sys-fs/zfs/zfs-2.0.5.ebuild clean unpack
root #cp /var/tmp/portage/sys-fs/zfs-2.0.5/work/zfs-2.0.5/config/kernel.m4{,.orig}
root ## edit /var/tmp/portage/sys-fs/zfs-2.0.5/work/zfs-2.0.5/config/kernel.m4
root #mkdir -p /etc/portage/patches/sys-fs/zfs
root #(cd /var/tmp/portage/sys-fs/zfs-2.0.5/work/ && diff -u zfs-2.0.5/config/kernel.m4{.orig,} > /etc/portage/patches/sys-fs/zfs/llvm.patch)

The patch should look something like this:

FILE /etc/portage/patches/sys-fs/zfs/llvm.patch
--- zfs-2.0.5/config/kernel.m4.orig     2021-09-11 21:32:30.967155385 +0000
+++ zfs-2.0.5/config/kernel.m4  2021-09-11 21:37:10.331820894 +0000
@@ -527,7 +527,7 @@
 # Example command line to manually build source
 # make modules -C $LINUX_OBJ $ARCH_UM M=$PWD/build/$1
 
-ccflags-y := -Werror $FRAME_LARGER_THAN
+ccflags-y := -Werror -Wno-address-of-packed-member $FRAME_LARGER_THAN
 _ACEOF
 
        dnl # Additional custom CFLAGS as requested.
@@ -585,7 +585,7 @@
 AC_DEFUN([ZFS_LINUX_COMPILE], [
        AC_TRY_COMMAND([
            KBUILD_MODPOST_NOFINAL="$5" KBUILD_MODPOST_WARN="$6"
-           make modules -k -j$TEST_JOBS -C $LINUX_OBJ $ARCH_UM
+           make LLVM=1 LLVM_IAS=1 modules -k -j$TEST_JOBS -C $LINUX_OBJ $ARCH_UM
            M=$PWD/$1 >$1/build.log 2>&1])
        AS_IF([AC_TRY_COMMAND([$2])], [$3], [$4])
 ])

You only have to go through the patching steps again if the patch stops working. Now proceed as usual:

root #env EXTRA_ECONF='--enable-linux-builtin --with-config=kernel' ebuild /var/db/repos/gentoo/sys-fs/zfs/zfs-2.0.5.ebuild clean configure
root #(cd /var/tmp/portage/sys-fs/zfs-2.0.5/work/zfs-2.0.5/ && ./copy-builtin /usr/src/linux)

Usage

ZFS includes already all programs to manage the hardware and the file systems, there are no additional tools needed.

Preparation

ZFS supports the use of either block devices or files. Administration is the same in both cases, but for production use, the ZFS developers recommend the use of block devices (preferably whole disks). To take full advantage of block devices on Advanced Format disks, it is highly recommended to read the ZFS on Linux FAQ before creating your pool. To go through the different commands and scenarios we can use files in place of block devices.

The following commands create 2GB sparse image files in /var/lib/zfs_img/ that we use as our hard drives. This uses at most 8GB disk space, but in practice will use very little because only written areas are allocated:

root #mkdir /var/lib/zfs_img
root #truncate -s 2G /var/lib/zfs_img/zfs0.img
root #truncate -s 2G /var/lib/zfs_img/zfs1.img
root #truncate -s 2G /var/lib/zfs_img/zfs2.img
root #truncate -s 2G /var/lib/zfs_img/zfs3.img
Note
On pool export, all of the files will be released and the folder /var/lib/zfs_img can be deleted.

Zpools

The program /usr/sbin/zpool is used with any operation on zpools.

Creating a zpool

One hard drive

Create a new zpool named zfs_test with one hard drive:

root #zpool create zfs_test /var/lib/zfs_img/zfs0.img

The zpool will automatically be mounted, default is the root file system aka /zfs_test

root #zpool status

To delete a zpool use this command:

root #zpool destroy zfs_test
Important
ZFS will not ask if you are sure.
Two hard drives (MIRROR)

In ZFS you can have several hard drives in a MIRROR vdev, where equal copies exist on each disk. This increases the performance and redundancy. To create a new zpool named zfs_test with two hard drives as a MIRROR:

root #zpool create zfs_test mirror /var/lib/zfs_img/zfs0.img /var/lib/zfs_img/zfs1.img
Note
In total, with a mirror vdev of 2GB disks, only 2GB are effectively usable so total_space * 1/n.
root #zpool status

To delete the zpool:

root #zpool destroy zfs_test
Three hard drives (RAIDZ1)

RAIDZ1 is the redundancy equivalent to RAID5, where data is written to two drives and a parity onto the third. You need at least three hard drives, one can fail and the zpool is still functional but DEGRADED, and the faulty drive should be replaced as soon as possible.

To create a pool with a RAIDZ1 vdev on three hard drives:

root #zpool create zfs_test raidz1 /var/lib/zfs_img/zfs0.img /var/lib/zfs_img/zfs1.img /var/lib/zfs_img/zfs2.img
Note
In total, with a raidz1 vdev of 3 2GB disks, only 4GB are effectively usable so total_space * (1-1/n).
root #zpool status

To delete the zpool:

root #zpool destroy zfs_test
Four hard drives (RAIDZ2)

RAIDZ2 is the redundancy equivalent to RAID6, where (roughly) data is written to the first two drives and a parity onto the other two. You need at least four hard drives, two can fail and the zpool is still ONLINE but the faulty drives should be replaced as soon as possible.

To create a pool with a RAIDZ2 vdev on four hard drives:

root #zpool create zfs_test raidz2 /var/lib/zfs_img/zfs0.img /var/lib/zfs_img/zfs1.img /var/lib/zfs_img/zfs2.img /var/lib/zfs_img/zfs3.img
Note
In total, with a raidz2 vdev of 4 2GB disks, only 4GB are effectively useable so total_space * (1-2/n).
root #zpool status

To delete the zpool:

root #zpool destroy zfs_test
Four hard drives (STRIPED MIRROR)

STRIPED MIRRORs are the redundancy equivalent to RAID10, where data is striped across sets of disks then the striped data is mirrored. You need at least four hard drives; this configuration provides redundancy and an increase in read speed. You can lose all disks but one per mirror.

To create a STRIPED MIRRORED pool with four hard drives:

root #zpool create zfs_test mirror /var/lib/zfs_img/zfs0.img /var/lib/zfs_img/zfs1.img mirror /var/lib/zfs_img/zfs2.img /var/lib/zfs_img/zfs3.img
Note
In total, with two mirror vdevs of 2GB disks, only 4GB are usable so total_space * (1-2/n).
root #zpool status
 pool: zfs_test
state: ONLINE
scan: none requested
config:
 
	NAME                          STATE     READ WRITE CKSUM
	zfs_test                      ONLINE       0     0     0
	  mirror-0                    ONLINE       0     0     0
	    /var/lib/zfs_img/zfs0.img ONLINE       0     0     0
	    /var/lib/zfs_img/zfs1.img ONLINE       0     0     0
	  mirror-1                    ONLINE       0     0     0
	    /var/lib/zfs_img/zfs2.img ONLINE       0     0     0
	    /var/lib/zfs_img/zfs3.img ONLINE       0     0     0
 
errors: No known data errors

To delete the zpool:

root #zpool destroy zfs_test

Import/Export zpool

To import (mount) the zpool named zfs_test use this command:

root #zpool import zfs_test

The root mountpoint of zfs_test is a property and can be changed the same way as for datasets. To import (mount) the zpool named zfs_test root on /mnt/gentoo, use this command:

root #zpool import -R /mnt/gentoo zfs_test
Note
ZFS will automatically search all attached hard drives for a zpool named zfs_test.

To search for and list all zpools available in the system issue the command:

root #zpool import

To export (unmount) an existing zpool named zfs_test into the file system, you can use the following command:

root #zpool export zfs_test

Spares/Replace vdev

You can add hot-spares into your zpool. In case a failure, those are already installed and available to replace faulty disks.

In this example, we use a RAIDZ1 with three hard drives in a zpool named zfs_test:

root #zpool add zfs_test spare /var/lib/zfs_img/zfs3.img
root #zpool status

The status of /dev/loop3 will stay AVAIL until it is set to be online, now we let /var/lib/zfs_img/zfs0.img fail:

root #zpool offline zfs_test /var/lib/zfs_img/zfs0.img
root #zpool status
  pool: zfs_test
 state: ONLINE
  scan: none requested
config:
 
        NAME                           STATE     READ WRITE CKSUM
        zfs_test                       ONLINE       0     0     0
          raidz1-0                     ONLINE       0     0     0
            /var/lib/zfs_img/zfs0.img  ONLINE       0     0     0
            /var/lib/zfs_img/zfs1.img  ONLINE       0     0     0
            /var/lib/zfs_img/zfs2.img  ONLINE       0     0     0
        spares
          /var/lib/zfs_img/zfs3.img
 
errors: No known data errors

We replace /var/lib/zfs_img/zfs0.img with our spare /var/lib/zfs_img/zfs3.img:

root #zpool replace zfs_test /var/lib/zfs_img/zfs0.img /var/lib/zfs_img/zfs3.img
root #zpool status
  pool: zfs_test
 state: ONLINE
  scan: resilvered 62K in 0h0m with 0 errors on Sun Sep  1 15:41:41 2013
config:
 
        NAME                             STATE     READ WRITE CKSUM
        zfs_test                         ONLINE       0     0     0
          raidz1-0                       ONLINE       0     0     0
            spare-0                      ONLINE       0     0     0
              /var/lib/zfs_img/zfs0.img  ONLINE       0     0     0
              /var/lib/zfs_img/zfs3.img  ONLINE       0     0     0
            /var/lib/zfs_img/zfs1.img    ONLINE       0     0     0
            /var/lib/zfs_img/zfs2.img    ONLINE       0     0     0
        spares
          /var/lib/zfs_img/zfs3.img      INUSE     currently in use
 
errors: No known data errors

The original disk will automatically get removed asynchronously. If this is not the case, the old disk may need to be detached with the "zpool detach" command. Later you will see it leave the zpool status output:

root #zpool status
   pool: zfs_test
 state: ONLINE
  scan: resilvered 62K in 0h0m with 0 errors on Sun Sep  1 15:41:41 2013
config:
 
        NAME                           STATE     READ WRITE CKSUM
        zfs_test                       ONLINE       0     0     0
          raidz1-0                     ONLINE       0     0     0
            /var/lib/zfs_img/zfs3.img  ONLINE       0     0     0
            /var/lib/zfs_img/zfs1.img  ONLINE       0     0     0
            /var/lib/zfs_img/zfs2.img  ONLINE       0     0     0
 
errors: No known data errors
Note
ZFS automatically resilvered onto /var/lib/zfs_img/zfs0.img and the zpool had no downtime.

Now start a manual scrub:

root #zpool scrub zfs_test
root #zpool status
  pool: zfs_test
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Sun Sep  1 15:57:31 2013
config:
 
        NAME                           STATE     READ WRITE CKSUM
        zfs_test                       ONLINE       0     0     0
          raidz1-0                     ONLINE       0     0     0
            /var/lib/zfs_img/zfs3.img  ONLINE       0     0     0
            /var/lib/zfs_img/zfs1.img  ONLINE       0     0     0
            /var/lib/zfs_img/zfs2.img  ONLINE       0     0     0
 
errors: No known data errors

Zpool version update

With every update of sys-fs/zfs, you are likely to also get a more recent ZFS version. Also, the status of your zpools will indicate a notice that a new version has been installed and the zpools could be upgraded. To display the current version on a zpool:

root #zpool upgrade -v
This system supports ZFS pool feature flags.
 
The following features are supported:
 
FEAT DESCRIPTION
-------------------------------------------------------------
async_destroy                         (read-only compatible)
     Destroy filesystems asynchronously.
empty_bpobj                           (read-only compatible)
     Snapshots use less space.
lz4_compress
     LZ4 compression algorithm support.
multi_vdev_crash_dump
     Crash dumps to multiple vdev pools.
spacemap_histogram                    (read-only compatible)
     Spacemaps maintain space histograms.
enabled_txg                           (read-only compatible)
     Record txg at which a feature is enabled
hole_birth
     Retain hole birth txg for more precise zfs send
extensible_dataset
     Enhanced dataset functionality, used by other features.
embedded_data
     Blocks which compress very well use even less space.
bookmarks                             (read-only compatible)
     "zfs bookmark" command
filesystem_limits                     (read-only compatible)
     Filesystem and snapshot limits.
large_blocks
     Support for blocks larger than 128KB.
large_dnode
     Variable on-disk size of dnodes.
sha512
     SHA-512/256 hash algorithm.
skein
     Skein hash algorithm.
edonr
     Edon-R hash algorithm.
userobj_accounting                    (read-only compatible)
     User/Group object accounting.
encryption
     Support for dataset level encryption
project_quota                         (read-only compatible)
     space/object accounting based on project ID.
device_removal
     Top-level vdevs can be removed, reducing logical pool size.
obsolete_counts                       (read-only compatible)
     Reduce memory used by removed devices when their blocks are freed or remapped.
zpool_checkpoint                      (read-only compatible)
     Pool state can be checkpointed, allowing rewind later.
spacemap_v2                           (read-only compatible)
     Space maps representing large segments are more efficient.
allocation_classes                    (read-only compatible)
     Support for separate allocation classes.
resilver_defer                        (read-only compatible)
     Support for deferring new resilvers when one is already running.
bookmark_v2
     Support for larger bookmarks
redaction_bookmarks
     Support for bookmarks which store redaction lists for zfs redacted send/recv.
redacted_datasets
     Support for redacted datasets, produced by receiving a redacted zfs send stream.
bookmark_written
     Additional accounting, enabling the written#<bookmark> property(space written since a bookmark), and estimates of send stream sizes for incrementals from bookmarks.
log_spacemap                          (read-only compatible)
     Log metaslab changes on a single spacemap and flush them periodically.
livelist                              (read-only compatible)
     Improved clone deletion performance.
device_rebuild                        (read-only compatible)
     Support for sequential device rebuilds
zstd_compress
     zstd compression algorithm support.
draid
     Support for distributed spare RAID
 
The following legacy versions are also supported:
 
VER  DESCRIPTION
---  --------------------------------------------------------
 1   Initial ZFS version
 2   Ditto blocks (replicated metadata)
 3   Hot spares and double parity RAID-Z
 4   zpool history
 5   Compression using the gzip algorithm
 6   bootfs pool property
 7   Separate intent log devices
 8   Delegated administration
 9   refquota and refreservation properties
 10  Cache devices
 11  Improved scrub performance
 12  Snapshot properties
 13  snapused property
 14  passthrough-x aclinherit
 15  user/group space accounting
 16  stmf property support
 17  Triple-parity RAID-Z
 18  Snapshot user holds
 19  Log device removal
 20  Compression using zle (zero-length encoding)
 21  Deduplication
 22  Received properties
 23  Slim ZIL
 24  System attributes
 25  Improved scrub stats
 26  Improved snapshot deletion performance
 27  Improved snapshot creation performance
 28  Multiple vdev replacements
 
For more information on a particular version, including supported releases,
see the ZFS Administration Guide.
Warning
systems with a lower pre-feature flags version installed will not be able to import a zpool of a higher version. Pools with unsupported feature flags enabled may be importable read-only or not at all. See the Feature Flags documentation for a detailed breakdown.
Warning
feature flags have three states - disabled, enabled, and active. When only disabled or enabled, they don't matter for pool import purposes. When active, the read-only/not at all rules from the aforementioned Feature Flags chart apply. zpool upgrade sets all features it knows about to enabled, but some feature flags immediately become active once enabled. Feature flags never go back to disabled once enabled, and some never go back to enabled once active. See zpool-features.7 for more details. You may find it better, if backward compatibility matters to you, to simply enable new features selectively as you intend to use them.

To upgrade the version of zpool zfs_test (and enable all feature flags):

root #zpool upgrade zfs_test

To upgrade the version of all zpools in the system:

root #zpool upgrade -a

Zpool tips/tricks

  • You sometimes cannot shrink a zpool after initial creation - if the pool has no raidz vdevs and the pool has all vdevs of the same ashift, the "device removal" feature in 0.8 and above can be used. There are performance implications to doing this, however, so always be careful when creating pools or adding vdevs/disks!
  • It is possible to add more disks to a MIRROR after its initial creation. Use the following command (/dev/loop0 is the first drive in the MIRROR):
root #zpool attach zfs_test /dev/loop0 /dev/loop2
  • Sometimes, a wider RAIDZ vdev can be less suitable than two (or more) smaller RAIDZ vdevs. Try testing your intended use before settling on one and moving all your data onto it.
  • RAIDZ vdevs cannot (currently) be resized after initial creation (you may only add additional hot spares). You can, however, replace the hard drives with bigger ones (one at a time), e.g. replace 1T drives with 2T drives to double the available space in the zpool.
  • It is possible to mix MIRROR and RAIDZ vdevs in a zpool. For example to add two more disks as a MIRROR vdev in a zpool with a RAIDZ1 vdev named zfs_test, use:
root #zpool add -f zfs_test mirror /dev/loop4 /dev/loop5
Warning
You probably don't have good reason to do this. Perhaps a special vdev or log vdev would be a reasonable time, but in general, you'd be winding up with the worst performance characteristics of both.
Note
This needs the -f option, because the vdev you're adding does not match the existing vdevs.
  • It is possible to restore a destroyed zpool, by reimporting it straight after the accident happened:
root #zpool import -D
  pool: zfs_test
    id: 12744221975042547640
 state: ONLINE (DESTROYED)
action: The pool can be imported using its name or numeric identifier.
Note
The option -D searches on all hard drives for existing zpools.

File systems and datasets

The program /usr/sbin/zfs is used for any operation regarding datasets (which encompasses filesystems, volumes, snapshots, and bookmarks).

  • Filesystems are a way of logically grouping data with shared properties on a pool - data you might want to set the same compression type, or recordsize, or snapshot all together, would be good examples of a use case for a separate filesystem.
Note
mv between filesystems on ZFS is cp-then-rm, not just ~instant rename.
  • Volumes are a way of exposing some space from a pool as a block device, which can be useful e.g. for VM storage, or iSCSI export to some other host.
  • Snapshots are read-only point-in-time representations of a filesystem/volume - which implies that if you take snapshots of a filesystem or volume, space that was used at the point of the snapshot is not actually freed when later deleted/overwritten until all snapshots referencing that data are destroyed. Snapshot names are formatted like pool/fs1@snapname. For filesystems, you can commonly access them at [FS mountpoint]/.zfs/snapshot/[snapshot name]/; for volumes, /dev/ nodes for snapshots default to hidden (since, for example, if you're doing an FS mount by UUID, and you see 30+ copies of the same FS, it may end poorly), but you can adjust the snapdev property of the volume to change that, or clone the volume snapshot you want to examine. Snapshots are very useful both for later reference of earlier states, and for use in zfs send+receive for backup/restore/transfer. (See also bookmarks and clones later.)
Note
Snapshots are approximately free to take - they are not by default recursive, so snapshotting pool@now does not imply you snapshotted pool/fs1@now unless you use zfs snapshot -r. The first snapshot of a volume may incur unexpected space implications if a reservation is set on the volume, as it will then reserve enough space to overwrite the entire volume once on top of any space allocated already for the volume.
  • Bookmarks are a very minimal kind of dataset - you create them with zfs bookmark [snapshot name] [bookmark name], and their purpose in life is to be used as the source of an incremental zfs send without having to keep the snapshot around - that is, if you have pool/fs1@snap3, @snap4, and @snap5, and you already used zfs send|recv to copy pool/fs1@snap3 somewhere, you could make pool/fs1#snap3 and destroy pool/fs1@snap3, and later be able to do zfs send -i pool/fs1#snap3 pool/fs1@snap4.
  • Clones are not really a separate type of dataset, but merit mentioning here. Whenever you have a snapshot of a filesystem or volume, but want to make a read-write version of it, you could clone it with zfs clone pool/fs1@snap1 pool/clonefs1, and you'll have a filesystem at pool/clonefs1 that is read-write and starts out identical to the snapshot state at pool/fs1@snap1.
Warning
Clones are not independent copies of the data - for example, if you have a clone of pool/fs1@snap1, you cannot destroy pool/fs1@snap1 while the clone exists. If you want a truly independent copy, use zfs send

To control the size of a filesystem/volume you can set a quota as a maximum, and/or you can reserve a certain amount of storage within a zpool to avoid any other dataset on the pool being able to use the free space before that dataset can. Filesystems default to being able to use all unreserved space on the pool, and have no reservation - volumes have a size (which can be adjusted) at creation time, which is an implicit quota, and unless created sparse, set a reservation for their whole size, to avoid the situation of running out of space when trying to overwrite a block.

Create a filesystem

We use our zpool zfs_test to create a new filesystem called dataset1:

root #zfs create zfs_test/dataset1

The filesystem will be mounted automatically as /zfs_test/dataset1/

root #zfs list

Mount/umount filesystem

Datasets can be mounted with the following command, the mountpoint is defined by the property mountpoint of the dataset:

root #zfs mount zfs_test/dataset1

To unmount the dataset:

root #zfs unmount zfs_test/dataset1

The folder /zfs_test/dataset1 stays without the dataset behind it. If you write data to it and then try to mount the dataset again (and have the overlay property set to off, which is not the default), you will see the following error message:

CODE
cannot mount '/zfs_test/dataset1': directory is not empty

Remove datasets

To remove the filesystem dataset1 from zpool zfs_test:

root #zfs destroy zfs_test/dataset1
root #zfs list
Note
You cannot destroy a dataset if any snapshots of it exist.
Warning
zfs destroy -r or -R could be used for this, but always use -nv to see what zfs destroy is going to destroy before running it, especially when using -R or -r.

Properties

Properties for datasets are inherited from its parent dataset, all the way to the "root" dataset with the same name as the pool. So you can change properties by changing them on a dataset, or on its parent that it's inheriting from, and so on up to the "root", depending on how widely you want the change to happen.

To set a property for a dataset:

root #zfs set <property> zfs_test/dataset1

To show the setting for a particular property on a dataset:

root #zfs get <property> zfs_test/dataset1

You can get a list of all properties set on every dataset with the following command:

root #zfs get all
Note
You almost certainly don't want to do this. You might find zfs get all -t filesystem or zfs get all [specific dataset] far more useful.

This is a partial list of properties that can be set on either zpools or datasets, for a full list see zfsprops.7:

Property Value Function
quota= 20m,none set a quota of 20MB for the dataset.
reservation= 20m,none reserves 20MB for the dataset within its zpool.
compression= lzjb,lz4,zle,gzip{,-N},zstd{,-fast}{,-N},on,off uses the given compression method or the default method for compression which is lz4 on pools with feature@lz4_compress=enabled and lzjb otherwise.
sharenfs= on,off,ro,nfsoptions shares the dataset via NFS.
exec= on,off controls if programs can be executed on the dataset.
setuid= on,off controls if SUID or GUID can be set on the dataset.
readonly= on,off Controls whether a filesystem is mounted read/write or a volume allows writes.
atime= on,off Controls whether atime is updated on the filesystem.
relatime= on,off If atime=on, controls whether we only update atime sometimes.
mountpoint= none,path sets the mountpoint for the dataset below the zpool or elsewhere in the file system, a mountpoint set to none prevents the dataset from being mounted.

Set mountpoint

Set the mountpoint for a filesystem, use the following command:

root #zfs set mountpoint=/mnt/data zfs_test/dataset1

The dataset1 mount will be automatically moved to /mnt/data.

NFS filesystem share

Activate NFS share on a filesystem:

Note
You can export ZFS filesystems via NFS using /etc/exports perfectly fine if you prefer.
root #zfs set sharenfs=on zfs_test/dataset2
root #exportfs

Per default the filesystem is shared using the exportfs command in the following manner. See exportfs(8) and exports(5) for more information.

CODE sharenfs default options
/usr/sbin/exportfs -i -o sec=sys,rw,no_subtree_check,no_root_squash,mountpoint *:<mountpoint of dataset>

Otherwise, the command is invoked with options equivalent to the contents of this property:

root #zfs set sharenfs="no_root_squash,rw=@192.168.11.0/24" zfs_test/dataset2
root #exportfs

To stop sharing the filesystem:

root #zfs set sharenfs=off zfs_test/dataset2
root #exportfs
Creating a snapshot

To create a snapshot of a dataset, use the following command:

root #zfs snapshot zfs_test/dataset1@22082011
Note
dataset1@22082011 is the full name of the snapshot, everything after the @ symbol can be any alphanumeric combination.

Whenever data is overwritten or outright deleted on the filesystem, it starts counting against the space only referenced by snapshots instead - if it's only referenced by one snapshot, it'll show up in the USED property for that snapshot.

Listing

List all available snapshots:

root #zfs list -t snapshot -o name,creation
Rollback

To rollback a full dataset to a previous state:

root #zfs rollback zfs_test/dataset1@21082011
Note
If there are other snapshots in between, then you have to use the -r option, which will destroy all snapshots since the one you are attempting to rollback to.
Removal

Remove snapshots of a dataset1 with the following command:

root #zfs destroy zfs_test/dataset1@21082011

Maintenance

Scrubbing

To start a scrub for the zpool zfs_test:

root #zpool scrub zfs_test
Note
This might take some time and is quite I/O intensive. ZFS attempts to minimize the impact on the wider system, but sometimes systems don't handle even lowest priority IO that well.

Log files

To check the history of commands that were executed:

root #zpool history

Monitor I/O

Monitor I/O activity on all zpools (refreshes every 6 seconds):

root #zpool iostat 6

Technical details

ARC

OpenZFS uses ARC page replacement algorithm instead of the Last Recently Used page replacement algorithm used by other filesystems. This has a better hit rate, therefore providing better performance. The implementation of ARC in ZFS differs from the original paper in that the amount of memory used as cache can vary. This permits memory used by ARC to be reclaimed when the system is under memory pressure (via the kernel's shrinker mechanism) and grow when the system has memory to spare. The minimum and maximum amount of memory allocated to ARC varies based on your system memory. The default minimum is 1/32 of all memory, or 64MB, whichever is more. The default maximum is the larger of 1/2 of system memory or 64MB.

The manner in which Linux accounts for memory used by ARC differs from memory used by the page cache. Specifically, memory used by ARC is included under "used" rather than "cached" in the output used by the `free` program. This in no way prevents the memory from being released when the system is low on memory. However, it can give the impression that ARC (and by extension ZFS) will use all of system memory if given the opportunity.

Adjusting ARC memory usage

The minimum and maximum memory usage of ARC is tunable via zfs_arc_min and zfs_arc_max respectively. These properties can be set any of three ways. The first is at runtime (new in 0.6.2):

root #echo 536870912 >> /sys/module/zfs/parameters/zfs_arc_max
Note
This sysfs value became writable in ZFSOnLinux 0.6.2. Changes through sysfs do not persist across boots. Also, the value in sysfs will be 0 when this value has not been manually configured. The current setting can be viewed by looking at c_max in /proc/spl/kstat/zfs/arcstats

The second is via /etc/modprobe.d/zfs.conf:

root #echo "options zfs zfs_arc_max=536870912" >> /etc/modprobe.d/zfs.conf
Note
If using genkernel to load ZFS, this value must be set before genkernel is run to ensure that the file is copied into the initramfs.

The third is on the kernel commandline by specifying "zfs.zfs_arc_max=536870912" (for 512MB). Similarly, the same can be done to adjust zfs_arc_min.

ZFS root

To boot from a ZFS filesystem as the root filesystem requires a ZFS capable kernel and an initial ramdisk (initramfs) which has the ZFS userspace utilities. The easiest way to set this up is as follows.

First, make sure to have compiled a kernel with ZFS support and to have installed it.

Distribution Kernel

The simplest way of getting started is to use a Distribution Kernel.

FILE /etc/portage/package.use
# Install an initramfs
sys-kernel/gentoo-kernel initramfs
# Ensure ZFS gets rebuilt when the kernel is upgraded
sys-fs/zfs-kmod dist-kernel

Install the kernel:

root #emerge --ask sys-kernel/gentoo-kernel

Manual install

Build a kernel and run make install to copy it to /boot/ and make modules_install to make the modules available at boot time.

Install and configure genkernel.

root #emerge --ask sys-kernel/genkernel
root #genkernel initramfs --zfs

Once setup with a kernel, it is time to setup the bootloader.

Install a bootloader, for example GRUB2.

root #emerge --ask sys-boot/grub:2

Configure grub to use ZFS, and which dataset to boot from.

FILE /etc/default/grubSpecify root device
GRUB_CMDLINE_LINUX="dozfs root=ZFS=mypool/mydataset"

Finally, install grub to your boot device and create the grub configuration.

root #grub-mkconfig -o /boot/grub/grub.cfg

Caveats

  • Swap: On systems with extremely high memory pressure, using a zvol for swap can result in lockup, regardless of how much swap is still available. This issue is currently being investigated in [1]. Please check the current OpenZFS documentation on swap"

See also

  • Btrfs — a copy-on-write (CoW) filesystem for Linux aimed at implementing advanced features while focusing on fault tolerance, self-healing properties, and easy administration.
  • User:Bugalo/Dell_XPS_15_7590 - for an example of configuring a new system with Gentoo to have ZFS root with native compression and encryption, through the use of an Ubuntu live DVD as the install medium

External resources