Duperemove

From Gentoo Wiki
Jump to:navigation Jump to:search
Resources

Duperemove is a btrfs and XFS tool for finding duplicated extents and submitting them to the kernel for deduplication.

Installation

Emerge

root #emerge --ask sys-fs/duperemove

Usage

Detailed information can be seen by running man duperemove.

Invocation

root #duperemove --help

The following command shows how to deduplicate the /home filesystem; the hash file will be stored under the /root directory:

root #duperemove -rdh --hashfile=/root/home.hash /home
Note
The previous command may be interrupted at any time with Ctrl+c and resumed later without risk of corrupting any data.

Reading a file list created with fdupes

By passing the --fdupes option, duperemove can work in conjunction with fdupes in order to deduplicate a pre-calculated list of files. When in this mode, input will be accepted on stdin:

root #cat fdupes_list.txt | duperemove --fdupes

This is handy when a list of duplicates has already been created so that disk-intensive deduplication job can be ran at a time when the system is not under heavy load.

It is also possible to deduplicate directly from fdupes (without creating a file list):

root #fdupes -r /path/to/filesystem/directory | duperemove --fdupes

See also

  • Deduplication — uses the clone mechanism of a copy-on-write or CoW capable filesystem, a feature that allows to share data of copied but identical files
  • fdupes — a tool for identifying duplicate files across a set of directories.
  • btrfs — a copy-on-write (CoW) filesystem for Linux aimed at implementing advanced features while focusing on fault tolerance, repair, and easy administration.
  • XFS — a high-performance journaling filesystem