Duperemove
From Gentoo Wiki
Resources
Duperemove is a btrfs and XFS tool for finding duplicated extents and submitting them to the kernel for deduplication.
Installation
Emerge
root #
emerge --ask sys-fs/duperemove
Usage
Detailed information can be seen by running man duperemove.
Invocation
root #
duperemove --help
duperemove v0.11 Find duplicate extents and optionally dedupe them. Basic usage: duperemove [-r] [-d] [-h] [-v] [-A] [--hashfile=hashfile] OBJECTS "OBJECTS" is a list of files (or directories) which we want to find duplicate extents in. If a directory is specified, all regular files inside of it will be scanned. <switches> -r Enable recursive dir traversal. -d De-dupe the results (must run on a supported fs). --hashfile=FILE Store hashes in this file. -A Open files for dedupe in read-only mode. -h Print numbers in human-readable format. -v Print extra information (verbose). --help Prints this help text. Please see the duperemove(8) manpage for a complete list of options.
The following command shows how to deduplicate the /home filesystem; the hash file will be stored under the /root directory:
root #
duperemove -rdh --hashfile=/root/home.hash /home
Note
The previous command may be interrupted at any time with Ctrl+c and resumed later without risk of corrupting any data.
The previous command may be interrupted at any time with Ctrl+c and resumed later without risk of corrupting any data.
Reading a file list created with fdupes
By passing the --fdupes
option, duperemove can work in conjunction with fdupes in order to deduplicate a pre-calculated list of files. When in this mode, input will be accepted on stdin:
root #
cat fdupes_list.txt | duperemove --fdupes
This is handy when a list of duplicates has already been created so that disk-intensive deduplication job can be ran at a time when the system is not under heavy load.
It is also possible to deduplicate directly from fdupes (without creating a file list):
root #
fdupes -r /path/to/filesystem/directory | duperemove --fdupes
See also
- Deduplication — uses the clone mechanism of a copy-on-write or CoW capable filesystem, a feature that allows to share data of copied but identical files
- fdupes — a tool for identifying duplicate files across a set of directories.
- btrfs — a copy-on-write (CoW) filesystem for Linux aimed at implementing advanced features while focusing on fault tolerance, self-healing properties, and easy administration.
- XFS — a high-performance journaling filesystem