Duperemove is a btrfs tool for finding duplicated extents and submitting them to the kernel for deduplication.
emerge --ask sys-fs/duperemove
Detailed information can be seen by running man duperemove.
duperemove v0.11 Find duplicate extents and optionally dedupe them. Basic usage: duperemove [-r] [-d] [-h] [-v] [-A] [--hashfile=hashfile] OBJECTS "OBJECTS" is a list of files (or directories) which we want to find duplicate extents in. If a directory is specified, all regular files inside of it will be scanned. <switches> -r Enable recursive dir traversal. -d De-dupe the results (must run on a supported fs). --hashfile=FILE Store hashes in this file. -A Open files for dedupe in read-only mode. -h Print numbers in human-readable format. -v Print extra information (verbose). --help Prints this help text. Please see the duperemove(8) manpage for a complete list of options.
The following command shows how to deduplicate the /home filesystem; the hash file will be stored under the /root directory:
duperemove -rdh --hashfile=/root/home.hash /home
The previous command may be interrupted at any time with Ctrl+c and resumed later without risk of corrupting any data.
Reading a file list created with fdupes
By passing the
--fdupes option, duperemove can work in conjunction with fdupes in order to deduplicate a pre-calculated list of files. When in this mode, input will be accepted on stdin:
cat fdupes_list.txt | duperemove --fdupes
This is handy when a list of duplicates has already been created so that disk-intensive deduplication job can be ran at a time when the system is not under heavy load.
It is also possible to deduplicate directly from fdupes (without creating a file list):
fdupes -r /path/to/filesystem/directory | duperemove --fdupes