Duperemove

From Gentoo Wiki
Jump to: navigation, search
Resources

Duperemove is a btrfs tool for finding duplicated extents and submitting them to the kernel for deduplication.

Installation

Emerge

root #emerge --ask sys-fs/duperemove

Usage

Detailed information can be seen by running man duperemove.

Invocation

root #duperemove --help
duperemove v0.11
Find duplicate extents and optionally dedupe them.

Basic usage: duperemove [-r] [-d] [-h] [-v] [-A] [--hashfile=hashfile] OBJECTS

"OBJECTS" is a list of files (or directories) which we
want to find duplicate extents in. If a directory is 
specified, all regular files inside of it will be scanned.

	<switches>
	-r		Enable recursive dir traversal.
	-d		De-dupe the results (must run on a supported fs).
	--hashfile=FILE	Store hashes in this file.
	-A		Open files for dedupe in read-only mode.
	-h		Print numbers in human-readable format.
	-v		Print extra information (verbose).
	--help		Prints this help text.


Please see the duperemove(8) manpage for a complete list of options.

The following command shows how to deduplicate the /home filesystem; the hash file will be stored under the /root directory:

root #duperemove -rdh --hashfile=/root/home.hash /home
Note
The previous command may be interrupted at any time with Ctrl+c and resumed later without risk of corrupting any data.

Reading a file list created with fdupes

By passing the --fdupes option, duperemove can work in conjunction with fdupes in order to deduplicate a pre-calculated list of files. When in this mode, input will be accepted on stdin:

root #cat fdupes_list.txt | duperemove --fdupes

This is handy when a list of duplicates has already been created so that disk-intensive deduplication job can be ran at a time when the system is not under heavy load.

It is also possible to deduplicate directly from fdupes (without creating a file list):

root #fdupes -r -S /path/to/filesystem/directory | duperemove --fdupes

See also

  • Duperemove/Manual installation — This article will focus on building and using the duperemove utility from the newest ('live') source code directly out of the git repository.
  • fdupes — a tool for identifying duplicate files across a set of directories.
  • btrfs — a copy-on-write (CoW) filesystem for Linux aimed at implementing advanced features while focusing on fault tolerance, repair, and easy administration.