fdupes

From Gentoo Wiki
Jump to: navigation, search

Resources

fdupes is a tool for identifying duplicate files across a set of directories. It works by scanning the specified directories for files, running md5sum on those files, then running a byte-by-btye comparison on the files. It can work in tandem with duperemove, a deduplication tool for btrfs.

Installation

Emerge

root #emerge --ask app-misc/fdupes

Configuration

fdupes has no configuration options other than optional command-line parameters.

Usage

Invocation

user $fdupes --help
Usage: fdupes [options] DIRECTORY...

 -r --recurse     	for every directory given follow subdirectories
                  	encountered within
 -R --recurse:    	for each directory given after this option follow
                  	subdirectories encountered within
 -s --symlinks    	follow symlinks
 -H --hardlinks   	normally, when two or more files point to the same
                  	disk area they are treated as non-duplicates; this
                  	option will change this behavior
 -n --noempty     	exclude zero-length files from consideration
 -f --omitfirst   	omit the first file in each set of matches
 -1 --sameline    	list each set of matches on a single line
 -S --size        	show size of duplicate files
 -m --summarize   	summarize dupe information
 -q --quiet       	hide progress indicator
 -d --delete      	prompt user for files to preserve and delete all
                  	others; important: under particular circumstances,
                  	data may be lost when using this option together
                  	with -s or --symlinks, or when specifying a
                  	particular directory more than once; refer to the
                  	fdupes documentation for additional information
 -N --noprompt    	together with --delete, preserve the first file in
                  	each set of duplicates and delete the rest without
                  	prompting the user
 -v --version     	display fdupes version
 -h --help        	display this help message

Find duplicate files recursively

To find duplicate files in target directories recursively the following command could be used:

user $fdupes --recurse --size /path/to/dir/one /path/to/dir/two
Note
If it is known in advance that some files down the tree will result in permissions conflicts (I.E. if some files will be owned by root or another user) be sure to run the fdupes command with appropriate privileges.

Most of the time, however, it is wise to redirect the output of the fdupes command to a file:

user $fdupes --recurse --size /path/to/dir/one /path/to/dir/two >> /tmp/fdupes_file_list.txt

Creating a file is a wise and efficient idea, especially when a large amount of files are being compared. It is much easier to look through a large file list with a text editor rather than attempting to parse the list via scroll back in a terminal buffer.

Find and delete files recursively

Users are strongly cautioned to run one of the above command(s) before running one of the next commands. This is done in order to verify the output is as expected. Do the operation right the first time; the fewer mistakes the better! After output is satisfactory, the following command can be used to delete all but the first occurrence of the file. Be sure to list the directories in the order of precedence so that the correct files are preserved. fdupes (IE, to keep all the files in home directory, listing the home directory last will make it show up first in the list)

The following command uses the --noprompt (-N) and --delete (-d) options to delete all but the first duplicate found in a file list (created using a previous command) without prompting the user:

user $fdupes --noprompt --delete --recurse /path/to/dir/one /path/to/dir/two

Removal

Unmerge

No special files need to removed. Uninstall fdupes via:

root #emerge --ask --unmerge app-misc/fdupes

See also

  • Duperemove - A tool that can submit duplicated extents to the kernel for deduplication (can read information directly from fdupes!).