Duplicates

NAME

File::Find::Duplicates - get a list of duplicate files in directories

SYNOPSIS

use File::Find::Duplicates;
my @dupes = find_duplicates( dirs => ["~/Pictures", "/camera/import"],
                                     recursive=>True, ignore_empty => True );
        say "First set: {@dupes[0]».path.join(', ')}"
		#Produces (as an example)
                # "First set: ~/Pictures/IMG0001.jpg, /camera/import/IMG0001.JPG"
	my @moredupes = "/copiedfiles".path.duplicates;

DESCRIPTION

File::Find::Duplicates finds files which are duplicates of each other, by comparing size and MD5 checksums. While it is certainly possible that files of the same size will have a hash collision, it's unlikely enough that most applications won't notice the difference. Symbolic links can still get you into trouble, though.

The find_duplicates function is the main method for accessing the function, though a duplicates method for IO::Path objects is also provided. Both take the same arguments, with the exception of dirs. Both functions return an array of arrays, listing each set of duplicate files as IO::Path objects.

dirs

A required option, dirs specifies which directories to look in. Requires an array of paths (as ordinary strings), though it's okay if it only contains one item. In the method form, the invocant IO::Path object serves as the directory to search through, and this option is not required.

recursive

Specifies whether to descend through directories encountered; default is False. If set to a value like True, this module uses File::Find to descend the directory tree.

ignore_empty

Specifies whether or not we should bother to report empty files back as duplicates. Defaults to False, but any value that evaluates to true will omit results with file size = 0 bytes.

method

Takes "md5" (default) or "compare". MD5 mode uses Digest::MD5 to check compare the content of files, which may cause some rare false positives. The other method, "compare", uses File::Compare to look at the individual bytes of files.

CLI Usage

This module can be directly called from the command line, where it emulates some of the functionality of fdupes. Due to a bug, some perl6 implementations might not call MAIN in a module, and you might have to comment out the module line to get it to work.

$ perl Duplicates.pm6 [options] directories

CLI Options

-r --recursive Go through directories recursively -S --size Print size of duplicate files -n --noempty Don't include empty files in the results -l --sameline Print results on a single line (careful: fdupes uses -1 instead of -l) -c --compare Compare byte-by-byte rather than via MD5 hash

TODO

Probably optimize the code. Add options for ordering and file deletion.

SEE ALSO

* File::Find * Digest::MD5

AUTHOR

Brent "Labster" Laabs, 2012-2013.

Released under the same terms as Perl 6; see the LICENSE file for details.

The Camelia image is copyright 2009 by Larry Wall. "Raku" is trademark of the Yet Another Society. All rights reserved.