Duplicates
NAME
File::Find::Duplicates - get a list of duplicate files in directories
SYNOPSIS
use File::Find::Duplicates;
my @dupes = find_duplicates( dirs => ["~/Pictures", "/camera/import"],
recursive=>True, ignore_empty => True );
say "First set: {@dupes[0]».path.join(', ')}"
#Produces (as an example)
# "First set: ~/Pictures/IMG0001.jpg, /camera/import/IMG0001.JPG"
my @moredupes = "/copiedfiles".path.duplicates;
DESCRIPTION
File::Find::Duplicates finds files which are duplicates of each other, by comparing size and MD5 checksums. While it is certainly possible that files of the same size will have a hash collision, it's unlikely enough that most applications won't notice the difference. Symbolic links can still get you into trouble, though.
The find_duplicates
function is the main method for accessing the function, though a
duplicates
method for IO::Path objects is also provided. Both take the same arguments,
with the exception of dirs
. Both functions return an array of arrays, listing each set
of duplicate files as IO::Path objects.
dirs
A required option, dirs
specifies which directories to look in. Requires an array of
paths (as ordinary strings), though it's okay if it only contains one item. In the method
form, the invocant IO::Path object serves as the directory to search through, and this
option is not required.
recursive
Specifies whether to descend through directories encountered; default is False. If set to a value like True, this module uses File::Find to descend the directory tree.
ignore_empty
Specifies whether or not we should bother to report empty files back as duplicates. Defaults to False, but any value that evaluates to true will omit results with file size = 0 bytes.
method
Takes "md5" (default) or "compare". MD5 mode uses Digest::MD5 to check compare the content of files, which may cause some rare false positives. The other method, "compare", uses File::Compare to look at the individual bytes of files.
CLI Usage
This module can be directly called from the command line, where it emulates some of the
functionality of fdupes. Due to a bug, some perl6 implementations might not call MAIN
in a module, and you might have to comment out the module
line to get it to work.
$ perl Duplicates.pm6 [options] directories
CLI Options
-r --recursive Go through directories recursively -S --size Print size of duplicate files -n --noempty Don't include empty files in the results -l --sameline Print results on a single line (careful: fdupes uses -1 instead of -l) -c --compare Compare byte-by-byte rather than via MD5 hash
TODO
Probably optimize the code. Add options for ordering and file deletion.
SEE ALSO
AUTHOR
Brent "Labster" Laabs, 2012-2013.
Released under the same terms as Perl 6; see the LICENSE file for details.