duplicate_files

Report on duplicate files in a file system found by fdupes

Version 1.0 (20160624)

I haven't found a better duplicate file finder than fdupes, but the output was really not what I wanted. The two scripts here are meant to be run on the output of fdupes. They simply read the output, then put it into a more useful format.

I assume the output was created with a command similar to

fdupes -rnS Dir/To/Check > dir.dups

thus, the -r -n -S parameters are necessary (probably). Since fdupes is pretty resource intensive, I save it to a file for processing later.

summarizeDups.pl takes the output of fdupes and simply summarizes it, letting you know how badly the system is duplicated. Use this first to find out if you really need to do anything more.

optimizeDups.pl assumes you have a problem, and you want to clean it up. The ouptut is sorted by which cleanups would most rapidly give you back the most space. Thus, if you have 5 copies of a 10G file, those will show up before your 10 copies of a 10G file. The largest savings is at the top, with the 500x0byte files all the way at the bottom.

To Use:

1. Make sure fdupes is installed

2. Copy the two scripts to some local directory

3. Run fdupes -rnS

4. pipe the result into one or both of the scripts.

For example:

fdupes -rnS /home/joe > joe.fdupes

./summarizeDupes.pl < joe.fdupes | less

./optimizeDupes.pl < joe.fdupes > cleanUp.todo

Download Here or use wget http://unixservertech.com/scripts/duplicate_files.tgz