Simple Linux BASH script to locate and delete duplicate photos

This is a quick bash script I wrote to walk through directories and delete duplicate photos based on MD5.

It was written for a specific scenario and I highly advise against using it as-is. Instead, read through it and tweak it to your own situation. I would appreciate any feedback.

echo HF=$HF
rm "$HF"
touch "$HF"
for x in `ls -bd *`; do
 echo x=$x
 cd $x
 for y in *; do
 m=`md5sum "$y" | awk -F '{print $1}'`
 echo $y: $m
 g=`grep $m "$HF"`
 echo g=$g
 if [[ "$g" != "" ]]; then
 echo "MATCH!!";
 echo rm "$y"
 echo "no match"
 echo $m >> "$HF"
 cd ..

, ,

  1. #1 by John on September 16, 2012 - 9:03 am

    Interesting, I always enjoy looking at the different ways to attack a problem. I tend to do things more via command line w/ pipes – simply because I hardly ever remember the correct bash syntax.

    A different approach (incomplete):

    find . -type f -name “*” -exec md5sum {} ; | sort | uniq -d -w 32

    This produces a two column output of the hash and filename of one of the duplicates. I assumed “-d” would give me multiple duplicate hits – but it appears to only output one of the dups if there are three or more matches for a single file.

    I’d be interested in which file was deleted – possibly relying on the file metadata to determine which one was the original (by date) if applicable. But that would add some more complexity. Your approach seems simple and straightforward to me.

    NOTE: Initially I thought rsync would have a feature to prune duplicates, but I couldn’t find one.