<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Simple Linux BASH script to locate and delete duplicate photos</title>
	<atom:link href="http://mikebeach.org/2012/09/15/simple-linux-bash-script-to-locate-and-delete-duplicate-photos/feed/" rel="self" type="application/rss+xml" />
	<link>http://mikebeach.org/2012/09/15/simple-linux-bash-script-to-locate-and-delete-duplicate-photos/</link>
	<description></description>
	<lastBuildDate>Wed, 19 Jun 2013 17:45:14 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
	<item>
		<title>By: John</title>
		<link>http://mikebeach.org/2012/09/15/simple-linux-bash-script-to-locate-and-delete-duplicate-photos/#comment-704</link>
		<dc:creator>John</dc:creator>
		<pubDate>Sun, 16 Sep 2012 14:03:04 +0000</pubDate>
		<guid isPermaLink="false">http://mikebeach.org/?p=5173#comment-704</guid>
		<description><![CDATA[Interesting, I always enjoy looking at the different ways to attack a problem.  I tend to do things more via command line w/ pipes - simply because I hardly ever remember the correct bash syntax.

A different approach (incomplete):

find . -type f -name &quot;*&quot; -exec md5sum {} ; &#124; sort &#124; uniq -d -w 32

This produces a two column output of the hash and filename of one of the duplicates.  I assumed &quot;-d&quot; would give me multiple duplicate hits - but it appears to only output one of the dups if there are three or more matches for a single file.

I&#039;d be interested in which file was deleted - possibly relying on the file metadata to determine which one was the original (by date) if applicable.  But that would add some more complexity.  Your approach seems simple and straightforward to me.

NOTE: Initially I thought rsync would have a feature to prune duplicates, but I couldn&#039;t find one.]]></description>
		<content:encoded><![CDATA[<p>Interesting, I always enjoy looking at the different ways to attack a problem.  I tend to do things more via command line w/ pipes &#8211; simply because I hardly ever remember the correct bash syntax.</p>
<p>A different approach (incomplete):</p>
<p>find . -type f -name &#8220;*&#8221; -exec md5sum {} ; | sort | uniq -d -w 32</p>
<p>This produces a two column output of the hash and filename of one of the duplicates.  I assumed &#8220;-d&#8221; would give me multiple duplicate hits &#8211; but it appears to only output one of the dups if there are three or more matches for a single file.</p>
<p>I&#8217;d be interested in which file was deleted &#8211; possibly relying on the file metadata to determine which one was the original (by date) if applicable.  But that would add some more complexity.  Your approach seems simple and straightforward to me.</p>
<p>NOTE: Initially I thought rsync would have a feature to prune duplicates, but I couldn&#8217;t find one.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
