Cleaning up after PhotoRec – Linux Edition

A friend of mine accidentally formatted her 2TB backup drive and brought it to me to see what I could do. PhotoRec was the only tool that would give more than a hint of recoverable data from my Linux-based laptop. After letting it run for several hours, I had a mess: PhotoRec recovered over 230,000 files,  and spread them out over 430 directories!

Finding and sorting them by hand was obviously out of the question as it would take way too long. I took a hint from another recovery program I had [unsuccessfully] tried, and thought of sorting them into directories based on their file extensions. I.e. .jpg into ‘jpg’, .doc into ‘doc’, etc.

However, since I was working with the large number of files and directories, I had to take this a bit further. This will sort files into directory names like ‘jpg.1′, ‘jpg.2′, etc. These are numbered to prevent putting too many files in a single directory and killing system performance :)

I came up with the following bash script for use on Linux systems. A word of caution is due: This script does very little error checking, though the commands are quite sensible and I ran it as-is with fine results. I’m not responsible for unanticipated results.

Here is the script:

#!/bin/bash
#
# PhotoRec Cleanup Script for Linux - Written by Mike Beach http://mikebeach.org
# Last update: 2013-06-03
#
# This script is designed to clean up the mix of files stored in recup_dir directories
# created after a PhotoRec recovery of a large number of files. This script traverses the
# directories and sorts the files into new locations based on the file's extension,
# creating the new directories where needed. CAUTION: This script doesn't do nearly as
# much error checking as it should, so naturally, use at your own risk.
#
# This script should be run from the parent of any recup_dir.* directories
#
# Behavior:
# By default, this is set to 'cp' (copy) for safety.
# Change to 'mv' (move) if you want that behavior.
# Valid values are ONLY 'cp' or 'mv'.
# Anything else will generate a fatal error and a non-zero exit code.
BEHAVIOR='mv'
#
# Only edit anything below this line if you know what you are doing. If you do feel the need
# to edit something, please drop me a line to explain why so that I can improve the script.
# http://mikebeach.org/2011/08/21/cleaning-up-after-photorec/
#
echo
echo "PhotoRec Cleanup Script for Linux - Written by Mike Beach http://mikebeach.org"
echo
B=$BEHAVIOR
if [[ $B != 'cp' && $B != 'mv' ]]; then
echo "[E] Invalid behavior set. This can cause undesirable/unpredictable behavior.";
echo "[E] Please edit the script, check the behavior setting, and re-run.";
exit 1;
fi
P=$PWD
# Check for any recup_dir directory
GO=0
for z in recup_dir.*; do
GO=1;
done
if [[ $GO == 1 ]];
then echo "[I] $P/recup_dir.* exists, proceeding as planned.";
else echo "[E] $P/recup_dir.* not found.";
echo "Are you in the right directory? Stopping.";
exit 1;
fi
for x in recup_dir.*; do
echo "[I] Entering $x...";
cd $x;
for y in *; do
# E is the extension of the file we're working on.
# Convert all extensions to lowercase
E=`echo $y | awk -F. '{print $2}' | tr '[:upper:]' '[:lower:]'`;
# fix for recovered files without extensions
if [[ "$E" == "" ]]; then E="no_ext"; fi
# C is the counter number; the 'number' of the directory that we're on
C=`echo $x | awk -F. '{print $2}'`;
# D is the destination pathspec. $P/$E/$E.$C -> (jpg) $PWD/jpg/jpg.1
D="$P/$E/$E.$C";
if [[ ! -d $D ]];
then 
mkdir -p $D;
if [[ ! -d $D ]]; then
echo "[E] Creating directory $D failed. Aborting.";
exit 255;
fi
fi
$B -f "$y" "$D"; echo -n "."
done
cd ..;
if [[ $B == 'mv' ]]; then
echo "[I] Attempting to remove now-empty directory $x."
rmdir $x;
fi
done
echo "[I] Complete."

Questions, comments, and feedback are welcome. Please leave them in the comments section below. Thank you!

, , , ,

  1. #1 by Frank Grant on March 24, 2012 - 8:27 am

    Thank you very much. Just what I needed.

  2. #2 by Frank Grant on March 29, 2012 - 1:24 pm

    There is an error at line 42 elseecho Should be else echo

    Also it does not handle correctly the directory were the data is keep if it has spaces in the name. That then screws up the destination pathway by creating a directory for the first part of the name.

    Thanks

  3. #3 by Brian Carroll on October 5, 2013 - 10:44 am

    Thank you – your script indeed looks like what I need or will need in a little while

    I have PhotoRec running on a 2T drive at the moment. The destination is a share on a network connected NAS. The connection on the machine that is running the process is 100M.

    It has been running for 26+ hours now and it tells me that there is 147+ hours to go.

    So far I have recovered over a million files. Of these about 97% are .txt.

    As far as I know a process on the host machine “went rouge” when the target 2T drive was the main storage. In any event it filled up and in trying to fix that problem I screwed up. I have 81k+ files of type *.dv and strongly suspect that a failed digital video capture session led to my initial problem.

    But there are digital video capture files there that I wish to recover, along with media files derived from them but there should be no more than a couple of dozen of them, at best.

    My main reason for doing what i am doing it to recover a small number of SVG, Scribus (*.sla), some PDF, edited photos, other graphics etc that were on the machine. There was also an extensive fonts library being hosted there. If possible I would like to be able to recover the contents of some MYSQL tables – but can live without that if I have to.

    When can I safely start running your script – do I have to wait another six days ? !

    I have another faster machine with 1T+ space available to it on the network that can connect to the NAS at perhaps 1000M.

    Starting weeding the 200+ recoup_dir.* folders that exist now would seem like a good idea but as you have probably gathered I have only a vague amateur idea of what I am doing.

    Any help, advice or pointers would be appreciated.

    • #4 by Mike Beach on October 5, 2013 - 4:57 pm

      You can run the sort script now as long as you select the ‘cp’ (copy) instead of move behavior, and as long as you are copying the files somewhere outside of where you are recovering them to. You shouldn’t run into any unusual behavior, but mind that you may have to run it again as more files are recovered.

  4. #5 by peterh on December 27, 2013 - 6:32 pm

    Brilliant solution to doing something with Photorec’s output. I ended up with 1.25 million files from my someone’s ‘dead’ backup drive. Most were web page fragments as .txt files not needed, so deleted them. But I still have 355,000 files in 16,000 folders.
    But I am getting an error, what am I doing wrong?
    prcleanup.sh: 29: prcleanup.sh: [[: not found
    prcleanup.sh: 40: prcleanup.sh: [[: not found
    [E] /home/peter/test/recup_dir.* not found.
    Are you in the right directory? Stopping.

    • #6 by Mike on December 27, 2013 - 8:00 pm

      Hey Peter,

      This looks like it could be an issue with the command interpreter. What OS are you using?

      • #7 by peterh on December 29, 2013 - 7:08 pm

        hi Mike
        FIXED
        I am running this: Mint 16 64bit.
        and found that I have by default DASH which doesn’t support [[.
        running sudo bash “cleanupscript”.sh wirk a treat.

        thanks
        peter (an old newbie)

        • #8 by peterh on December 29, 2013 - 7:28 pm

          sorry about my spelling.
          I had a further question on the subfolders But worked out what your script was doing.
          would be handy to dump 500 files into a folder then start a new one.

          thanks again
          peter

          • #9 by Mike on December 29, 2013 - 8:03 pm

            Thanks for all the feedback, Peter!

Follow

Get every new post delivered to your Inbox.

Join 34 other followers

%d bloggers like this: