Cleaning up after PhotoRec – Linux Edition

A friend of mine accidentally formatted her 2TB backup drive and brought it to me to see what I could do. PhotoRec was the only tool that would give more than a hint of recoverable data from my Linux-based laptop. After letting it run for several hours, I had a mess: PhotoRec recovered over 230,000 files,  and spread them out over 430 directories!

Finding and sorting them by hand was obviously out of the question as it would take way too long. I took a hint from another recovery program I had [unsuccessfully] tried, and thought of sorting them into directories based on their file extensions. I.e. .jpg into ‘jpg’, .doc into ‘doc’, etc.

However, since I was working with the large number of files and directories, I had to take this a bit further. This will sort files into directory names like ‘jpg.1’, ‘jpg.2’, etc. These are numbered to prevent putting too many files in a single directory and killing system performance :)

I came up with the following bash script for use on Linux systems. A word of caution is due: This script does very little error checking, though the commands are quite sensible and I ran it as-is with fine results. I’m not responsible for unanticipated results.

UPDATE: This has moved to github, here.

Questions, comments, and feedback are welcome. Please leave them in the comments section below. Thank you!

, , , ,

  1. #1 by Frank Grant on March 24, 2012 - 8:27 am

    Thank you very much. Just what I needed.

  2. #2 by Frank Grant on March 29, 2012 - 1:24 pm

    There is an error at line 42 elseecho Should be else echo

    Also it does not handle correctly the directory were the data is keep if it has spaces in the name. That then screws up the destination pathway by creating a directory for the first part of the name.


  3. #3 by Brian Carroll on October 5, 2013 - 10:44 am

    Thank you – your script indeed looks like what I need or will need in a little while

    I have PhotoRec running on a 2T drive at the moment. The destination is a share on a network connected NAS. The connection on the machine that is running the process is 100M.

    It has been running for 26+ hours now and it tells me that there is 147+ hours to go.

    So far I have recovered over a million files. Of these about 97% are .txt.

    As far as I know a process on the host machine “went rouge” when the target 2T drive was the main storage. In any event it filled up and in trying to fix that problem I screwed up. I have 81k+ files of type *.dv and strongly suspect that a failed digital video capture session led to my initial problem.

    But there are digital video capture files there that I wish to recover, along with media files derived from them but there should be no more than a couple of dozen of them, at best.

    My main reason for doing what i am doing it to recover a small number of SVG, Scribus (*.sla), some PDF, edited photos, other graphics etc that were on the machine. There was also an extensive fonts library being hosted there. If possible I would like to be able to recover the contents of some MYSQL tables – but can live without that if I have to.

    When can I safely start running your script – do I have to wait another six days ? !

    I have another faster machine with 1T+ space available to it on the network that can connect to the NAS at perhaps 1000M.

    Starting weeding the 200+ recoup_dir.* folders that exist now would seem like a good idea but as you have probably gathered I have only a vague amateur idea of what I am doing.

    Any help, advice or pointers would be appreciated.

    • #4 by Mike Beach on October 5, 2013 - 4:57 pm

      You can run the sort script now as long as you select the ‘cp’ (copy) instead of move behavior, and as long as you are copying the files somewhere outside of where you are recovering them to. You shouldn’t run into any unusual behavior, but mind that you may have to run it again as more files are recovered.

  4. #5 by peterh on December 27, 2013 - 6:32 pm

    Brilliant solution to doing something with Photorec’s output. I ended up with 1.25 million files from my someone’s ‘dead’ backup drive. Most were web page fragments as .txt files not needed, so deleted them. But I still have 355,000 files in 16,000 folders.
    But I am getting an error, what am I doing wrong? 29: [[: not found 40: [[: not found
    [E] /home/peter/test/recup_dir.* not found.
    Are you in the right directory? Stopping.

    • #6 by Mike on December 27, 2013 - 8:00 pm

      Hey Peter,

      This looks like it could be an issue with the command interpreter. What OS are you using?

      • #7 by peterh on December 29, 2013 - 7:08 pm

        hi Mike
        I am running this: Mint 16 64bit.
        and found that I have by default DASH which doesn’t support [[.
        running sudo bash “cleanupscript”.sh wirk a treat.

        peter (an old newbie)

        • #8 by peterh on December 29, 2013 - 7:28 pm

          sorry about my spelling.
          I had a further question on the subfolders But worked out what your script was doing.
          would be handy to dump 500 files into a folder then start a new one.

          thanks again

          • #9 by Mike on December 29, 2013 - 8:03 pm

            Thanks for all the feedback, Peter!