Posts Tagged Amazon

Restoring and downloading S3 Glacier objects using s3cmd

I currently have a portion of my backups on S3, with a life-cycle policy that includes moving the objects to Glacier after a period of time. This makes the storage much cheaper ($0.01/GB/Mo from $0.03/GB/Mo – Source), but has the downside that objects require a 4-hour restore period  before they can become available for download. I have had need for some objects quickly, and so the 4-hour restore time isn’t worth the savings. Unfortunately, once an object has had this life-cycle applied to it, it can only be temporarily restored. In order to make it a standard object again, you have to download it, delete the Glacier object, and then re-upload it. Unfortunately, doing it all wasn’t quite as straightforward as I thought it might be. But, (I think) I figured out a way to get it done rather painlessly.

I’m going to be using s3cmd and a few cron jobs to automate this.

First, get s3cmd version 1.5. This version supports initiating restores on the Glacier objects. You can recursively initiate a restore on every object in the bucket, but when it hits a non-Glacier object it will stop. You can also use s3cmd to initiate a download of all the objects in the bucket, but when it hits a Glacier object, the download will stop. And you will end up with a zero-byte file. (Hey s3cmd developers, would you mind fixing this behavior, or at least writing in something to force progression on a failure, so we can walk through the entire bucket in one go?)

The solution had to involve initiating restores, waiting at least 4 hours for the restore, then going back for the restored data and deleting it from the buckets, then deleting any zero-byte files, and then doing it all over again later.

Ain’t nobody got time for that. Except cron. Cron has plenty of time for that.

First of all, make sure you have s3cmd installed and configured (with s3cmd --configure). Then you can configure the following script to run every 4 hours. I’m not going to go into much detail on this. If you’re familiar with s3cmd and Amazon S3/Glacier, you can probably figure out how it works. I wrote it as a short-term fix, but it’s worth sharing.

#!/bin/bash

# This script should be fired every 4 hours from a cron job until all
# data from the desired bucket is restored.
# Requires s3cmd 1.5 or newer

# Temp file
TEMPFILE=~/.s3cmd.restore.tmp

# Bucket to restore data from. Use trailing slash.
BUCKET="s3://bucketname/"

# Folder to restore data to. Use trailing slash.
FOLDER="/destination_folder/"

# Because of the way s3cmd handles errors, we have to run in a certain method
# 1: download/delete files from bucket,
# 2: run restore on the remaining objects
# 3: Do housekeeping on the downloaded data

if [ ! -f $TEMPFILE ]
then
touch $TEMPFILE
echo === Starting Download Phase
s3cmd -r --delete-after-fetch --rexclude "/$" sync $BUCKET $FOLDER
echo === Starting Restore Phase
s3cmd -r -D 30 restore $BUCKET
echo === Starting cleanup
# s3cmd doesn't delete empty folders, and can create empty files. Clean this up.
find $FOLDER -empty -delete
# but it might accidentially delete the target directory if the download didn't
# happen, so we have to fix that now
mkdir $FOLDER
rm $TEMPFILE
fi

Note that restore, download, and delete operations can incur extra costs. Be aware of that before proceeding.

So that’s it. I *should* have my entire S3 bucket downloaded completely within the next few days, and then I can migrate to what I hope is a more simplified archiving plan.

, , ,

Leave a comment

Amazon S3 s3cmd s3tools on Ubuntu Server

If you want tools for Ubuntu Server to automatically sync files to Amazon S3, here’s how to set it up using S3tools

First, add the Ubuntu repository and GPG key, as described here:

wget -O- -q http://s3tools.org/repo/deb-all/stable/s3tools.key | sudo apt-key add –

add the repository lines automatically to a sources.d list file:

sudo wget -O/etc/apt/sources.list.d/s3tools.list http://s3tools.org/repo/deb-all/stable/s3tools.list

get s3tools

sudo apt-get update && sudo apt-get install s3cmd

configure s3tools

s3cmd –configure

Supply your Access Key and Secret Key, and turn on optional encryption and HTTPS.

If you generated a new AWS Access Key just for s3tools (like I did), and the program reports the following…

ERROR: Test failed: 403 (InvalidAccessKeyId): The AWS Access Key Id you provided does not exist in our records.

… go here, login, and then try s3tools again.

the rest of the documentation is here.

, , ,

Leave a comment