Crazy Backup Solution

by Steven Noonan

I am thinking about attempting to install FreeBSD on my Mac again. The last time I tried, though, it wiped my GUID Partition Table (GPT) and my Master Boot Record (MBR). So basically, it trashed the system and I had to reformat again. It wasn’t lovely. So this time, I decided to back up important things. My source code is already backed up, thanks to Git. My preferences, bookmarks, etc are all backed up by .Mac (oh, excuse me. “MobileMe“), so all that’s really left is my Applications folder and a few miscellaneous documents.

Normally, when backing up my Applications folder, the process is fairly straightforward:

  • Run AppFresh to get the latest versions of my apps.
  • Run Xslimmer to strip out localizations and PowerPC parts.
  • Hand-pick the applications I want to back up and copy them to a network share.

Unfortunately, this is prone to error, especially in the third step. It’s possible to click too quickly when multi-selecting, and open dozens of applications at once. I don’t much like that. So I hacked together a fairly simple (but quite nifty) solution to deal with backing up my applications.

Let’s start with the problem. The problem is that there are several applications that I shouldn’t back up, because they’re provided by the Mac OS X installer. iTunes, Front Row, etc. In fact, I can’t think of a single application made by Apple that doesn’t either come with the OS or need an installer in order to operate correctly (i.e. Aperture, Logic, Final Cut, iLife, iWork, etc). So these applications need to be excluded while backing up.

So I started with a simple script that filters through the applications I’ve got and finds ones that need to be backed up:

filter.sh

#!/bin/bash

BANLIST="$(cat filter-blacklist)"
WHITELIST="$(cat filter-whitelist)"

# We don't want spaces to muck up the paths.
export IFS=$'\n'

rm -f backup-queue
for a in $(find /Applications -maxdepth 1 -depth 1 -type d | sed 's/\/Applications\///g' | sort -f); do

	# Simple exclusion rule for Apple-provided apps.
	EXCLUDE=$(cat "$a/Contents/Info.plist" 2> /dev/null | grep -A 1 CFBundleIdentifier | grep com.apple)

	# We also run this through a blacklist and a whitelist, just
	# in case there's something that we _did_ want to be banned
	# or vice versa.
	if [ "$EXCLUDE" == "" ]; then
		BANNED=0
		for b in $BANLIST; do
			if [ "$a" == "$b" ]; then
				BANNED=1
				break
			fi
		done
	else
		BANNED=1
		for b in $WHITELIST; do
			if [ "$a" == "$b" ]; then
				BANNED=0
				break
			fi
		done
	fi

	# JUDGEMENT TIME
	if [ $BANNED -eq 0 ]; then
		echo $a OK
		echo $a >> backup-queue
	else
		echo $a BANNED
	fi
done

The script requires two files. A whitelist (applications that should be backed up, but are excluded by the filter) and a blacklist (applications that shouldn’t be backed up, but pass the filter). Here are mine:

filter-blacklist

Adobe Bridge CS4
Adobe Device Central CS4
Adobe Dreamweaver CS4
Adobe Drive CS4
Adobe Extension Manager CS4
Adobe Flash CS4
Adobe Illustrator CS4
Adobe Media Encoder CS4
Adobe Media Player.app
Adobe Photoshop CS4
AppleScript
Microsoft Office 2008
NetBeans.app
Utilities
VirtualBox.app
VMware Fusion.app

 

filter-whitelist

Plasma Pong.app

A bit of explanation on my blacklist and whitelist… Plasma Pong.app is on the whitelist because the author specified that the CFBundleIdentifier is ‘com.apple.plasmapong’, which gets caught by the anti-Apple app filter (a CFBundleIdentifier of ‘com.plasmapong.plasmapong’ would be better). The folders/applications listed on the blacklist are ones which are either provided with the OS (‘AppleScript’, ‘Utilities’), or require an installer to function (‘NetBeans’, ‘VirtualBox’, ‘VMWare Fusion’).

So anyway, the script uses these two files and its filter to figure out what apps should be included or excluded. Let’s look at what the script outputs in my case:

Alcarin:Applications steven$ ./filter.sh
0xED.app OK
Address Book.app BANNED
Adium.app OK
Adobe Bridge CS4 BANNED
Adobe Device Central CS4 BANNED
Adobe Dreamweaver CS4 BANNED
Adobe Extension Manager CS4 BANNED
Adobe Flash CS4 BANNED
Adobe Illustrator CS4 BANNED
Adobe Media Encoder CS4 BANNED
Adobe Media Player.app BANNED
Adobe Photoshop CS4 BANNED
Angband.app OK
Anxiety.app OK
Aperture.app BANNED
AppFresh.app OK
AppleScript BANNED
AppZapper.app OK
Arora.app OK
Audacity.app OK
Automator.app BANNED
BetterZip.app OK
blender 2.48a OK
Braid.app OK
Calculator.app BANNED
Canary.app OK
CandyBar.app OK
Chess.app BANNED
Chmox.app OK
coconutBattery.app OK
coconutIdentityCard.app OK
Coda OK
Colloquy.app OK
CrossOver Games.app OK
CrossOver.app OK
Cyberduck.app OK
DAA Converter.app OK
Darwinia.app OK
Dashboard.app BANNED
Dictionary.app BANNED
Disco.app OK
Dock Library.app OK
DOSBox.app OK
Doukutsu.app OK
DropCopy.app OK
DVD Player.app BANNED
Dwarf Fortress 0.28.181.40d OK
Expose.app BANNED
Firefox.app OK
Flip4Mac OK
Flock.app OK
Font Book.app BANNED
Freeciv.app OK
Front Row.app BANNED
FrostWire.app OK
Geekbench (64-bit).app OK
Geekbench (Rosetta).app OK
Geekbench.app OK
Gimp.app OK
GrandPerspective.app OK
GridWarsOSX OK
Hacker Evolution Untold.app OK
HandBrake.app OK
iCal.app BANNED
iChat.app BANNED
Image Capture.app BANNED
Inkscape.app OK
iPodDisk.app OK
iStumbler.app OK
iSync.app BANNED
iTunes.app BANNED
Jaikoz.app OK
Leopard Cache Cleaner.app OK
LiquidMac.app OK
Little Snitch Configuration.app OK
MacHeist Chat.app OK
MacPorts OK
Mactracker.app OK
Mail.app BANNED
Microsoft Office 2008 BANNED
Multiwinia.app OK
NetBeans OK
NetNewsWire.app OK
Nocturne.app OK
Opera 10 beta.app OK
Opera.app OK
Photo Booth.app BANNED
Picasa.app OK
Picturesque.app OK
Pixelmator.app OK
Plasma Pong.app OK
Preview.app BANNED
Privateer - Ascii Sector OK
Quicksilver.app OK
QuickTime Player.app BANNED
RealPlayer.app OK
Safari.app BANNED
Scribus.app OK
SeaMonkey.app OK
Senuti.app OK
Sequel Pro.app OK
Shimo.app OK
SiteSucker.app OK
SketchFighter 4000 Alpha OK
Skitch.app OK
Skype.app OK
smcFanControl.app OK
Smultron.app OK
Spaces.app BANNED
Speed Download 5 OK
Stickies.app BANNED
SubEthaEdit.app OK
System Preferences.app BANNED
TextEdit.app BANNED
The Unarchiver.app OK
Time Machine.app BANNED
Transmission.app OK
TrueCrypt.app OK
Twitterrific.app OK
Unangband.app OK
Utilities BANNED
Ventrilo.app OK
VLC.app OK
VMware Fusion.app BANNED
Vuze.app OK
WebKit.app OK
World of Goo.app OK
X-Chat Aqua.app OK
Xbench.app OK
Xslimmer.app OK
Yep.app OK
ZAngband.app OK
Alcarin:Applications steven$

Alright, looks good. filter.sh also generates a file called ‘backup-queue’ which contains all the apps which passed the filter. So now, step two is a two-part process:

  • Archive the apps (gzipped tarball works for this)
  • Copy the tarballs to a network share

So I created a couple scripts for this (they can be run in parallel, which is fantastic for a couple reasons which I’ll outline below).

backup.sh

#!/bin/bash

cd /Applications

QUEUE="$(cat backup-queue 2> /dev/null)"

# We don't want spaces to muck up the paths.
export IFS=$'\n'

echo "Running backup queue..."

rm -iv *.tar.gz.lock *.tar.gz

for a in $QUEUE; do

	echo -n "Creating $a.tar.gz... "

	# Create a lock so that the move script won't touch
	# the file until we're done with it here.
	touch "$a.tar.gz.lock"

	# Do the actual work.
	tar -czpf "$a.tar.gz" "$a"

	# Mark this tarball complete.
	rm -f "$a.tar.gz.lock"

	echo "OK"

done

echo "All done."

 

move.sh

#!/bin/bash

DESTINATION="$1"

# The user doesn't know what he's doing, clearly.
if [ "$DESTINATION" == "" ]; then
	echo "Please provide a destination directory."
	exit 1
fi

# I spoke too soon. _NOW_ the user has no idea what they're doing.
if [[ ( ! -d "$DESTINATION" ) || ( ! -w "$DESTINATION" ) ]]; then
	echo "Destination specified is not a writable directory."
	exit 1
fi

# We don't want spaces to muck up the paths.
export IFS=$'\n'

cd /Applications

while true; do

	# Stays zero unless something actually gets moved.
	DIDWORK=0

	QUEUE="$(ls | grep tar.gz$)"
	for a in $QUEUE; do

		# If the lock file for the tarball doesn't exist,
		# we assume that backup.sh has finished its work.
		if [ ! -f "$a.lock" ]; then
			DIDWORK=1
			echo "Moving $a to $DESTINATION/..."

			# Finally move it to the backup storage
			# directory.
			mv "$a" $DESTINATION/ &> /dev/null
		fi
	done

	# We sleep if no work is done because otherwise
	# this infinite loop would waste quite a few CPU
	# cycles.
	if [ $DIDWORK -eq 0 ]; then
		echo "Nothing to do. Sleeping..."
		sleep 2.5s
	fi

done

You might wonder why I didn’t just have it tarball directly to the remote server. The reason is fairly simple. If the network latency is bad enough, it won’t be able to transfer over the network fast enough and the archiving process grinds to a halt while waiting for the network to catch up. And if the bottleneck happens to be in your archiving speed, this doesn’t adversely affect it, either. This parallel method ensures that the maximum amount of work is being done at a time.

Stumble it!

Leave a Reply

You must be logged in to post a comment.