Backups on a Budget
Volume Number: 23 (2007)
Issue Number: 08
Column Tag: Programming
Backups on a Budget
Build your own backup utility
with AppleScript Studio
by José R.C. Cruz
Introduction
Today, we will look into three backup tools that ship with Mac OS X. We will learn how to use these tools through the Terminal window. We will then use AppleScript studio to give one of these tools a user-friendly interface.
Backups, A Quick Primer
The reason for backups
The value of a computer lies not in its hardware or software features. It lies in the data stored on its hard drive, data that are critical to a user's daily activities. Some examples of critical data include active files such as documents, and web pages. Others include static ones such as fonts, plug-ins, and preferences files.
The potential for data loss is a harsh reality in any computing situation. Losing some, if not all, of these files can be disruptive and costly in terms of time and money. If the affected volume is largely intact, users can use specialized software to try to recover the lost files. But this type of recovery becomes ineffective when the volume itself is lost or destroyed. A more effective solution is to use a backup method.
Types of backups
The basic idea of a backup is to maintain copies of critical data from the source media. These copies are stored in a separate backup volume. That volume is then moved to a secure site after the backup. In the event of data loss, the backup media is retrieved from its site. Copies of affected data are then transferred from the volume back onto the target.
Backup systems fall into two categories. The first category groups these systems in terms of how the data is stored. The following are three basic types of systems in this category.
a. unstructured An unstructured backup (Figure 1) stores the data as a simple collection. It does not organize the data in any meaningful form. It also does not keep track of which data was revised or removed in the computer later on. Instead, users have to do the extra steps needed to manage the backed up data.
Figure 1. An unstructured backup.
b. full+incremental In a full+incremental backup (Figure 2), the first backup consists of all the data chosen by the user. This is also known as a full store. Then later backups consist only of data that were either revised or removed. All affected data are stored in a special store known as an incremental. The backup process then places each incremental in front of the full store. Restoring the latest data set consists of two steps. First, the process restores the full store back onto the computer. It then updates the restored data with each incremental, starting from the oldest to the newest.
Figure 2. A full+incremental backup.
c. mirror+incremental The mirror+incremental works in the same fashion as the full+incremental. But they differ| in how they manage their incrementals. The mirror+incremental (Figure 3) updates its full store with the latest data revisions. It moves older revisions of the data into an incremental. It also takes the same approach for those data that were deleted. Then, it places each incremental after the full store. Now, restoring the latest data state is a simple process. All that is required is to copy the full store back onto the computer.
Figure 3. A mirror+incremental backup.
The second category groups backup systems in terms of the type of data being backed up. Again, there are three basic types in this category.
a. file-directory copy The backup store consists of files and directories from the target (Figure 4). Most backup systems use native system calls to copy the data onto the backup volume. Some use their own routines to create the copies.
Figure 4. A file-directory copy backup.
b. filesystem dump The backup store consists of the filesystem data of the target (Figure 5). This data can either be the entire filesystem or just the parts that have changed. Some backup systems use native system calls to copy the filesystem data onto the backup volume. Some use specialized tools, others a server process.
Figure 5. A filesystem dump backup.
c. block-level incremental The backup store consists of the actual drive blocks from the target (Figure 6). Backup systems use low-level drivers to copy the blocks containing critical data onto the backup volume. This method requires close integration between the backup system and the target.
Figure 6. A block+incremental backup.
The Backup Tools
Mac OS X comes with a number of shell tools that can be used to create backups. Readers need only to start a Terminal session in order to use these tools. So, let's take a look at three of these tools.
The dump and restore tools
The dump tool creates a backup of the target filesystem. This tool examines the target media and decides which filesystem structures should be backed up. The tool is located in the /sbin directory, and it is available to all users.
The basic syntax of the dump tool is as follows.
dump backup_level -f backup_media target_data
The tool supports other command options as well. To view these options, type info dump at the Terminal prompt.
The backup_level option sets the type of backup to be done. To do a full backup, set this option to 0. To do an incremental backup, set it to a value in the range of 1 to 9. If this option is not set, the tool will assume a value of 1.
The backup_media parameter sets the location of the backup volume. The location can be a device node, a separate file, or stdout. For example, if the backup volume is /dev/disk5, then pass the node as follows.
dump -f /dev/disk5 target_data
Finally, the target_data parameter sets the location of the data on the target volume. This can be either a directory or the entire volume itself.
The opposite of the dump tool is the restore tool. This tool retrieves the backed up filesystem data and puts it back onto the target volume. It is also located in the /sbin directory, and available to all users.
The basic syntax for the restore tool is shown below.
restore <-r | -t> -f backup_media
The -r option restores the filesystem data from backup_media onto the current directory. The -t option lists the files contained in the backup. The tool supports other options as well. To view these options, type info restore at the Terminal prompt.
The dump and restore tools do have one shortcoming: they lack support for the HFS+ filesystem. Any attempts to backup an HFS+ volume will generate the following error message.
DUMP: bad sblock magic number
Nevertheless, these same tools will work if the target volume is formatted as UFS.
The tar tool
An alternative to the dump and restore tools is the tar tool. The basic syntax of the tool is shown below.
tar backup_options --file=backup_media target_data
Once again, the backup_media parameter sets the location of the backup volume. It can either be a device node or an actual file. The target_data parameter sets the location of the data on the target volume. It can either be a file, a list of files or a directory.
The backup_options parameters control the backup and recovery process. It can consist of one or more command options. The tar tool supports a wide range of command options. However, we will focus only on those options that are relevant to the topic.
Assume that we want to back up our web pages in the directory Sites, which is located in our home directory. To create a basic full backup, type the following statement at the Terminal prompt.
tar --create --verify --label=full_backup_200704 ¬
--file=/Volumes/Backup/foobar.tar Sites/*
The tool first creates the backup store, foobar.tar, on the Backup volume. Next, it adds each web page to the store, and verifies that the page is added correctly. Then, the tool assigns the volume label full_backup_200704 to the backup store. This label helps identify this backup from other backups on the same volume.
To list the contents of foobar.tar, use the --list command option.
tar --list --file=/Volumes/Backup/foobar.tar
To remove a file from foobar.tar, e.g. Sites/styles/foobar.css, use the --delete option.
tar --delete --file=/Volumes/Backup/foobar.tar Sites/styles/foobar.css
To retrieve a file from foobar.tar, e.g. Sites/pages/links.htm, use the --get option.
tar --get --file=/Volumes/Backup/foobar.tar Sites/pages/links.htm
To create a full+incremental backup, the tar tool needs to use a different set of command options. First, create a full backup with the following statement.
tar --create --file=/Volumes/Backup/foobar200704a.tar ¬
--listed-incremental=/Volumes/Backup/log/backup.log Sites/*
The tar tool creates the backup store foobar200704a.tar in /Volumes/Backup. Next, it creates a backup.log file in the directory /Volumes/Backup/log. But make sure to create the log directory before using the above statement. The tool itself will not create the log subdirectory; instead, it will generate an error if that directory is absent.
Now, assume that we made changes to the foobar.html and foobar.css files. To create an incremental backup, type the following statement at the Terminal prompt.
tar --create --file=/Volumes/Backup/foobar200704b.tar ¬
--listed-incremental=/Volumes/Backup/log/backup.log Sites/foo/*
The tar tool now creates a second backup store named foobar200704b.tar. Next, it copies the two changed files, foobar.html and foobar.css, into this store, ignoring the other unchanged files. Finally, the tool updates the backup.log file with the new tracking information.
The df tool
The df tool plays a special part in the backup and restore process. The tool displays a list of mounted volumes that can be access from the command-line. It also displays the amount of space available on those volumes. The tool also displays the device nodes assigned to each volume.
The basic syntax of the tool is df display_options, where display_options is one or more command options. To view a list of these options, type info df at the Terminal prompt.
Using the tool is quite straightforward. For instance, to display a list of all mounted volumes, type df at the Terminal prompt. The tool will display a list similar to that shown by Listing 1.
Listing 1. Sample listing of the df tool.
Filesystem 512-blocks Used Avail Capacity Mounted on
/dev/disk0s3 31195136 15484680 15398512 50% /
devfs 202 202 0 100% /dev
fdesc 2 2 0 100% /dev
<volfs> 1024 1024 0 100% /.vol
/dev/disk0s5 8126464 4511032 3615432 56% /Volumes/Backups
/dev/disk0s9 10223616 2551800 7671816 25% /Volumes/Projects
/dev/disk0s11 3932160 2822352 1109808 72% /Volumes/Darwin
/dev/disk0s17 47742000 27840584 19901416 58% /Volumes/Users
automount -nsl [199]0 0 0 100% /Network
To display only those volumes that are mounted locally, type df l at the prompt. The output list will now resemble that shown by Listing 2.
Listing 2. Sample listing of df -l.
Filesystem 512-blocks Used Avail Capacity Mounted on
/dev/disk0s3 31195136 15484680 15398512 50% /
/dev/disk0s5 8126464 4511032 3615432 56% /Volumes/Backups
/dev/disk0s9 10223616 2551800 7671816 25% /Volumes/Projects
/dev/disk0s11 3932160 2822352 1109808 72% /Volumes/Darwin
/dev/disk0s17 47742000 27840584 19901416 58% /Volumes/Users
To display the volume capacities using the K, M, G notations, type df -l h at the prompt. The output will now change to the one shown by Listing 3.
Listing 3. Sample listing of df -l -h.
Filesystem Size Used Avail Capacity Mounted on
/dev/disk0s3 15G 7.4G 7.3G 50% /
/dev/disk0s5 3.9G 2.2G 1.7G 56% /Volumes/Backups
/dev/disk0s9 4.9G 1.2G 3.7G 25% /Volumes/Projects
/dev/disk0s11 1.9G 1.3G 542M 72% /Volumes/Darwin
/dev/disk0s17 23G 13G 9.5G 58% /Volumes/Users
The Backup Utility
We will now use AppleScript Studio to give a graphical interface to the df and tar tools. We will also write a number of scripts to manage the backup and restore process. Most Mac OS X users prefer the comfort and convenience of a user-friendly interface. They find that a graphical interface is much easier to use and remember than a line of command options.
To keep things simple, the utility will only do a full backup. It will store the backed up data in a single tar file named backup.tar. Users can add files, but not directories, to the backup process. Users can also add aliases but the utility will not backup these types of files. Instead, it will backup the actual files referred to by the aliases.
Also, for reasons of length, this article will show only the principal scripts used by the utility. Readers can see the rest of the scripts by downloading the Xcode project, Conserve, from the MacTech website: ftp://ftp.mactech.com/src
The user-interface layout
Our backup utility has a single window, which then contains three tab panel views. The first panel view is the Volumes panel (Figure 7). This panel displays a list of mounted local volumes using the df tool. Selecting a volume enables the Next button. Also, the state of the checkbox Restore data from the volume will decide which panel view will be displayed next.
Figure 7. The Volumes panel view.
Suppose that the checkbox is left unchecked. Then the next panel view to be displayed is the Backup panel (Figure 8). To add file entries to the table, click on the upper right button marked with a <+>. To delete an entry, select the entry and click on the button marked with a <->. To clear all the entries, click on button marked with a <0>. The Backup button (lower right) is enabled only when there is at least one entry on the table; otherwise, it remains disabled.
Figure 8. The Backup panel view.
Now suppose that same checkbox is checked. This time the next panel view to be displayed is the Restore panel (Figure 9). This panel shows all the files contained in the backup.tar file. Selecting one or more entries will enable the Restore button.
Figure 9. The Restore panel view.
Retrieving a list of volumes
Listing 4 shows the getVolumes handler. This method first retrieves a path to the temporary directory in the user's home directory. It then executes the command statement df -l and stores the result in the temporary file vol_list. Also, the contents of the file should resemble that shown in Listing 2.
Listing 4. The getVolumes handler (Conserve.applescript).
to getVolumes()
local tCMD, tTmp
-- retrieve a path to the temporary folder
set tTmp to path to temporary items from user domain as string
set tTmp to POSIX path of tTmp
-- set the temporary file
set tTmp to tTmp & "vol_list"
-- prepare the script statement
set tCMD to "df -l > " & tTmp
-- execute the script statement
do shell script tCMD with altering line endings
-- return the path to the temporary file
return (tTmp)
end getVolumes
The next step is to extract the device node and volume path from each line of text in the vol_list file. The getVolumeNode handler (Listing 5) extracts the device node of each mounted volume. The getVolumePath handler (Error! Reference source not found. Listing 6) extracts the volume path for that same volume. Both handlers use similar parsing techniques to extract their data. They differ only in the direction taken by each technique.
Listing 5. The getVolumeNode handler (Conserve.applescript).
to getVolumeNode from theDat
local tVol, tLen, tPos
local tChr
try
-- initialize the following locals
set tVol to ""
set tLen to length of theDat
repeat with tPos from 1 to tLen
set tChr to character tPos of theDat
if (tChr is equal to " ") then
exit repeat
else
set tVol to tVol & tChr
end if -- (tChr is equal to " ")
end repeat -- with tPos from 1 to tLen
-- return the string results
return (tVol)
on error tErr
-- return the default string
return ("")
end try
end getVolumeNode -- from theDat
Listing 6. The getVolumePath handler (Conserve.applescript).
to getVolumePath from theDat
local tVol, tLen, tPos
local tChr, tEnd
try
-- initialize the following locals
set tVol to ""
set tLen to length of theDat
repeat with tPos from tLen to 1 by -1
set tChr to character tPos of theDat
if (tChr is equal to " ") then
exit repeat
else
set tVol to (tChr & tVol) as string
end if -- (tChr is equal to " ")
end repeat -- with tPos from 1 to tLen
-- test the extracted string
set tChr to character 1 of tVol
if (tChr is equal to "/") then
-- return the string results
return (tVol)
else
-- return a null string
return ("")
end if -- (tChr is equal to "/")
on error tErr
-- return the default string
return ("")
end try
end getVolumePath -- from theDat
Performing the backup process
Through the Backup panel (Figure 8), users select the files they want to add to the backup store. Once done, they then click on the Backup button to start the process. This button invokes the saveButton method shown in Listing 7. It also passes the volume path to the selected backup volume as the input argument.
First, the method constructs the path to the backup.tar file. Next, it prepares a date/time stamp, which will serve as the volume label. Then the method sets the following command statement.
tar --file=<path to the backup.tar file> --label=<date/time stamp>
When done, the method starts parsing the file paths that are stored in the list property pData. For the first file path, it executes the following tar statement.
tar --file=<path to the backup.tar file> --label=<date/time stamp> ¬
--create --verify <target file path>
For the rest of the file paths, it executes a different tar statement.
tar --file=<path to the backup.tar file> --label=<date/time stamp> ¬
--append <target file path>
Once the method finishes the backup process, it displays a modal dialog to inform the users of its success.
Listing 7. The saveBackup handler (Backup.applescript).
to saveBackup into theVol
local tVol, tTag, tClk, tTar
local tBck, tItm, tPth, tNew
-- parameter check
if (theVol is not equal to null) then
-- prepare the backup tar file
set tVol to theVol & "/backup.tar"
-- prepare the time tag
set tClk to current date
set tTag to (year of tClk as integer) as string
set tTag to tTag & (month of tClk as integer) as string
set tTag to tTag & (day of tClk as integer) as string
set tTag to tTag & (time of tClk as integer) as string
-- prepare the shell command
set tTar to "tar --file=" & tVol
set tTar to tTar & " --label=" & tTag
-- start backing up the files
set tNew to true
repeat with tItm in pData
-- retrieve a path to the file
set tPth to fpth of tItm
-- is this a new backup?
if (tNew) then
-- backup:file:create
set tNew to false
set tBck to tTar & " --create --verify "
set tBck to tBck & tPth
else
-- backup:file:update
set tBck to tTar & " --append "
set tBck to tBck & tPth
end if -- (tNew)
-- execute the script
try
do shell script tBck
on error tErr
log tErr
end try
end repeat -- with tItm in pData
-- tell the user that the backup is successful
display dialog "Backup is successful" buttons {"OK"} ¬
default button "OK" attached to window "demo"
end if -- (theVol is not equal to null)
end saveBackup -- into theVol
Performing the recovery process
To perform the recovery process, users first select the files they wanted from the Restore panel (Figure 9). Then they click on the Restore button to start the process. The button invokes the restoreFiles handler shown in Listing 8. It then passes the path to the backup volume as the input argument.
The restoreFiles method first prompts users to select a destination for the restored files. Then it prepares the following command line statement.
cd <path to the recovery directory>; tar --extract ¬
--file=<path to the backup.tar file>
Next, the method retrieves the list of files selected from the Restore panel. It then parses through each file, and updates the command line statement as follows.
cd <path to the recovery directory>; tar --extract ¬
--file=<path to the backup.tar file> <file to be recovered>
The method then uses the do shell script command to execute the above statement. Once the entire recovery process is finished, the method displays a modal dialog to inform users of its success.
Listing 8. The restoreFiles method (Restore.applescript).
to restoreFiles from theVol
local tDst, tPth, tSel, tRow
local tBsh, tTar, t
if (theVol is equal to null) then
return (false)
else
-- prompt the user for a destination volume
try
choose folder with prompt "Restore the backed up files into this directory "
set tDst to result as string
set tDst to the POSIX path of tDst
on error tErr
return (false)
end try
-- initialize the tar command
set tTar to "tar --extract --file=" & theVol
set tTar to tTar & "/backup.tar "
-- initialize the shell command
set tDst to "cd " & tDst & "; "
set tDst to tDst & tTar
-- retrieve the selected rows
set tSel to selected rows of pTable
if (length of tSel > 0) then
repeat with tRow in tSel
-- retrieve the file path
set tPth to item tRow of pData
set tPth to fpth of tPth
-- extract the file
set tBsh to tDst & tPth
try
do shell script tBsh
on error tErr
log tErr
end try
end repeat -- with tRow in tSel
-- tell the user that restoration is successful
display dialog "Restoration is successful" buttons {"OK"} ¬
default button "OK" attached to window "demo"
end if -- (length of tSel > 0)
end if -- (theVol is equal to null)
end restoreFiles -- from theVol
Final Thoughts
Mac OS X already comes with a number of tools for backing up critical files. You can access these tools through the Terminal window, or through AppleScript. You can also use AppleScript Studio to give these tools a nice user-friendly interface.
Backups are an important part of any computing process. They are an effective solution against data loss. Keep in mind, however, that backups are only effective if you do them periodically. Skipping a scheduled backup is tantamount to adding a crack in the proverbial armor. Perhaps this is why Mac OS X 10.5 comes with a continuous backup system known as Time Machine.
Bibliography and References
Free Software Foundation. "Perform Backups and Restoring Files". GNU Tar 1.16.1 (2006 Dec 7). Copyright 1992-2006. Free Software Foundation, Inc. Online: http://www.gnu.org/software/tar/manual/tar.html.
Wikipedia. Backup (2007 Apr 12). Copyright 2007. Wikipedia Foundation, Inc. Online: http://en.wikipedia.org/wiki/Backup.
Wiebe, James. "Types of Backup Systems". Backups for Everyone. Copyright 2007. WiebeTech LLC. Online: http://www.wiebetech.com/whitepapers.php.