Archiving with Tar
Volume Number: 23 (2007)
Issue Number: 10
Column Tag: Programming
Archiving with Tar
Learn how to use the tar tool
by José R.C. Cruz
Introduction
Today, we will look at the command-line tool tar. We will learn how to use this tool to create a basic archive. We will also learn how to use the tool to manage the contents of the archive. Finally, we will learn how to integrate the tar tool with the Xcode environment.
Version 10.4.2 of OS X ships with version 1.14 of the tar tool. Later versions of OS X may ship with newer versions of that tool. [Ed. note: 10.4.10 also contains tar version 1.14.]
So, start up a Terminal session and learn how to work with this flexible tool.
Tarballs and the Tar Tool
The tar tool is found in most Unix-based systems such as MacOS X. It is located in the /usr/bin directory of the boot volume. The original function of the tool is to copy a group of files and store the copies into an external tape unit. Now, the tool will combine those same copies into an archive file called a tarball.
The tarball is a cross-platform archive. It can be shared across multiple platforms without any loss of data. A tarball is also safe since it does not allow any self-executing code. It is also easily compressed to reduce its overall size.
Both the tar tool and the tarball format are open-source. Users can download the project files and develop custom solutions for their archive needs. Also, the tar tool itself is a free tool.
Alternatives to tar
MacOS X, however, provides other ways to archive files. One way is to use the DMG file. This file is created using either the Disk Utility or the hdiutil tool. Double-clicking on the file causes it to be mounted on the desktop. Files are then copied or moved to the mounted DMG file.
Most OS X developers use a DMG files to distribute their Cocoa applications. Some use it to distribute installer packages. Users can also use a DMG file to back up data onto optical media such as CD-Rs. DMG files, however, are supported only on MacOS X.
Another way to archive files is to use either a SIT or a ZIP file. A SIT file is created using either DropStuff or StuffIt Deluxe. A ZIP file is created using either DropZip or the zip command-line tool. These applications compress the files before adding them to the archive. To get the contents of the archive, a separate tool such as StuffIt Expander is often used.
Both SIT and ZIP files are cross-platform formats. Also, the SIT file preserves any resource forks found in the files. It even has the extra option of supporting passwords to encrypt its contents. [Ed. note - OS X created zip files will also preserve resources forks by using the AppleDouble format.]
But the SIT file is a closed format. The tools needed to create the file are not free; they must be purchased from a reputable vendor. The ZIP file, on the other hand, is an open format. But it lacks some of the features offered by the tar tool.
Basic Tar Usage
Using the tar tool is easy and straightforward. Most tar statements follow the same basic syntax shown below.
tar --file=path_to_tarball --subcommand [--subcommand] \
[path_to_payload [path_to_payload]]
The --file argument is the path of the tarball. The --subcommand argument is the action to be taken by the tool. A tar statement can have more than one subcommand. But in most cases, one subcommand is enough.
The last argument is the path to the file payload. There can be more than one payload path for a given tar statement. But there are subcommands that do not need this argument. These subcommands are often used to manage the tarball itself. We will learn more about them later in the article.
Creating a tarball
Assume that we have two files: foo.txt and bar.log. To create the tarball, foobar.tar, use the --create subcommand..
tar --file=foobar.tar --create foo.txt bar.log
First, the tar tool copies the two file payloads. It then combines the copies into a single tarball (Figure 1). Next, the tool adds header data (olive) to each archived payload. This data identifies the location and size of each payload in the archive.
Figure 1. Creating a tarball.
The example will create the tarball in the same directory as the payloads. To create the tarball in a separate directory, e.g. /Volumes/Downloads, pass the full path to the --file argument.
tar --file=/Volumes/Downloads/foobar.tar --create foo.txt bar.log
Also, make sure to specify at least one payload when creating the tarball. Otherwise, the tool returns an error if it tries to create an empty tarball.
Adding another file
Now assume that we have another file named nue.xml. To add this file to foobar.tar, use the --append subcommand.
tar --file=foobar.tar --append nue.xml
Figure 2. Adding a file to the tarball.
The tar tool copies the new payload to the end of the tarball (Figure 2). It then updates the header information to reflect the new addition.
Retrieving a file
To retrieve a file, e.g. nue.xml, from foobar.tar, use the --extract subcommand.
tar --file=foobar.tar --extract nue.xml
The tar tool first searches the tarball for the specified file. If the file exists, the tool extracts a copy and saves it on the current directory (Figure 3). Otherwise, the tool returns an error message.
Figure 3. Retrieving a file from the tarball.
Removing a file
To remove a file, e.g. nue.xml, from the tarball, use the --delete subcommand.
tar --file=foobar.tar --delete nue.xml
Again, the tar tool searches the tarball for the specified file. If the file exists, the tool removes the file from the tarball (Figure 4). It also returns an error message if the file does not exists.
Figure 4. Removing a file from the tarball.
If the file to be deleted was located in a different directory, make sure to use the correct path for that file. For instance, if bar.log was originally in /Library/Logs, type the tar statement as follows.
tar --file=foobar.tar --delete /Library/Logs/bar.log
Listing the contents
To list the contents of foobar.tar, use the --list subcommand.
tar --file=foobar.tar --list
The tar tool reads the header information of the tarball, and displays it as a simple list (Figure 5). For a more detailed list, add a --verbose subcommand to the statement.
tar --file=foobar.tar --list --verbose
The tar tool then displays the contents of the tarball, and the metadata of each archived file (Figure 6).
Figure 5. Listing the contents of the tarball.
Figure 6. Displaying a more detailed list.
Compressing the tarball
The tar tool stores the files into the tarball uncompressed. But it can use other tools to compress the tarball itself. The current version of the tool uses one of three compression tools: compress, bzip2, and gzip. To compress the tarball, select which compression tool to use when creating the tarball. Make sure to use the same tool when working with the compressed tarball.
For example, to create foobar.tar and compress it using the gzip tool, add the --gzip subcommand.
tar --file=foobar.tar.gz --create --gzip foo.txt bar.log
As usual, the tar tool combines the files foo.txt and bar.log into foobar.tar. Then, it uses gzip to compress the entire file (Figure 7).
Figure 7. Compressing with gzip.
If you want to use the compress tool, use the --compress subcommand. For the bzip2 tool, use the bzip2 option.
Make sure to add the correct file extension to the tarball's filename. The extension specifies which tool was used. For the compress tool, use the extension .Z. For the bzip2 tool, use .bz2; the gzip tool, use .gz.
As stated earlier, use the same compression tool when managing the tarball. Otherwise, the tar tool will return a message stating that the file is not valid. For instance, to list the contents of foobar.tar.gz, type the tar statement as follows.
tar --file=foobar.tar.gz --list --gzip
To add nue.xml to the tarball, type the following statement.
tar --file=foobar.tar.gz --append --gzip nue.xml
Advanced Tar Usage
The tar tool also has a number of subcommands to do various tasks. Some subcommands control how files are added to the tarball. Some control how files are extracted from the tarball. Others control how the tarball is managed.
For reasons of length, we will cover only those subcommands useful for daily tasks. You can, however, view the rest by typing info tar at the Terminal prompt.
Modifying the archived metadata
When you add a file to the tarball, the tar tool preserves the metadata assigned to the file. But you can use the right subcommand to override this behavior. Also, the subcommand affects only the archived file; it does not affect the original.
Assume you want to add the bar.log file to foobar.tar. To change the permissions of bar.log, use the --mode subcommand to pass the new settings.
tar --file=foobar.tar --append bar.log --mode=new_permissions
The tar tool makes a copy of bar.log and sets its permissions to new_permissions. The tool then adds the modified copy to the end of foobar.tar. For instance, Figure 8 shows the tool changing the permissions of bar.log from 444 to 747.
Figure 8. Changing the file permissions.
In the above example, the new permissions are passed as octal values. But you can also pass a formatted string to the --mode subcommand. This is handy when you find it difficult to think in octal terms. For instance, instead of passing the octal value 747, pass the string value of uo=rwx,g=r. To learn how to write permissions as formatted strings, type info chmod at the Terminal prompt.
You can also use the tool to change the uid and gid of the archived file. The uid (user id) specifies the user who owns the file. By default, this is either your username or root. The gid (group id) specifies the group to which the user belongs. This is often either admin or wheel.
Assume again that you are adding bar.log to foobar.tar. To change the uid of bar.log, use the --owner option.
tar --file=foobar.tar --append bar.log --owner=new_uid
The tar tool copies bar.log and sets its uid to new_uid. It then adds the modified copy to the end of foobar.tar. For instance, Figure 9 shows the tool changing the uid of bar.log from john to smith.
Figure 9. Changing the file uid.
Now to change the gid of bar.log, use the --group option.
tar --file=foobar.tar --append bar.log --group=new_gid
Again, the tar tool copies bar.log and sets its gid to new_gid. As usual, it adds the modified file to the end of foobar.tar.
You can also change both uid and gid at the same time. To do so, use the --owner and --group options on the same statement.
tar --file=foobar.tar --append bar.log --owner=new_uid --group=new_gid
Make sure that new_uid or new_gid exists. Otherwise, the tar tool will return an error message stating that these values are not valid. Use the NetInfo Manager tool to check if these values exists on your OS X system.
Selecting files for the tarball
When you select files for the tarball, you list their names or paths at the end of the tar statement. This becomes unwieldy when selecting large sets of files. Naturally, the tar tool supports other ways of adding files to the tarball.
One way is to copy the files into a separate directory. Then use the directory itself as the input argument. For example, assume the files are in the directory sample. To create foobar.tar using files from that directory, type the tar statement as follows.
tar --file=foobar.tar --create sample/*
Again, this statement assumes foobar.tar will be in the same directory as sample.
The wildcard character '*' at the end of sample tells the tar tool to add all the files from that directory into foobar.tar. You can, however, use the --exclude subcommand to filter out specific files. For example, to exclude all XML files from foobar.tar, type the tar statement as follows.
tar --file=foobar.tar --create sample/* --exclude="*.xml"
The tar tool examines the contents of the directory sample. It then creates foobar.tar and adds the files from sample, except those with an .xml extension (Figure 10).
Figure 10. Adding and excluding files from a directory.
Make sure to enclose the filter pattern in double quotes. Avoid using regex patterns as they are not currently supported.
A second way of adding large sets of files is to use a text file containing a list of those files. Then pass the text file to the tar tool using the --files-from subcommand. For example, to use the file sample.list, type the following statement at the prompt.
tar --file=foobar.tar --create --files-from=sample.list
The tar tool retrieves the list of files from sample.list. It then creates foobar.tar using the files specified by the list (Figure 11). You can still use the --exclude subcommand to filter out unwanted files.
Figure 11. Creating the tarball using a list of files.
Selecting files from the tarball
When selecting files from the tarball, you list their names or paths at the end of the tar statement. The tool retrieves the specified files and saves them on the current directory. If the directory has a file with the same name as the extracted one, the tool will replace that file.
Of course, you can use a pattern to select which files to retrieve. For example, to retrieve only those files with a .log extensions, use the *.log pattern at the end of the tar statement.
tar --file=foobar.tar --extract *.log
You can also control how the tar tool replaces existing files. To replace only those files older than the ones in the tarball, use the --keep-newer-files subcommand.
tar --file=foobar.tar --extract --keep-newer-files
To stop the tool from replacing any file, use the --keep-old-files subcommand.
tar --file=foobar.tar --extract --keep-old-files
You can also save the retrieved files into a different directory. Use the --directory subcommand to pass the directory path to the tool. Make sure the directory exists, or the tool will return an error message.
For example, assume you want to save all retrieved files into the directory output. To use that directory, type the tar statement as follows.
tar --file=foobar.tar --extract --directory=output
The tar tool extracts all the files from foobar.tar, and stores them into the output directory (Figure 12).
Figure 12. Saving files into a separate directory.
Modifying the retrieved metadata
When the tar tool retrieves a file from the tarball, it sets the uid of the file to that for the current user. The tool also sets the file's permissions to those assigned to the user. But the tool leaves the modification date of the file unchanged.
You can change these behaviors with the right subcommand. To demonstrate, assume the archived file bar.log has a uid of john and a permissions flag of 767. Also, assume that its modification date is 20070501.
To keep the same uid assigned to the file in the tarball, use the --same-owner subcommand.
tar --file=foobar.tar --extract bar.log --same-owner
The tar tool extracts the file bar.log from foobar.tar, and leaves the file's uid unchanged (Figure 13).
Figure 13. Keeping the same uid.
To keep the same permissions for the file, use the --same-permissions subcommand.
tar --file=foobar.tar --extract bar.log --same-permissions
Again, the tar tool extracts bar.log from foobar.tar, and leaves the file's permissions unchanged (Figure 14).
Figure 14. Keeping the same permissions.
Finally, to change the modification date of the extracted file, use the --touch option.
tar --file=foobar.tar --extract bar.log --touch
First, the tool extracts bar.log from foobar.tar. It then changes the file's modification date from 2007-05-01 to the retrieval date. For instance, if the tool retrieved bar.log on 2007 May 12, it sets the modification date to that date (Figure 15).
Figure 15. Changing the modification date.
Managing the tarball
You can use the tar tool to compare the contents of the tarball against the files on the drive. You can use it to merge the contents of one tarball into another. You can also use the tool to change the format used by the tarball.
Assume that the tarball foobar.tar contains the files foo.txt and bar.log. To compare the archived files against those on the drive, use the --compare subcommand.
tar --file=foobar.tar --compare
The tar tool then determines which file has changed in terms of size or modification date (Figure 16). It displays its comparison results onto the Terminal window. To save the results into a separate file, e.g. compare.log, use the > redirection command.
tar --file=foobar.tar -compare > compare.log
Figure 16. Comparing the contents of the tarball.
Now assume you have a second tarball fubar.tar. This tarball contains two files: nue.xml and neu.htm. To merge this tarball with foobar.tar, use the --concatenate subcommand.
tar --file=foobar.tar --concatenate fubar.tar
First, the tar tool retrieves the contents of fubar.tar. It then adds the retrieved files into foobar.tar. Finally, the tool updates the header data of foobar.tar to reflect the new additions (Figure 17).
Figure 17. Merging two tarballs.
Finally, you can change the format of the tarball with the --format subcommand. This subcommand allows you to create tarballs that can be opened by other versions of the tool.
The latest version of the tar tool supports five basic formats (Table 1). By default, the tool uses the gnu format to create its tarball. Future versions of the tool will start using posix as the default format. To find out the default format of your tool, type the following statement at the prompt.
tar --help | tail -n 5
Table 1. List of supported tarball formats.
Format Description
gnu
Format used by tar tool versions 1.12 and newer. Has support for sparse files and incremental archives.
oldgnu
Format used by tar tool versions older than 1.12.
v7
Format used by tar v7. It is used by the Automake utility when producing makefiles.
ustar
Format defined by POSIX.1 1988 specification. Has support for symbolic ownership information and special files.
posix
Format defined by POSIX.1 2001 specification. Designed as the most flexible and feature-rich of all five formats.
Assume you want to create foobar.tar using the files foo.txt and bar.log. To create the tarball using a posix format, type the tar statement as follows.
tar --file=foobar.tar --format=posix --create foo.txt bar.log
To add the file nue.xml to foobar.tar, type the statement as follows.
tar --file=foobar.tar --format=posix --append nue.xml
Remember, once you have selected a tarball format, make sure to use the same format for all other tar tasks.
Tar and XCode
So far, we learned how to use the tar tool from the command-line. Now, we will learn how to use the tool from Xcode. Access to the tool can be done in one of two ways. The first way is with a run script phase; the second is with a menu script.
Feel free to modify these scripts to suit your needs.
Using the run script phase
Listing 1 shows one example of using the tar tool through the Xcode run script phase. First, the script makes a list of files in the project directory. Then it prepares the tarball's filename from the script variable PROJECT_NAME. To ensure portability, the script strips out all spaces from the prepared name.
Next, the script checks if a tarball already exists with the same name. If one does exist, the script deletes that tarball. Then the script parses its list of files, and checks if each one exists. If the file exists, the script adds the file to the tarball. Also, the script skips over the build directory to avoid adding the tarball to itself.
Listing 1. The Xcode run script phase.
# Retrieve a list of files in the current directory
TAR_LST=`ls `
# Prepare the path to the tarball
TAR_NOM=${PROJECT_NAME// /}
TAR_NOM="${TAR_NOM}.tar"
TAR_NOM="$TARGET_BUILD_DIR/$TAR_NOM"
# Does the old tarball exists?
if [ -e $TAR_NOM ]
then
# delete the old tarball
rm -rf $TAR_NOM
fi
# Parse through the list of files
for TAR_ITM in $TAR_LST
do
# does the file exists?
if [ -e $TAR_ITM ]
then
# is the file really the build directory?
if [ $TAR_ITM != "build" ]
then
# does the tarball exists?
if [ -e $TAR_NOM ]
then
# update the tarball
tar --file=$TAR_NOM --append $TAR_ITM
else
# create a new tarball
tar --file=$TAR_NOM --create $TAR_ITM
fi
fi
fi
done
Using the menu script
Listing 2 shows how to use an Xcode menu script to access the tar tool. The script uses the same code as the run script phase. But it improves over the latter by providing user interaction.
First, the script creates a default name for the tarball with the date tool. It passes the format string +%y%m%d%H%M to the tool to format the tool's output. The tool returns the results as a string.
Next, the script prompts the user for a location where it will store the tarball. It also prompts for a name for the tarball, offering the default name as a possible value. The user enters the information on the dialog and clicks on the OK button.
The script then checks if a tarball exists with the same name at the chosen location. If one does exist, the script deletes the tarball. Next, the script gets a list of files in the project directory. It parses the list, and adds each named file to the tarball. The script also skips over the build directory as a preventive measure.
Listing 2. The Xcode menu script.
#! /bin/bash
#
# -- PB User Script Info --
# %%%{PBXName=Archive the Project}%%%
# %%%{PBXInput=None}%%%
# %%%{PBXOutput=SeparateWindow}%%%
# %%%{PBXKeyEquivalent=}%%%
# %%%{PBXArgument=}%%%
# %%%{PBXIncrementalDisplay=YES}%%%
#
# initialize the following shell variable
TAR_MSG="Save the tarball as"
# prepare a default archive name
TAR_DEF=`date "+%y%m%d%H%M"`
TAR_DEF="${TAR_DEF}.tar"
# select a backup filename
TAR_PTH=`%%%{PBXUtilityScriptsPath}%%%/AskUserForNewFileDialog Â
"$TAR_MSG" "$TAR_DEF"`
# does the tarball already exist?
if [ -e $TAR_PTH ]
then
# delete the old tarball
rm -rf $TAR_PTH
fi
# retrieve a list of project files and subdirectories
TAR_LST=`ls`
# Parse through the list of files
for TAR_ITM in $TAR_LST
do
# does the file exists?
if [ -e $TAR_ITM ]
then
# is the file really the build directory?
if [ $TAR_ITM != "build" ]
then
# does the tarball exists?
if [ -e $TAR_PTH ]
then
# update the tarball
tar --file=$TAR_PTH --append $TAR_ITM
else
# create a new tarball
tar --file=$TAR_PTH --create $TAR_ITM
fi
fi
fi
done
echo "Finished creating the archive at: $TAR_PTH"
Concluding Remarks
The tar tool makes it easy to combine multiple files into a single tarball. The tarball format is supported by different platforms, and is both simple and open. Various tools can also compress the tarball to reduce its size.
In most cases, you will need a Terminal session to use the tar tool, but you can also use the tool in your Xcode sessions. All you need to do is to write a run script phase or a menu script to store your project files into a tarball.
The tar tool is reliable, full-featured, and free. Perhaps that is why it is a popular tool amongst open-source developers.
Bibliography and References
Bonney, Laurence. "Tarball advantages". Reduce compile time with distcc. 2004 Jun 22. Copyright 2004. DeveloperWorks, IBM. Online:
http://www-128.ibm.com/developerworks/linux/
library/l-distcc.html.
Free Software Foundation. GNU tar: An Archival Tool. 2006 December 7. Copyright 1992, 1994-1997, 1999, 2000, 2001, 2004-2006. Free Software Foundation, Inc. Online: http://www.gnu.org/software/tar/manual/tar.html
Wikipedia. Tar (file format) (2007 May 04). Copyright 2007. Wikipedia Foundation, Inc. Online: http://en.wikipedia.org/wiki/Tar_file_format.
JC is a freelance engineering writer who lives happily in North Vancouver, British Columbia. He divides his time between writing technical articles, and teaching origami at his district's public library. He can be reached at anarakisware@gmail.com.