Sharing with Git
Volume Number: 23
Issue Number: 03
Column Tag: Programming
Sharing with Git
Learn how to use Git to share
your Xcode projects
by José R.C. Cruz
Introduction
CVS, Subversion, and Git let several users work the same project. Users can contribute their changes and avoid getting in each other's way. They can retrace their steps if they make a mistake. They can also review each other's changes with little effort.
But CVS and Subversion use a central repository, at risk to data loss without a good backup plan. Git, on the other hand, supports the use of distributed repositories. Each user keeps a separate repository for the same project. They then update each other's repositories at specific times in the project schedule.
This article shows how to use Git to setup a basic distributed repository. It shows how to move data between repositories. It also shows other ways to send data updates when those repositories are not available.
Planning Your Repository
Git lets you setup your project repository in many ways. The simplest way is to use a central repository (Figure 1). User A checks out a copy of the project on which he makes his changes. Then he commits his changes back into the repository, thus allowing user B to update her copy as well.
Figure 1. The centralized setup
But this setup, as stated earlier, has a set of problems. First, losing the repository means losing all project data. Second, users need constant access to the repository in order to commit their changes. Third, the repository can get large and unwieldy as time goes on.
A different way is to use a two-tier setup (Figure 2). Here, a public repository contains all the files needed by the project. Next, users A and B have their own private repositories, which are direct duplicates of the public one. User A makes his changes to the project and updates his repository. User B also does the same to hers. Then, at an agreed time, users A and B update the public repository with data from their private ones. They also work with the project head to resolve any conflicts that occur.
Figure 2. The two-tier setup
Since users A and B have their own repositories, they access the public one only when they need to. They are also free to try out new ideas, knowing that they will send only those changes that worked to the public repository. And if user A loses his repository, he can make a new one using the public repository.
A more radical way is to use a round-robin setup (Figure 3). First, user A creates a private repository for his project. Then he creates a public copy of that same repository. User B, who just joined in, gets a copy of user A's public repository. She then makes another copy to serve as her public repository. During the course of the project, user A sends updates from his private to his public repository. But he receives updates from user B's public one. The same steps also hold for user B.
Figure 3. The round-robin setup
The round-robin setup removes the need for a central repository. It does, however, need more storage resources. It can also be more difficult to maintain when a lot of users are involved.
Enter The Daemon
You normally store the public repository on the network volume. How you access it depends on what server daemon you used. That same daemon also dictates the setting for the global option –git-dir or for the environment variable GIT_DIR.
For example, assume you place the repository for project foobar on the network volume foobar.net. If the server daemon is Apache, add the following URL in your Git commands.
git –git-dir=http://foobar.com/foobar/ ...
If it is rsync, add the following lines to your .bash_profile file.
GIT_DIR=rsynch://foobar.com/foobar
Export $GIT_DIR
Git also supports the use of https and ssh as valid daemons. But the one we will look into is the git daemon.
The git daemon
The git daemon provides basic TCP/IP services for the Git repository (Figure 4). It listens to port 9418 and handles any SCM requests that appear on that port. It is easy to setup and use than most network servers. It does not yet support authentication, but this support may appear in future revisions.
Below is the basic syntax of the git daemon command.
git daemon options –base-path=volume_path white_list
The options argument consists of one or more flags to control the daemon. The –base-path option sets the path to the network volume holding the repositories. The white_list argument is a list of paths to each repository serviced by the daemon.
Figure 4. The git daemon
Launch the daemon
Assume that your project repository foobar is in the network volume /Volumes/Server/Projects. Also, assume that the volume uses the domain name foobar.net. To start the daemon, enter the following statement at the Terminal prompt.
git daemon –base-path=/Volumes/Server/Projects \
/Volumes/Projects/foobar &
The '&' token tells the daemon to launch itself in the background. Otherwise, you will be unable to enter more command unless you stop the daemon with a Ctrl-C keystroke. To check if the daemon is running correctly, type the following statement at the prompt.
git ls-remote git://foobar.net/foobar
Git should then display a list of nodes and SHA1 keys as shown below.
b491b9426501f78575632588d2a86dbbb242df1d HEAD
b491b9426501f78575632588d2a86dbbb242df1d refs/heads/master
Now assume you want to include the repository barfoo to the daemon's white list. To do so, first stop the daemon using the kill command to. Then launch the daemon again as follows.
git daemon –base-path=/Volumes/Server/Projects \
/Volumes/Projects/foobar /Volumes/Projects/barfoo &
When you enter the directory paths, make sure to type the paths in full. Avoid using shortcut tokens such as '~'. Also avoid adding a trailing '/' to each directory path. Either token will only prevent the daemon from finding the repository.
Starting the daemon from a Terminal session does have one problem. If you end the session, i.e. by closing the window, you stop the daemon as well. A better approach is to start the daemon using either inetd or launchd. For instructions on how to do so, see the references listed at the end of this article.
Configuring the daemon
You can change how the daemon behaves by using one or more options. For instance, the daemon sends its error messages to stderr, which is the Terminal window. To send those messages to the file system.log instead, add the –syslog option.
git daemon –syslog ...
Also, the daemon listens to port 9418 for incoming requests. To use a different port, e.g. 9500, pass the new number using the –port option.
git daemon –port=9500 ...
Make sure that the new port number is not used by other network services. To see if the number is available, check the /etc/services file as follows.
more /etc/services | grep 9500
By default, the daemon uses only those repositories in its white list that have the zero-byte file gitdaemonexportok.
To remove this limitation, add the –export-all option.
git daemon –export-all ...
The daemon now handles all the repositories in its white list, as long as they have the subdirectories .git/objects and .git/refs. If these two subdirectories are missing, the daemon will return an error message. You can also add the zero-byte file to the repository by entering the following statement.
touch /Volumes/Projects/foobar/.git/config/git-daemon-export-ok
For a detailed list of other daemon options, read the following online document.
http://www.kernel.org/pub/software/scm/git/docs/git-daemon.html
Facing The Public
Before you let other users to access your repository, first make sure that its data is ready for public use. Check if the project compiles without any problems. See if the project has the minimum set of support documents such as a README and a version history file. Use the git fsck and git gc commands to remove unwanted data from the repository. Also, backup the repository before making it public.
These extra checks will help ensure a reliable and error-free repository.
Making clones
Use the git clone command to make a copy of the project repository. The command stores the copy in the location of your own choosing. It also updates the copy with data that links it back to the original.
Figure 5. The git clone command
The basic syntax of the command is as follows.
git clone options repository_url destination_path
The repository_url argument is the URL to the repository being copied. The destination_path argument is the location of the copy. The options argument consists of one ore more settings to control the copy process.
Assume, for example, you want a copy of the project repository foobar. To make the copy, enter the git clone command as follows.
git clone git://foobar.net/foobar /Volumes/Projects/private
First, Git sends a copy request to the daemon. The daemon responds by sending back the repository data. Next, Git stores the copy in the directory /Volumes/Projects/private. It then marks the copy as the master, the original as the origin.
Choosing which repository to copy depends on the project's state and setup. In an ongoing project, make a copy of public repository. The copy then serves as your private repository. In a new project, however, make a copy of your private repository. In this case, the copy itself becomes the public one.
Exchanging data
Use the git pull command to move data from the public repository to your private one. This command uses the public data to update your repository (Figure 6). It also updates your working copy, if present. It then reports any conflicts found during the update.
Figure 6. The git pull command
Below is the basic syntax of the git pull command.
git pull options repository_url references
The options argument consists of one or more settings to control the pull process. The repository_url argument is the location of the public repository. The references argument consists of the nodes or branches where the data is to be pulled. Leave this blank to retrieve only the latest updates.
Assume you want to update your private copy of the foobar repository. To start the update, enter the git pull command as follows.
git pull git://foobar.net/foobar
The above command gets the latest updates from the public repository. Then it merges them with the HEAD node on the private copy. To get updates from a specific node, e.g. version_1, enter the command as follows.
git pull git://foobar.net/foobar version_1
To display a diff report after data is retrieved, add a –summary option.
git pull –summary git://foobar.net/foobar
To keep the data from merging with your private repository, add a –no-commit option.
git pull –no-commit git://foobar.net/foobar
Use the git push command to move data from your private repository to the public one (Figure 7). If you are using the git daemon, make sure to tell the daemon to allow the push request. To do so, enter the following git config command on the server side.
git config daemon.receivepack true
Figure 7. The git push command
The git push command has the same syntax as the pull command. But it has a much smaller set of options than the latter. Also, its use consists mostly of moving the latest updates to the public one. For instance, to update the public repository foobar, enter the command as follows.
git push git://foobar.net/foobar master:master
Make sure to update your private repository first before updating the public one. You will find that it is much easier and safer to resolve any conflicts on the private copy than on the public one.
The Daemon Is Down
There can be times when you will be unable to connect to the public repository. The server daemon may have crashed or terminated. The network may be unreachable. The repository itself may be corrupted. Or the server that hosts the repository may be getting upgrades.
For these cases, Git gives you two ways to still share your project changes with other users. You can use Git to make patches from your private repository. You can also use it to archive specific revisions in your repository. Then, you can send these patches or archives to those users via other means such as e-mail or portable media.
Making patches
Use the git diff command to make a patch for specific revisions of your project files (Figure 8). You can make a patch for the working copy of a project file against the one in the repository. You can make another patch for two revisions of the project in the repository.
Figure 8. The git diff command
For example, to make a patch for the project file foobar.html, enter the command as follows.
git diff foobar.html > foobar.diff
Git first compares foobar.html with its latest revision in the repository. Then it stores any changes it finds in the patch file foobar.diff. To make a patch against the second latest revision of foobar.html, enter the command as follows.
git diff HEAD^ foobar.html > foobar.diff
To make a patch for the latest and second latest revision of project foobar, enter the command as follows.
git diff HEAD HEAD^ > foobar.diff
To make a patch for a binary file such as foobar.jpg, add a –binary option to the command.
git diff –binary foobar.jpg > foobar.diff
To make a patch for more than one file, enter their names as a space-delimited list.
git diff –binary foobar.html foobar.jpg > foobar.diff
Notice that the above example uses the –binary option since at least one of the files is a binary file. You can exclude this option if all the files are text files. Also, patch files use either a .diff or .patch suffix. But you can use your own suffix as long as you inform this to your fellow users.
Use the git apply command to update your repository with a patch. But check first to see if the patch is for a group of text files or for a group of text and binary files. If the patch is for text files only, enter the command as follows.
git apply foobar.diff
If it is for binary files only or for text and binary files, add a –binary option to the command.
git apply –binary foobar.diff
Either way, Git will use the patch to update the right file. It will, however, leave the updated files uncommitted. You must verify and commit the updated files manually. Doing so will protect your repository from unwanted changes.
Making archives
Use the git archive command to archive revisions of a project in the repository (Figure 9). This command can make two types of archives: tarballs and zip files. Choose the one most suited to your needs.
Figure 9. The git archive command
The basic syntax of the git archive command is as follows.
git archive options –format=type revision > archive_path
The options argument consist of one or more flags to control the archival process. The –format option selects the archive format. The revision argument is the project revision to be archived. And the archive_path argument is the name and path of the archive.
Assume you want to archive the latest revision of the project repository foobar. To make a tarball archive, enter the command as follows.
git archive –format=tar HEAD > foobar.tar
To make a zip archive, change the –format option as follows.
git archive –format=zip HEAD > foobar.zip
Notice that both examples has the archive's suffix set explicitly. Make sure to do so as the command itself will not set the right suffix for you.
To display a list of files added to the archive, add a –verbose option.
git archive –format=tar HEAD –verbose > foobar.tar
To add a prefix, e.g. foo_, to each archived file, use the –prefix option.
git archive –format=tar –prefix=foo_ HEAD > foobar.tar
Finally, if you get an archive from another user, use either the tar or unzip tool to open the archive. But make sure to do so in a directory separate from your working copy. That way you will avoid overwriting your project files by accident.
Concluding Remarks
Git makes it easy for several users to share their project data with each other. It supports most server protocols, as well as provides its own server daemon. It can move data from one repository to another, or copy a specific repository. It can make patches or archives for specific project revisions.
These sharing features makes Git well suited for handling distributed setups. But Git can also handle a centralized setup, which may suit certain projects. This flexibility is one of the reasons why Git is getting a lot of notice from the developer world.
Bibliography and References
Marczak, Edward. "launchd: Judge, Jury, Executioner". MacTech Magazine, XXI:6. Xplain Corporation. Online:
http://www.mactech.com/articles/mactech/Vol.21/21.06/launchd/index.html.
Fields, J. Bruce. "Sharing development with others". Git Users Manual. 2007 Aug 11. From the Linux Kernel Archives. Online:
http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#sharing-development
Hamano, Junio C. "Repository Administration". Everyday Git with 20 Commands or So. 2007 Nov 03. From the Linux Kernel Archives. Online:
http://www.kernel.org/pub/software/scm/git/docs/everyday.html
JC is a freelance engineering writer who lives happily in North Vancouver, British Columbia. He is a regular contributor to MacTech and REALbasic Developer. He also spends quality time with his nephew. He can be reached at anarakisware@gmail.com.