Focus Review: Perforce
Volume Number: 20 (2004)
Issue Number: 04
Column Tag: Review
Focus Review: Perforce
By Paul Pharr
Powerful version control for the Mac (and all those other platforms)
Figure 1 -
P4V, the MacOS X client GUI
Introduction
When I was in engineering school, my software engineering professor made a point of saying to the
undergraduates, "Use the tools you have." This bit of real-world advice was useful both because the
tools we use are rarely the enabling factor for the success of software projects, and also because
individual engineers are rarely in a position to dictate the tools to be used on a project. Mac
developers are especially aware of limitations imposed on their choices by factors beyond their
control. With this in mind, Mac developers are fortunate to have Perforce as an available and
well-supported option when shopping for a revision control system.
My Perspective
I work for Nemetschek North America - formerly Diehl Graphsoft - makers of MiniCAD & VectorWorks
CAD software. We have a development environment with about 40 engineers interacting with our source
code base through Perforce - a tool that we selected three years ago to replace Microsoft's Visual
SourceSafe. Before that (a long time before) we used MPW's Projector from Apple for version control.
I will use our SourceSafe experience as a point of comparison, and I'll mention CVS comparisons if I
happen to know, but I have never used CVS in a production environment. I will try to use Perforce
terminology where possible and clarify it where necessary. One exception to this is that I will use
the term "check-out" in some cases to indicate the functionality known in Perforce terms as "Open
for Edit" because it makes the description more accessible for non-Perforce users.
What is version control
I think it's a safe bet that anyone who works with other developers on a team of any size has a
pretty good idea of the basic elements of version control, but to get everyone on the same page,
I'll outline some of the basic features. All version control systems allow multiple engineers to
work within the same code base by serving as a file librarian and tracking individual users' work on
the files that are being modified. They maintain a history of the changes made to files within the
system so that earlier versions of the files can be retrieved and differences between versions can
be displayed. Version control systems also provide a mechanism to track some additional data about
the changes such as who made a change, a description of what was done, and when it happened. Beyond
this very basic set of functionality, the capabilities of various systems diverge.
System Capabilities
Perforce, produced by Perforce Software in Alameda, California, is a modern and full-featured
version control system intended to be the main repository of all of a software project's files,
structure, and history. It has a feature set which can look similar to that of CVS, PVCS,
SourceSafe, or ClearCase. With such products, however, a high level outline of the feature set often
leaves out a lot about how the product will actually work in a given development environment.
Perforce differentiates itself in the following ways:
- Robust
- Fast & Efficient
- Automated merging
- Inter-File Branching Model
- Atomic
change submission
- Low administration overhead
I'll cover each in some detail, and then describe some of the Mac-specific tools and
functionality. But first, I'll introduce some basic concepts and terminology.
Perforce basics
Understanding Perforce involves a few concepts that I'll cover from a user's point of view so
they can be used as a point of reference for examples that follow. Perforce is a client-server
system in which the main database runs on a single central server and clients connect using a TCP/IP
based protocol to interact with the server. The server can be run on or MacOS X, Windows NT/XP, or
various flavors of Linux and Unix. Command line client software is available for almost any
conceivable platform, whereas more mature GUI client implementations are available for fewer
platforms. Windows has the most mature GUI client software called P4Win - implemented as native
Windows code. Mac, Windows, and Linux share a more recently introduced, but very full featured
client called P4V (short for Visual) which is about a year old. It is implemented using the QT
cross-platform toolkit and is fast and reliable with a native looking GUI on the platforms it
supports. The second major release of P4V is in late beta and is what I used on the Mac while
working on this review.
Perforce is set up with user accounts for those that will be accessing the system. The "depot" -
Perforce's term for the main hierarchy of files under version control - is populated with the files
that make up the development projects of the company or department using Perforce. Each user can
have one or more client workspaces, which are each associated via preferences maintained on the
server with a particular root path on their development machine. Users can have as many workspaces
on one or more machines as they find useful. In normal use, the user will keep a copy of some part
of the overall depot on their local file system. They will update their workspace files with changes
made to the depot by others (described by Perforce as "Sync-ing"), edit files within their local
workspace, and submit changes back to the depot.
From this description, Perforce is similar to other version control tools available. Now we'll
look at the details that differentiate Perforce.
Perforce is Robust
Keeping source code safe is one of the highest priorities of a software developer, and it's good
to know that the tool that is most responsible for the safety of your code places a high priority on
maintaining a robust repository. Perforce is architected to facilitate recovery if disaster strikes,
but is implemented so well that recovery is rarely if ever necessary.
Perforce is a client-server system in which all client interaction takes place via a TCP/IP
connection to a single centralized server. This eliminates a plethora of potential problems compared
to systems such as SourceSafe where multiple clients access database files through a shared file
system. This architecture gives the server responsibility for recording changes to system data in a
way that allows full recovery should disaster ever strike. Perforce uses an industrial strength
database for its metadata and provides for checkpointing and journaling, thereby allowing full
recovery from most disaster scenarios. Source code files in Perforce are stored using
industry-standard formats for reverse-delta storage, compression, and Mac resource file encoding
allowing recovery of their content even if Perforce's databases were completely deleted.
The system also has ample tools to assure you that everything is working as expected. Every
revision of every file is given an MD5 hash which is stored in the database and it is
straightforward to ask the server to verify that every checksum matches for all files and revisions
stored in Perforce. It's an easy and common practice for Perforce sites to regularly verify that
every revision in the system is corruption-free.
Knowing that Perforce is built with recovery in mind gives you the comfort of sleeping well at
night, but productivity is still compromised if you experience frequent problems. Perforce has an
outstanding reputation in this regard among its customers, and our site is certainly evidence of
their high reliability. In our three years of using Perforce heavily, we have seen only one server
crash - related to differencing revisions of a very large file with very long lines. Perforce
support worked with us to carefully but quickly identify the problem and a workaround. They had
isolated and fixed the cause and issued a public update to their entire product line within a week
of the problem report.
Our use of the product has also confirmed for us that it is virtually impossible for any kind of
client failure to cause a database or file corruption on the server. We have occasionally seen bugs
in the client software which affect specific features, but they have never had material impact on
the overall robustness of Perforce, and have, for the most part, been resolved by a subsequent
release of the software. Perforce is, as a whole, at least as robust as any other software we use.
A final factor in robust system performance is that Perforce provides outstanding support -
especially in case of emergency. I have heard of a handful of cases where a Perforce server was
compromised by hardware failure, but have never heard of a significant loss of data. The consensus
in the Perforce user community is that they will do anything in their power to maintain the robust
reputation of their software.
Every one of these points stands in stark contrast to the experiences we had with SourceSafe,
where we would suffer from file corruptions on a weekly basis, and more significant system
corruptions every few months. There were no mechanisms to prevent this or aid recovery. The analysis
tool always complained of dire corruption, but provided no means of fixing it. Support was
non-existent. We felt like we could lose our entire database at any moment. My impression is that
CVS is much better in this respect, but support can still be very hard to come by.
Perforce is Fast & Efficient
Speed is not often considered a feature of software, but in the world of revision control where
individual operations can involve the inspection or transfer of tens of thousands of files, you will
soon come to realize which operations are doing more work than they should - and Perforce prides
itself on having built efficiency into the system from the ground up.
The database used by the server that I mentioned as a key element of the system's reliability
also has dramatic impact on the speed of most normal operations. I'll use updating or syncing a
client's source code as an example. In Perforce, the central database keeps a record of every file
revision held by every workspace. If you ask to sync to the latest revision of a set of files you
don't have in your local workspace, then the server is forced to send you everything, which can be
time consuming. During normal work, however, you will usually already have the current revision of
most of the files you're working on. In that case, Perforce will only need to send you the files
that have changed since your prior sync operation, and it can determine this with a very fast query
on an indexed database without inspecting anything on the client machine's file system. A sync
operation on a project with 10,000 files typically takes a few seconds, unless it is very
out-of-date. The longest sync operations I routinely see are a couple of minutes. SourceSafe and CVS
have no provision for optimizing this common operation and will typically exhibit performance
corresponding to the total number of files in the hierarchy being updated. We would often wait 10-15
minutes using SourceSafe, whereas Perforce is almost always done in seconds.
As an example, after a week of vacation, I synced our main development branch in just over 2
minutes and got 3500 new files of 18,600 total files in the branch. I synced a maintenance branch
and got 5 new files of 10,000 total in about 3 seconds. Most Perforce operations are similarly
efficient, including merging changes between branches. Another example - It took about 15 seconds to
list all 243,000 files in our depot to a text file using "p4 files //depot/... > filelist.txt"
An obvious benefit of this efficiency is that off-site work becomes feasible. Three of our users
are on another continent connected by a 128kbps internet connection. They certainly need to adjust
their work habits to account for the lower bandwidth, but not by much. We have never sent them a
code snapshot, and they have never been at our site. Nevertheless, they have the same level of
project interaction as our local users. Better still, working over a cable-modem sized pipe for
local users connecting from home rarely feels much slower than the 100Mb/s switched Ethernet at the
office. Perforce also has a recently introduced remote caching server called Perforce Proxy intended
to speed up access for an entire remote site. (Our remote site performance without using Perforce
Proxy has been good enough that we have decided not to use this tool yet.)
Perforce Automates Merging
One of the primary benefits of version control is that it enables concurrent development among
engineers. The success of this in a production situation varies depending on the extent to which the
tools have been refined, and Perforce does this very well. Merging or resolving differences is also
an area that has seen marked improvement in the MacOS clients in recent releases - especially P4V,
the new MacOS X GUI client.
There are two situations where you can be exposed to the need to resolve differences. The first
is during normal development when you are working on the same set of files as someone else. The
second is during multi-branch development where one branch has changes which need to be moved or
propagated to another branch. For the sake of this discussion, I'll keep it simple and talk about
the first case, but bear in mind that all resolve functionality is pretty much the same regardless
of whether you are checking in a small file change to the project you are working on or doing a
large inter-branch merge.
For example, you and another developer both check-out and begin editing a file at the same time,
but your changes are more extensive, and take longer. You attempt to submit your changes and find
that the latest revision of the file you've been editing is newer than the one you started with
because the other engineer submitted her changes first. Perforce provides a great deal of control
over this process using their resolve functionality. Whenever you ask Perforce to update a file
using another version of that file, it uses the always-present database to determine what kinds of
changes might be coming across and what kind of action may be necessary. It knows if you have opened
a file for editing operations, so if you ask to sync to a newer revision of that file, you may need
to resolve potential conflicts between the changes made. Similarly, if you try to submit changes to
a file that has been changed in the depot by someone else, you may need to resolve differences. This
is the case described above, and there are a number of resolve options available to the user.
After any operation that can encounter conflicting differences, the Perforce GUI indicates files
that need to be resolved with a special icon. First, you might try an "Automatic resolve" in which
changes made by either engineer will be merged to the result file as long as the changes do not
overlap. This typically works and the file is merged, but if it fails because of overlapping
changes, you will want to interactively merge the changes. When you merge interactively, you are
given the opportunity to review all of the changes made by yourself and the other engineer, and
choose which to use. The conflicting changes are the most interesting, as they are the ones that
require some modification to the code to preserve the original intent of each change in the final
merged file. Perforce will typically handle all but the conflicts automatically, leaving only the
work that benefits most from direct user interaction.
If necessary, Perforce will apply all of these operations to very large sets of files at once. My
company routinely uses feature branches to accomplish large development projects, wherein a complete
set of features is developed outside of our main development branch to avoid disrupting other
engineers. This development effort may go on for months, and affect hundreds or thousands of files.
At the end of all of this, we need to get the changes back into our main branch intact. Without
discussing too many details, I'll cover the process you would use with Perforce in this situation.
First, you'd tell Perforce to "integrate" changes from one branch to another. In other words, you
are specifying what set of changes to merge without telling it specifically how you want that
accomplished. You can limit the scope of the integration based on the path to the files, or specific
versions of the files, but you can just as easily tell it to do the whole set of changes at once.
Perforce will then "check-out" all of the files in the destination branch that were changed in the
source branch. All of these files now have the special icon that indicates Perforce is waiting for
you to specify how to resolve differences.
You would then tell Perforce to automatically resolve all the files that had no conflicting
changes. This operation usually eliminates about 90%-95% of all changes with no manual work on the
part of the engineer doing the merge. It can be done in a single step no matter how many files are
involved.
Finally, you are left with a much smaller set of files that still have the special icon that
indicates they have differences which have not yet been successfully resolved. Now you'll need to
resort to interactively merging the conflicting differences using the visual three way merge tool
provided by Perforce.
The above discussion may be too detailed for some, but the overall concept is that even when
manipulating large sets of files, Perforce always tries to avoid involving you if it's not
necessary, but if there are situations that need your attention, you will be involved, and given the
detailed information you need to proceed efficiently. Perforce provides a consistent set of
integrate & resolve functionality that is applied the same to all merging operations. It stands far
ahead of CVS or SourceSafe in this respect, and ahead of most non-MacOS version control tools as
well.
Inter-File Branching
Since we have been talking about merging between branches, I'll discuss the branching model used
by Perforce. They call it Inter-File Branching, but in essence it is the use of the depot directory
hierarchy to represent different branches of your development projects. The path to each individual
file in the depot includes a full human readable representation of the intended purpose of that
file. For example, it's easy to differentiate the intent of these two files:
//depot/Engineering/VectorWorks/ReleaseBranches/VectorWorks10.0.0
/AppSource/Project Setup.txt
//depot/Engineering/VectorWorks/TaskBranches/VW10/3DDevelopment
/AppSource/Project Setup.txt
Behind the scenes supporting the seemingly simple Inter-File Branching concept is the Perforce
database. which is aware of all branching relationships between any two files in the system. For
every pair of files that has a branching relationship, it tracks the specific revisions that have
been integrated. For large hierarchies of files that are related to other branched files, it's quite
easy to display all changes made to a particular branch chronologically, or even to display all
revisions in one branch that have yet to be merged to another.
To a large extent, the Inter-File Branching model works in concert with the automated merging
capabilities described above. It allows independent branches within the codebase (and their
independent change histories) to exist in a logical environment where every engineer cannot help but
know how they are differentiated from each other simply by virtue of the path to the files.
Inter-File Branching provides the conceptual foundation that can keep large teams of developers
efficiently working on parallel development branches with very little management overhead.
Other tools such as CVS have a file hierarchy for your files, but then each file has a tree of
numeric versions with no immediately apparent meaning. Thus every important event in CVS needs to be
represented by a label that pulls together an arbitrary set of files and versions into a meaningful
package. What an individual engineer needs to do in CVS to accomplish a simple task such as "Merge
all of your version 4.1 changes into the main development branch" becomes so complex and error-prone
that it prevents projects from even attempting to do large-scale parallel development. (Perforce
also has labels, but they are rarely used because other Perforce functionality makes them much less
necessary.)
SourceSafe has no meaningful tools to assist merging changes between branches and cannot
effectively be used for any significant parallel development efforts.
Atomic Change Submission
Figure 2 -
Revision history of a single file with changelist comments
Figure 3 -
Pending changelists before submission
Atomic change submission is another refinement that stems from the rigorous database architecture
developed by Perforce at the outset. The concept is simple to grasp, and prevents a host of ugly
side-effects which afflict competing products without this feature. Simply stated, if I try to
submit changes to a set of files, and for whatever reason I am unable to change one or more files,
then the entire submission is rejected. This dramatically improves the chances that the project as
submitted to the version control system will be in a consistent, and buildable state, and improves
the ability to analyze complex changes that have gone into the project after the fact.
In SourceSafe or CVS, If I submit ten files, and the last one has a conflict with a change that
was already submitted by another engineer, I won't know it until the first nine have already been
submitted. At that point, I may have a big problem, and will need to scramble to come up with a fix.
If other engineers step into that trap, and check-in more files before the problems are solved, then
the mess keeps getting bigger. In Perforce, if I submit ten files, and one of them has a conflict,
then the entire submit fails before the central database is modified at all. I can then resolve the
conflict without the pressure of having just checked-in a partial set of files which do not build.
In normal use, the organization of work allowed by Perforce changelists is also very beneficial.
All of an engineer's open files are assigned to one or more pending changelists visible to all users
of the system. (See Figure 3) The choice of which files to include as well as the description of
the changes can all be prepared in advance to eliminate last-minute errors when submitting changes
to the depot.
Unlike some other revision control systems where atomic change submission is tacked on as an
afterthought, it is core to the implementation of Perforce. Every change that has been successfully
submitted to the system is represented by the set of files that changed, a high level description of
the significance of that change, and a list of files that were affected. The description of a change
applies to the entire change (and all files that make up the change) rather than each individual
file being given a duplicate of the description. The history of a hierarchy of files includes a list
of the high level changes and their descriptions, not the less useful list of every file that
changed between two dates. One can even implement server-side trigger scripts that can examine a
proposed submission and programmatically accept or reject it in its entirety based on a centrally
maintained submission policy.
Low Administration Overhead
Perforce requires very, very little administration attention. If you install the server properly,
set up your backups, and maintain the server hardware with adequate RAM and drive space, the only
administrative attention required is upgrading the software as frequently or infrequently as you
like, and a small amount of overhead when adding new user accounts or cleaning up after users who
depart. Perforce has never once created an emergency for us, and I can't see a site needing a full
time administrator for this system until it has many hundreds of users. Even then, I think there
would be a lot of free time on that person's hands.
SourceSafe will quickly burden an administrator with unpleasant tasks such as the investigation
and patching of file and database corruptions. Based on our experience, and that of others I've
talked to, this is virtually certain over the long term with more than a handful of engineers using
the system.
CVS will probably be less troublesome with regular maintenance, but the lack of thorough
documentation and support and the need to assemble client and server pieces from various open source
projects can certainly add to the initial outlay of effort.
Other info
There are some other features that do not warrant extensive discussion, but are important to
mention. Perforce maintains the ability to fully access all of the client functionality through
their command line tools. This is the universal Perforce interface. It's available everywhere and
can do anything that can be done with Perforce. This means that there is a backup plan if you ever
run into something that can't easily be done using the GUI. It also means that the system is highly
scriptable and extensible, as you'd expect any mission-critical developer tool to be. It works well
with most any scripting environment (Python, Perl, and possibly Ruby being the most commonly used).
Mac Software Support
Having talked at some length about the core system functionality, I'll spend some time talking
about Mac-specific support. As far as the server goes, MacOS X is a supported platform as well as
Linux & Windows. We use a Windows 2000 server platform, and there are no issues I'm aware of related
to mixing Mac & Windows clients and servers.
The client software on the Mac includes MacOS X native command line tools as well as P4V - the
visual GUI client, P4Web - a web browser based client, and a CodeWarrior plug-in. There are also
legacy clients for MPW, older versions of CodeWarrior, and a MacOS 9 based version of P4Web.
Finally, Apple has integrated native Perforce support into the Xcode development environment.
GUI Interface - P4Web & P4V
Figure 4 -
Graphical text differences display
P4Web used to be the only "GUI" interface to Perforce on the Mac, and we used it successfully for
a couple of years. It takes a little getting used to, but became quite easy to use, if a little
slow. From my point of view, there is no reason for most users to continue using P4Web as the
current version of P4V is faster, easier, more capable, and nicer looking. It introduces much better
file differencing and merging capabilities, which were sorely lacking on the Mac under P4Web. Figure
4 is an example of the kind of text difference display P4V produces. You can easily display the
difference between your local copy and any version in Perforce - or between any two versions of any
file in the depot.
P4V is being developed by Perforce as the next generation GUI for Mac and Linux. With only a few
exceptions, P4V has the full functionality of P4Win - the native Windows GUI client. In some
respects, such as the graphical diff viewer, it's better. I think Perforce would ultimately like for
P4V to be the only GUI client - even on Windows. They may have a way to go to achieve that, but they
are rapidly improving P4V, and it's already at a state of good usability.
CodeWarrior Integration
The Perforce CodeWarrior plug-in is most useful for environments where CodeWarrior is the
development platform, and you want quick access to syncing, checking out, and differencing files
from within the IDE. You can submit changes from within the IDE, but the ability to resolve
conflicts if they occur is weaker than either P4Web or P4V, and you'll probably gravitate to those
more capable tools.
Xcode Integration
Apple has integrated Perforce support directly into Xcode, and it provides the same feature set
as their CVS integration - namely sync, check-out, check-in, and diff. As I don't use Xcode for
production work, I'm not sure whether this integration works better than the CodeWarrior integration
when submitting conflicting files. Either way, we don't find it difficult to use the GUI tools for
the more demanding tasks.
References
There is a long list of major software companies, and projects with hundreds or thousands of
developers who are happy Perforce users. See the customer spotlight page at http://www.perforce.com/perforce/customers.html
for a bunch of interesting reading. Companies like Palm, Symantec, Macromedia, and TiVo, among many
other household names, have standardized on Perforce as a best-of-breed solution for Mac development
as well as virtually any other platform.
Pricing
Current pricing for Perforce is $750 per user, which includes a year of upgrades and support.
Continuing the upgrade & support contract costs $150 per user per year. Perforce has special site
licensing available for educational institutions, free licenses available for open source projects,
and an unlimited time evaluation version which includes two users and two workspaces. If you'd like
to evaluate more in depth in a production environment, Perforce will supply you with a time-limited
license enabling a larger number of seats, depending on your environment.
Final Word
Perforce is a full-featured revision control system that differentiates itself from the
competition by the uncompromising quality of its implementation. Mac support was acceptable three
years ago, but due to recent improvements such as P4V for MacOS X, it is now very good, and still
improving rapidly. Perforce is developed by a team that sets priorities early, and sticks to them.
Producing a top-notch Mac development product is clearly one of their priorities. Oh - and if you're
interested - Perforce is clearly one of the top players in Windows and Unix version control as well.
Paul Pharr manages the ongoing software development of the VectorWorks family of CAD
applications at Nemetschek North America. You can reach him at
pharr@nemetschek.net