Iterating for Profit and Pleasure

Volume Number: 16 (2000)
Issue Number: 5
Column Tag: Programming

Iterating for Profit and Pleasure

By Neil Mayhew, Calgary, AB

Using iterators to tap into the power of the Standard Template Library

A Brave New world

Iterators are not a new idea. These lightweight objects provide a convenient way to step through the elements of a collection. Likewise, the Standard Template Library has been around for a while now, although many people have still to explore its depths. Iterators are a fundamental part of STL, and an understanding of their role is vital for effective use of the Library. Many of the data structures that we work with every day are 'virtual' collections, and by writing our own STL-compatible iterators for them we can make use of many powerful algorithms from STL.

As an example of custom iterators, this third and final article presents a set of classes that iterate through the items in a MacOS folder. These mesh with algorithms from STL to produce an application that checks the System Folder for additions and changes to its extensions and control panels since the last time the application was run.

Developers already familiar with STL and its use of iterators may wish to skip to the first implementation section, "Walking a Folder".

In search of elegance

In the most general sense, an iterator is an object that records an index into a collection of other objects, and allows those objects to be retrieved for processing. Of course, it must also be possible to advance the index forwards, and possibly backwards, to reference other members of the collection. These actions are invoked by calling methods of the iterator object.

An example of the effective use of iterators is in the reading of run-length encoded pixel data. It is possible to define an iterator that presents the illusion of a rectangular array of pixels, whereas this array is a virtual collection that is computed on the fly from the run-length-encoded data. The iterator's internal storage (instance variables) record a pointer to the current run and a count of the number of pixels still to be read from that run.

There are of course many ways to design a set of methods for an iterator. For example, next() and previous() methods can be used for advancing forwards and backwards through the associated collection, and atEnd() can be used to test for the availability of further elements. Unfortunately, a lack of standards in this area is a great hindrance to code reuse. Code that uses iterators from a given source cannot easily be adapted to use iterators from a different source. Likewise, programmers often find it easier to develop their own, simple-minded solutions to a problem than learn how to use yet another set of iterator conventions.

Yet C has some very powerful iterators of its own that many other languages lack: pointers. These are iterators in the sense that they step through a collection (an array or linked-list) without additional support. There is no need for a separate reference to a collection that must be used with a subscripting operation when retrieving elements. Countless algorithms owe their elegance to the use of pointers instead of arrays and subscripts. Learning to use pointers effectively is therefore one of the key requirements for becoming a skilled C programmer.

A standardized approach

Since C already has a well-established paradigm for the use of iterators, it makes sense to standardize on this model in C++. We discussed smart pointers in an earlier article. These are objects that have the syntax and semantics of a pointer, through overloading the *, -> and ++ operators. A standardized iterator is therefore a special case of a smart pointer.

The advantage of this approach is that code may be written in a style that is very similar to the one that would be used with raw pointers. This makes the code easy to write and easy to understand, and yet considerable sophistication may be present 'under the surface' within the iterators themselves. There need not be any hidden pitfalls, however: if the semantics of the iterator are clearly defined, and consistent with the usual behavior of pointers, then client code will work as expected.

In fact, code using smart pointers for iterators may often be safer than equivalent code using raw pointers. For example, a smart pointer is usually capable of performing bounds checking, in a debug build at least. Or, awkward and error-prone pointer manipulations can be taken care of in one place, inside the iterator, and client code is free from having to worry about such details.

Mastering new idioms

A subtle characteristic of the smart-pointer approach to iterators is that they will be passed around by value, since this is how regular pointers are treated. These iterators therefore need to be small, usually just 4 or 8 bytes; any bigger than this and client code would have to be aware of the need for efficient handling.

Sophisticated behavior, however, doesn't necessarily require large amounts of storage. An iterator for a linked list need only occupy four bytes, and yet the dereference operator can return a reference to the list element rather than to the node that contains it. Likewise, the increment operators can follow the link to the next node.

This standardization allows a linked list to be processed just as if it were an array. For example, the following construction is idiomatic for all standardized iterators:

	for (Iterator p = begin; p != end; ++p)
	{
		[Expressions using *p …]
	}

Note that the loop increment uses ++p and not p++. It is important to understand the reason for this, and to remember to do likewise in your code. Recall that the post-increment operator returns a copy of the iterator or pointer as it was before the increment. This behavior is not needed in this context, since the value is not used anyway. However, when the item being incremented is a user-defined object rather than a pointer, the compiler is not entitled to substitute a pre-increment rather than a post-increment, since the two methods may have different side effects. There is therefore no way to avoid creating a redundant copy of the iterator object each time around the loop.

Note too that the loop test uses != and not <. Again, there is a good reason for this. In the case of a linked list, the < operator is very difficult to implement, and not very useful either. Since there is no compelling reason to use <, even for an array, it is better to write all loops using !=.

These conventions enable client code to remain unchanged even when the collection's underlying representation is changed, for example from an array to a list. As well as easing the maintenance burden, this has the much more general advantage that template functions can be written that work with a wide range of representation types. The code can be written once, as a template, and reused with any collection that uses standardized iterators as an interface. These generic functions are usually referred to as algorithms.

Algorithms + Containers = Programs

Naturally, many of these algorithms are well-understood examples from computer science, such as sorting and searching. Now that we have a mechanism for writing these in a completely generic way, it makes sense to have a standard library of templates that can be re-used in almost any situation. Hence the name Standard Template Library or STL.

Just as the algorithms are classic examples, so are many of the collection types, such as linked lists and dynamically growing arrays. The Library therefore also includes a representative selection of collection types. These are usually referred to as containers, and they avoid the need for endless re-implementations of common container types. The template mechanism is still flexible enough to allow a wide variation in the behavior that can result from use of the template.

The magic ingredient that enables algorithms and containers to work well together is the standardized iterator. All of the containers have begin() and end() methods that return iterator values for the start and end of the collection, and all of the algorithms expect collections to be represented by two such values.

An algorithm typically has the following form:

	template
	some_algorithm(It begin, It end)
	{
		for (It p = begin; p != end; ++p)
		{
			…
		}
	}

It is called like this:

		some_algorithm(container.begin(), container.end());

Neither the algorithms nor the containers are closed sets: the user can very easily add new algorithms and containers, and provided standardized iterators are involved these will integrate seamlessly with the rest of the Library. We will see an example of this shortly.

Smart glue

Iterators are the glue that holds algorithms and containers together to make programs. The Standard Template Library elevates the use of iterators almost to the level of an art form.

For example, many algorithms produce a new collection as their output. They expect to write this output using an iterator. Such iterators are called output iterators, to distinguish them from the read-only input iterators that provide the input data.

However, it may not be known in advance exactly how many elements will be produced in the output. This shouldn't be a problem with a dynamically sized collection, except that the elements are being appended to the collection using an iterator. The iterator has to be smart enough to call the container's append method whenever the algorithm writes a value through the dereferenced iterator. Such an iterator is called a back inserter, and it works by some rather clever sleight of hand that includes overloading the = operator. Needless to say, this is implemented by yet another template in the Library.

Another example of the creative use of iterators is the set of stream iterators that is provided by the Library. These iterators enable the successive reading or writing of objects via iostreams, so that intermediate storage is not required when passing the data to or from an algorithm.

Not all iterators have to be so smart, however. Every algorithm in the library will work happily with raw pointers as the iterators, provided the underlying raw array is large enough to hold any output that may be generated. It is a fundamental principle of the library that iterators and pointers must be completely interchangeable.

Committing to the Standard

Many libraries have been based on excellent theoretical principles, and some have been promoted as universal standards. However, that doesn't mean they have always been universally accepted.

STL is exceptionally well designed, and has been thoroughly tested over a number of years. It is efficient, comprehensive, and portable. Nothing in STL precludes it from being a good citizen within the MacOS. Although iostreams are supported as an option in several areas, they are by no means required, and most of the functionality is purely computational.

Code that makes extensive use of STL over more homegrown solutions usually has much greater clarity, especially if the reader has a general familiarity with STL. A lot of the clarity results from the fact that loops are largely eliminated from user-code, since most of the looping takes place inside the algorithms. User-code is more parallel in nature, being expressed in terms of operations on entire collections. Also, groups of calls to standard algorithms are usually much easier to read than a series of custom loops.

As more and more developers become aware of the benefits of STL, and commit to using it in their work, the value of STL can only increase. It is increasingly likely that another developer studying one's code will be comfortable with its style and structure. The quality of STL implementations continues to improve, and new algorithms and collections are becoming available all the time, often for free via the web.

Just as the functions from the standard C library have become an integral part of the language over the years, the algorithms and containers from STL are becoming an integral part of C++. This is not surprising, given that Bjarne Stroustrup, the creator of C++, has been very active in the development of the Library. For a while, the language and the Library developed in parallel, as the template capabilities of the language were pushed to the limit and beyond.

Fortunately, the language and the Library are now static, having at last been adopted as an ISO standard. It is safe to assume that any modern C++ implementation will have a solid implementation of the Standard Library, which is a superset of STL. Code using STL should therefore be portable to any platform that has a modern C++ compiler.

Walking A Folder

Theory is of little value unless it is used. The combination of standardized iterators and STL algorithms can be very effective in a MacOS setting. Although there is always another way to skin any cat, an STL-based approach is usually rapid in its development and efficient in its implementation.

All operating systems have facilities for iterating through every file in a folder. The MacOS is no exception, although the sole method for doing it, using PBGetCatInfo, can be rather tedious. It would be nice to develop an iterator facility that gives this operation a simple, high-level interface. The result would be a virtual collection, since the elements need not all exist at once as objects in memory, although they do exist as a collection of 'objects' on disk.

Simple interface, subtle design

An interesting design issue concerns the exact nature of the information that is returned. The obvious answer might seem to be an FSSpec, but unfortunately this doesn't allow one to distinguish between files and folders. It would be possible to skip the folders, but this would prevent a recursive descent of an entire folder hierarchy. At the other extreme, the entire contents of PBGetCatInfo's parameter block could be returned, but this would be cumbersome and inefficient in most cases. The best compromise seems to be to return just the most frequently-used file and directory attributes, representing this using a subclass of FSSpec so that it can be used as a parameter to any of the MacOS APIs that expect this datatype.

Another design issue concerns storage management. This is almost always an issue in C++, and many people feel that this is an unwelcome complication of the language. However, it also allows great flexibility in the implementation tradeoffs that can be made. Other languages force one to store all non-scalar data in dynamically allocated heap storage, which introduces significant computational overhead. In contrast, C++ allows objects to be returned by value, rather than by a pointer or a reference. This eliminates the problem of object lifetimes, which would otherwise have to be handled by reference counting or garbage collection.

It is often thought that return-by-value is also expensive computationally, because of the amount of data that has to be copied, but this need not be so. When client code calls a function that returns an object, a hidden argument is passed that contains a pointer to a block of uninitialized storage that will hold the returned object. In effect, the called function copies the return value into this storage using the appropriate copy-constructor, and the client code uses a copy-constructor or assignment operator to move the object to another location. Two destructor calls are also involved, of course, to clean up the intermediate values.

However, if the return statement in the called function is itself a constructor call, most compilers know to optimize this as a direct construction into the space that was reserved by the caller and passed as the hidden argument. If in addition the client code uses the function call as the initializer for a named variable of the same type as the return value, the compiler will optimize this by passing the address of the uninitialized variable as the hidden argument. The net result is that just one object is involved, which is constructed in the called function's return statement.

For example:

	// Caller
	Foo x = GetAFoo();

	// Callee
	Foo GetAFoo() {
		return Foo(arg1, arg2);
	}

This is functionally equivalent to:

	// Caller
	Foo x = Foo(arg1, arg2);

Note that this is not dependent on the callee being an inline function. If the assignment operator is significantly more expensive than the constructor that is being used, it is better to use a fresh variable for every function call than to try to re-use the same variable several times, especially inside a loop.

Hand over your data!

Using this model, the iterator’s dereference operator needs to construct and return an object containing a subset of the information returned by PBGetCatInfo. We’ll call this a FolderItem. Since this will have quite a number of instance variables, we want to avoid having to pass every value as a separate constructor argument. A neat solution is to have FolderItem’s constructor perform the call to PBGetCatInfo. This means we only have to pass three arguments to the constructor: volume, directory, and index.

We also want to ensure that copying and assignment is as efficient as possible. Since the FolderItem will need to contain space for a variable-length name, it makes sense not to use the compiler’s automatically generated methods, and instead to copy just the bytes that are used by the name. On the other hand, it is quite handy not to have to include an explicit copy of every instance variable—it is easy to forget to add an extra line if a new instance variable is introduced. We can have the best of both worlds if we define a base class that contains the name, and a subclass that contains the other information. One uses explicit copy/assignment methods, and the other uses automatic ones. As already explained, we also want to have FSSpec as the ultimate base class.

This leads to the implementation shown in Listing 1.

Listing 1: FolderItem.h

// Class declaration for FolderItem and its bases
class FileSpec : public FSSpec
{
	// Constructors
protected:
	FileSpec() {}	// Used when subclass initializes everything
public:
	FileSpec(short vol, long dir, ConstStringPtr pName)
	{
		vRefNum = vol;
		parID = dir;
		BlockMoveData(pName, name, pName[0] + 1);
	}

	// Copy constructor	
	FileSpec(const FileSpec& other)
		{ *this = other; } // Uses assignment

	// Assignment operator	
	FileSpec& operator= (const FileSpec& other)
	{
		vRefNum = other.vRefNum;
		parID = other.parID;
		BlockMoveData(other.name, name, other.name[0] + 1);
		return *this;
	}
	
	// Comparison operators
	bool operator<  (const FileSpec& other) const
		{ return RelString(name, other.name, false, false) < 0; }
	bool operator== (const FileSpec& other) const
		{ return EqualString(name, other.name, false, false); }
	bool operator!= (const FileSpec& other) const
		{ return !operator== (other); }
};

class FolderItem : public FileSpec
{
public:
	// File attributes
	typedef unsigned long Date;
	Date			created, modified;
	Size			data, resource;
	OSType		type, creator;
	Boolean	folder;

	// Constructors
	FolderItem() {}		// Needed when creating arrays
	FolderItem(short vol, long dir, int index);
		
	// Comparison operators
	bool operator<  (const FolderItem& other) const;
	bool operator== (const FolderItem& other) const;
	bool operator!= (const FolderItem& other) const
		{ return !operator== (other); }
};

There are only three non-trivial methods, all in FolderItem: the constructor that calls PBGetCatInfo, and the two comparison operators. These have fairly obvious implementations as shown in Listing 2.

Listing 2: FolderItem.cp

// Implementation of FolderItem
FolderItem::FolderItem(short vol, long dir, int index)
{
	vRefNum  = vol;
	parID    = dir;

	CInfoPBRec pb;

	pb.hFileInfo.ioCompletion = 0;
	pb.hFileInfo.ioVRefNum = vol;
	pb.hFileInfo.ioDirID = dir;
	pb.hFileInfo.ioFDirIndex = index;
	pb.hFileInfo.ioNamePtr = name;

	PBGetCatInfoSync(&pb);

	folder = pb.hFileInfo.ioFlAttrib & ioDirMask;

	if (folder)
	{
		created  = pb.dirInfo.ioDrCrDat;
		modified = pb.dirInfo.ioDrMdDat;
		data     = 0;
		resource = 0;
		type     = 0;
		creator  = 0;
	}
	else
	{
		created  = pb.hFileInfo.ioFlCrDat;
		modified = pb.hFileInfo.ioFlMdDat;
		data     = pb.hFileInfo.ioFlLgLen;
		resource = pb.hFileInfo.ioFlRLgLen;
		type     = pb.hFileInfo.ioFlFndrInfo.fdType;
		creator  = pb.hFileInfo.ioFlFndrInfo.fdCreator;
	}
}

bool FolderItem::operator== (const FolderItem& other) const
{
	return FileSpec::operator== (other)
		&& created  == other.created
		&& modified == other.modified
		&& data     == other.data
		&& resource == other.resource
		&& type     == other.type
		&& creator  == other.creator
		&& folder   == other.folder;
}	

bool FolderItem::operator<  (const FolderItem& other) const
{
	if (FileSpec::operator< (other)) return true;
	if (FileSpec::operator!=(other)) return false;
	if (created   < other.created)   return true;
	if (created  != other.created)   return false;
	if (modified  < other.modified)  return true;
	if (modified != other.modified)  return false;
	if (data      < other.data)      return true;
	if (data     != other.data)      return false;
	if (resource  < other.resource)  return true;
	if (resource != other.resource)  return false;
	if (type      < other.type)      return true;
	if (type     != other.type)      return false;
	if (creator   < other.creator)   return true;
	if (creator  != other.creator)   return false;
	if (folder    < other.folder)    return true;
	return false;
}

This takes care of most of the low-level grunt work. The rest is finesse.

Asking for it to happen

Typically, we don’t want to create FolderItems directly. It is much more convenient to define a Folder class that represents the entire collection, and obtain appropriately constructed boundary iterators from it. The Folder needs to store the volume and directory parameters, and it also needs to record the number of items in the folder so that it can supply an end() iterator when required. This item count can be initialized in the constructor, using PBGetCatInfo again.

It is helpful to be able to construct a Folder either from volume and directory numbers, such as returned from FindFolder, or from an FSSpec, such as returned from iterating another Folder. This enables recursive iteration of folder hierarchies. In fact, it is possible to write a ‘super-iterator’ that traverses a hierarchy as if it was a flat list of files. This involves storing a stack of folders and indexes, which is relatively easy with STL although it is still beyond the scope of this article.

Since FolderItem requires the Folder’s volume and directory information to be passed to its constructor, it makes sense to have Folder do the work of constructing FolderItems. This is done very conveniently by defining an array subscripting operator, operator[]. Implementation of the FolderIterator is then fairly trivial, since it is based on integer indexes, and delegates dereferencing to Folder::operator[].

There is some circularity in the definitions of Folder and its iterator: the iterator calls the subscript operator from Folder, and Folder has methods that return a FolderIterator. It is necessary to define FolderIterator first, since Folder passes FolderIterators by value. Then, although the implementation of FolderIterator::operator* is inline, this has to be postponed until after Folder has been declared. The results are shown in Listing 3.

Listing 3: Folder.h

class Folder;	// Forward declaration

// Class declaration for FolderIterator
class FolderIterator
{
	// Instance variables – 8 bytes
	const Folder&	folder;
	int						index;
public:
	// Constructor – zero-based index
	FolderIterator(const Folder& f, int i)
		: folder(f), index(i) {}

	// Dereference operator
	FolderItem operator* () const;

	// Increment and decrement
	FolderIterator& operator++ ()
		{ ++index; return *this; }
	FolderIterator  operator++ (int)	// Post-increment
		{ return FolderIterator(folder, index++); }
	FolderIterator& operator— ()
		{ —index; return *this; }
	FolderIterator  operator— (int) // Post-increment
		{ return FolderIterator(folder, index—); }
	
	// Comparison operators
	bool operator<  (const FolderIterator& other) const
		{ return index <  other.index; }
	bool operator== (const FolderIterator& other) const
		{ return index == other.index; }
	bool operator!= (const FolderIterator& other) const
		{ return index != other.index; }
};

// Class declaration for Folder
class Folder
{
	short	vol;
	long		dir;
	int		count;
public:
	// Constructors - initialize count using PBGetCatInfo
	Folder(short v, long d);
	Folder(const FSSpec&);

	// Subscript operator
	FolderItem operator[] (int i) const
		{ return FolderItem(vol, dir, i + 1); }

	// Bounds
	FolderIterator begin() const
		{ return FolderIterator(*this, 0); }
	FolderIterator end()   const
		{ return FolderIterator(*this, count); }

	// Number of elements
	int size() const { return count; }
	
	// Information
	short volume()    const { return vol; }
	long  directory() const { return dir; }
};

// Deferred implementation
inline FolderItem FolderIterator::operator* () const
	{ return folder[index]; }

The only non-trivial methods are the two constructors for Folder. These have straightforward implementations as shown in Listing 4.

Listing 4: Folder.cp

// Implementation of Folder class

Folder::Folder(short v, long d)
	: vol(v), dir(d)
{
	CInfoPBRec pb;
	Str255 name;

	pb.hFileInfo.ioCompletion = 0;
	pb.hFileInfo.ioVRefNum = vol;
	pb.hFileInfo.ioDirID = dir;
	pb.hFileInfo.ioFDirIndex = -1;
	pb.hFileInfo.ioNamePtr = name;

	PBGetCatInfoSync(&pb);

	count = pb.dirInfo.ioDrNmFls;
}

Folder::Folder(const FSSpec& spec)
	: vol(spec.vRefNum), dir(spec.parID)
{
	CInfoPBRec pb;

	pb.hFileInfo.ioCompletion = 0;
	pb.hFileInfo.ioVRefNum = vol;
	pb.hFileInfo.ioDirID = dir;
	pb.hFileInfo.ioFDirIndex = 0;
	pb.hFileInfo.ioNamePtr = const_cast(spec.name);

	PBGetCatInfoSync(&pb);

	count = pb.dirInfo.ioDrNmFls;
	dir   = pb.dirInfo.ioDrDirID;
}

Seeing the wood instead of the trees

It would be easy to lose sight of the fact that, despite the implementation details, these classes are designed to provide a clean, reusable interface for folder iteration. A typical use might be as follows:

Folder theFolder(anFSS);
long sum = 0;

FolderIterator p;
for (p = theFolder.begin(); p != theFolder.end(); ++p)
{
	sum += (*p).data;
}

Note that it is not possible to use p->data instead of (*p).data, since the iterator returns a copy of the value and not a reference to an object with an independent existence. If multiple uses of *p were needed, it would be much more efficient to create a named variable initialized with *p. This is a slightly dangerous aspect of the interface, since it is not obvious that each use of *p makes a fresh call to PBGetCatInfo. However, the convenience does seem to be worth the risk.

Healthy criticism

A natural question to ask at this point is whether using integer indexes wouldn’t have been simpler. Could this hype about iterators simply be another case of The Emperor’s New Clothes?

In the example above, where the folder is a local variable in the same routine as the loop, an integer approach would have worked well. However, we haven’t looked at any algorithms yet. An algorithm using integers would require the container to be passed as an additional argument. Some algorithms take two input sequences and produce a third, output sequence. This would be very cumbersome with integers, especially if the algorithm needed to operate on subsections of a collection. Also, an algorithm needs to be independent of any container type and so it can’t make any assumptions about begin() and end() methods. Raw arrays, which are a type of collection, don’t even have any methods!

FolderIterator uses an integer implementation internally, but it carries around within itself a reference to the associated collection. Tying essential information together in this way makes it algorithm-friendly, and yet retains the simplicity and robustness of implementation associated with array subscripting.

Checking Extensions

How might the folder-iterating facilities we have developed be put to use in a real-life application? As mentioned in the introduction, it is possible to combine standard algorithms with our own iterators to develop a program that checks the System Folder for additions and changes to its extensions. As well as being a good illustration of STL principles, this could be a useful tool in the continual battle to keep wayward installer programs under control. It seems almost an everyday occurrence that obsolete extensions get installed on one’s system, or shared libraries are overwritten with older versions, both of which can lead to inexplicable crashes at some random point in the future. Help stamp out installation abuse!

Planning the strategy

It is too much to expect that one will remember to run the checking program every time some new software is installed. A safer approach is to create a program fast enough to be run on every startup, that compares the currently installed extensions with their last known state, and records a new state if there have been any changes. This is combined with an interactive program that can generate an attractive display of the differences between any two states chosen from a list box.

The startup behavior breaks down into a number of discrete steps:

Read the current list of extensions into memory
Find and read the previous list from a preference file
Check if the lists are equal
Write a new preference file if they differ

The interactive behavior involves similar steps, but a more complex algorithm at step 3:

Read the current list of previous preference files
Ask the user to choose two of these
Analyze the differences between them
Display the results

We won’t cover the details of finding and creating preference files here, or the GUI mechanisms used to display differences between sets. We will however look at all the algorithms in detail.

Creating a suitable framework

As a framework for our code, we define a class Extensions that inherits from one of the STL containers. Creating, reading and comparing sets are then methods of this class. The class definition is as follows:

class Extensions : public std::vector
{
public:
	Extensions() {}
	Extensions(short vRefNum);

	OSErr Read (const FSSpec&);
	OSErr Write(const FSSpec&) const;

	bool operator== (const Extensions& other) const;
	bool operator!= (const Extensions& other) const
		{ return !operator== (other); }
};

The STL container class vector implements a dynamically sized array. The template argument FolderItem specifies that this is to be an array of FolderItems.

All Standard Library classes and functions are placed in the namespace std. It is possible to avoid having to specify this explicitly by means of a using declaration in .cp files, but it is not a good idea to do this in a header file. Any client that includes such a file will have the whole of the Standard Library dumped into its global namespace.

Reading the current extensions

We don’t want switches between different Extensions Manager sets to be recorded as a change. We therefore read the contents of both enabled and disabled extensions folders and merge them into a single list. This is accomplished very easily with STL. The code for this is in the Extensions constructor:

Extensions::Extensions(short vRefNum)
{
	long extensionsDirID, disabledDirID;
	
	FindFolder(vRefNum, kExtensionFolderType,
		kDontCreateFolder, &vRefNum, &extensionsDirID);
	FindFolder(vRefNum, kExtensionDisabledFolderType,
		kDontCreateFolder, &vRefNum, &disabledDirID);

	Folder extensionsFolder(vRefNum, extensionsDirID);
	Folder   disabledFolder(vRefNum,   disabledDirID);

	std::remove_copy_if(
		extensionsFolder.begin(), extensionsFolder.end(),
			std::back_inserter(*this), isFolder);
	std::remove_copy_if(
		disabledFolder.begin(), disabledFolder.end(),
			std::back_inserter(*this), isFolder);
	
	std::sort(begin(), end());
}

The STL algorithm remove_copy_if copies its input sequence—here corresponding to one of the two folders — to its output sequence, removing any elements that satisfy the given predicate. The predicate used here is a small user-defined function, isFolder, defined like this:

inline bool isFolder(const FolderItem& item)
	{ return item.folder; }

The output sequence is the collection stored in the Extensions object, which is a container derived from vector. However, we don’t know in advance how many elements will be needed, so we append them one by one using a back inserter. The compiler is smart enough to figure out what template arguments to use based on the data that is passed to back_inserter.

The two copying algorithms generate a list of extensions that is a straight concatenation of the contents of the two folders. We then sort the entire list in order to merge the two sections, and to make comparison with other sets easier. The arguments to the sort algorithm are very simple, because the compiler is able to deduce much of the necessary information from the data types being used. The comparison function reverts to the less-than operator unless a specific predicate is specified as a third argument.

Saving and restoring sets

The i/o functions are mostly plain MacOS code, except for the computations of the length and address to read or write. The resize() method (from std::vector) changes the number of elements in the vector, and the size() method returns the current number. The begin() method returns an iterator for the first element of the vector, but since vector is an implementation of an array this iterator is in fact a raw pointer to the beginning of a contiguous block containing all the elements.

OSErr Extensions::Read(const FSSpec& file)
{
	resize(FSpDataSize(&file) / sizeof(FolderItem));
	short fileRef;
	OSErr err = FSpOpenDF(&file, fsRdPerm, &fileRef);
	if (err != noErr)
		return err;
	long count = size() * sizeof(FolderItem);
	err = FSRead(fileRef, &count, begin());
	FSClose(fileRef);
	return err;
}

OSErr Extensions::Write(const FSSpec& file) const
{
	FSpCreate(&file, ‘7INI’, ‘ESET’, smSystemScript);
	short fileRef;
	OSErr err = FSpOpenDF(&file, fsWrPerm, &fileRef);
	if (err != noErr)
		return err;
	long count = size() * sizeof(FolderItem);
	err = FSWrite(fileRef, &count, begin());
	FSClose(fileRef);
	return err;
}

These functions, as written, are rather wasteful of disk space and i/o bandwidth, as most of the 256 bytes in FSSpec’s name field are unused. A better solution would be to compact the items into a character buffer, using variable-length space for the names, and write that buffer to and from disk.

Comparing sets

Checking for equality between Extensions collections is a very simple application of STL calls:

bool Extensions::operator== (const Extensions& other) const
{
	return size() == other.size()
		&& std::equal(begin(), end(), other.begin());
}

The equal method expects its input sequences to be of the same length, so it is necessary to check the lengths for equality first. If the lengths differ the sequences are obviously different.

Determining the differences between collections is a little more interesting. This is implemented as an algorithm using iterators, to allow it to be used with any container type and not just vector. For the same reason, we have not made it a method of Extensions.

The algorithm determines which items have been added, removed or changed between two sequences of FolderItems, and appends the results to its three output sequences. STL has a group of set algorithms that work on sorted sequences, and these are ideal for the implementation:

template<class In, class Out>
void FolderDifferences(
	In begin1, In end1,
	In begin2, In end2,
	Out unique1,
	Out unique2,
	Out changed)
{
	using namespace std;

	vector<FolderItem> common1, common2;

	set_difference(begin1, end1, begin2, end2,
		unique1, less<FileSpec>());
	set_difference(begin2, end2, begin1, end1, 
		unique2, less<FileSpec>());

	set_intersection(begin1, end1, begin2, end2, 
		back_inserter(common1), less<FileSpec>());
	set_intersection(begin2, end2, begin1, end1, 
		back_inserter(common2), less<FileSpec>());
	
	set_difference(
		common2.begin(), common2.end(),
		common1.begin(), common1.end(),
		changed);
}

The set_difference algorithm produces a sequence of elements that are members of its first, but not its second, input sequence. This is used to determine which items are unique to the first set, and then those unique to the second set. The set_intersection algorithm produces a sequence of elements from the first sequence that are also found in the second sequence. This is used to determine which items are found in both sets. It is used twice, to generate separate lists of common items from each of the sets, because the common items will later need to be checked for attribute differences.

In each of the first four set operations, the items are compared as FileSpecs rather than FolderItems, because at this point we are only interested in the names. Other attributes may differ, but if the names are the same we regard two items as one item that has been changed.

We arrange for the algorithm to make comparisons on this basis by passing a predicate as an additional argument. This predicate, less<>, is actually a template class from STL and not a function. Such classes are referred to as functors. The name is followed by parentheses in the argument list because we are constructing an unnamed temporary object and passing a reference to it. The algorithm calls the object's function-call operator, operator(), to perform the comparisons. The function-call operator of less in turn uses the < operator on its two arguments. By specifying FileSpec as the template argument, we ensure that the function call will take arguments of type const FileSpec&, producing the kind of comparison we want. Fortunately, it is not necessary to understand why it works in order to use it!

The final use of set_difference picks out the items from the second set that differ from their namesake in the first set. This time native FolderItem comparison is used, because we want to take all attributes into account. No special predicate is required. The reason set_difference produces the result we want is that unchanged items will be members of both common sequences, whereas changed items will not.

It would have been slightly more efficient to write a custom loop to implement FolderDifferences. This would reduce the number of element comparisons that have to be performed. However, the compute time is small compared with the time taken to scan the extensions folders in the first place. The clarity and ease of implementation easily outweigh the cost.

PowerPlant, anyone?

The algorithm is used as follows:

vector<FolderItem> unique1, unique2;
vector<FolderItem> changed;

FolderDifferences(
		previous.begin(), previous.end(),
		current.begin(), current.end(),
		back_inserter(unique1),
		back_inserter(unique2),
		back_inserter(changed));

These three vectors can then form the inputs to a three-pane PowerPlant window that displays the differences in three columns. A double-click in the Changed column could produce a dialog box that compares the item's attributes in each of the two sets.

A subtle point that is often overlooked is the fact that the elements of the output sequences do not need to be of the same type as the elements of the input sequences. Provided there is an appropriate assignment operator, elements will be converted as they are written to the output. The three output vectors above could have been vector<std::string> if there is an operator that can assign FolderItems to strings.

For testing, it may be appropriate to send the output to a text window. STL can help with this too. First a << operator must be written to output a FolderItem onto an ostream. Then, an STL ostream_iterator can be used to copy an entire vector to an ostringstream. The code looks like this:

	ostringstream text;
	if (unique1.size())
	{
		text << "Previous only:\r";
		copy(unique1.begin(), unique1.end(),
			ostream_iterator<FolderItem>(text, "\r"));
	}
	if (unique2.size())
	{
		text << "Current only:\r";
		copy(unique2.begin(), unique2.end(),
			ostream_iterator<FolderItem>(text, "\r"));
	}
	if (changed.size())
	{
		text << "Changed:\r";
		copy(changed.begin(), changed.end(),
			ostream_iterator<FolderItem>(text, "\r"));
	}

The string argument of the ostream_iterator will be output after every item. As used here, this will put each item onto a separate line. The generated text can then be assigned to a PowerPlant text window in one step:

mTextView->SetTextPtr(text.str().begin(), text.str().size());

Parting Shots

This article explores the principles behind the Standard Template Library in some depth, especially with regard to the role of iterators. It also introduces a representative slice of STL's capabilities in terms of algorithms, containers and iterators.

Unfortunately it is not possible to do proper justice to the Library in such a brief presentation, and the reader is encouraged to continue learning about STL through the many books and papers that have been written on the subject. Most good C++ books nowadays devote a significant proportion of their length to teaching the Standard Library, and indeed any C++ book that does not should be treated with some skepticism. Stroustrup (see Bibliography) devotes almost half the book to the Library, and uses library features in examples right from the first chapter.

STL in retrospect

STL encourages a very different style of programming. Very few loops are seen, and a variety of powerful library functions are used instead. Code is written in terms of logical operations on entire collections.

Although STL's capabilities are well thought-out and remarkably comprehensive, there are inevitably a few clumsy areas, and limitations that can be frustrating at times. However, as the standard begins to receive wide acceptance throughout the C++ industry, it is worth living with all of these limitations in order to reap the benefits of having a much-needed common standard. One has only to look at the ground that has been lost to other languages to realize how much the lack of a standard library has hurt the language. This is a pity, as the power and flexibility of the Standard Template Library are now far, far ahead of any equivalent capability available in other languages that don't implement templates.

Where there are limitations in STL, it is usually possible to overcome these relatively easily, due to the Library's highly extensible design. New containers can be added that will work seamlessly with all of the standard algorithms, and new algorithms that work with all of the standard containers. New iterators can be developed that allow existing algorithms to perform in ways not previously possible. All of these additions can be made by the application developer as required, either on an ad-hoc basis for a single application, or as part of a private library of reusable capabilities. Either way, code quality and clarity is greatly enhanced.

C++ in retrospect

Much has been said about this remarkable language, both good and bad. Whatever else is said, there is no denying that it is lean, elegant, expressive and advanced. These are characteristics to die for, amongst serious programmers. Hard work and discipline are needed in reaping the benefits, but the results are admirably worth it.

It is often said that C++ is a 'large' language, in that there is a lot to learn and remember in order to become proficient in using it. It certainly didn't start out as a large language-in the early days it was considered a big improvement on Ada in that respect! C++ is actually based around a relatively small number of features thoughtfully and consistently applied. If these concepts are discovered and learned, the rest of the language will inevitably fall into place.

C++ has some unique features. Well-applied, these can make a huge difference to our software development. Learn them, and use them!

Bibliography

I have refrained from including a bibliography with every article. Everything I have written about may be found in any good C++ book, and there is no shortage of those. However, I cannot let this series pass without acknowledging the incredible debt that we owe to Bjarne Stroustrup, the creator and refiner of C++, and I can think of no better C++ reference than his own. It is comprehensive, and provides unique insight into how the language is designed and the best way to use it.

Stroustrup, Bjarne: The C++ Programming Language. 3rd edn. Addison-Wesley. Reading, Mass. 1997.

Neil Mayhew works for Wycliffe Bible Translators, a non-profit organization dedicated to translating the Bible for the world's 400 million people that do not have it in their own language. Neil started programming in C in 1983, and graduated to the Mac in 1989. When he's not at his Mac or trying to beat his kids at video games you might find him flying a stunt kite if it's windy, or throwing a boomerang if it's not.