Foundation's Tool
Volume Number: 20 (2004)
Issue Number: 5
Column Tag: Programming
Getting Started
by Dave Mark
Foundation's Tool
When I was in college (long ago, boys and girls), one of my all-time favorite classes was Professor Forest's data structures course. We used Knuth's Sorting and Searching Algorithms text as our bible and explored all kinds of data structures. My favorite exercise was building a binary tree dictionary implementation using tree balancing techniques to keep the dictionary from leaning too far to one side.
Before we dig into this month's program, I wanted to take a few minutes to explore the thinking behind a simple binary tree dictionary. Even though this doesn't apply directly to our program, I think this is some valuable food for thought, especially if you've never played with binary trees before.
The Binary Tree Dictionary
Figure 1 shows a simple binary tree diagram. The structure that is common to all nodes of the tree features a text string and two pointers, one to the left child and one to the right child. In Figure 1, the node holds the text string cat and both child pointers are set to null. Note that there is a root pointer that points to the first node in the tree.
Figure 1. A binary tree dictionary containing the word cat and two null pointers.
One way to implement a dictionary is to build a node every time the user enters a new word, then place the node in the tree. A simple method is to insert the first word at the root of the tree, as we did with cat in Figure 1. Then, place the next word entered using the left pointer if the word is before cat in the dictionary and using the right pointer if the word is after cat in the dictionary.
As an example, Figure 2 shows what happens when I insert the word apple into the dictionary structure. Note that the cat node's left child points to the apple node and its right node remains null, while both of the apple nodes are set to null.
Figure 2. Adding apple to the dictionary.
Figure 3 shows what happens when you add the word egg to the tree, followed by the word dog and then zebra. Note that dog occurs after cat, but egg is already there. So dog is placed as the left child of egg. When zebra is added to the tree, it goes to the right of cat, then to the right of egg.
Figure 3. More words in the dictionary.
You can see how you'd add words to the dictionary. To look up a word, you'd start at the root node, then work your way down into the tree. To look up dog, you'd start by comparing dog to cat. Since it is not a match, you'd see that dog comes after cat and follow the right child pointer. When you get to egg, you'd follow the left child pointer (cause dog comes before egg) and you'd find the dog node.
So what happens when you find the dog node? Well, for this to be useful, the dog node would likely have more info in it than just the key value used for lookup. You'd either have an extra pointer to a data block or the data you are looking up would be embedded in the node along with the left and right child pointers.
For example, you might have a pointer to an object containing the definition of the node's word, along with a list of pointers to synonyms of the word. This gives you a structure that serves as both a dictionary and a thesaurus.
One pitfall of this approach is the lack of balance that can occur. For example, imagine what would happen if the very first word entered into the dictionary was aardvark. Chances are very good that the left child node would never get used and that the entire dictionary would hang off the right child of the root node. Can you see this?
One solution to this problem is called tree-balancing. After the dictionary is built, it is passed through a tree balancing routine. This function walks the entire tree looking for the key value that belongs exactly in the middle. The tree is then rebuilt with that value as the root node. Again, this is a simple example, but you can see how this would create a more balanced tree. And this turns out to speed up the average search time for the tree. Why? When the tree is more balanced, each level is filled in and the average number or comparisons needed to find a match in the tree goes down. One way to convince yourself of this is to imagine a completely unbalanced, right-leaning tree where every single entry hangs off of a right child. If there are 200 entries in the dictionary, the average search depth will be 100, since the tree will be 200 nodes deep.
In a perfectly balanced tree. You'll have 1 root node, then 2 level 2 nodes, 4 level 3 nodes, 8 level 4 nodes, etc. In that balanced tree, you'll be able to fit 200 nodes quite comfortably in 8 levels of the tree. The worst case search of that tree will require 8 comparisons. Much better than the worst case of 200 comparisons in the extremely unbalanced tree.
I love this stuff. Do you find this interesting? If so, let me know and I'd be more than happy to do more in future columns. For now, you might check out these pages:
http://www.semaphorecorp.com/btp/algo.html
http://whatis.techtarget.com/definition/0,289893,sid9_gci508442,00.html
for a nice discussion of b-trees. Note that the second URL has three commas interspersed towards the end.
In the meantime, let's get back to our regularly scheduled Cocoa program.
The Lottery
For this month's program, we're going to continue on our journey through Aaron Hillegass' excellent Cocoa Programming for Mac OS X. I am fortunate to have in my hot little hands a draft copy of the second edition, which should be hitting bookstores about the time you are reading this.
We'll be building a Foundation Tool project, basically an Objective C program that uses Cocoa but confines its user interface to the command line. The program uses the NSMutableArray class to create a list of lottery entries. Aaron claims this will make us fabulously wealthy. We shall see.
Create the Project
Start up Xcode and create a new Foundation Tool project. Call it lottery. One of the first things you'll notice is Xcode indexing your project, making searches much faster.
In the Groups & Files pane, open the lottery group and the Source subgroup. Now select the file main.m. When you click on the file, the contents of main.m should appear in the editing pane of the project window. If the editing pane does not appear, click on the Show Editor icon in the toolbar. It's the fourth icon, just to the right of the stop sign icon.
Here's the default code in main.m:
#import <Foundation/Foundation.h>
int main (int argc, const char * argv[]) {
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
// insert code here...
NSLog(@"Hello, World!");
[pool release];
return 0;
}
Replace the existing code with the following:
#import <Foundation/Foundation.h>
int main (int argc, const char * argv[])
{
NSMutableArray *array;
int i;
NSNumber *newNumber;
NSNumber *numberToPrint;
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
array = [[NSMutableArray alloc] init];
for ( i=0; i<10;i++) {
newNumber = [[NSNumber alloc] initWithInt:(i*3)];
[array addObject:newNumber];
}
for ( i=0; i<10; i++ ) {
numberToPrint = [array objectAtIndex:i];
NSLog( @"The number at index %d is %@", i,
numberToPrint );
}
[array release];
[pool release];
return 0;
}
Danger, Will Robinson!!! There is a memory leak in the program you've just entered. If you have a bit of Cocoa experience, see if you can figure out what we did wrong here. Not to worry, though. We'll definitely fix the leak later on in the column.
Build and run the program. Something similar to the following should appear in your Run window:
2004-03-29 20:26:52.491 lottery[553] The number at index 0 is 0
2004-03-29 20:26:52.520 lottery[553] The number at index 1 is 3
2004-03-29 20:26:52.532 lottery[553] The number at index 2 is 6
2004-03-29 20:26:52.554 lottery[553] The number at index 3 is 9
2004-03-29 20:26:52.561 lottery[553] The number at index 4 is 12
2004-03-29 20:26:52.567 lottery[553] The number at index 5 is 15
2004-03-29 20:26:52.574 lottery[553] The number at index 6 is 18
2004-03-29 20:26:52.581 lottery[553] The number at index 7 is 21
2004-03-29 20:26:52.589 lottery[553] The number at index 8 is 24
2004-03-29 20:26:52.596 lottery[553] The number at index 9 is 27
lottery has exited with status 0.
In a nutshell, this code builds a modifiable (i.e., mutable) array of objects, then stores a pointer to an NSNumber object in each array element.
Whenever I encounter a Cocoa class I haven't seen before, I like to use Xcode to look the class up in the documentation. Hold down the option key and double-click on the word NSMutableArray in the source code. After a bit of searching, Xcode will bring up the Developer Documentation window (see Figure 4) containing a detailed description of the NSMutableArray class. You'll see that it inherits from the NSArray class, that it conforms to a variety of protocols (guarantees to offer a specific set of methods laid out in each protocol declaration), and find out where its headers are declared.
Figure 4. The Developer Documentation window that appears when you OPTION double-click on NSMutableArray.
Here are a few lines from the doc that are definitely worth their weight in code:
The NSMutableArray class declares the programmatic interface to objects that manage a modifiable array of objects. This class adds insertion and deletion operations to the basic array-handling behavior inherited from NSArray.
NSMutableArray methods are conceptually based on these primitive methods:
addObject:
insertObject:atIndex:
removeLastObject
removeObjectAtIndex:
replaceObjectAtIndex:withObject:
The other methods in its interface provide convenient ways of inserting an object into a specific slot in the array and removing an object based on its identity or position in the array.
When an object is removed from a mutable array, it receives a release message. If there are no further references to the object, the object is deallocated. Note that if your program keeps a reference to such an object, the reference will become invalid unless you remember to send the object a retain message before it's removed from the array. For example, if anObject isn't retained before removing it from the array, the third statement below could result in a runtime error:
id anObject = [[anArray objectAtIndex:0] retain];
[anArray removeObjectAtIndex:0];
[anObject someMessage];
Note that the primitive methods listed are links to the detailed description of those methods found later in the same window. Think of NSMutableArray as a linked list of objects where you can add objects to the list at a specific index and replace an object in the list with a different object that you specify.
The term mutable means changeable. An NSMutableArray is something you can change. An NSArray is immutable, meaning once you specify it, you are stuck with it. Why would you ever want an NSArray? One classic use is when you want to read in the contents of a file (perhaps a series of records) and store it into a data structure for reference. You don't want to change the data, you just need to access it. NSArray will do the trick.
Let's take a look at the code...
Walking Through the Lottery Code.
Though this source code is quite short, it touches on a number of important concepts. You've already learned a bit about the NSMutableArray. Be sure to read the NSMutableArray doc and look ever the set of methods available to create and modify them.
The source code starts off in typical fashion, with the #import of the Foundation.h header file and the definition of main().
#import <Foundation/Foundation.h>
int main (int argc, const char * argv[])
{
Next up are declarations of an NSMutableArray object pointer, a counter, and a pair of NSNumber pointers.
NSMutableArray *array;
int i;
NSNumber *newNumber;
NSNumber *numberToPrint;
Any idea what an NSAutoReleasePool does? Option-double-click on the class name and Xcode will bring up a doc window that describes the class. Note the link to Memory Management toward the middle of the page. Clicking on this link will bring up an intro to memory management as well as an index to a terrific series of articles on Objective C memory management. Take some time to read the intro and, at the very least, scan the other articles to get a sense of the available information. To truly master Cocoa, you must understand Cocoa's unusual memory management techniques.
Here's the short version. Every object has a retain count. When an object is created, its retain count is set to 1. If the retain count goes to 0, the object is deallocated. When a chunk of code wants an object to stick around, it sends the object a retain message. This increases the retain count by 1. When the chunk of code is done with the object, it sends the object a release message. This decrements the retain count by 1. This scheme is well worth understanding and enables many objects to share a common object. When the last object is done using the shared object, the last release message is sent to the shared object and it is deallocated.
The NSAutoReleasePool helps solve a knotty problem common to many frameworks. It is easy to allocate an object at the beginning of a function, then release it at the end of the function. The same object allocs the object and deallocs the object, making it easy to manage. But what if you write a function that creates an object then returns that object to another method? Who deallocates the object?
Java solves this problem by automated garbage collection. Cocoa uses the NSAutoReleasePool. When you send an autorelease message to an object, it is added to the autorelease pool. When the pool is deallocated, it sends a release message to all objects in the pool.
We'll talk more about NSAutoreleasePool in future columns. For now, just know that it exists.
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
The real key to this column is the NSMutableArray named array. We start by creating array, sending it the alloc message and then the init message. As we've discussed, sending a message to an object is similar to calling an object's method, but without the binding. The actual method called to process the alloc method for the NSMutableArray array is not decided until runtime.
Also worth noting is that messages can be nested. This code sends the alloc message first, then sends the init message. This is a typical way of creating an object.
array = [[NSMutableArray alloc] init];
Remember, an NSMutableArray is an array of pointers to objects. So if we want to store objects in the array, we need to create them first. Here's a loop that creates a series of 10 NSNumber objects. For each one, we send the message initWithInt: instead of just plain init: when we create the object.
Once the object is created, we add the object to array by sending it the addObject: message. Hmmm. When we add the object to the array, the array no doubt sends a retain message increasing the retain count for each NSNumber to 2. And when we release the array down below, the objects in the array are sent a release message, giving each NSNumber a retain count of 1. But the retain count never gets to 0. Hmmm.
for ( i=0; i<10;i++) {
newNumber = [[NSNumber alloc] initWithInt:(i*3)];
[array addObject:newNumber];
}
What we should have done was add this line of code to the bottom of the for loop, just below the addObject message. Once the NSNumber is added to the array, its retain count is bumped to 2. The release would drop it back down to 1. Then, when array is released and all its NSNumber objects are sent their own release messages, they will be deallocated. See how this works?
[newNumber release];
Next, we step through the array, using NSLog() to print each of the objects in the array. NSLog() behaves much like printf(), using format descriptors embedded in the string to print the objects that follow the string. While %d should be familiar to you C programmers, %@ is a new beast. Basically, it tells NSLog() to send a description message to the object and the object returns a string describing itself. In this case, NSNumber sends a string consisting of its number.
for ( i=0; i<10; i++ ) {
numberToPrint = [array objectAtIndex:i];
NSLog( @"The number at index %d is %@", i,
numberToPrint );
}
Finally, we release the array and then the autorelease pool.
[array release];
[pool release];
return 0;
}
Till Next Month...
Cocoa is an incredibly rich, incredibly powerful framework. There are some things that take getting used to but, once you do, I think you'll find your coding time to be much more productive. I strongly recommend spending some time curled up with your favorite Cocoa book. Then go write some code.
Be sure to check out http://spiderworks.com and I'll see you next month...
Dave Mark is a long-time Mac developer and author and has written a number of books on Macintosh development, including Learn C on the Macintosh, Learn C++ on the Macintosh, and The Macintosh Programming Primer series. Dave's been busy lately cooking up his next concoction. Want a peek? http://www.spiderworks.com.