The Road to Code: Building on a Solid Foundation
Volume Number: 24 (2008)
Issue Number: 03
Column Tag: The Road to Code
The Road to Code: Building on a Solid Foundation
Exploring the Foundation Framework
by Dave Dribin
The Foundation Framework
Now that we've gone over the basics of the Objective-C language, including classes, inheritance, and memory management, we can start to write some real code. One of the great things about developing for Mac OS X is that in addition to getting a nice language to use, you also get a lot of reusable code to use in your own applications. This can save you lots of time because you can reuse what's already present in the system.
Reusable code on Mac OS X is often grouped together in a package called a framework. A framework is similar to a dynamically linked library (DLL) on Windows or a shared object (.so) on most other Unix platforms in that it contains shared code that may be used by multiple applications. Frameworks have one important difference: they may contain more than just code. Typically they will contain the header files needed to use the shared code, but they can also include other resources, such as images or sounds. Mac OS X ships with many frameworks for a number of purposes, including text manipulation, graphics, sound, and networking. All the system-supplied frameworks are located in the /System/Library/Frameworks directory. Some of these are C-based while some are Objective-C. The main Objective-C framework that provides classes used in all Objective-C applications is called the Foundation framework, or just Foundation for short. You've already been using part of the code in Foundation: NSObject. In this article, we'll be exploring some of the other popular Foundation classes.
Strings
A string is a collection of characters used to represent human text. We've been using strings when we use the printf function:
printf("Hello world\n");
The text between, and including, the double quotes is called a string, i.e. "Hello world\n". We haven't really gone over how strings are implemented in C, so let's dive down a bit deeper.
Strings in C are an array of characters. The built-in type for text characters in C is char. The char type holds a signed integer between -128 and 127. You can assign a single character to a variable of type char using single quotes, and you can print out an individual character in printf with the %c formatting specification:
char letterA = 'A';
printf("Character: %c\n", letterA);
The output for this would be:
Character: A
So, if a string is an array of characters, you may think you would declare the array for "hello" as follows:
char hello[5];
hello[0] = 'h';
hello[1] = 'e';
hello[2] = 'l';
hello[3] = 'l';
hello[4] = 'o';
This is close, but there's one major issue. There's no way to determine the length of an array in C, thus there's no way to determine the end of a string. Those clever C designers thought up a way around this by using what's called the null character. The null character has the integer value of zero and may be entered directly with single quotes using backslash zero, \0. In C, strings must end with a null character, thus the correct way to define an array for "hello" is:
char hello[6];
hello[0] = 'h';
hello[1] = 'e';
hello[2] = 'l';
hello[3] = 'l';
hello[4] = 'o';
hello[5] = '\0';
Because the last character in the array is the null character, strings in C are called null-terminated strings. The standard C library has many string functions. One such function is strlen, which returns the length of the string. There are many more, but we won't be covering them here because, as we will see, Objective-C handles strings differently than C.
C provides a shorthand notation for initializing arrays in one step, so we can alternately initialize the array:
char hello[] = {'h', 'e', 'l', 'l', 'o', '\0'};
This syntax allows us to drop the array size, as the compiler can figure it out for you. Even though this syntax is better than before, it's still cumbersome. That's where the double quotes come in. We can use double quotes to represent the exact same thing:
char hello[] = "hello";
The double quote syntax is called a C string literal. Just keep in mind that under the hood, string literals are still just null-terminated char arrays. Remember, too, that arrays are nearly the same as pointers in C, so yet another way to write this would be:
char * hello = "hello";
This char * type is the way you will see most strings declared in C. Often you will see const char *, too:
const char * hello = "hello";
The const keyword means that the contents of the string are constant and may not be modified. To print null-terminated strings using printf, use the %s formatting specification:
printf("Say: %s\n", hello);
While the null character and double quote syntax solve many issues of strings in C, they have one serious weakness that's not so easy to overcome: Unicode.
Unicode
Above, I mentioned that the char type holds an integer between -128 and 127. This means that every character needs an equivalent number value. The translation between number and text character is called an encoding. Back when C was invented, the most popular encoding was called American Standard Code for Information Interchange, or ASCII for short. ASCII encoded the English alphabet in upper and lower case, the digits 0 through 9 and a few other characters used for controlling teletype terminals. ASCII only has 127 characters specified, which is perfect for the char type. The problem with ASCII is that it only works for the English alphabet. It doesn't provide a way to represent accented characters or non-English alphabets, such as Russian, Greek, or any of the Asian languages, including Chinese or Japanese.
To solve this problem, computer scientists from around the world came together and created a master list of all human characters on Earth. The result is called Unicode, specifically the Universal Character Set. As you may imagine, the number of characters far exceeds the 255 available numbers of the char type. But these computer scientists were really smart. They created multiple Unicode encodings that map the Unicode code points (the Unicode terminology for character) into different tables. One such encoding is called UTF-8, and it's specifically designed for ASCII-based null-terminated systems, like C strings.
While using UTF-8 in C strings makes it possible to use Unicode in C, it's far from ideal. Many standard C functions don't work quite right with UTF-8, and dealing with UTF-8 for lots of string data can be slow. Because of this, Objective-C strings are not based on C strings.
By the way, fully covering Unicode and the different Unicode encodings would be an article in its own right. I'm only covering the basics needed to understand strings in C and Objective-C. If you want to learn more about Unicode, there are plenty of good resources on the Internet.
Objective-C Strings
To get around many of the limitations of C strings, Objective-C includes its own string class called NSString as part of the Foundation framework. You can create a new NSString instance from a UTF-8 encoded C string using the stringWithUTF8String: method:
NSString * hello =
[NSString stringWithUTF8String: "hello"];
Remember from our previous article on memory management that this class method creates an autoreleased instance of NSString. This means you don't have to worry about retaining or releasing it, so long as you are finished using it before the autorelease pool is released. There's a corresponding instance method constructor you can use, if you don't want an autoreleased object:
NSString * hello =
[[NSString alloc] initWithUTF8String: "hello"];
// Must call [hello release] when finished
While these are very handy for using the C string literals we've already been using, it's a bit long-winded for regular use. Thankfully, Objective-C also has its own syntax for string literals. It uses double quotes like C strings, but it uses an at sign ('@') before the first double quote:
NSString * hello = @"hello";
Notice that using the Objective-C syntax results in an instance of the NSString class. String literal instances are not autoreleased, but you shouldn't release them, either. They are allocated automatically before the main function is called and are automatically released when your application exits.
The nice thing about Objective-C strings is that they are full-blown objects. This means you can call methods on string instances. For example, to get the length of the string, you would use the length method:
printf("NSString length: %d\n", [hello length]);
If you want to get a UTF-8 string for use with C functions, you can use the UTF8String method:
printf("Say: %s\n", [hello UTF8String]);
Keep in mind the memory returned from UTF8String is also autoreleased. If you need it to stick around longer than the current autorelease pool cycle, you'll need to copy it into a new C string.
This is just the tip of the iceberg on what you can do with NSString. It's a very powerful class and works well with Unicode. As such, there are many more methods available. Since strings are so heavily used, we will no doubt be using more of these methods. I will explain them as we encounter them, but consult the documentation for a list of all available methods.
NSLog
As you saw above, to print out an NSString with printf, we had to convert it into a UTF-8 string. Not only is this a bit of a pain, but it can also be inefficient to convert the string into a new encoding. Unfortunately, since printf was designed only for C code, it cannot natively handle NSStrings. The Foundation framework comes with its own printing function called NSLog. It works very similarly to printf in that you can give it a string to print; however, you must pass it an NSString instead of a const char * C string:
NSLog(@"Hello world");
The output is a little different than printf. First, it automatically includes a newline on the end, so you do not need to use \n as the last character. It also includes extra information, such as the date and time, before the message. Here's the output when I ran the code above:
2008-01-07 15:30:35.539 objcstrings[13448:10b] Hello world
You can also use all the printf-style percent formatting specifications, like %d and %s. But it also comes with a new formatting specifier for Objective-C strings: %@. You would use it like this:
NSString * hello = @"hello";
NSLog(@"Say: %@", hello);
The resulting output should be similar to:
2008-01-07 15:39:46.147 objcstrings[13486:10b] Say: hello
Even though NSLog prints extra stuff, you will often see it used in Objective-C code rather than printf. This is mainly because the extra stuff printed is helpful for debugging GUI applications. If you want more control over the output of your text, you'll have to use printf with the UTF8String method. Since we are still writing command line applications, for now, I will use printf as the output has a lot less clutter.
Modifying Objective-C Strings
An NSString instance is not modifiable, meaning you cannot change the text of the string. You can create a new string, but you cannot modify its contents directly. A fancier way to say "not modifiable" is immutable. Thus NSString instances are said to be immutable. There is a class called NSMutableString that creates a string whose contents may be modified directly. There is no shortcut way to create them, so you must use one of the constructor methods. One of the methods to change the string is the appendString: method, which adds text to the existing text:
NSMutableString * helloWorld =
[NSMutableString stringWithString: @"hello"];
[helloWorld appendString: @" world"];
This results in the string @"hello world". NSMutableString is a subclass of NSString. This means you can call any method of an NSString such as UTF8String:
printf("Say: %s\n", [helloWorld UTF8String]);
Remember from our article on inheritance that you can use a subclass instance anywhere a superclass is used. Thus, you can pass in an NSMutableString anywhere an NSString is required.
Collections
While the string classes of the Foundation framework are extremely popular, the next most popular classes are called collections. Collections are classes whose sole purpose is to hold other objects. There are different collection classes depending on your needs, and we'll be looking at arrays and dictionaries.
Arrays
In previous articles, we've covered arrays in C. Arrays hold multiple values of the same type. C arrays are very limited in their functionality, though, and it's easy to use them incorrectly. The Foundation framework has a class for arrays named NSArray. It holds zero or more Objective-C objects and remembers their order. It is similar to an array in C, but it is a lot more flexible. To create an array you can use the arrayWithObjects: constructor:
NSArray * colors = [NSArray arrayWithObjects:
@"red", @"green", @"blue", nil];
This method creates an autoreleased array with three string elements. Note that the list of elements in the constructor is terminated with nil. It is important to not forget the terminating nil. If you do, you will most likely crash your program.
You can access individual elements of the array with the objectAtIndex: method. Just like C arrays, the index of the first element is 0, thus to access the second element, you'd use an index of 1:
NSString * green = [colors objectAtIndex: 1];
You can find out how many elements are in the array with the count method. Combining these two methods with a for loop, we print all the elements:
int i;
for (i = 0; i < [colors count]; i++)
{
NSString * color = [colors objectAtIndex: i];
printf("Color %d: %s\n", i, [color UTF8String]);
}
This should give you the following output:
Color 0: red
Color 1: green
Color 2: blue
NSArray objects are immutable. Just like NSString, there is a mutable subclass: NSMutableArray. A common method of mutable arrays is addObject: that adds an object to the end of the array:
NSMutableArray * animals = [NSMutableArray array];
[animals addObject: @"cat"];
[animals addObject: @"dog"];
Again, there are far more methods to NSArray and NSMutableArray to cover here, but you've learned enough to get you started.
Dictionaries
Another popular collection is the dictionary. Dictionaries manage pairs of keys and values. Dictionaries can also efficiently look up values by their keys. In other languages, dictionaries are known as hash tables or associative arrays. You can think of a dictionary as a lookup table. For example, let's say we have a table of countries and their capitol city:
Table 1: Countries and their capitals
Country Capitol
United States Washington, D.C.
England London
France Paris
Let's say we wanted to use the information in Table 1 to create a lookup table so we could quickly find the capital of a country. We could use a dictionary to do this. Each row in the table is a key/value pair. The country is the key, since that is what we are using as the lookup, and the capitol is the value.
The Foundation framework has a class called NSDictionary that is a dictionary implementation, with one minor change: it calls values "objects". We can create a new dictionary with the dictionaryWithObjectsAndKeys: class method, set the countries to be the keys, and set the capitals to be the values, or objects, as NSDictionary likes to call them:
NSDictionary * capitals =
[NSDictionary dictionaryWithObjectsAndKeys:
@"Washington, D.C.", @"United States",
@"London", @"England",
@"Paris", @"France",
nil];
Note again the key/value pair list is terminated with a nil. With this dictionary, we can now look up a capital (object) given a country (key) using the objectForKey: method:
NSString * capital = [capitals objectForKey: @"England"];
printf("Capital of England is %s\n", [capitol UTF8String]);
If the key has no corresponding object, then it returns nil.
As with arrays and strings, NSDictionary is immutable. If you want an updateable dictionary, use the NSMutableDictionary subclass.
Loose Ends
All collection classes retain their objects. This means that after you add an object to an array or dictionary, you may release your instance of it, if you no longer need it. The collection classes will release an object when it is removed from the collection, or they will release all their objects when they themselves get deallocated.
One downside to the collection classes in Foundation is that they can only hold Objective-C objects. This means that you cannot put standard C types, or primitive types, such as int and float, directly into a collection. Luckily, Objective-C has wrapper classes for primitive types, one of them being NSNumber. NSNumber is an immutable class that holds any primitive number type. Here's how you would put the integer 42 into an NSNumber, and then get it back out again:
NSNumber * theAnswer = [NSNumber numberWithInt: 42];
printf("The answer is %d\n", [theAnswer intValue]);
Since NSNumber is a full-blown Objective-C class, you can use it to put primitive numbers into arrays and dictionaries. I will demonstrate this shortly.
Election Counting
Putting together everything we've learned in this article, we're going to write a small application that tallies votes for an election. This seems rather fitting with 2008 being an election year in the United States. However, instead of voting for president, let's take a poll that asks people to vote on their favorite fruit. Since counting up votes can be tedious, we'd like to write an application to tally up the results. Using the classes in Foundation, this is actually pretty easy! Listing 1 shows the entire program. Read it over quickly, and then we'll walk through it.
Listing 1: tally.m: A vote tallying program
#import <Foundation/Foundation.h>
int main (int argc, const char * argv[])
{
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
NSString * results =
@"apple,orange,apple,cherry,banana,apple,banana,"
@"orange,apple,banana,cherry,banana,apple,orange";
NSArray * votes = [results componentsSeparatedByString: @","];
// Tally up the votes
NSMutableDictionary * tallies = [NSMutableDictionary dictionary];
int i;
for (i = 0; i < [votes count]; i++)
{
NSString * vote = [votes objectAtIndex: i];
NSNumber * currentTally = [tallies objectForKey: vote];
int newTally;
if (currentTally == nil)
{
// This is the first vote for this candidate
newTally = 1;
}
else
{
newTally = [currentTally intValue] + 1;
}
[tallies setObject: [NSNumber numberWithInt: newTally]
forKey: vote];
}
// Print out the results
int winningTally = 0;
NSString * winner;
NSArray * voteKeys = [tallies allKeys];
for (i = 0; i < [voteKeys count]; i++)
{
NSString * vote = [voteKeys objectAtIndex: i];
NSNumber * tally = [tallies objectForKey: vote];
printf("%10s: %d\n", [vote UTF8String],
[tally intValue]);
if ([tally intValue] > winningTally)
{
winner = vote;
winningTally = [tally intValue];
}
}
printf("\nAnd the winner is: %s!\n",
[winner UTF8String]);
[pool release];
return 0;
}
The general idea of this program is to take a comma-separated list of votes and tally them up. Once we have the tallies, we print the results and then the winner.
The first new syntax you'll see is how the results string is created. This shows how you can break up long string literals onto multiple lines. If you don't end the first line with a comma or semicolon, you can just start the second line like a normal string. The result is one big NSString.
The next new bit is how we split the results into individual votes. We use the componentsSeparatedByString: method to split the long, comma-separated string into an array of strings. It also removes the commas, so we have a nice, clean array of votes.
With each vote now in an array, we can proceed to calculating the tallies. We use a mutable dictionary to tally up the votes, where the key is the vote and the value is the current tally. The only complication is that we have to use the immutable NSNumber class to store the tally, since primitive types cannot be stored in a dictionary.
We loop through each vote in the votes array and look up its current tally in the tallies dictionary. Since initially the dictionary is empty, we may not get any tally back. If this is the case, then objectForKey: will return nil. We use this condition to set the next tally to 1. Otherwise, we add 1 to the current tally. We then package up the new tally as an NSNumber and store it back in the dictionary with the setObject:forKey: method. This method replaces any existing value, so our dictionary will only contain the current tally.
After we are done looping through all the votes, the tallies dictionary contains our voting results. Now we need to report the final results. The easiest way to do this is to loop through every key/value pair in the dictionary, and the easiest way to do that is to use the allKeys method of NSDictionary. It gives us every key in an array. We can then loop through each key, get its corresponding value, and print the tally. In order to make the results line up in columns, we use the %10s format specification for vote. This tells printf to pad out the string to 10 characters using spaces, if the string is less than 10 characters.
As we are looping through all the tallies to print them out, we also keep track of the maximum number of votes so we can find our winner. After reporting the final results, we print out our winner.
Okay, so what happens when we run this application? I get the following output:
banana: 4
cherry: 2
orange: 3
apple: 5
And the winner is: apple!
It was a close race, but ultimately apple (or is that Apple?) prevails. Well, as cool as this program is, it does have some limitations. First, the results string is stored directly in the program. This means if we need to update our results, we also have to recompile our program. Ideally, we would store our results in an external text file, but that would complicate this simple example a bit too much. If you want to try this yourself, though, look into the stringWithContentsOfFile:encoding:error: method of NSString. Here's a little hint to get you started. It reads a file named results.txt on your desktop into a string:
NSString * file = @"~/Desktop/results.txt";
file = [file stringByExpandingTildeInPath];
NSString * results =
[NSString stringWithContentsOfFile: file
encoding: NSUTF8StringEncoding
error: NULL];
Be careful, though. Even this code is not complete, as it ignores any errors that may occur. Production code should always handle errors. We will talk more about properly handling NSError later. If you want to try this out anyway, an added challenge would be to support votes on separate lines, instead of separated by commas.
Another limitation is that the results are not printed in any order. Ideally, we would print the results in ascending or descending order, by votes. The allKeys method does not guarantee what order the keys are in. In fact the order could be different every time we run it. Getting the keys back in a specific order requires some topics we haven't yet covered, such as selectors and comparators. If you want to learn more about this on your own, look into the keysSortedByValueUsingSelector: method of NSDictionary.
The final limitation is that we don't handle ties. We could use an array of winners, instead of a single winner, to fix this. This would be another fun modification to try on your own.
Conclusion
Well, we are making good progress! We've learned about the string, array, and dictionary classes that come as part of the Foundation framework. There's much more to Foundation, but with even this basic knowledge, we can do a lot. In fact, we can even begin to write GUI applications. Thus finally, next month, we will step away from the dark world of text-only command line applications, and start writing GUI applications.
Dave Dribin has been writing professional software for over eleven years. After five years programming embedded C in the telecom industry and a brief stint riding the Internet bubble, he decided to venture out on his own. Since 2001, he has been providing independent consulting services, and in 2006, he founded Bit Maki, Inc. Find out more at http://www.bitmaki.com/ and http://www.dribin.org/dave/.