Cocoa's New Tree-Based XML Parser
Volume Number: 21 (2005)
Issue Number: 6
Column Tag: Programming
Cocoa's New Tree-Based XML Parser
Using NSXML To Parse, Edit, And Write XML
by Jeff LaMarche
Introduction
This is the first part of a (two-part) article showing how to use a few of the great new pieces of Cocoa
functionality available to developers with the release of Tiger. The first--the focus of this article--is NSXML,
a set of Cocoa classes that implement a robust tree-based XML parser that provides the ability to create and
edit XML from scratch as well as from data contained in files and vended through web services.
The second new technology, which we'll concentrate on in the next installment, is Core Data, one of the
most astoundingly wonderful bits of technology Apple has ever given to the Mac development community.
In this month's article, we'll build a small Cocoa application that will retrieve book information from
Amazon's XML-based web service, parse the vended data into Cocoa objects, then let the user edit the data
before saving it to an XML file on their local disk.
Next month, we'll use Core Data and Cocoa Bindings to make a version of the same application with more
functionality that's easier to maintain and support, takes less time to build, and requires writing much less
code.
It's important to note here that you must have Tiger and the Tiger development tools installed, since
neither NSXML nor Core Data are available in earlier releases. So, fire up Xcode 2.0, create a new Cocoa
Application project titled Amazon-XML and let's dive in.
The Data model
Our first step is to build our data model the traditional Cocoa way. We have to create Cocoa classes to
hold our data. We'll also need to write code that will allow instances of our data model objects to be copied,
compared, and persisted. Go ahead and create a new objective-C class called MTBook in your newly created
project. Now give MTBook instance variables to hold our book data.
MTBook Instance Variable Declaration
We're going to store a subset of the data that Amazon's web service provides. The bibliography provides a
reference to the Amazon SDK where you can find out more about their SDK and XML format. The last two instance
variables do not come from Amazon, but rather are there to hold additional, user-provided information.
@interface MTBook : NSObject
{
NSString *isbn;
NSString *title;
NSString *publisher;
NSCalendarDate *dateReleased;
NSData *coverImage;
NSMutableArray *authors;
int salesRank;
NSURL *url;
NSCalendarDate *dateRead;
NSString *comment;
}
Of course, we need to add accessor and mutator methods for each of these variables, as well as code to
initialize the object, copy it, persist it, restore it, compare it, and release its memory when it's
deallocated. These aspects of building a data model are some of the very few tedious tasks in programming
Cocoa. Next month you'll see that these tedious steps are no longer necessary thanks to Core Data. But let's
not get ahead of ourselves.
These various methods are not shown in code listings; they are standard Cocoa tasks that have been written
about extensively elsewhere. I used Kevin Callahan's handy Accessorizer program to automatically generate the
accessors and mutators, then hand-coded the initializers as well as the other methods a data object needs,
such as code to copy an instance, compare one instance to another, or to encode and decode an instance. The
process of creating the MTBook class took a considerable amount of time, even though it's all very
straightforward. If you don't feel like doing all those tedious tasks (and who would blame you?), feel free to
copy MTBook from the issue CD.
You may be wondering why we would choose to store the cover image as an NSData instead of an NSImage. The
simple reason is that NSImage will not necessarily preserve the original image data at the same size and
resolution. To avoid re-compression and possible loss of image quality or resolution, we'll store the original
JPEG data as retrieved.
Architecture of NSXML
NSXML's functionality is provided by a small collection of classes, only a couple of which you will need to use.
Figure 1. NSXML Class Diagram
Most developers will only ever need to interact with NSXMLNode, NSXMLElement, and NSXMLDocument, though
some may use NSXMLDTD, and NSXMLDTDNode in their travels. Since our application is not concerned with creating
DTDs and we are going to assume that Amazon's XML validates correctly, we only need to look at the first
three:
NSXMLDocument represents an entire XML document as a Cocoa object;
NSXMLElement represent the various, nested nodes of an NSXMLDocument
NSXMLNode represent XML "nodes". Nodes can be just about any item in an XML document (including the
document itself), and as a result, NSXMLNode is the parent class of both NSXMLDocument and NSXMLElement. Most
methods you'll interact with will return an NSXMLNode.
Nodes can contain attributes and namespaces. An attribute is data stored within the node's opening XML tag
using key-value coding and is generally used to store metadata about the node. A namespace is a special kind
of attribute used to avoid name conflicts among elements and attributes. Most XML you'll encounter probably
won't use namespaces or will use a single, standard namespace. As a practical matter, you can often ignore
namespaces when parsing XML. Amazon's XML uses a single, standard W3C namespace, so we're going to ignore them
from here on in.
In addition to attributes and namespaces, NSXMLNodes can contain child nodes. Its children can, themselves,
contain child nodes (and so on). This nesting behavior gives XML a lot of its flexibility and power and is
what makes a tree-based parser such a good choice for many situations.
We need to take note of two other boxes on this diagram above. One is the light green box on the far right
that is titled (utils). We'll talk about that one shortly. The other is on the left side of the diagram and is
titled NSXMLParser. The downside of a tree-based XML parser is that it uses a fair chunk of memory and is
overkill for many situations (including ours, if it were a real application). NSXMLParser is a standalone
class that implements an event-driven XML parser. It's easy to use and has a much smaller footprint than
NSXML. NSXMLParser is beyond the scope of this article, but you should know it's there and use it when
appropriate.
Handy Additions to NSXML
There is one thing that NSXML lacks that many other tree-based XML parsers offer: the ability to retrieve a
child of a given node by its name. Since referring to children by name makes code much more readable, I've
created a category on NSXMLNode that adds this functionality to NSXMLNode objects.
Listing 1: NSXMLNode-utils.m
NSXMLNode utilities
This category on NSXMLNode adds two bits of useful functionality to NSXMLNode. The first method allows
retrieving a child node of a given name. The second converts all the children of a given node to their string
values, allowing you to retrieve an array of a node's children with on call. You'll see how both of these are
used later in the article.
@implementation NSXMLNode(utils)
- (NSXMLNode *)childNamed:(NSString *)name
{
NSEnumerator *e = [[self children] objectEnumerator];
NSXMLNode *node;
while (node = [e nextObject])
if ([[node name] isEqualToString:name])
return node;
return nil;
}
- (NSArray *)childrenAsStrings
{
NSMutableArray *ret = [[NSMutableArray arrayWithCapacity:
[[self children] count]] retain];
NSEnumerator *e = [[self children] objectEnumerator];
NSXMLNode *node;
while (node = [e nextObject])
[ret addObject:[node stringValue]];
return [ret autorelease];
}
@end
Building the Interface
The next step in building our application is to design the interface in Interface Builder. If you're
comfortable with the basics of using Interface Builder and how to link them to outlets and actions in your
code, feel free to skip ahead to the next section: Coding with NSXML.
We're going to build a relatively simple interface here, just enough to work with. First, we'll lay out our
interface so that it has fields corresponding to each instance variable in our data model, along with a button
to initiate the search, and another to save the revised XML to a file. Mine looks like this:
Figure 2. Our Window in Interface Builder
Next, we'll go to the Classes tab, right-click NSObject and create a subclass of NSObject which we'll call
MTXMLAppDelegate. This new class will act as both our application delegate and as our main window's controller
class. Control-drag from the File's Owner icon to the instance of MTXMLAppDelegate and bind it to the
delegate outlet. While we're here, control-drag from the Authors table to MTXMLAppDelegate and link to the
DataSource outlet.
Double-click on the MTXMLAppDelegate instance icon, which will open the object inspector where you can add
outlets and actions. Add actions called doSearch:, and saveToXML:. Also add the outlets shown in Figure 3,
which correspond to each of the user interface elements that we'll need access to in our code.
Figure 3. Outlets
Now control-drag from the Lookup button to the doSearch: outlet of MTXMLAppDelegate and repeat with the
ISBN text field. Now control-drag from the Save button and bind to the saveToXML: outlet. If we were creating
a real application where it was important to keep the data model synchronized with the interface in real-time,
we would also have to have an additional method that was called when any of the other field values changed. I
mention this only because real-time synchronization of the data model is yet another thing you get for free
using Core Data. For our limited purposes, we'll simply update the model when the Save button is pressed.
We also need to link our application delegate's outlets to the various fields in our window, so
control-drag from the MTXMLAppDelegate instance icon to each of the fields and bind to the corresponding
outlet. Once that is done, we're ready to write some code. Go ahead and generate the source files for
MTXMLAppDelegate, then go back to Xcode, open up MTXMLAppDelegate.h, and immediately following all the
automatically generated IBOutlet variables generated, add an MTBook instance variable called book, and don't
forget to include MTBook.h. Our doSearch: method is where we'll actually retrieve the XML data from Amazon and
work with it. A stub implementation of this method has been generated for us, so we'll add our code there.
Coding with NSXML
As with most Cocoa classes, the NSXML classes have been designed to be easy to use and to insulate you from
having to know too much about the inner workings of the class. To instantiate an NSXMLDocument based on data
retrieved from a URL, you simply allocate as normal and then initialize the object using
initWithContentsOfURL:options:error:. If you already have the XML in memory, you can initialize instead with
initWithData:options:error: instead.
The root node is available from an NSXMLDocument by using the rootElement method. An array of the child
nodes of any element can be retrieved using the children method of NSXMLNode, or you can request a specific
child by name using the childNamed: method from our utils category. To get the string contents of a node - the
value between the begin and end tags--simply call the stringValue method.
The easiest way to show how these various methods work is to show some code.
Listing 2: MTXMLAppDelegate.m
doSearch:
This method uses NSXML to parse the results of the XML data retrieved from an Amazon ISBN search then
stores that data as an MTBook object.
- (IBAction)doSearch:(id)sender
{
// More information about Amazon's XML web services is available from
// http://tinyurl.com/cb89g
// Here, we're just setting up the Amazon URL based on our assigned developer token
// (assigned to us by Amazon) and the ISBN number entered by the user
NSString *urlBase =
@"http://xml.amazon.com/onca/xml3?t=1&dev-t=%@&AsinSearch=%@&type=heavy&f=xml";
NSString *urlString = [NSString stringWithFormat:urlBase,
AMAZON_DEV_TOKEN,
[isbnTF stringValue];
NSURL *theURL = [NSURL URLWithString:urlString];
// NSXML doesn't throw an exception on bad allocation, nor does it simply return a
// nil. NSXML acts old-school in some ways; it wants a pointer to a variable which it
// will populate with an error code if there's a problem
NSError *err=nil;
// Initialize our document with the XML data in our URL
NSXMLDocument *xmlDoc = [[NSXMLDocument alloc]
initWithContentsOfURL:theURL
options:nil
error:&err];
// If we were doing proper error checking, we'd check the value of err here. We'll
// skip it and simply check for xmlDoc == nil later. This method works fine if you
// don't intend to give the specifics of any problems encountered to the end user.
// Get a reference to the root node
NSXMLNode *rootNode = [xmlDoc rootElement];
// In case of an error, Amazon includes a node called ErrorMsg, its presence tells us
// that an error happened not in parsing the XML, but rather on Amazon's side, so we
// check for it
NSXMLNode *errorNode = [rootNode childNamed:@"ErrorMsg"];
if (rootNode == nil || errorNode != nil)
{
// We'll provide an error message in the URL field, blank the others, then abort.
// objectValue is a method that will give you the contents of a node as an
// NSObject, which will always be a string when first parsed from XML,
// but which might be another type of Cocoa object such as an NSCalendarDate
// if you created the NSXMLDocument programmatically as we'll do later.
// NSXMLDocument is smart enough to automatically convert commonly used
// objects to the correct W3C formatting when outputting them to XML
[urlTF setStringValue:[errorNode stringValue]];
[commentsTF setStringValue:@""];
[coverImage setImage:nil];
[isbnTF setStringValue:@""];
[publisherTF setStringValue:@""];
[salesRankTF setStringValue:@""];
[titleTF setStringValue:@""];
return;
}
else
{
// If no error, the book information is contained in a node called Details, so we'll
// grab it. Though the method actually returns an NSXMLNode, we know that, as a
// child of a document, we'll actually get an NSXMLElement. We declare it as an
// NSXMLElement to avoid casting it later
NSXMLElement *detailsNode = [rootNode
childNamed:@"Details"];
// Release memory from any prior book, and start with a fresh object
[book release];
book = [[MTBook alloc] init];
// Now set the fields of our data model from singleton child nodes
// NSStrings are easy - we just call the stringValue method on the node
[book setIsbn:
[[detailsNode childNamed:@"Asin"] stringValue]];
[book setTitle:
[[detailsNode childNamed:@"ProductName"]
stringValue]];
[book setPublisher:
[[detailsNode childNamed:@"Manufacturer"]
stringValue]];
// Since we created this NSXMLDocument from parsed XML, we're going to get
// NSStrings when we call objectValue, so dates, numbers, and others have
// to be manually converted from strings
[book setDateReleased:[NSCalendarDate
dateWithNaturalLanguageString:[[detailsNode
childNamed:@"ReleaseDate"] stringValue]]];
// Sales rank is a scalar value
[book setSalesRank:[[[detailsNode
childNamed:@"SalesRank"] stringValue] intValue]];
// URL is stored as an attribute of the Details node, not as child element. The
// syntax for getting attributes is slightly different than for child elements
[book setUrl:[NSURL URLWithString:[[detailsNode
attributeForName:@"url"] stringValue]]];
// The XML we retrieved doesn't have the cover image, but it does have a URL
NSURL *imageURL = [NSURL URLWithString:[[detailsNode
childNamed:@"ImageUrlLarge"] stringValue]];
[book setCoverImage:[NSData
dataWithContentsOfURL:imageURL]];
// Since there can be more than one author of a book, Authors are stored in a child
// list of nodes. Fortunately, we now have a handy method for reducing children to
// an array of their string values thanks to our category on NSXMLNode
NSXMLNode *authorsNode = [detailsNode
childNamed:@"Authors"];
[book setAuthors:[NSMutableArray
arrayWithArray:[authorsNode childrenAsStrings]]];
// We'll default the date last read to today, just to be user friendly
[book setDateRead:[NSCalendarDate date]];
// Okay, now our data model is populated... but how do we make the interface
// reflect these changes? We call a method (next listing) that does it the traditional
// Cocoa way
[self updateInterface];
}
// We allocated it, we're responsible for releasing it
[xmlDoc release];
}
Okay, so doSearch: gets the data, parses it, and populates our model. There's a missing link, however,
which is to make the user interface reflect our data model. That's handled in a method cryptically called
updateInterface, which simply copies the various values from our data model into the corresponding fields of
the user interface, another tedious but straightforward bit of code writing.
But wait a second. It's not as straightforward as it first appears; there is potentially more than one
author, and we've chosen an NSTableView for displaying them to the user. We can't just give an NSTableView an
array to display. We need a table data source! For our simple application, we'll just implement the three
required table data source methods right in MTXMLAppDelegate (which we earlier specified as the tables' data
source in Interface Builder, you may recall). We mention these tasks here, without showing the code, to give a
point of comparison for next month's article where we'll use Core Data to eliminate most of these tedious,
time-consuming aspects of building a Cocoa application.
Once we've taken care of updating the interfaces and implementing the table data source methods, we can
compile and run our application. Type in the ISBN number from the back of a book you've got laying around,
click Lookup, and it should automatically populate all of the fields except for date read and comments. If
there's a problem with the ISBN, the error message will display in the lower left of the window.
Figure 4. The Compiled Application
Not much spit and polish, but it works! One task still remains, however, which is to write the code to
save the data, with changes, to an XML file. Since we didn't retain the original data we retrieved from
Amazon, we'll have to create XML from scratch.
Listing 3: MTXMLAppDelegate.m
saveToXML:
This method creates a new NSXMLDocument, populates it with the data from the user interface, then generates
an XML file on disk from that data.
- (IBAction)saveToXML:(id)sender
{
// Every XML document needs a root element, so we create a root element, then use it
// to create our new document
NSXMLElement *root = [[NSXMLElement alloc]
initWithName:@"Book"];
NSXMLDocument *doc = [[NSXMLDocument alloc]
initWithRootElement:root];
[root release];
// We'll store the source URL where metadata should be stored: as an attribute rather
// than as a child element
NSDictionary *attr = [NSDictionary
dictionaryWithObject:[urlTF stringValue]
forKey:@"url"];
[root setAttributesAsDictionary:attr];
// Now we'll store the strings
[root addChild:[NSXMLElement elementWithName:@"Comment"
stringValue:[commentsTF stringValue]]];
[root addChild:[NSXMLElement elementWithName:@"Title"
stringValue:[titleTF stringValue]]];
[root addChild:[NSXMLElement elementWithName:@"Publisher"
stringValue:[publisherTF stringValue]]];
[root addChild:[NSXMLElement elementWithName:@"ISBN"
stringValue:[isbnTF stringValue]]];
// We could add the sales rank as a string, but we'll do it as a number to show how
NSXMLElement *rank = [NSXMLElement
elementWithName:@"SalesRank"];
[rank setObjectValue:[NSNumber numberWithInt:
[salesRankTF intValue]]];
[root addChild:rank];
// Dates can go in either as string or as NSDate, we'll store them as NSDates
NSXMLElement *publishedDate = [NSXMLElement
elementWithName:@"DatePublished"];
[publishedDate setObjectValue:
[publishedDateDP dateValue]];
[root addChild:publishedDate];
NSXMLElement *readDate = [NSXMLElement
elementWithName:@"DateRead"];
[readDate setObjectValue:[lastReadDP dateValue]];
[root addChild:readDate];
// We'll store the image's binary data. Letting the user change the image is
// beyond the scope of this article, so we'll just pull the image from our data model
// object instead of the NSImageView.
NSXMLElement *cover = [[NSXMLElement alloc]
initWithName:@"Cover"];
[cover setObjectValue:[book coverImage]];
[root addChild:cover];
// Up to now, we've used convenience class methods that return an autoreleased
// NSXMLElement. This time we allocated it ourselves to show how. Since
// we allocated it, we have to release it. This way is more memory efficient because
// the object is released right away without going into the autorelease pool. The
// downside is that we have to write more lines of code.
[cover release];
// Finally, the data in the author's table automatically gets updated in our data model
// thanks to our table data source methods, so we'll pull this array right from the data
// model as well
NSXMLElement *authors = [NSXMLElement
elementWithName:@"Authors"];
int i;
for (i=0; i < [[book authors] count]; i++)
{
// Since we're in a loop of unknown size, we'll be kind and not use the
// autorelease pool.
NSXMLElement *author = [[NSXMLElement alloc]
initWithName:@"Name"];
[author setStringValue:
[[book authors] objectAtIndex:i]];
[authors addChild:author];
[author release];
}
[root addChild:authors];
// At this point, all of our data is now contained in our NSXMLDocument. Let's write
// it to a file. The option NSXMLNodePrettyPrint tells it to use tabs and other
// whitespace to make the output file easier on human eyes. You can pass nil if you
// don't care how it looks
NSData *xmlData =
[doc XMLDataWithOptions:NSXMLNodePrettyPrint];
// If this were a real app, we would present a sheet and ask where to save. But this is a
// demo, so we're just going to dump the file onto the desktop.
[xmlData writeToFile:[@"~/Desktop/book.xml"
stringByExpandingTildeInPath] atomically:YES];
// Memory cleanup
[doc release];
}
Now, let's compile and run again, plug in an ISBN and press the Save button. If all went according to plan,
there will now be a file called book.xml on your desktop. Open this file up in a text editor or XML editor,
and you should see our edited data as well-formed XML. Since we specified NSXMLNodePrettyPrint, it should
appear in a human readable format with indents and line breaks, as you see in Figure 5 (although I've removed
the binary image data from the XML in the screenshot).
Figure 5. Output XML
Bring It On Home
In this project, we've used NSXML to parse XML into Cocoa objects and to create XML from Cocoa objects.
There's even more you can do with NSXML: You can delete nodes and edit nodes in place. You can create DTDs
programmatically and validate existing XML against its DTD to make sure it conforms. From here, you should be
able to figure out any XML-related task you need from Apple's developer documentation: NSXML is quite easy to
use.
But speaking of "easy to use", you probably noticed that, in a few places, I mentioned certain aspects of
building this Cocoa application that were not quite as developer-friendly as one would hope, such as creating
data model classes and synchronizing the user interface with the data model? Well, have no fear! Next month,
we're going to dive into an even more amazing new technology: Core Data. We're going to recreate this
application with more functionality in less time by writing less code.
References
"Namespaces in XML". W3C. World Wide Web Consortium. 6 June 2005. < http://www.w3.org/TR/REC-xml-names/>
"Introduction to Tree-Based XML Programming Guide for Cocoa". Apple Developer Connection. Apple Computer, Inc. 6 June 2005. < http://developer.apple.com/documentation/Cocoa/Conceptual/NSXML_Concepts/index.html>
Accessorizer. Kevin Callahan. 6 June 2005. < http://www.kevincallahan.org/software/accessorizer.html>
"Amazon Web Services Documentation". Amazon.com. Amazon. 6 June 2005. < http://www.amazon.com/gp/browse.html/002-5739587-3639218?_encoding=UTF8&node=3487571>
Jeff LaMarche wrote his first line of code in Applesoft Basic on a Bell & Howell Apple //e in
1980 and he's owned at least one Apple computer at all times since. Though he currently makes his living
consulting in the Mac-unfriendly world of "Enterprise" software, his Macs remain his first and greatest
computer love. You can reach him at jeff_lamarche@mac.com.