Formatting HTTP Messages
Volume Number: 19 (2003)
Issue Number: 3
Column Tag: Cocoa Development
Formatting HTTP Messages
Integrating web content with desktop applications - Part 2 of a 3-part series
by Fritz Anderson
Introduction
In last month's article, we saw how easy it is to incorporate web content into Cocoa applications using the NSURL and NSURLHandle classes. These classes issue HTTP GET requests, in which all the information that makes up the query is encoded in the URL. Browsers send GET requests in the form of an empty HTTP "envelope," a message with headers, but no body.
However, much of the information that is available through HTTP servers is unavailable through GET queries; they are available only through POST queries. The POST request is sent to a relatively simple URL, with the HTTP envelope filled with a string of key-value pairs that specify the parameters for the query. The usual way a browser fills these in is from an HTML form on a web page.
Once again, we'll be using the Library of Congress's legislative-history database Thomas for our examples. Thomas provides basic bill-lookup services in response to GET requests, but more sophisticated queries, such as a request for all measures sponsored by a certain member, must be done through POSTs.
Unfortunately, Cocoa does not provide a POST facility to match the ease-of-use of its GET methods. This month's article, and its companion in next month's issue, will show how to use Core Foundation's CFNetwork package to add a class to Cocoa to perform POST queries. This article will concentrate on getting the query ready for transmittal, with a brief pause for a tourist's snapshot of Core Foundation. Next month, we'll cover how CFNetwork helps manage the query transaction itself, and wrap up what we've learned into an Objective-C class that hides as much of the dirty work as possible.
From Form to Substance
Listing 1 shows how the Library of Congress recommends a form be set up to provide a web-page button that lists all the bills sponsored by Congressman Morella in the 104th Congress:
Listing 1: A simple POST setup in HTML
Name-search button
<FORM ACTION="http://thomas.loc.gov/cgi-bin/bdquery"
METHOD=POST>
<input name="Dbd104" type=hidden value="d104">
<input name="srch" type=hidden value="/bss/d104query.html">
<input name="TYPE1" type=hidden value="bimp">
<input name="HMEMB" type=hidden value="MORELLA">
<input name="Sponfld" type=hidden value="SPON">
<input type="submit"
value="Bills Sponsored by Rep. Morella (104th Congress)">
</FORM>
The <input> tags pair keys in the name elements with values in the value elements. Pressing the submit button in the form causes the query to be sent by the method specified in the <form> tag--POST, in this case. Like many web-based reference services, this query is available only through a POST.
Listing 2 shows what Navigator, the Chimera browser, sends when the Submit button is pressed.
Listing 2: A POST query
HTTP header
Note: the '-' character denotes a line break for readability, and neither it nor the line break are
in the message as actually transmitted.
POST /cgi-bin/bdquery HTTP/1.1
Host: thomas.loc.gov
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; -
en-US; rv:1.0 1) Gecko/20021104 Chimera/0.6
Accept: text/xml,application/xml,application/xhtml+xml,-
text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,-
image/jpeg,image/gif;q=0.2,text/css,*/*;q=0.1
Accept-Language: en-us, en;q=0.50
Accept-Encoding: gzip, deflate, compress;q=0.9
Keep-Alive: 300
Connection: keep-alive
Referer: http://thomas.loc.gov/home/example.html
Content-Type: application/x-www-form-urlencoded
Content-Length: 78
Message body
Dbd104=d104&srch=%2Fbss%2Fd104query.html&TYPEl=bimp&-
HMEMB=MORELLA&Sponfld=SPON
HTTP separates all messages into two parts, a header and a body. The body contains the "meat" of the communication, and the header tells the recipient how to interpret the body and provides additional information that may be useful in framing the reply. The whole message is sent to the server, which responds with a web page embodying the result of the query.
Making the Bed
The process of making a POST query, then, falls into three tasks: framing the query, sending it, and receiving the reply. This article covers the task of framing the query: Taking user input and getting it into the proper format for a POST message.
To provide a test-bed for this step, we can use Interface Builder to lay out a simple Cocoa application, similar to the one we used in last month's article. The application shows a window with a text field for a congressman's name, a field for the Congress number, an NSTextView to hold the formatted POST query we'll be building, and an NSButton to trigger the build process.
Figure 1. A testbed for building POST messages.
Taking Action
Everything our BuildPOSTQuery application does happens in the doQuery: method of the BuildPOSTController object. This is the method I designated in Interface Builder as the action method for the Build button in the application's window. Let's step through doBuild: piece-by piece.
We are going to use Core Foundation to fill in an HTTP message buffer using information we supply. We start by calling CFHTTPMessageCreateRequest, which allocates the necessary memory and returns a reference to the message buffer and its associated data. Core Foundation does some initialization at this time, so we pass in the type of request (a CFString of "POST"), the full URL the message is going to (as a reference to a CFURL), and the version of the Hypertext Transfer Protocol the message is to claim to be in (1.1).
CFHTTPMessageRef message;
NSURL * thomas = [NSURL URLWithString:
@"http://thomas.loc.gov/cgi-bin/bdquery"];
message = CFHTTPMessageCreateRequest(
kCFAllocatorDefault,
CFSTR("POST"),
(CFURLRef) thomas,
kCFHTTPVersion1_1);
CFHTTPMesssage is an opaque data type, meaning it has properties but you can reach them only through its API. Among its properties is a dictionary of header fields, the key-value pairs you see in the few lines following the first in Listing 2. We set a few of these:
CFHTTPMessageSetHeaderFieldValue(message,
CFSTR("User-Agent"),
CFSTR("Generic/1.0 (Mac_PowerPC)"));
CFHTTPMessageSetHeaderFieldValue(message,
CFSTR("Content-Type"),
CFSTR("application/x-www-form-urlencoded"));
CFHTTPMessageSetHeaderFieldValue(message,
CFSTR("Host"),
(CFStringRef) [thomas host]);
CFHTTPMessageSetHeaderFieldValue(message,
CFSTR("Accept"),
CFSTR("text/html"));
Each call is addressed to the message, and passes the name of the desired field, and the desired value for the field.
A couple of things are worth noticing here. Though thomas is an NSURL pointer and [thomas host] is an NSString pointer, it is perfectly OK to pass them to Core Foundation routines that expect CFURLRef and CFStringRef respectively. Most classes in Cocoa's Foundation framework are "toll-free bridged" to their equivalents in Core Foundation. You can pass Core Foundation references in Cocoa messages and Cocoa Foundation object references to Core Foundation library routines. All that's necessary is a cast to silence the compiler warning about the unexpected pointer type.
The documentation for individual Cocoa classes will indicate whether they are toll-free bridged to Core Foundation, and to which types they are bridged. Use the global find panel in Project Builder to search the documentation for the phrase "toll-free" to get a complete list.
Another thing to notice is that we have specified that the User-Agent -- basically, the browser making the query -- is running on a PowerPC Macintosh. Server administrators may make decisions on whether to keep their sites Mac-friendly based on logs that capture User-Agent headers. Fly the flag when you can!
Core Foundation: Stepping Back
We're now at a point where we can step back and see some unifying principles that help in reading and writing Core Foundation programs. Core Foundation becomes much easier to understand and use if you see the underlying philosophy in the APIs.
- CF is built around references to data types--strings, streams, dates, HTTP messages--that are not objects, but observe the same object-oriented disciplines of opacity and structured access. So I'm going to call them "objects" anyway.
- Functions in a CF object's API take the object reference as the first parameter. Their names begin with the name of the object's type. Constants related to an object type also begin with the name of the object's type. This contributes to the fabulous length of names in the CF toolbox, but once you are used to the naming conventions, you can often guess the existence of a CF function from what the conventions say its name should be.
- CF objects are reference-counted for memory management, similarly to NSObjects, and you observe the same discipline: If you got an object reference from a function with Copy or Create in its name, or if you explicitly call CFRetain on an object reference, you're responsible for making a balancing call to CFRelease.
- There is no equivalent to autorelease in Core Foundation.
- CF Create functions are addressed to an "allocator" that is responsible for the memory being requested. The first parameter of these functions will therefore be a reference to an allocator. Almost always, you want to use the system-provided allocator, which you can specify by passing NULL, or its synonym kCFAllocatorDefault.
Onward to Content
There is one more header field we'd like to add: Content-Length. But we don't know how long the content of the message is, because we haven't formatted it yet.
Let's start by agreeing we won't do this the obvious way. We can look at the body of the HTTP message in Listing 2,
Dbd104=d104&srch=%2Fbss%2Fd104query.html&TYPEl=bimp
&HMEMB=MORELLA&Sponfld=SPON
... and instantly parameterize it with a single Objective-C statement:
query = [NSString stringWithFormat:
@" Dbd%d=d%d&srch=%2Fbss%2Fd%dquery.html"
@"&TYPEl=bimp&HMEMB=%@&Sponfld=SPON",
congress, congress, congress, member];
The resulting query string will be correct for the purpose. Presenting it to the server will yield the desired results. It's a good solution.
But it isn't the solution we'll be using. Remember that our goal in this series is not just to do this one class of POST queries, but eventually to write an Objective-C class that will cover any POST query. So let's approach the problem like API designers, even if that approach is excessive for this one application.
Our goal is to produce a message in the format corresponding to the MIME type application/x-www-form-urlencoded. The format is a chain of items, separated by ampersands; the items are pairs of strings joined by equal-signs; and the strings are UTF-8, encoded in a reduced character set, one that includes the letters, numerals, and the characters "@._-*" as-is, with + substituting for the space character, and % followed by a hexadecimal byte code substituting for any other character.
The outer two layers of the format are easy to represent in the API we are building, because we (and the users of our API) already have a way of building lists of key-value pairs: It's the NSMutableDictionary. We can use NSDictionary objects as initializing "constants" for POST messages, and use the NSMutableDictionary API to add elements to messages. All we have to do is provide a way to "flatten" the dictionary into a URL-encoded query when we're ready to send it.
Knowing that our POST-query class will use an NSMutableDictionary, let's practice by loading one up. We can start with the parts that never change:
NSMutableDictionary * postData =
[NSMutableDictionary dictionaryWithObjectsAndKeys:
@"bimp", @"TYPE1",
@"SPON", @"Sponfld",
nil];
Next, we harvest the member name and congress number from the UI, and incorporate them into the query-specific pairs:
int congress = [congressField intValue];
NSString * member = [memberName stringValue];
[postData setObject: [NSString stringWithFormat:
@"d%d", congress]
forKey: [NSString stringWithFormat:
@"Dbd%d", congress]];
[postData setObject: [NSString stringWithFormat:
@"/bss/d%dquery.html", congress]
forKey: @"srch"];
[postData setObject: member forKey: @"HMEMB"];
And with the dictionary complete, we flatten it to the proper encoding, reduce it to a block of ASCII characters (remember that NSStrings are Unicode), and make that data the body of the query message.
NSString * postString = [postData webFormEncoded];
NSData * postStringData = [postString
dataUsingEncoding: kCFStringEncodingASCII
allowLossyConversion: YES];
CFHTTPMessageSetBody(message,
(CFDataRef) postStringData);
Now that we know how big the body is, we can set the Content-Length header.
CFHTTPMessageSetHeaderFieldValue(message,
CFSTR("Content-Length"),
(CFStringRef) [NSString stringWithFormat: @"%d",
[postStringData length]]);
Flattening the Dictionary
The flattening of the parameter dictionary happens in the message [postData webFormEncoded]. Listing 3 shows two simple categories on NSString and NSDictionary that "flatten" an NSDictionary and encode its keys and values in URL-encoded query format. The methods are straightforward: webFormEncoded loops through each key-value pair, passes each string to escapedForQueryURL, and joins the half-pairs with equal-signs and the pairs with ampersands.
Character sets and old habits: An aside
Now is the time for a shameful admission. My first draft of escapedForQueryURL began this way:
unsigned char * source = [self lossyCString];
... and from there, I walked the C string to the terminating null character as I'd done for years. My guilty conscience showed through even in that one line of code--in the use of lossyCString and not cString. Macintosh is a Unicode system. Your code will see non-ASCII characters. I used lossyCString instead of cString because cString will throw an exception if the NSString contains characters that don't exactly map to ASCII. And then I'd have to know what to do about the exception. LossyCString prevents the exception at the expense of potential loss of data in the conversion. In this application, the Thomas server doesn't expect non-ASCII characters, so there is no harm done.
But it isn't the Right Thing. The RFC, fortunately, specifies what to do with non-ASCII characters--pass them as UTF-8 Unicode--and Apple makes correct string-encoding routines easily available. It's there. Use it. Even if it isn't needed in this application, it's easier to write code that works when written, than to make it work right later when you do need the correct behavior.
There's another reason not to use the cString method: It's deprecated. Apple has announced in the release notes for Mac OS X 10.2 that it expects to remove the method from the public API in a future release.
The Result
Now that all the information has been put into the CFHTTPMessage, we have a completed message in the buffer, ready for display. We ask for a copy of a CFData object (which is toll-free bridged to NSData, remember) embodying the "serialized" form of the message.
NSData * theRequest =
(NSData *) CFHTTPMessageCopySerializedMessage(message);
We can then turn the byte buffer into an NSString, the NSString into an NSAttributedString (which is the form NSTextStorage wants), put the results into our window's NSTextView, and throw out the garbage along the way:
NSString * reqAsString =
[NSString stringWithCString: [theRequest bytes]
length: [theRequest length]];
NSAttributedString * styledText =
[[NSAttributedString alloc]
initWithString: reqAsString];
[[resultText textStorage]
setAttributedString: styledText];
CFRelease(theRequest);
CFRelease(message);
[styledText release];
And with that, the perfectly-formatted POST request appears in the window's text area. It isn't going anywhere, but that's the subject of our next article. See you next month!
Listing 3a: httpFlattening.h
/*
* httpFlattening.h
* CocoaPOST
*/
#include <Foundation/Foundation.h>
NSString
@interface NSString (httpFlattening)
- (NSString *) escapedForQueryURL;
@end
NSDictionary
@interface NSDictionary (httpFlattening)
- (NSString *) webFormEncoded;
@end
Listing 3b: httpFlattening.m
/*
* httpFlattening.m
* CocoaPOST
*/
#import "httpFlattening.h"
@implementation NSString (httpFlattening)
InitializePassthrough
Fill a table of BOOLs with YES wherever the corresponding character code is legal for inclusion in a
query URL.
static void
InitializePassthrough(BOOL * table)
{
int i;
for (i = '0'; i <= '9'; i++) {
table[i] = YES;
}
for (i = '@'; i <= 'Z'; i++) {
table[i] = YES;
}
for (i = 'a'; i <= 'z'; i++) {
table[i] = YES;
}
table['*'] = YES;
table['-'] = YES;
table['.'] = YES;
table['_'] = YES;
}
- [NSString escapedForQueryURL]
Return the target string, encoded in UTF-8, with spaces replaced with +, and with characters outside
the set [-@*._0-9A-Za-z] encoded as % and the hexadecimal byte code. This is the standard encoding for
query parameters in a URL or a POST query.
- (NSString *) escapedForQueryURL
{
NSData * utfData = [self
dataUsingEncoding:
NSUTF8StringEncoding];
unsigned char * source = [utfData bytes];
unsigned char * cursor = source;
unsigned char * limit = source + [utfData length];
unsigned char * startOfRun;
NSMutableString * workingString = [NSMutableString
stringWithCapacity:
2*[self length]];
static BOOL passThrough[256] = { NO };
if (! passThrough['A']) {
// First time through, initialize the pass-through table.
InitializePassthrough(passThrough);
}
startOfRun = source;
while (YES) {
// Ordinarily, do nothing in this loop but advance the cursor pointer.
if (cursor == limit || ! passThrough[*cursor]) {
// Do something special at end-of-data or at a special character:
NSString * escape;
int passThruLength = cursor - startOfRun;
// First, append the accumulated characters that just pass through.
if (passThruLength > 0) {
[workingString appendString:
[NSString stringWithCString: startOfRun
length: passThruLength]];
}
// Then respond to the end of data...
if (cursor == limit)
break;
// ... by stopping
// ... or to a special character...
if (*cursor == ' ')
escape = @"+";
// ... by replacing with '+'
else
escape = [NSString stringWithFormat:
@"%%%02x", *cursor];
// ... or by %-escaping
[workingString appendString: escape];
startOfRun = cursor+1;
}
cursor++;
}
return workingString;
}
@end
@implementation NSDictionary (httpFlattening)
- [NSDictionary webFormEncoded]
Return the key-value pairs in the dictionary, with the keys and values encoded as query parameters,
paired by =, and delimited with &. This is the format for a full set of named parameters in a
URL-coded query.
- (NSString *) webFormEncoded
{
NSEnumerator * keys = [self keyEnumerator];
NSString * currKey;
NSString * currObject;
NSMutableString * retval = [NSMutableString
stringWithCapacity: 256];
BOOL started = NO;
while ((currKey = [keys nextObject]) != nil) {
// Chain the key-value pairs, properly escaped, in one string.
if (started)
[retval appendString: @"&"];
else
started = YES;
currObject = [[self objectForKey: currKey]
escapedForQueryURL];
currKey = [currKey escapedForQueryURL];
[retval appendString: [NSString stringWithFormat:
@"%@=%@", currKey, currObject]];
}
return retval;
}
@end
Listing 4a. BuildPOSTController.h
//
// BuildPOSTController.h
// BuildPOSTQuery
// Copyright (c) 2002 Frederic F. Anderson
//
#import <Cocoa/Cocoa.h>
class BuildPOSTController
A simple controller class generated by Interface Builder.
@interface BuildPOSTController : NSObject
{
IBOutlet NSTextField *congressField;
IBOutlet NSButton *fetchButton;
IBOutlet NSTextField *memberName;
IBOutlet NSTextView *resultText;
}
- (IBAction) doBuild: (id) sender;
@end
Listing 4b. BuildPOSTController.m
//
// BuildPOSTController.m
// BuildPOSTQuery
// Copyright (c) 2002 Frederic F. Anderson
//
#import "BuildPOSTController.h"
#import "httpFlattening.h"
@implementation BuildPOSTController
doBuild:
The action method for the Fetch button. It formats the entries in the text fields into a POST query
and displays the query in the NSTextView.
- (IBAction) doBuild: (id) sender
{
CFHTTPMessageRef message;
NSURL * thomas = [NSURL URLWithString:
@"http://thomas.loc.gov/cgi-bin/bdquery"];
// Allocate and initialize a CFHTTPMessage
message = CFHTTPMessageCreateRequest(
kCFAllocatorDefault,
CFSTR("POST"),
(CFURLRef) thomas,
kCFHTTPVersion1_1);
// Set the message headers we know about
CFHTTPMessageSetHeaderFieldValue(message,
CFSTR("User-Agent"),
CFSTR("Generic/1.0 (Mac_PowerPC)"));
CFHTTPMessageSetHeaderFieldValue(message,
CFSTR("Content-Type"),
CFSTR("application/x-www-form-urlencoded"));
CFHTTPMessageSetHeaderFieldValue(message,
CFSTR("Host"),
(CFStringRef) [thomas host]);
CFHTTPMessageSetHeaderFieldValue(message,
CFSTR("Accept"),
CFSTR("text/html"));
// Allocate a mutable dictionary to hold query parameters.
// Initialize it with some invariant params.
NSMutableDictionary * postData =
[NSMutableDictionary dictionaryWithObjectsAndKeys:
@"bimp", @"TYPE1",
@"SPON", @"Sponfld",
nil];
// Harvest the user-entered parameters and put them into
// the parameter dictionary
int congress = [congressField intValue];
NSString * member = [memberName stringValue];
[postData setObject: [NSString stringWithFormat:
@"d%d", congress]
forKey: [NSString stringWithFormat:
@"Dbd%d", congress]];
[postData setObject: [NSString stringWithFormat:
@"/bss/d%dquery.html", congress]
forKey: @"srch"];
[postData setObject: member forKey: @"HMEMB"];
// Flatten the parameter dictionary into a web-form encoded
// string, and pack the Unicode string down to ASCII data.
NSString * postString = [postData webFormEncoded];
NSData * postStringData = [postString
dataUsingEncoding: kCFStringEncodingASCII
allowLossyConversion: YES];
// Put the parameter data into the message body.
CFHTTPMessageSetBody(message,
(CFDataRef) postStringData);
// Add the message length to the HTTP message header.
CFHTTPMessageSetHeaderFieldValue(message,
CFSTR("Content-Length"),
(CFStringRef) [NSString stringWithFormat: @"%d",
[postStringData length]]);
// Ask the CFHTTPMessage for a buffer containing the serialized POST message
NSData * theRequest =
(NSData *) CFHTTPMessageCopySerializedMessage(message);
// Make the message data into a string, and the string
// into an attributed string
NSString * reqAsString =
[NSString stringWithCString: [theRequest bytes]
length: [theRequest length]];
NSAttributedString * styledText =
[[NSAttributedString alloc]
initWithString: reqAsString];
// Put the attributed string with the POST message into the NSTextView
[[resultText textStorage]
setAttributedString: styledText];
CFRelease(theRequest);
CFRelease(message);
[styledText release];
}
@end
Fritz Anderson has been programming and writing about the Macintosh since 1984. He works (and seeks work) as a consultant in Chicago. You can reach him at fritza@manoverboard.org.