OPENSTEP Sockets
Volume Number: 13 (1997)
Issue Number: 7
Column Tag: Rhapsody
OPENSTEP Sockets
by Jason Proctor, BroadQuest, Inc.
Network programming in the brave new world of BSD Sockets, a networking interface under Rhapsody
Welcome to the Pleasure Dome
The Macintosh is about to take a large step into a brave new world. It's not quite the brave new world you or I expected or perhaps wanted, but it's going to happen nevertheless. Despite Apple's assurances to the contrary, I expect the first manifestation of the next-generation Macintosh OS to bear more resemblance to NEXTSTEP than to the current MacOS. Hence, it's a fair bet that we'll be getting unix-style networking. Indeed, a large clue has appeared with the news that Open Transport is on the list of projects that Apple is no longer continuing to develop.
I've been doing networking stuff for a good while, for MacOS and NEXTSTEP as well as other environments; so, it seemed a good idea (at the time) to write an article explaining how to use the new (actually rather old) APIs to do useful things.
Assumed Knowledge
The purpose of this article is to describe in detail the BSD sockets API, rather than the ins and outs of TCP/IP. Hence, I'll assume you've done a fair bit of network hacking before. You should be comfortable with IP addresses and port numbers, binding and connecting, and that kind of thing.
What's in this article
To start with, I'll describe a few major things that are different between the good old Mac APIs we're used to and the even older BSD sockets APIs you're probably going to have to get used to. I'll then provide a library of routines that make dealing with the socket library a touch easier, and finally I'll use the library to construct a simple server application.
Differences Between MacOS and BSD Sockets
Synchronicity
MacTCP, and to a lesser extent OT, were developed for the MacOS environment, which is essentially co-operatively scheduled. Hence, making synchronous I/O calls is a bad idea. Establishing a connection to a slow or distant site could take a good few seconds, during which time the machine can not respond to any user action. For this reason, these calls normally should be called asynchronously with one of a variety of strategies used to detect and respond to completion. It's beyond the scope of this article to go into those strategies, but suffice it to say that they boil down to completion routine chaining, checking during idle time, or using a Thread Manager thread to spin in a yielding loop.
In complete contrast however, the socket library was initially developed for Berkeley unix, a pre-emptively scheduled environment, and therefore synchronous calls are the norm. Indeed, there are no well-developed asynchronous facilities such as you would find under MacOS, because there is generally no need for them. Provided Rhapsody delivers on its pre-emptive multitasking promise, we can all use synchronous calls and get away with it.
Of course, that's not to say that there aren't any asynchronous facilities at all. Whilst being universally panned for having a dearth of industrial-grade real-time features, unix does provide signals as a means of being informed when something happens.
Portability
Hitherto, Mac developers have not had to consider portability issues when writing MacTCP or Open Transport code. Code written to these APIs is not going much further than MacOS, even though OT is loosely based on the XTI standard.
In contrast again, the socket library has been ported to many different environments, and one should always write socket code to be portable. One never knows when one might want to take one's precious code to unix, or anywhere else for that matter. Fortunately however, we only have one major portability concern to worry about -- the socket library expects all its IP addresses and port numbers in network (i.e., big-endian) format. By virtue of being initially hosted on the 68k processor, MacOS has its bytes the correct way round, which is why we haven't had to worry about it until now.
So, look out for the convertor routines in the code samples. I've illustrated their use with comments.
Protocol Independence
The socket library was designed with a certain degree of protocol independence in mind. This manifests in the API as socket calls taking several configuration parameters, and as other calls, if appropriate, taking pointers to generic address structures -- pointers to protocol-specific address structures must be cast appropriately.
However, implementing new transport protocols (such as UDP and TCP) generally means writing kernel servers, not a trivial task, and the competing XTI standard (on which OT is based) provides a slightly better interface for alternative protocols, hence other protocols which use the sockets interface are rare.
One extra protocol is normally available under standard BSD sockets, that of unix domain sockets. Although certain unix systems use these for IPC, they're generally used by normal people about as often as PowerTalk, so I won't reference them further. Details are available in the unix manual pages.
Network API Concepts
The BSD socket library API calls are substantially different to MacTCP and, to a lesser extent, OT, but the actual concepts one deals with are similar, as at the end of the day it's all just TCP/IP. Hence, anyone used to sockets, ports, connections, and the difference between stream-based and datagram-based communication will be off to a flying start. Basically you just do this:
- Open a socket.
- Bind to a port.
- Establish a connection (optional for best-effort datagram-based service).
- Send and receive some data.
- Shut down the connection gracefully.
Making the Transition
We're all engineers, aren't we? On to the code.
Step 0: Dealing with IP Addresses
As previously indicated, the socket library requires its IP number and port values in network (that is, big-endian) format. It's easy to forget to convert backwards and forwards all the time. Also, the IP address structure is rather arcane, so a convenient way of accessing that structure is a big help. So, I use the following routines to make dealing with the socket address structure a touch more convenient.
Listing 1: SocketCalls.c
SetAddress
This routine configures an internet address structure from the passed in
IP number and port number parameters.
/* assumption: sizeof(long) == sizeof(IP address) */
/* this is not true for IPv6 */
void
SetAddress (struct sockaddr_in *aAddress,
unsigned long aIPNumber, unsigned short aPort)
{
/* set address family */
aAddress->sin_family = PF_INET;
/* IP number is assumed to be in network format */
/* this is because IP numbers are rarely constructed manually */
/* and are normally obtained via name lookup calls, etc */
aAddress->sin_addr.S_un.S_addr = aIPNumber;
/* port number is assumed to be in host format */
aAddress->sin_port = htons (aPort);
}
GetAddress
This routine configures an internet address structure from the passed in
IP number and port number parameters.
/* assumption: sizeof(long) == sizeof(IP address) */
/* this is not true for IPv6 */
void
GetAddress (struct sockaddr_in *aAddress,
unsigned long *aIPNumber, unsigned short *aPortNumber)
{
/* the IP number is returned in network format */
*aIPNumber = aAddress->sin_addr.S_un.S_addr;
/* the port number is returned in host format */
*aPortNumber = ntohs (aAddress->sin_port);
}
Step 1: Opening A Socket
To do anything, you must open a socket and tell the socket library what kind of communication you're ultimately going to be doing. When writing to the socket library, you call socket() and supply the appropriate arguments.
The socket() call takes three arguments -- the protocol family, the type of communication to be used, and the protocol within the protocol family. If we're intending to communicate TCP/IP, the first argument will always be PF_INET, signifying the internet domain protocol family. The second argument will be SOCK_STREAM or SOCK_DGRAM, according to whether we will be communicating via streams or datagrams. The third argument will be IPPROTO_TCP or IPPROTO_UDP, according to whether we will be communicating with the TCP or UDP protocol.
Note that the latter two arguments are of course linked. If you want to communicate using TCP, you must specify SOCK_STREAM as the second and IPPROTO_TCP as the third argument. For UDP, you must use SOCK_DGRAM as the second and IPPROTO_UDP as the third.
Other protocol types are available from the PF_INET family, such as the ICMP and RAW types, but they are rarely used by clients so I won't go into them here.
Listing 2: SocketCalls.c
OpenTCPSocket
This routine opens a socket which is configured for stream-based
communication over TCP.
int
OpenTCPSocket (int *aNewSocket)
{
/* internet address family, stream-based, TCP */
*aNewSocket = socket (PF_INET, SOCK_STREAM, IPPROTO_TCP);
if (*aNewSocket == -1)
return errno;
else
return 0;
}
OpenUDPSocket
This routine opens a socket which is configured for datagram-based
communication over UDP.
int
OpenUDPSocket (int *aNewSocket)
{
/* internet address family, datagram-based, UDP */
*aNewSocket = socket (PF_INET, SOCK_DGRAM, IPPROTO_UDP);
if (*aNewSocket == -1)
return errno;
else
return 0;
}
Step 2: Binding A Port
Unlike AppleTalk and IPX, TCP/IP exposes port numbers, rather than raw socket numbers. Hence it is necessary to associate our socket with a port number so that we can see and be seen on the network. As is usual with TCP/IP interfaces, we can either specify a port number or ask the socket library to give us the next unused one. Generally, servers would specify a well-known port number so that clients can find them. Under unix, and probably Rhapsody, the program would have to be running with root permission to bind to a port in the server range (less than 1024). This is to prevent user processes from masquerading as servers and snarfing people's passwords, amongst other nastiness.
The bind() call takes three parameters -- the number of the socket to be bound, the address to which the socket is to be bound, and the size of the address parameter. Note that the address parameter usually has an address member of zero, signifying the local IP address of the machine, however this need not be the case. Multiple IP addresses can be supported, and this is how web servers accomplish virtual hosting.
You'll notice that there's an extra call in the Bind() routine: the setsockopt() call. This call gives the TCP stack permission to bind to the same port each time. Otherwise, the stack imposes a 2 minute delay between binds to the same port. This delay is to guard against the following sequence of events:
- Local endpoint opens a socket and binds to port n.
- Remote endpoint establishes connection with port n.
- Local endpoint disconnects in a disorderly fashion; remote endpoint is not aware connection has been dropped.
- Local endpoint opens another socket and binds to port n.
- Remote endpoint sends packet to port n, believing previous connection is still up.
- Local endpoint receives packet from previous connection.
- This is bad.
Thankfully, this happens very rarely nowadays.
Listing 3: SocketCalls.c
Bind
This routine associates a port number with a socket which has been
opened via the socket() call. If the port number is zero, the next
available port number is allocated from dynamic port space (ie
outside the server port range).
int
Bind (int aSocket, struct sockaddr_in * aLocalAddress)
{
int on = 1;
assert (aLocalAddress);
/* say it's OK to reuse our address */
setsockopt (aSocket, SOL_SOCKET, SO_REUSEADDR, &on, sizeof (on));
if (bind (aSocket,
(struct sockaddr *) aLocalAddress,
sizeof (struct sockaddr_in)) == 0)
{
return 0;
}
else
{
return errno;
}
}
Step 3: Establishing a Connection
Establishing a connection is mandatory only for reliable, stream-based communication. Within the TCP/IP family of protocols, at least at the network/transport layer, this means TCP. If only best-effort, datagram-based (that is UDP) communication is required, connecting is not required, although if connect() is called on a UDP socket, this fixes the source and destination for further communication. This can be handy if you want to use unicast UDP but don't want to validate that the remote address is correct each time you receive a packet.
Connections are initiated via the connect() call. This takes three parameters -- the local socket number (which must be bound to a port), the address of the remote endpoint to connect to, and the size of the address parameter.
Listing 4: SocketCalls.c
Connect
This routine initiates a connection with the remote address. For
TCP sockets, this actually negotiates a connection. For UDP sockets,
it simply fixes the remote address.
int
Connect (int aSocket, struct sockaddr_in *aRemoteAddress)
{
assert (aRemoteAddress);
if (connect (aSocket, (struct sockaddr *) aRemoteAddress,
sizeof (struct sockaddr_in)) == 0)
{
return 0;
}
else
{
return errno;
}
}
Listening for incoming connections is accomplished with the listen() call, and accepting with the accept() call.
The listen() call takes two arguments -- the local socket number, which must be bound to a port, and a number signifying the "queue length". The latter is simply how many incoming connections can be held unaccepted -- any more are refused. Generally, you would want to tailor this number according to how many requests you are serving in a given time period.
The accept() call takes three arguments -- the local socket number, which must be bound to a port, a pointer to an address structure, and a pointer to an integer which is initialized to the size of the address structure. The address structure and size argument are filled in with the address of the endpoint initiating the connection.
Note that the accept() call returns a new socket, which is the connected socket over which all data transfer should be done. The original socket is dedicated to waiting for incoming connections, and should not be used for any data transfer. In this way, there is always a socket waiting for an incoming connection, and incoming connect requests cannot be lost.
Listing 5: SocketCalls.c
Accept
This routine listens for an incoming TCP connection.
int
Accept (int aSocket, struct sockaddr_in *aRemoteAddress,
int aQueueLength, int *aNewSocket)
{
int addressLength;
assert (aRemoteAddress);
assert (aNewSocket);
/* listen() is included here for clarity */
/* technically, one must only call listen() once */
if (listen (aSocket, aQueueLength) == 0)
{
addressLength = sizeof (struct sockaddr_in);
*aNewSocket = accept (aSocket,
(struct sockaddr *) aRemoteAddress,
&addressLength);
if (*aNewSocket > 0)
return 0;
}
return errno;
}
Step 4: Sending and Receiving Data
Sending and receiving data can be as straightforward as unix file I/O, providing the socket is connected and no special options have to be set on the data to be sent. Normal unix read() and write() work just fine. However, the socket library does of course provide primitives for sending and receiving data in all cases, and clients generally use those.
The calls are split into two groups, for connected and unconnected sockets. For connected sockets, the send() and recv() calls are essentially the same as write() and read(), except that an additional flags parameter provides a mechanism for associating special options with the data. It's beyond the scope of this article to go into the options, but unix manual pages can provide full details.
For unconnected (that is, UDP) sockets, the calls are similar, but with the addition of separate destination (for sendto()) and source (for recvfrom()) parameters, and associated address size parameters. You must specify the destination for a send on an unconnected socket, and provide space for the source of a packet received on an unconnected socket.
Listing 6: SocketCalls.c
Send
This routine sends the passed-in chunk of data to the other end of
the connection. For unconnected UDP sockets, see SendTo().
int
Send (int aSocket, void *aData, int aLength)
{
assert (aData);
assert (aLength);
if (send (aSocket, aData, aLength, 0) == aLength)
return 0;
else
return errno;
}
SendTo
This routine sends the passed-in chunk of data to the specified address
and port. The socket must be unconnected.
int
SendTo (int aSocket, void *aData, int aLength,
struct sockaddr_in *aDestination)
{
assert (aData);
assert (aLength);
assert (aDestination);
if (sendto (aSocket, aData, aLength, 0,
(struct sockaddr *) aDestination,
sizeof (struct sockaddr_in)) == 0)
{
return 0;
}
else
{
return errno;
}
}
Receive
This routine receives data from the other end of a connection, up to
the amount specified in the parameters. For reception on unconnected
UDP sockets, see ReceiveFrom().
int
Receive (int aSocket, void *aData, int *aLength)
{
int cc;
assert (aData);
assert (aLength);
cc = recv (aSocket, aData, *aLength, 0);
if (cc >= 0)
{
/* note that zero receive generally means orderly shutdown at the other end */
*aLength = cc;
return 0;
}
else
{
*aLength = 0;
return errno;
}
}
ReceiveFrom
This routine receives data from an unconnected UDP port.
int
ReceiveFrom
(int aSocket, void *aData, int *aLength,
struct sockaddr_in *aSource)
{
int cc;
int addressLength;
assert (aData);
assert (aLength);
assert (aSource);
addressLength = sizeof (struct sockaddr_in);
cc = recvfrom (aSocket, aData, *aLength, 0,
(struct sockaddr *) aSource, &addressLength);
if (cc >= 0)
{
*aLength = cc;
return 0;
}
else
{
*aLength = 0;
return errno;
}
}
Step 5: Shutting Down the Connection Gracefully
TCP clients are happiest if they notify one another when a connection is going down. Amongst other benefits, graceful disconnection means fewer packets transmitted on dead connections, which is a good thing for everyone.
Under MacTCP and OT, orderly disconnection is a rather messy process involving closing the local end and then waiting for the remote to close its end. There are a few different situations to survive and most hackers will probably say this is the gnarliest bit of code for these APIs.
Orderly disconnection under sockets however couldn't be easier, largely because the protocol stack handles all the nastiness for you. The shutdown() call takes two parameters -- the socket to be affected and an integer specifying which ends of the connection to close. If the second parameter is zero, further reception is prevented (that is, the remote end is closed); if it is 1, further transmission is prevented (that is, the local end is closed); if it is 2, both ends are closed and the connection is torn down.
Listing 7: SocketCalls.c
Disconnect
This routine disconnects both ends of a connected socket.
int
Disconnect (int aSocket)
{
if (shutdown (aSocket, 2) == 0)
return 0;
else
return errno;
}
Domain Name Lookups
Domain name lookups are generally performed by one routine -- gethostbyname(). This call takes one parameter -- a C string (that is, zero terminated) specifying the name to look up. It returns a pointer to a static structure describing the host, with IP addresses and other names that host might have, amongst other details.
The kicker with the structure returned by gethostbyname() is that it contains a list of pointers to IP addresses, rather than copies of them. This is to enable a certain amount of address size independence. Hence one must copy the addresses out of the structure. I've lost count of the number of times I've forgotten this detail and sat there wondering why the addresses I'm getting back are complete crap.
Listing 8: SocketCalls.c
LookupName
This routine looks up the passed name and places the first matching
address in the passed output parameter.
int
LookupName
(char *aHostName, struct sockaddr_in *aAddress)
{
struct hostent *hp;
assert (aHostName);
assert (aAddress);
hp = gethostbyname (aHostName);
if (hp)
{
/* size of address returned is protocol-dependent */
memcpy (&aAddress->sin_addr, hp->h_addr_list [0],
hp->h_length);
return 0;
}
else
{
return h_errno;
}
}
Building a Document Server
This is a simple HTTP-type document server that loops accepting TCP connections on a user-specified port number (defaulted to 8000). When the program successfully accepts an incoming connection, it expects a file name followed by a carriage return or line feed. The server then attempts to open the file and send the contents down the TCP connection. Server-side errors are signified by closing the connection before any content is sent.
Listing 9: DocumentServer.c
main
Mainline for the document server.
#include <errno.h> /* for errno extern */
#include <fcntl.h> /* for file open mode */
#include <libc.h> /* for generic ANSI stuff */
#include <netdb.h> /* for lookups */
#include <netinet/in.h> /* for PF_INET stuff */
#include <stdio.h> /* for printf() */
#include <sys/socket.h> /* for socket API */
#define kDefaultPortNumber 8000
#define assert(x) if(!(x)) \
printf("%s,%d: assertion failure\n", __FILE__, __LINE__);
int
main (int argc, char *argv [])
{
char buffer [1024];
int acceptingSocket;
int amountRead;
int amountReceived;
int dataSocket;
int fd;
int i;
int portNumber;
struct sockaddr_in local;
struct sockaddr_in remote;
/* make a socket */
if (OpenTCPSocket (&acceptingSocket) != 0)
{
perror ("socket");
exit (1);
}
/* allow setting of port number by arguments */
if (argc > 1)
{
portNumber = atoi (argv [1]);
if (portNumber == 0)
portNumber = kDefaultPortNumber;
}
else
{
portNumber = kDefaultPortNumber;
}
printf ("listening on port %d \n", portNumber);
/* set up our bind address */
/* note the address is zero, for the local address */
/* the port number is expected in host format */
SetAddress (&local, 0, portNumber);
if (Bind (acceptingSocket, &local) != 0)
{
perror ("bind");
exit (1);
}
/* loop accepting incoming connections */
while (Accept(acceptingSocket, &remote, 5, &dataSocket)==0)
{
amountReceived = sizeof (buffer);
if (Receive (dataSocket, buffer, &amountReceived) == 0)
{
/* assumption: we receive the file name in one chunk */
for (i = 0; i < amountReceived; i++)
{
if (buffer [i] == '\r' || buffer [i] == '\n')
{
buffer [i] = 0;
break;
}
}
if (i == amountReceived)
{
/* we didn't find a CR/LF in the chunk */
}
else
{
/* open up the file */
fd = open (buffer, O_RDONLY);
if (fd == -1)
{
/* couldn't open the file */
perror (buffer);
}
else
{
do
{
/* read a chunk from the file */
amountRead = read (fd, buffer, sizeof (buffer));
if (amountRead > 0)
{
/* send what we got down the connection */
if (Send
(dataSocket, buffer, amountRead) != 0)
{
/* the initiator has probably closed */
perror ("send");
break;
}
}
}
while (amountRead == sizeof (buffer));
}
}
}
else
{
/* couldn't read from the connection */
/* the remote end has probably closed */
perror ("recv");
}
Disconnect (dataSocket);
}
perror ("accept");
exit (1);
/* NOTREACHED */
}
Further Reading
Any unix manual pages will give you copious information on the socket library. Simply type "man socket" at any available shell prompt. Unfortunately however, the man pages have a tendency to provide amazing detail without actually telling you how to do anything. When I first got into networking, the unix manual pages misled me so much that I actually thought AppleTalk was easier to use than sockets. Nothing could be further from the truth.
There are several good books on the socket library in particular and TCP/IP in general. Lots of people learned their TCP/IP from Douglas Comer's seminal work "Internetworking with TCP/IP", in at least three volumes with lots of quality info.
Also worth a look is the Addison-Wesley series "Illustrating TCP/IP", in at least three thicker volumes. These live up to the high standard maintained by A-W's Professional Computing Series.
Jason Proctor is Minister of Ideology for BroadQuest Inc, a startup doing cool things with the network. In past lives he has emulated PCs for Insignia, done the networking for the cool but ill-fated Software Ventures web browser, and worked on reliable multicasting for GlobalCast. Even though he finds himself writing Unix code more often than he'd like, he considers himself a Macintosh bigot.