CGIs and WebSTAR
Volume Number: | | 11
|
Issue Number: | | 7
|
Column Tag: | | Internet Development
|
CGI Applications and WebSTAR
Have some fun with your World Wide Web server.
By Jon Wiederspan, jonwd@first.com
Note: Source code files accompanying article are located on MacTech CD-ROM or source code disks.
[The World Wide Web is changing rapidly and announcements of new companies, mergers, licensing agreements, and new WWW software are arriving with increasing frequency. In light of this, there is a high probability that anything printed about the Web is out of date by the time you read it. Even as we printed our previous article introducing MacHTTP, it was being licensed to StarNine Technologies, Inc. and relabeled WebSTAR. More information on this is available at http://www.starnine.com/.
WebSTAR is now available in several flavors depending on your budget and needs. Every WebSTAR version, though, uses the same interface to CGI applications that was originally part of MacHTTP. So, whether you are using WebSTAR Pro or still hanging on to your MacHTTP 2.0.1 server, the code introduced in this article will work for you.
Welcome to the Web.
- Ed nst, editorial@xplain.com]
In my previous article (May 1995), I introduced you to WebSTAR (formerly MacHTTP), a software package that will turn your Macintosh into a World Wide Web server. WebSTAR is the perfect solution for anyone that wants to put documents and other information on the WWW, but if youre like me, thats not enough. You can see all those other sites with cool maps and forms and you want the same. Actually, you want better! Well, Im here to show you how easy it is to add these functions to your WebSTAR site using AppleScript and a few items you probably have in your kitchen, like chocolate.
This article is intended for those who already have some programming or AppleScript background. It is not intended as a complete introduction to AppleScript. Judging by the mail I have already received about these scripts, it doesnt take much experience at all and even some complete novices have successfully used them, but you shouldnt expect miracles (at least not here). These lessons will make things easier for most people, but they still require work and study. Take your time, think about what is said here, challenge anything that seems to be wrong (its not inconceivable that I get something wrong here) and do things in order as much as possible.
What does CGI mean?
CGI stands for Common Gateway Interface. There. Now you know. The CGI definition provides a standard for external gateway applications to interface with information servers such as WebSTAR. These gateway applications are used to provide new features to a WWW site either by acting as a gateway to another application on the server (such as searching a database or creating charts in a spreadsheet) or by processing the data themselves (such as a map).
The CGI standard controls what kind of information is passed from the information server to the external gateway application and how the information is formatted. It does not control how the information is passed, though. That is left up to each server to implement in the best way possible for that system. WebSTAR uses Apple events to communicate with CGI applications. This means that you can write your CGI application in any language that can handle Apple events, which includes not only all of the major languages (C, Pascal, LISP, SmallTalk, Fortran), but also scripting languages like AppleScript, Frontier, and MacPerl, and even applications with scripting languages that qualify such as HyperCard and 4D.
Although AppleScript CGI applications are not the fastest, this is what Ill be showing you because they are the easiest to understand. AppleScript relieves you from almost all of the tedium of making an interface, initializing various things, and registering Apple events. It is also a language that somewhat resembles the English language, unlike C which most closely resembles a bowl of alphabet soup.
How CGI Applications Work - The Basics
When trying to understand how CGI applications work, the first thing to learn is the difference between the server and the client. WebSTAR is the server. Mosaic, MacWeb, and Netscape are all clients. The client software is smart. It knows how to interpret HTML, how to handle partial URLs and URLs with strange extra information like search strings, and how to find servers. The server software is stupid. It does fine as long as it is fed the right information, but if a client sends bad information, the server has no idea what to do. Therefore, in most transactions it is the client thats doing all the cool stuff and the server is just passing back a file or an error code.
Figure 1. Client-Server interaction in HTTP
First, lets look at a typical client-server transaction on the Web. Heres an example of the interaction that might occur when the user wants an HTML page:
Netscape: My user clicked on a link to URL http://www.uwtc.washington.edu/UWHome.html. Ill send a request for the document /UWHome.html to machine www.uwtc.washington.edu using HTTP.
WebSTAR: Ah! So you want one of my pages. Here is all the data from the URL you sent. Im sending it as a text/html MIME type.
Netscape: Here it comes. The MIME type is text/html. Ill interpret this as an HTML document then and display it correctly for my user.
WebSTAR: I couldnt care less how you display it. Im done dumping the data so goodbye.
This example is extremely simplified and ignores a lot of the communications that occur with HTTP, but it does show you that the client is responsible for properly placing the request to the correct server. All the server does is try to find the file and return it or return an error code. Once the server is done returning the file, it completely forgets that the client was ever there.
When the user requests a URL that involves a CGI application (like clicking on a map or submitting a form) things become a little more complicated. There is not only interaction between the client and server, but also between the server and the CGI application. Lets take a look now at the conversation that might occur between a client, server, and CGI when handling a map click (assuming they speak English, of course).
User: Hmmmm. A map of Washington state. Theres a star in the upper right-hand corner. I wonder what thats for. I think Ill click on it.
Netscape: Lets see, that click was at 287,48. Ill add that to the URL that was given with the map and send it to the server.
WebSTAR: Hey, someone sent me a URL with some extra data. That URL is for a CGI application on this machine. Well, Ill just send an Apple event to that CGI application with the extra data enclosed. Im glad I dont have to do anything to the data myself.
CGI app: Finally, an Apple event! Lets see, first I decode the extra data that was sent. Now I can use these map click coordinates to figure out what page to return to the client. Here it is - Ill send the server an Apple event reply, containing the URL for the new page and an HTTP header with the code to redirect the client to that page.
WebSTAR: Finally. Ive been waiting for this Apple event reply. Ill just repackage this and send it back to the client. I sure hope the CGI application remembered to include a proper HTTP header. I dont check that sort of thing myself.
Netscape: Hmmm. This code tells me that I should get this other URL instead of the one I originally requested. Ill send a request to the server for this other page.
WebSTAR: Another request for one of my pages. Heres the file contents as text/html.
Netscape: Here comes another HTML page. Better display it nicely for my user.
User: Republic, Washington? Ive never even heard of that place before!
Well, that was a bit long, but it should give you an idea of the complex interactions that go on when you are using a CGI application. The server still isnt being very bright. As you saw, the server didnt really do any processing of the data that was passed to it. It just acted as a liaison between the client and the CGI application, blindly passing whatever those two wanted to send to each other. For more information, see the interactions in Figure 2.
Figure 2. Complex interactions between client, server, and CG 1.
Client-Server interaction in HTTP
How CGI Applications Really Work - The Less Basics
Now that you have an idea of where all of the information is going and whos passing it where, it is time to cover some of the more technical points.
Apple events and CGIs
WebSTAR uses the name extension of a file to determine whether it is a CGI application or not. Any file which ends in .cgi or .acgi will be treated as a CGI application, whether it actually is one or not. CGI applications can exist anywhere in the WebSTAR directory structure so be certain never to use that extension for a file unless you are certain it can handle the CGI Apple event.
The Apple event used by WebSTAR to send information to CGI applications is WWW sdoc. There is another event that WebSTAR supports which is often referred to as the search event (WWW srch). This is a remnant from the days before CGI support and there is no guarantee that support for it will remain in future versions so avoid using it.
There are two methods for sending information to CGI applications, the GET and POST methods. The two serve very similar functions but the POST method has the advantage because it can send more information to the server (24K for POST method vs. 2K for GET method). Because of that I will cover only the POST method. If you feel a need to use GET at some time, it is very easy to convert the code.
There are two ways to run a CGI application; synchronously and asynchronously. When run synchronously, WebSTAR waits for information to return from the CGI. This makes most CGI applications run faster, but it also ties up your entire server while the CGI is running. When you run asynchronously, the server doesnt wait for the CGI application but instead goes on processing connections. This steals processing time so the CGI application will run more slowly, but you wont have connections piling up. Once again, WebSTAR uses the name extension to tell which method you want used. Any file ending in .cgi is run synchronously and any ending in .acgi is run asynchronously. It is a little more complex to write an asynchronous CGI application (ACGI) because there is the possibility that while the ACGI is running, WebSTAR will send it another event. These events queue up, waiting for the previous event to finish processing. You need to be certain to leave some code that will pause after finishing one event so the ACGI has time to check whether there is another event queued up. Because of this, any ACGI can also run as a CGI, but the reverse is not always true. All of the code provided here can be used to create both CGI and ACGI applications without modification.
WebSTAR does not do any processing of the actual data going to the CGI application or of pages being returned by the CGI application to the client. The only things WebSTAR does is package the data in an Apple event (to go to the CGI) and remove it from the Apple event reply (to return to the client). Therefore, any errors that occur are almost always the fault of either the CGI application or the client.
Return Codes
When WebSTAR sends an Apple event to a CGI application, it waits for an Apple event Reply in return. As I noted above, WebSTAR doesnt do any processing itself. Therefore, the Apple event Reply returned by the CGI application must contain instructions for WebSTAR on what to tell the client. There are two things the CGI must send back: an HTTP header and some data. The header tells the client what happened (success, error) and what to do with the data that is being returned. The data can be a block of text, an HTML document, or the URL for a file. To indicate what the client should do with the data, the header contains a special Return Code. These codes are part of the HTTP standard and there are many of them. Typically, though, there are only two that your CGI application should return, 200 OK or 302 FOUND. 200 OK is used to tell the client that the data being returned is a file and the client should try to display it. The 302 FOUND code tells the client that the data is a URL and the client will then try to connect to that URL. This 302 FOUND code is also called URL Redirection, because it redirects the client to a different URL.
In this lesson I will be using the 200 OK code in the header and returning a block of HTML text. At the end of the lesson, I will also include some sample code with a 302 FOUND header to show how that works a little differently.
The Basic CGI Application
Now youre ready to dig into some code. This month we are going to make the Hello, world. equivalent for CGI applications. It wont do anything very exciting, but it will be a CGI application. When run, this CGI application will return an HTML page listing all of the data passed to it by WebSTAR. This will be very useful later when youre debugging your other CGI applications.
With this lesson you will learn:
how to accept the Apple event information from WebSTAR when the POST method is used,
what information is passed from WebSTAR to the CGI application
how to build an HTTP header and an HTML page to return to WebSTAR
how to return the header and page to WebSTAR
This code is written in AppleScript. If you have System 7.5, you probably got AppleScript free on your system installation disks. If not, you will need to purchase AppleScript from APDA or find an AppleScript book that includes the ScriptEditor on disk. The latter is preferable, since you will need an AppleScript book in order to create your own CGI applications after you are through here.
The AppleScript Code
Here is the entire code of this lesson. A text file of the code is available at the usual online sources and on the monthly disk.
Listing 1: BasicCGI.txt
property crlf : (ASCII character 13) & (ASCII character 10)
property http_10_header : "HTTP/1.0 200 OK" & crlf ¬
& "Server: WebSTAR/1.0 ID/ACGI" & crlf ¬
& "MIME-Version: 1.0" & crlf & "Content-type: text/html" ¬
& crlf & crlf
on «event WWW sdoc» path_args ¬
given «class kfor»:http_search_args, «class post»:post_args, ¬
«class meth»:method, «class addr»:client_address, ¬
«class user»:username, «class pass»:password, ¬
«class frmu»:from_user, «class svnm»:server_name, ¬
«class svpt»:server_port, «class scnm»:script_name, ¬
«class ctyp»:content_type, «class refr»:referer, ¬
«class Agnt»:user_agent, «class Kact»:action, ¬
«class Kapt»:action_path, «class Kcip»:client_ip, ¬
«class Kfrq»:full_request
set return_page to http_10_header ¬
& "<HTML><HEAD><TITLE>Unprocessed Results</TITLE></HEAD>" ¬
& "<BODY><H1>Unprocessed Results</H1>" & return ¬
& "<H4>path_args</H4>" & return & path_args & return ¬
& "<H4>http_search_args</H4>" & return & http_search_args
set return_page to return_page & return ¬
& "<H4>post_args</H4>" & return & post_args & return ¬
& "<H4>method</H4>" & return & method & return ¬
& "<H4>client_address</H4>" & return & client_address ¬
& return ¬
& "<H4>username</H4>" & return & username & return ¬
& "<H4>password</H4>" & return & password & return
set return_page to return_page & return ¬
& "<H4>from_user</H4>" & return & from_user & return ¬
& "<H4>server_name</H4>" & return & server_name & return ¬
& "<H4>server_port</H4>" & return & server_port & return ¬
& "<H4>script_name</H4>" & return & script_name & return ¬
& "<H4>content_type</H4>" & return & content_type & return ¬
& "<H4>referer</H4>" & return & referer & return
set return_page to return_page & return ¬
& "<H4>user_agent</H4>" & return & user_agent & return ¬
& "<H4>action</H4>" & return & action & return ¬
& "<H4>action_path</H4>" & return & action_path & return ¬
& "<H4>client_ip</H4>" & return & client_ip & return ¬
& "<H4>full_request</H4>" & return & full_request & return
set return_page to return_page ¬
& "<H>R<I>Results generated at: " & (current date) ¬
& "</I>" & "</BODY></HTML>"
return return_page
end «event WWW sdoc»
If you have never used AppleScript before, there are several special characters used in this AppleScript that you should know about:
« THIS IS NOT TWO LEFT BRACKETS. It is a special character used to mark the beginning of data. It is typed as option-\.
» The same thing, only marking the end of data. It is typed as option-shift-\.
¬ The continuation marker. This indicates that the next line is to be considered part of the current line. You can use this to break up very long lines for easier reading. This is made in AppleScript by typing option-return.
This is the capital omega character. It is part of the name of the event passed from WebSTAR to CGI applications. It is typed as option-z.
Step By Step
Lets walk through this code line by line.
property crlf : (ASCII character 13) & (ASCII character 10)
property http_10_header : "HTTP/1.0 200 OK" & crlf ¬
& "Server: WebSTAR/1.0 ID/ACGI" & crlf ¬
& "MIME-Version: 1.0" & crlf & "Content-type: text/html" ¬
& crlf & crlf
This is actually only two lines (see the continuation character mentioned above). These lines are used to create variables that are needed when a CGI application will be returning an HTML document. By creating and setting these variables outside of the Apple event handler, they persist as long as the CGI application stays open, thus saving some minuscule amount of time when successive forms are processed. For more information on script properties, see your favorite AppleScript book.
The first line creates a variable called crlf and sets it to be equivalent to a carriage return and a linefeed (thus the name, crlf). This combination is used to mark the end of lines in the header of an HTTP transaction. It is very important! The second line creates a variable called http_10_header. This text forms the standard header for HTTP version 1.0 that is returned in a transaction between an HTTP server and client when the transaction is successful. In this case you are returning code 200 OK, which means that the contents of a file will be returned. You are also telling the client the MIME type of the information that you are returning, so the client will know how to display it properly. If you dont build a proper header, the client wont know what to do with the text you are returning. Remember, WebSTAR doesnt do anything but pass the data back to the client. These two properties can be used with any CGI application you write that will be returning HTML text (as opposed to returning the URL to a file).
This next long line is the key to the entire script.
on «event WWW sdoc» path_args ¬
given «class kfor»:http_search_args, «class post»:post_args, ¬
«class meth»:method, «class addr»:client_address, ¬
«class user»:username, «class pass»:password, ¬
«class frmu»:from_user, «class svnm»:server_name, ¬
«class svpt»:server_port, «class scnm»:script_name, ¬
«class ctyp»:content_type, «class refr»:referer, ¬
«class Agnt»:user_agent, «class Kact»:action, ¬
«class Kapt»:action_path, «class Kcip»:client_ip, ¬
«class Kfrq»:full_request
This line defines a handler that is triggered when the CGI application receives an Apple event of class WWW and type sdoc. The sdoc event is the one that WebSTAR sends when it is communicating with CGI applications. The items following all contain information passed to the CGI application by WebSTAR. There are lots of them and more added every year. Most of these are not used in every script, but theyre very handy when you need them. It doesnt cost your CGI any memory or processing speed to include all of these variables, so I recommend that you use this line in every script, just to be sure you dont miss any. The variables are: path_args - This is the direct argument for this Apple event.
This is the data in the URL following the $ character (we wont get into that here).
http_search_args - The data passed in when using the GET method or when ? is appended to the URL of the CGI application. This argument will be empty when youre using the POST method. You can actually use both at once, though.
post_args - The data passed in when using the POST method. Contains all of the information that was typed into the form.
method - Tells whether GET or POST was used so you know which argument to use.
client_address - IP address or domain name of remote clients host, depending on the servers NO_DNS setting
username - The username given by the client (if youre using the security features).
password - The password given by the client (if youre using the security features).
from_user - Non-standard. May contain the e-mail address of remote user but it isnt supported by all browsers.
server_name - The name of the requesting server (which WebSTAR youre using).
server_port - TCP/IP port number being used by this server.
script_name - path and filename portion of the URL that was sent to the server for this CGI.
referer - the URL of the page referencing this CGI.
user_agent - the name and version of the WWW client software being used.
content_type - MIME content type of post_args
action - if the CGI is called by an ACTION, this will contain either PREPROCESSOR or POSTPROCESSOR. If it is called as a .cgi or .acgi it will contain CGI or ACGI.
action_path - for an ACTION, this will contain the Macintosh path to the file. For a CGI, it will be the same as script_name.
client_ip - contains the IP address of the client as a string, even if NO_DNS is false.
full_request - contains the unmodified text of the complete request as received from the WWW client.
Generally, the only variable you care about is post_args, since that contains all of the data that the user entered. All of the others (except path_args, http_search_args, and full_request) are generated automatically by WebSTAR. Some of them, like username and password only have data under certain conditions, such as when the access to the CGI has been restricted and the user had to supply a name and password. Others, like referer and from_user arent supported by all clients and therefore might be empty. This script shows them all, even if theyre empty, so you can get a feel for how the information is passed by different clients.
The next lines are the actual processing part of this script.
set return_page to http_10_header ¬
& "<HTML><HEAD><TITLE>Unprocessed Results</TITLE></HEAD>" ¬
& "<BODY><H1>Unprocessed Results</H1>" & return ¬
& "<H4>path_args</H4>" & return & path_args & return ¬
& "<H4>http_search_args</H4>" & return & http_search_args
set return_page to return_page & return ¬
& "<H4>post_args</H4>" & return & post_args & return ¬
& "<H4>method</H4>" & return & method & return ¬
& "<H4>client_address</H4>" & return & client_address ¬
& return ¬
& "<H4>username</H4>" & return & username & return ¬
& "<H4>password</H4>" & return & password & return
set return_page to return_page & return ¬
& "<H4>from_user</H4>" & return & from_user & return ¬
& "<H4>server_name</H4>" & return & server_name & return ¬
& "<H4>server_port</H4>" & return & server_port & return ¬
& "<H4>script_name</H4>" & return & script_name & return ¬
& "<H4>content_type</H4>" & return & content_type & return ¬
& "<H4>referer</H4>" & return & referer & return
set return_page to return_page & return ¬
& "<H4>user_agent</H4>" & return & user_agent & return ¬
& "<H4>action</H4>" & return & action & return ¬
& "<H4>action_path</H4>" & return & action_path & return ¬
& "<H4>client_ip</H4>" & return & client_ip & return ¬
& "<H4>full_request</H4>" & return & full_request & return
set return_page to return_page ¬
& "<H>R<I>Results generated at: " & (current date) ¬
& "</I>" & "</BODY></HTML>"
These lines create a variable called return_page which will hold the entire text that we plan to return to WebSTAR when were finished. return_page is set to include:
the HTTP 1.0 Header that we made earlier (http_10_header)
a header for an HTML page (HTML, HEAD, and TITLE tags)
the body of an HTML page with the BODY tag, a nice H1 heading, and the contents of each of the arguments passed.
This page will provide the user with a nicely formatted page where each argument has its name and data on a separate line. I have used several assignment statements to put the text variable together for ease of reading and to avoid problems. There is a limit to the number of characters that can be assigned to a variable at one time in AppleScript and trying to do this as one assignment might come close to that limit.
return return_page
This line returns the contents of return_page to WebSTAR in an Apple event reply. As mentioned earlier, WebSTAR will not process this returned information at all, except to extract it from the Apple event reply structure and pass it along to the WWW client. The client will then display it properly as an HTML page. You can see now why we had to build the whole header ourselves (or the client wouldnt know what was being sent to it).
As soon as the return command is processed, the event handler is done. Any code that you put after the return will not be processed at all. This can make things a little difficult if you are doing a lot of processing because it means you have to do all of the processing before you return information to WebSTAR. There are ways to get around this, but none of them are straightforward. If you have a lot of processing to do, it is probably best to write the CGI in C instead to get the fastest response.
end «event WWW sdoc»
This is the last line of the handler. It tells AppleScript that the handler is finished, were all done processing the Apple event that was sent. If no return statement has been encountered before this, WebSTAR will hang up waiting for information to be returned. This will usually result in an error returned to the client which says Error -1701, meaning that WebSTAR timed out waiting for return information or failed to get the proper information in what was returned.
Saving As A CGI Application
In order to save the above code or whatever AppleScript code you write as a CGI application, you need to follow these steps:
1) Click on the Check Syntax button at the top right of the ScriptEditor window. This will check your AppleScript code for syntax errors and ask you to locate any applications that are specifically mentioned (there arent any in this basic script).
2) Select Save As Run-Only from the File menu. Remember that this option strips the AppleScript text out of the final product, so dont throw away your original script.
3) Select a name for the CGI application. Give it the extensions .acgi or .cgi depending on how you want WebSTAR to run it.
4) Select Application from the popup menu above the file name (see Figure 3).
5) Check the two boxes for Stay Open and Never Show Startup Screen located below the file name. These are very important! Without these two options checked, the CGI application will not be able to launch properly or process the Apple events.
6) Select the folder you want to save the CGI application into and click on the Save button.
Figure 3. The Save As Run-Only dialog
Most of the problems that people report to me are solved by properly following the steps above.
URL Redirection
As promised, here is a short section of code that demonstrates URL Redirection. Unlike above, where we returned the contents of the page we wanted the client to display, with URL Redirection we return the location of the page (or any file). This can be useful in two ways. First, you can use it to return a file that is more than 32K in size (POST arguments are limited to 32K of information). Instead of returning the data, save it to a file then use URL Redirection to return the URL to the file. Second, you can use this to have a CGI redirect the client to an existing file based on the input. This is how map graphics typically work - the client is sent to different pages on the site depending on where the user clicks on the map graphic.
The major difference in this code example is that the header contains a different code, the 302 FOUND. 302 FOUND tells the client that a file was found and the URL to it is being returned. That causes the client to try to connect to that URL and to the user it looks like the client went directly there. There are some other small differences in the way the URL is added to the header, though, so be certain to read through it carefully. Also, I have removed all of the extraneous code so just the bare bones are left. In this case, what you have is a script to redirect clients from one server to the identical page on another server. If you want to move your server, you can leave this on the old server as Error.acgi. Now remove all of the files from the old server so that every request generates an error. Set the Error file in WebSTAR to be Error.acgi and then WebSTAR will launch this CGI application to handle the errors.
This script takes the name of the path and file requested (which is passed in script_name) and appends it to the name of the new server to create a new URL. It then adds this to a Redirect header to tell the client to retrieve the new page instead.
Listing 2: Redirect.txt
property crlf : (ASCII character 13) & (ASCII character 10)
property http_10_header : "HTTP/1.0 302 FOUND" & crlf & ¬
"Server: WebSTAR/1.0 ID/ACGI" & crlf ¬
& "Location: http://www.uwtc.washington.edu"
on «event WWW sdoc» path_args given «class scnm»:script_name
-- adds the path and file name to the new server name to
-- redirect the client to the new server.
set return_page to http_10_header & script_name & crlf & crlf
return return_page
end «event WWW sdoc»
Wrap It Up
Now you have everything you need to begin building your own CGI applications for use with WebSTAR forms, maps, or whatever. An awful lot of information that was glossed over or left out entirely due to space limitations, but you should have everything you need to know to at least make a good beginning. Before we finish, though, there are a few important items to reiterate:
Always return an HTTP header to WebSTAR to pass on to the client. If you can also return an HTML document with useful feedback for the user, that is even better, but it isnt necessary. The header is.
Get a good AppleScript book! This article is meant to get you started, but you cant do anything really productive until you get a book.
CGI applications are the best way of personalizing WebSTAR to meet your sites needs. So, next time youre tempted to ask for a new feature to be added to WebSTAR, dont be surprised if someone shouts Use a CGI!.
Next Month
Next month I will dig more deeply into the structure of a CGI application. Ill show you how to make an application that quits when no one is using it, how to get useful error reports back for debugging, and how to use OSAXen to extract information from form pages. I will also include some useful code for creating a Comment area on your WWW site and provide hints on how to accomplish other tasks like auto-mailing and passing information to databases. Until then, you might want to check out the URLs listed in this months Uniform Resource Locator.