Macintosh Web Server
Volume Number: | | 11
|
Issue Number: | | 5
|
Column Tag: | | World Wide Web
|
Your Very Own Web Server - MacHTTP
Take Five Minutes and Join the Web
By Jon Wiederspan, jonwd@tjp.washington.edu
[Remember that excitement the first time you ever saw a Macintosh? How about that first week after you got your hands on one? I dont - I simply dove in and resurfaced days later, grinning from ear to ear. From time to time, we come across technology tools which evoke such reactions. Serving up pages on the World Wide Web seems to have this effect on some people.
In this article, Jon introduces us to MacHTTP, a Macintosh server. It couldnt be easier, nor could it be any more addictive. In addition to job-security warnings, and tips on getting along with un*x admins, he also gives us a preview of the 3.0 version (coming soon), and shares tips on how to get the most out of your Web site. You dont have one yet? Odds are good that, after reading this article, you will! Enjoy! Let us hear from you about your experiences, and send us your URLs - Ed stb, editorial@xplain.com]
Like most people on the Internet, I have only recently been introduced to the World Wide Web. It was less than two years ago, that fateful day when I downloaded a copy of Mosaic from the UMich ftp site. One double-click and I was launched into hypermedia heaven, madly clicking my way around the globe, heedless of restrictions like server addresses and directory paths that I had accepted without question for years in my FTP client. The memory already fades, so I dont recall exactly whether I rushed right over to show it to my friends or whether I waited a few minutes first to get a drink of water. Either way, though, it was not long before we were engrossed with ideas of what we could do with our own World Wide Web server.
That is when reality slammed me rudely back to earth. Yes, I had a Sun workstation and the server software (NCSA httpd was my choice) was freely available, but my novice UNIX skills had been barely sufficient to get an anonymous FTP site started and building and running the httpd software turned out to be completely beyond my capabilities. I didnt have the time to spend improving my skills and I couldnt justify the funds to hire someone who could do the job. It looked as if I was to be denied my part in the fastest growing service on the Internet.
Luckily for me, there was a solution in the form of MacHTTP, a software package from BIAP Systems in Houston, Texas. MacHTTP is a high-performance, feature-packed WWW server that runs right on your own Macintosh under the standard Macintosh OS. In this article I will show you how to install MacHTTP, how to fine tune its performance, and how to use some of its special features. I will also talk about why anyone would want to run a server in the first place and some issues to consider in site design and maintenance.
In order to keep everyone on the same page, there is some terminology everyone should know. If you arent completely familiar with terms like WWW, URL, FTP, and HTTP, you should read the sidebar entitled Vocabulary 301. They will keep popping up throughout this article and Im not going to take time to explain them.
Before We Begin
Before we discuss how to start your own World Wide Web server we need to cover some other topics.
Vocabulary 301
Here are some terms you need to know to when trying to set up a World Wide Web site (or read this article).
FTP - File Transfer Protocol. A very common and basic method of providing files for sharing with others.
Gopher - A more efficient type of FTP which allows more users to connect at once. Also provides a way for one Gopher server to provide links to other Gopher servers.
HTTP - HyperText Transfer Protocol A way of communicating that is based on HyperText files which contain links to other files.
NNTP - Network News Transfer Protocol. A way of transferring news messages around on the Internet.
SMTP - Simple Mail Transfer Protocol. A way of transferring mail around on the Internet.
Telnet - One of the earliest network services available. It allows you to connect to a remote computer and issue commands as if you were physically at that computer.
URL - Uniform Resource Locator. A way of specifying exactly where a file is on the internet and how to connect to it.
WWW - World Wide Web. A virtual network of servers which includes HTTP, FTP, Gopher, and other servers connected by URLs.
Why Run a Server
The first topic is, Why do you want to start your own server? I warn you right now that a World Wide Web server is a time sink that will gradually steal away more and more of your working day until you have time for nothing else but building your site. This is not because maintenance is so difficult. In fact, maintenance is an almost negligible task for most sites (but dont tell your boss that). The truth is that there are so many cool things you can do with a Web site that its hard to resist adding just one more hack and each hack takes a little bit longer than the last. So, if you plan to run a site, make sure you have the willpower to resist this temptation (or have a very secure job).
Now that I am legally covered, we can discuss how you could benefit from running a World Wide Web server. Here is a list of good reasons that I keep handy:
To advertise products or services for your company or yourself
To attract potential customers for your company, organization, or non-profit group
To publish research results
To provide online help resources for customers
To publish informational articles, newsletters, magazines, or anything else
To provide an easier front-end to existing network services like FTP or Gopher sites
To provide easy access to databases of all kinds
To conduct surveys or gather opinions or comments on any topic
To make personal information like your photo, resume, or opinions available to the general public.
To put up pictures of your last vacation or your pet dog (cats are acceptable as well).
As you can see, there are a number of ways that a World Wide Web site can be useful.
Is putting up a WWW site a good idea for you or your company? If your company already offers services on the Internet (mailing lists, FTP sites, WAIS databases), then the decision is fairly easy. A WWW server can provide a much nicer interface to all of these and even help unify multiple services for your customers. However, if this is your first foray into Internet services you need to give this some thought. Think of the kind of people that inhabit the Internet. Now think of your product or service. Do they want it? If not, then this probably isnt for you. But dont let that stop you...
Uniform Resource Locators
The next topic to cover is Uniform Resource Locators or URLs. URLs are text strings that completely specify where a file or directory can be found on the Internet or a local computer. They form the connective filaments of the World Wide Web, connecting servers which run a variety of protocols from HTTP to Telnet.
A URL uses the following format:
protocol://server_address[:server_port]/[directory/][filename]
protocol can be any of http, ftp, gopher, telnet, nntp, mailto, or any other communications protocol that works on the network. This tells the client software how to communicate with the server. It is up to the client software to support the protocol, as in mailto, which is not supported by all clients.
server_address is either the IP address or DNS name of the server which has the file.
server_port is an optional number which designates what port the server communicates on. Well talk more about that later, but if nothing is specified, the standard port of 80 is used by default.
/ indicates the root directory for the site. Only files in this directory tree can be accessed.
directory/ is a sub-directory or directory path that contains the file. files/reports/daily/, for example.
filename is an optional string telling which file should be returned. If no filename is specified, then a default action is done for the folder. In some cases (depending on the server) a listing of the directory is returned. This is not very safe, as it gives clients access to every single file on your site. MacHTTP uses a different method that will be discussed below.
In summary, the URL specifies what method to use to connect to the server, what server to connect to, what directory to look in, and what file to return in that directory. Since all of this information is provided in the link, the user doesnt need to know anything about such things and can click on any link without worrying about where it leads. This is a simplification, of course. More information about URLs can be found at http://www.w3.org/hypertext/WWW/Addressing/Addressing.html
Choosing a Server
The final topic to cover is that of selecting the proper server. If you have to deal with UNIX-loving network administrators, then you will need rational reasons for selecting a Macintosh as your server. Use the following guidelines to help you make your decision.
Performance - Ill talk more about this later, but suffice it to say here that a Macintosh server running MacHTTP will outperform an equivalently priced UNIX workstation running either the NCSA or CERN servers.
Installation - This is the real Macintosh strength. Even a fairly inexperienced Macintosh user can get MacHTTP running in very little time. On the other hand, even a very experienced UNIX person may have some problems installing the NCSA or CERN servers. This stems primarily from the fact that MacHTTP comes as a fat binary which is ready to run on any Macintosh, while the NCSA and CERN servers need to be compiled and there is a tremendous variety across the platforms they support.
Server Maintenance - MacHTTP 3.0 can be completely monitored and controlled remotely with an easy-to-understand graphic client. I have yet to see another server that offers this ease of maintenance.
Server Stability - Dont believe the rumors that Macintoshes crash more than other systems. If you have a lot of freeware system extensions installed (as I do) then that might be true. Without those extensions, though, your Macintosh will keep running for months or even years without crashing.
OS Design - Another vicious rumor says that the Macintosh cant be a server because it doesnt have pre-emptive multi-tasking. Actually, this is not necessary because the server software itself provides the multi-tasking using the Thread Manager.
Site Content Maintenance - If the people who will be adding content to the site are also Macintosh users, then this is a major consideration. By running a Macintosh server, they can mount the files directly on their computers for easy editing.
Starting A Site
So now youre ready to begin. There are really only three things you need to start your WWW site: a Macintosh computer, the MacHTTP software, and network connection (not strictly required). The Macintosh must be running MacOS System 7.1 or later, but any Macintosh from a Plus on up can be a server. Either MacTCP or OpenTransport (if it is released by the time this is printed) are needed for network connectivity and a minimum of 600K of free memory is required to run MacHTTP. If you want people to be able to reach your server, you need a dedicated connection to the Internet and an IP address. The documentation that comes with MacHTTP includes instructions for running on a standalone machine, though, if you need to do so for demonstration or development purposes.
See the end of the article for details on where to get the software.
Installation
The MacHTTP software distribution comes with a complete WWW site in a server folder that you can copy onto your hard disk. It doesnt matter where on your disk you copy it or what name you give the folder. This is because once you launch MacHTTP, whatever folder it is in becomes the root of your WWW site. The URLs you use on the site will all use paths relative to this root so it doesnt matter to them whether your server folder is at the root of the hard disk or buried 10 folders deep.
MacHTTP is preconfigured with settings that are appropriate for the majority of sites. This means that installation amounts to four steps:
1) Copy the server folder over to your hard disk.
2) Give the folder a name that you like.
3) Open the server folder
4) Double-click on MacHTTP
Congratulations! You now have a fully-functional site. Thats all there is to it. You can access your new server from your favorite WWW client by using the following URL: http://your_ip_address/ where your_ip_address is the IP address the machine you are running MacHTTP on (e.g., 129.45.3.100).
About MacHTTP
Now that you have MacHTTP running, it is probably time to learn a little more about it. MacHTTP is the product of Chuck Shotton of BIAP Systems, Inc. It was the first HTTP server available for the Macintosh and now offers all of the features you expect of professional HTTP server software including:
multiple, threaded connections *
HTTP 1.0 support (also an HTTP 0.9 compatibility mode)
CGI application support
site- and document-based security
secure connections ý
remote site monitoring and administration
gateway for credit card payments (First Virtual)
support for database and text-searching systems (WAIS, AppleSearch, Verity)
* MacTCP limits to 48 connections
ý due Q2 1995
MacHTTP also offers several special features that make it easier to use than other servers, especially in a Macintosh-based installation:
CGI (Common Gateway Interface)
The Common Gateway Interface is a proposed standard for information services to communicate with external applications. So far, only HTTP servers take advantage of this standard, and MacHTTP follows the latest CGI standards. The advantage lies in the fact that MacHTTP uses AppleEvents to communicate with CGI applications and those applications can, in turn, use AppleEvents to communicate with other Macintosh applications. This means that WWW pages on your Macintosh can be used as a front-end to databases, spreadsheets, word processors, graphics applications, and anything else you can think of. Pre-built CGI applications are already available for doing maps, linking to FileMaker and Butler databases, and many other specialized tasks. You can also write your own CGI applications using almost any language, including C, Pascal, Prograph, or even AppleScript or HyperCard.
MacHTTP Manager
Beginning with version 3.0, MacHTTP will become a server with very little interface. All management, monitoring, and configuring will take place through the MacHTTP Manager. The Manager can communicate with the server via AppleTalk or TCP/IP and can communicate with multiple servers at one time. This has several advantages that are not found with other servers.
Update server settings remotely. Multiple servers can have their settings updated simultaneously. For large MacHTTP sites which are running multiple, mirrored servers, this is a great asset in keeping them in synch.
Monitor multiple servers. Log data from multiple servers can be displayed on one remote machine for easy monitoring.
Security. Servers can be run headless to prevent unauthorized access (and save on monitors). All other security settings for site and file access can be set remotely and can be shared across multiple servers.
Pre-Processing
MacHTTP 3.0 allows you to use CGI applications to process files before they are passed to the client. This could be used to implement server-side includes, on-the-fly document translation, or your own security scheme. There will also be new definable types in addition to the built in TEXT, CGI, and SCRIPT types. MacHTTP will match files to these types based on filename extensions and pass all files of one type to a specified application for pre-processing. As an example, all files with the extensions .sit might be passed to a pre-processor for binhex encoding before sending them to the client.
Configurable Log Files
Beginning with version 3.0, MacHTTP log files can be configured to store only the data needed for each site. The site manager can select both which items are logged and what order they are written on the line. The items include date, time, referer, client address, requested file, data transfer speed, and HTTP code.
The log information can be directed to external applications for processing (including to a MacHTTP Manager). This means that connection data can be dumped to your favorite database to track connection statistics or generate billing data for clients.
Aliasing
Version 3.0 of MacHTTP supports using aliases of folders or disks to extend your site. Place the alias inside the MacHTTP folder and you can then write links to files inside the folder or disk. This is very useful when your site outgrows the current disk or for helping multiple groups and users publish information on a single server.
Special Files
MacHTTP supports three special files: Error, Index, and NoAccess. The Error file is returned whenever the server is sent a URL that it cant resolve to a file or folder. The Error file can be anything from a simple HTML document informing the user that an error occurred to an advanced CGI that tries to figure out what the URL was meant to point to. The Index file is returned whenever (1) a URL is received that specifies only a folder, not a file and (2) a file with the name given for Index files is found in that folder. If no Index file is found in the folder, then an error is returned to the user. Like the Error file, the Index file can be anything, including an HTML file listing the folder contents (or just the ones you want others to know about) or a CGI that allows users to search for contents. In addition, each folder can have its own Index file or no file at all so every folder need not be treated the same. The NoAccess file is returned when access to the a file is refused because the user lacks permission to access it. This file also can be anything the administrator wants and can be used to let the user know exactly why access was refused and what to do if there was a problem.
MacHTTP Performance
Whether youre planning a large site for a corporation or a small site to list the results of your kids softball team, one thing is certain: you want the site to be fast. Internet users are getting spoiled by near-instantaneous access to sites around the world so many of them wont hang around if it takes too long to download one of your pages. So far, though, there has been no metric discovered that measures how fast a server is. Some people look at the number of simultaneous connections that a server can handle, but that is an increasingly useless statistic due to speed increases in both the server software and CPUs. Consider the fact that a server that processes every connection in at least three seconds (not unreasonable) can handle more than 200,000 connections every week and never process more than one connection at a time.
The real key to speed on a server is simply get the data out fast. Toward that end there are several things you can do to improve the speed of your MacHTTP server.
Server Settings
The easiest way to improve server speed is to adjust the settings listed below:
Thread Manager - This is the single best performance improvement you can make to your site! Versions of MacHTTP from 2.0.2 on are able to use the Thread Manager, if it is installed. The Thread Manager is a system extension that allows applications to run multiple simultaneous threads. MacHTTP uses this to provide each connection with its own thread. Without Thread Manager, connections fight for attention in MacHTTP and slow connections eat up more time than fast ones, which slows your server down. With threading, each connection is somewhat isolated from the others so fast connections are processed quickly without waiting for slower ones. The Thread Manager extension is distributed with MacHTTP and is also part of System 7.5.
DNS Name lookups - MacHTTP logs the IP address of each connection. It also offers the option of logging the DNS name instead. This can be a lot of fun if youre the kind of person who would stare at the log for hours waiting for an important site to connect (Look, someone from Apple just looked at my bio page!). This is a significant performance hit, though, because every connection adds a delay while MacHTTP waits for a DNS server to return the DNS name. In addition, for sites that dont have DNS names you have to wait for the DNS server to give up looking for a name which takes much longer. For best performance, turn DNS lookups off.
Dump_buf_size - This is the setting which controls how large a block of data is sent to the client at once. The larger you make the blocks, the faster the data can be pumped out. The size option ranges from 512 bytes to 10K. Your best performance will come by using the 10K setting if you have the Thread Manager installed. If you dont have Thread Manager (see above), then the 10K setting may actually slow down your machine when you have to deal with slow clients. There is another possible problem with having a large dump_buf_size setting. This problem is mainly with Windows clients and arises when the TCP/IP interface that the client uses is unable to properly handle large data chunks. If the interface tries to reassemble the entire 10K block before passing it on to the client and if the interface has an 8K or smaller limit on how much data it can handle then the interface can crash. If you get a lot of complaints from Windows users, try reducing dump_buf_size to 8K or less.
Timeouts - MacHTTP allows you to set how long it will wait before timing out a connection. A timeout occurs when MacHTTP fails to get a response from a client in the time set in the Timeouts setting. MacHTTP has no way to know if the connection to the client was lost or is just slow, so it just hangs on as long as it can. Longer timeout settings will result in extra connection processing as MacHTTP keeps listening for a response from these dead connections. However, using a very short timeout setting can cause slow clients or those with poor network connections to be cut off prematurely. In addition, the Timeouts setting controls how long MacHTTP will wait for a response from a CGI application before giving up. If you are running CGI applications that do a lot of processing, you will probably want to use a longer Timeouts setting to give the CGI time to finish. Otherwise, I recommend a shorter setting for better performance.
Foreground Operation - For best performance, keep MacHTTP running in the foreground. Because the Macintosh uses cooperative multi-tasking, the foreground application is in charge of deciding when other applications get to run. Keeping MacHTTP in the foreground (meaning that its menubar is the one showing) gives it control of the CPU and the most processing cycles.
Hardware Improvements
This is where the largest performance improvements come from, but its not cheap or even possible in some cases.
Network - It is likely that the single biggest improvement you can make is in getting a better network connection. Even a Quadra 610 can swamp most networks, so its no surprise that relatively slow machine can swamp a 56K Internet connection. This is because the HTTP protocol is not processor-intensive - most of what MacHTTP does is pump data out over the network. Since your network speed comes nowhere near that of your disk drive interface, the network is the most likely bottleneck.
Computer platform - While CPU speed isnt the largest factor in performance, it does make a difference. My general feeling is that the difference between CPUs is much more pronounced than clock differences on the same CPU. That means that you will see a much bigger difference moving from a 68030 to 68040 or from 68K to PowerPC than you will moving from a 66MHz PPC to a 100MHz PPC. In addition, you can get much greater improvements by running two 6100s in parallel than using one 8100 for about the same price (see below on RAIC Design). Remember, though, that all of this is limited by your network. Its not much good moving from a PowerMac 6100 to a 9150 for speed improvements if youre running on a 56K connection. Using my own site as an example, a Power Macintosh 6100/60 (with Thread Manager installed) can easily handle 20,000 connections a day. It can handle even more than that if youre not running other software (mail, ftp, word processing) on the same machine and have a good network connection. For working examples of MacHTTP on various CPUs, check out the MacHTTP Registry (see this months Universal Resource Locator for the URL).
Disk Drives - Adding a faster disk drive can provide some performance improvement, especially if you are serving a lot of large files or are using an older Macintosh with the original 80MB disk. The difference isnt that great, though, so I suggest adding a new disk only if you need the storage space.
Memory - Adding more memory will have almost no effect on performance. More memory is required to handle more connections, but you can handle the maximum number of connections for your Macintosh in 4MB of RAM, so any 8MB machine already has plenty of memory. If you are running a lot of CGI applications or linking to external applications, then you may need more memory to handle those (about 32MB extra if youre linking to Excel).
RAIC Design
If you plan to run only a small or moderately busy site, then you can probably skip this section. If you plan to put up something that can handle 100,000 connections a day, though, you need to consider using a RAIC design. RAIC (Redundant Array of Inexpensive Computers) is my obvious play on RAID, which has become a popular method of designing large, fast hard disks. A RAIC system uses multiple Macintosh computers (at least all of mine do) to share processing duties for a site, thus providing much better performance than could be achieved by any single machine purchased for the same price. There are a number of ways to divide processing among several servers and they can be mixed and matched as needed.
Server Mirroring - The latest versions of BIND allow you to have a single DNS name map to multiple IP addresses with connections being passed out in round-robin fashion to each machine in turn. This allows a cluster of computers to appear to be one computer to the outside world. This has obvious benefits in server design. By mirroring the contents of a WWW site across multiple servers you spread the connection load across all of the machines evenly so no single machine has to handle a very high load. This allows several cheaper machines to provide the performance of one more expensive machine. As an example, 10 Macintosh SE/30 computers, each capable of handling 10 simultaneous connections easily, can be combined to create a virtual server that can handle 100 simultaneous connections without coughing and up to 480 simultaneously before dying. All at less than the cost of a high-end workstation. Performance is not the only reason to go with a RAIC design, though. There is the additional benefit of being able to hot-swap the CPUs that make up your server. Since the DNS server takes care of connections, any single CPU can go down or be removed (for maintenance, update, or replacement) without shutting down the site. This gives your site a degree of fault-tolerance that cant be matched by a single CPU. In addition, there is no need for all of the CPUs to be identical. Any Macs that you have lying around and gathering dust can be used to create a RAIC server.
Server Distribution - Another way to improve performance with multiple CPUs is to divide your site up into discrete units that can each run on a separate machine. This is very useful for sections that have high traffic and cannot be easily mirrored, such as a Comments page where people use a form to leave their opinions about your site. This can also be used to give users the illusion of a faster site. A users perception of your site speed is developed largely on the first connection to the site. By moving the sites home page or a single, very popular page to its own server, the user gets a very fast initial response and that impression will last even if other pages are not delivered quite so quickly.
Element Distribution - You can also speed up performance by spreading out the elements of your site, by which I mean the HTML pages, graphics, and other large files. Even an older Macintosh can serve up smaller (<10K) HTML pages with great speed. Graphics and other large files take a bit longer, though. In addition, newer clients like NetScape Navigator use multiple connections to retrieve the graphics in a page at the same time as the text, thus placing a larger load on the server. By putting graphics onto a separate server, clients may receive your pages faster. Moving large files to a separate server will also prevent slow connections from tying up connections on the main server.
Application Isolation - The final method to consider is moving external applications and/or CGI applications to a separate server. A CGI application must reside on the same machine as any HTML page that links to it.
You may not want to run another server simply to handle a single CGI application. However, you may want to set up a second machine to run applications (e.g. FileMaker) which CGI applications call. This second machine doesnt need to run a copy of MacHTTP.
As I mentioned above, if you are running a CGI that does database queries or that captures real-time images, or anything else that is processor intensive, you can greatly speed up both the CGI processing and MacHTTP performance by moving the external application to its own CPU.
Keep these techniques in mind the next time you wonder what to do with those older Macintoshes, or when some UNIX user spouts off that you need an SGI workstation to get a really fast site.
Security
MacHTTP provides several options for security on your site. These are in addition to the security that every Macintosh enjoys by having an operating system that cant be hacked into the way many UNIX systems can. There is no way for someone to launch software on your server, or erase the disk, or change your WWW files (unless you specifically write some software to do this, in which case you deserve what you get).
Directory Restrictions - The first way that MacHTTP provides security is by limiting connections to sub-directories under MacHTTP. Any file that is not in the MacHTTP sub-directories cannot be transferred unless the site manager makes an alias to it. In addition, MacHTTP does not allow people to get directory listings as some servers do (you can allow this if you want to, but it isnt built into MacHTTP). This means that people only see files that you provide links to, unless they are good at guessing file names. In addition, MacHTTP does not provide directory listings when the URL specifies only a directory and not a file. Many people consider this a feature on UNIX servers, but it is actually a breach of your site security. If, by some chance, you do want to offer this capability, you can do so by using a CGI application and you can limit it to specific directories.
Allow/Deny - MacHTTP provides the ability to allow or restrict access to your entire site based on either IP addresses or DNS names. It can handle partial addresses and names as well and the rules can be layered so that a single element is allowed within a larger group that is denied. Sites that fall in the DENY listing of addresses will be returned the NoAccess page. There is no way for users to hack around this protection.
Realms - Realms are used to restrict access to specific portions of your site or specific pages. A Realm is a text string. If this text string is found in the name or path of the file being requested, then the client is asked to provide a username and password to access the page. Each Realm has a username and password assigned to it. If the user cannot provide the username and password then the NoAccess page is returned instead.
Youre On Your Own Now
Well, thats it. You now have enough information to start your own WWW site on a Macintosh computer. I have completely ignored some important issues, of course, such as network connections, design issues for the content of your site, client software, and the future of the Web. There just isnt room to cover all of those topics, even if I took up the whole magazine. For more information you can check out the links provided in this months Universal Resource Locator, or read the comp.www.* newsgroups, or watch your favorite bookstore for one of the dozens of books that are sure to appear on the topic.
Obtaining MacHTTP
MacHTTP is available from BIAP Systems, Inc., 16323 Hazy Pines Ct., Houston, TX 77059. On the Web, http://www.biap.com/, by e-mail info@biap.com.