The Lust for Control
Volume Number: 19 (2003)
Issue Number: 6
Column Tag: Untangling the Web
The Lust for Control
by Kevin Hemenway
My love affair started before I knew love even existed.
Cue The Grandfatherly Blathering
I'm not sure when exactly it began, or when I realized it for what it was, but my strongest memory is drawing a login screen for the iniquity BBS. This was probably my first attempt at a "server" - a collection of BBS door games, files, and ezines available under the name "The Darkened Closet". Accessible only from the draconian hours of "when I got home from school" to "when my mother got home from work", it was never the hit I imagined. Whenever I hear a Fear Factory album, this is what my mind returns to - the single-minded tweaking of an "animated" ANSI menu.
It certainly hadn't been my first attempt. After I survived learning Microphone LT and logging into local BBS's, I immediately wanted my own, anxious as heck to get Hermes, a Mac-only BBS software, operational. Through my own frustration of why I couldn't get .EXE files working, I ran out, bought a Windows 3.1 and DOS laptop from a shady pawn shop, and descended into the joys of qEdit, QEMM, TSRs, batch files, and more.
My fond reminiscing of this time will become an important part of my biography.
Thinking back, even before I realized a network existed, I was interested in the aggregation of data. A big NES player at the time, Nintendo Power regularly graced my bookshelves and helped empty my piggy bank. What was most annoying, though, was having to find that one tip I needed to survive, that one cheat code that strayed from the Konami norm. Rifling frantically while the game was paused just didn't satisfy my boyish impatience.
I needed a database, and thus beget my first pathetic attempt: a giant text file on a Brother word processor. I remember sitting in front of that 7,000-pound behemoth and transcribing, word for word, the tips, tricks, and code sections of all the back issues I could find. Copyright meant nothing to my excessively star-filled eyes... I was gonna become an even better player (through cheating), and gosh darn it, my electronic Brother was going to help me!
My desire for collected data and my own server only increased with time.
The lust for control continued with the creation of my first web site back in early 1997. Enhanced by that magical tilde, it was a stunning portrayal of all the sites we've come to hate today - an intro page with three glowing buttons (I too loved the Adobe filters) and very little content. It was also the first battle wound of attempted (and failed) self-imposed censorship: my ezine, Devil Shat, got me kicked off AOL shortly after the fourth issue. I needed a web site for my missives and I got it all wrong, much like the Brother datab... er, text file.
It didn't matter, of course. Perseverance just doesn't quit.
Months after my aborted /~morbus/ attempt, I decided I needed to start over fresh, and purchased disobey.com as my new beachhead of world domination. It did well as a cultural haven, and my continued and nervous exploration of Linux, combined with timing that only fate could produce, landed me a job at my local ISP. It was perfect - not only did I control my own web site, but now had control of the very web server it resided on! My BBS and data dreams were enflamed, getting brighter as my love of Perl deepened - not only could I tweak what Apache did, but I could tweak what Apache served, dipping into MySQL databases via a CGI script of my own devising. I felt like I had been handed a gift - the realization that everything I had dreamt of doing could be done, and it wasn't that hard.
Enter Mac OS X, Dave Mark, And Unwelcome Innuendo
When Mac OS X was released, it blew my mind. Here was my beloved Macintosh GUI, with whom I had coded countless Perl scripts and uploaded numerous buggy attempts, suddenly sitting on top of every tool I used at the ISP. Apache, Sendmail, Perl, a shell! I was in heaven, and steadily watched my productivity increase tenfold. No more did I have to debug my scripts from afar - duplicating disobey.com's Apache configuration under OS X was immediate, as was my satisfaction. It was one of those moments where part of the equation disappeared: no longer did I need two boxes, a Mac for getting work done and Linux for getting work public. Everything old was new again, a sentiment echoed by a very like-minded Scott Knaster (MacTech Magazine, April 2003).
Shortly after, my Apache articles for O'Reilly's MacDevCenter.com appeared. A soft introduction to serving web sites with OS X's Apache, they were very well received, enough so that I went on to co-author O'Reilly's Mac OS X Hacks. Currently in the process of being re-edited and reprinted for Jaguar on http://macdevcenter.com/, my original articles aroused the attention of MacTech editor Dave Mark.
His email had me at "My name is Dave Mark", really, but the rest of his cranium-enlarging fan gushing sent me over the edge of acceptance. Yes, I'd be interested in writing for MacTech. Yes, I'm really as handsome as my O'Reilly photo professes me to be. Yes, if you give me a column, I'll let you hear my impression of Barry White. Yes, yes. Yes.
With a license to "write about anything having to do with the web: opinion, programming, config, Perl, firewalls. Anything." I couldn't pass it up. Within a week of our initial contact, I settled on the following introduction: "Most spider webs contain elegant symmetry, layer upon layer of silk and polygonal beauty. Some, however, create chaotic weaves with no apparent rhyme, reason, or pattern. For those who've never delved into setting up their own web server or coding their own web scripts, this "tangled Web" of confusing acronyms, daemons, and protocols is enough to make sucking blood a viable career change. With MacTech's "Untangling the Web" series, we'll alleviate your stomach knots with a guide to turning your OS X machine into a web serving and web programming powerhouse. Free those butterflies!"
Entwined metaphors leave me in stitches.
In Which Mr. Earbrass Finishes Chapter VII
A web column is a natural progression for MacTech - the past four issues of our 19th year included Dan Wood's "Building the Internet-Connected Application" (MacTech, January 2003) and Fritz Anderson's three-part "The Web from Cocoa" (MacTech, February, March, April 2003). If your application isn't web-enabled somehow, you may think you're falling behind. While we won't be focusing on the Cocoa or Objective C end of your app, we'll definitively cover the other half: making your application's web site smarter.
With a comically wide creative swath, I'll cover the "basics" first: enabling the Apache web server built into your OS X machine, turning on its features and learning how to use them, as well as how to play with technologies like SOAP, XML-RPC, and the hosts that provide public APIs.
You may have noticed that "basic" is quoted - there's a reason for this. Largely, most Apache tutorials are just that - they tutor you in the basics enough to get you going and then send you out into the world. That'd be fine if my audience was someone who just wanted to host a weblog, but I'm of the mind that we're all a little bit smarter than that, and we want to do something "real" (not The Real Cancun real, but real nonetheless). I'll magnanimously attempt to give you the free lovin' you yearn for.
Once we have our own web server running smoothly, I'll meander a bit about pre-packaged software. Too many people teach Apache as a tool for the outside world and totally ignore its potential as a supplementary local tool. Much like we may use iPhoto to organize our pictures, there's much to be gained by using a web-based script like Gallery. And even though the recently released iTunes 4 can stream tracks over the web or Rendezvous, using Andromeda will allow your Windows and Linux friends to jam along with you.
Familiarity That Breeds An Uh-Oh!
Cool URLs never die.
Or, more accurately, they never change. And that's the first technically inclined concept I want to leave you with. Much like most accidents happen within a mile of the home (a statistic I've always had a chuckle over), URLs are so familiar that we don't really think of them as something that deserves a second thought. Unfortunately, that's an invitation to a little party wrecker called the 404 - the dreaded "Not Found" error that permeates our search results or carefully organized bookmarks.
The groundings in URL design can be found in the World Wide Web Consortium's (W3C) "Style Guide for online hypertext", which contains a document from 1998 entitled "Cool URIs Never Die" (the difference between "URI" and "URL" aren't important for now). You'd do yourself some good to familiarize yourself with this body of work, as well as the matching Alertbox column from Jakob Nielsen (linked from the "Cool URIs" article):
http://www.w3.org/Provider/Style/
The larger points can be boiled down, simply, to:
Your URLs Should Never Change: In other words, once you make a URL public, it should remain accessible for the rest of forever, which most people suspect is a very long time. If you absolutely MUST change the location of a resource on your web site, you should have the original URL point or redirect to the new location.
Your URLs Should Not Reveal Your Technical Capabilities: You may be using HTML today, but ten years from now when you'll be fiddling with XHTML, any URL ending with .html are visually and technically inaccurate. And what about that /cgi-bin/script.pl of yours? What happens when you switch to PHP, Ruby, or Python? Your URLs should never have an identifying extension (like .html, .cgi, or .php) nor should they contain an indication of their backend (/scripts/, /cgi-bin/, /database/, etc.).
Don't Be Specific: Try not to include titles, authors, status, or any other fluctuating bits of information in your URL: morbus_letter.html, draft-pr.php, and my_favorite_music.txt could all be improved (the great debate, of course, is how to do so). This is the trickiest part of creating a good URL.
Linkrot is the Devil. Are you a friend of the Devil?
As the column progresses, we'll keep the above tenets in the back of our minds - I'll show you how to remove the need for URL file extensions, how to turn on spelling correction so that fumbling fingers follow the facts, and how to redirect unsightly 404's to their proper locations. We'll also examine a few tricks on how to be proactive about yours, and others, inevitable mistakes.
Now, as a slight precursor to the inevitable Nelson email, some of my own existing sites don't follow all the suggestions I'll give. I can (and do) fix every 404 I've ever created, but other tricks require forethought that I just didn't have when these sites were originally created. Some suggestions, like extension-less files, can often have a detrimental effect to caching servers, especially when I've been using /about/morbus.shtml for the past four years, and then suddenly change to the stronger /about/morbus (likewise with the often interchangeable www.disobey.com or disobey.com).
If you're redesigning your web site, starting fresh, or planning on launching a new sub-section, give special consideration to the suggestions above and in the URLs featured - you have the chance to do things "right" (or, at the very least, "better"), and that's not an opportunity you should pass up.
But First, The Whor... ISPs!
As with most Macintosh technologies, you may be surprised by how easy it is to turn on your built-in Apache web server. A click of a button here, a browser Location there, and you should be serving with the best of 'em, right? Well, maybe. Sorta. Perhaps. Ok, look, it really depends on your Internet connection. If you've got a dial-up connection, then you've got nary a worry. DSL or cable users, however, it's a different matter entirely.
See, with a dial-up connection, nothing fancy really happens. You dial into your ISP, get assigned a dynamic IP address (see Figure 1), and that's about it. There's nothing funny with routers, DHCP, NAT, or acceptable usage policies. I mean, who'd be crazy enough to seriously run a web server on a dial-up modem anyways?
Figure 1. Determining your dial-up IP address.
With always-on home connections, like DSL and cable, the issues start sprouting up left and right. Your first hurdle isn't even technical in nature: unbeknownst to you, you probably agreed, on sign-up, to never run a server of any kind, be it web, FTP, Direct Connect, or whatever. Legally, this puts an immediate dent in your web serving dreams, and thus, should be the first issue you get out of the way. Check your provider's web site for an "Acceptable Use Policy", "Usage Terms", "Subscriber Agreement", or some other freedom-limiting statement of fact. For Comcast, a New England cable provider, we see only frowns:
"Whether the cable modem is owned by you or us, we have the unrestricted right, but not the obligation, to upgrade or change the firmware in the cable modem at any time that we, in our sole discretion, determine is necessary or desirable." Section 1(a).
Section 1(b) allows Comcast to "enter your premises ... at a time agreed to with you" to maintain their equipment, but that I "agree to indemnify, defend and hold harmless Comcast and its affiliates and agents against all claims and expenses (including reasonable attorney fees) arising out of any breach of this [section]." I've a nagging paranoia this means they can break into my house without legal repercussions.
"By using the Service to publish, transmit or distribute material or content, you ... consent to and authorize Comcast, its agents and affiliates to reproduce, publish, distribute, and display the content worldwide..." In other words, the drone from Sector 7G is posting your love letters near the water cooler. Section 2(c).
Since these columns deal with the web, however, let's stay on topic:
"...The Service is for personal and non-commercial use only and you agree not to use the Service for operation as an Internet service provider, a server site for ftp, telnet, rlogin, e-mail hosting, "web hosting" or other similar applications, for any business enterprise, or as an end-point on a non-Comcast local area network or wide area network." Section 5(a).
"Comcast will provide you with dynamic Internet protocol ("IP") address(es) as a component of the Service and these IP address(es) can and do change over time. You will not alter, modify, or tamper with these dynamic IP address(es) or those of any other customer. You agree not to use a dynamic domain name server or DNS to associate a host name with the dynamic IP address(es) for any commercial purpose." Section 6(f).
You may find similar limits in your own provider's usage policies - they really want you to pay for hosting with them, not set out on your own. If restrictions exist, you're left with only three options: you could find a new and friendlier provider, beg with your current one to loosen their belts, or hope they don't catch you when you innocently "forget".
Where Are You Located On The Cog?
If you're shopping around for new connectivity because of a restrictive policy, here's a quick run-down of what you'll need: a static IP (ie. one that doesn't change, like a dynamic address), a fast connection (DSL or higher), and no restrictions on hosting your own servers, be they mail, web, FTP, etc. You may also want to find out additional charges for exceeding allowed bandwidth - if your site gets Slashdotted, you don't want your provider turning you off midway through. Likewise, find out their uptime, support policies, how long they've been in business, who their upstream is, and so forth. Having watched a friend change DSL providers three times in one year, two of which closed up shop with only weeks notice, stresses how much of an important decision this is.
If you're not changing providers, either due to happiness or memory loss, there are a few other technicalities to worry about. First and again, you really should have a static IP - if you don't, you may end up having to use a dynamic DNS service like dyndns.org. These services assign you a domain name that follows you regardless of your IP address, which is reported back to their servers by software you install and run continually. You can use one of their 30+ vanity addresses or your own custom domain if you already have one.
Next, find your external IP address, that is, the one that's accessible to the outside world. The most reliable way to do this is have someone else tell you, so you should run off to http://checkip.dyndns.org/ (Figure 2) and write down what they say. Don't take this as face value just yet though - before you trust this setting 100%, shut off your cable or DSL modem and wait five minutes or so. Turn it on, go back to that page, and see if the IPs are the same. If they're not, then you've got a dynamic IP that changes upon each connection to your service (which, by nature, is far less often than a dial-up) - you'll have to keep this in mind during power outages, service interruptions, or whenever you reboot your equipment.
Figure 2. Your external IP address, as reported by dyndns.org.
In practice, we're still not entirely home free... your ISP or local network configuration may also contain a proverbial wrench. Paradoxically, some of these you may not know exist until we actually turn on the web server and start fiddling with serving pages. We'll get to that in the next column, but for now, run through the list below and find out as much as you can:
Does your ISP use DHCP? If so, you may have a DHCP lease on your IP address. Depending on your ISP, these leases could expire every half hour, every month, every year, or whenever, after which, you'll automatically receive a new IP. If this is the case, you may, depending on the duration, need some sort of dynamic DNS service to follow your IP address as it changes.
Do you, or someone you love, use NAT? If you've multiple machines in your local network, your router probably has an external and internal network interface - you'll need to read your router's instruction manual to forward all incoming traffic on the external interface to your internal machines. At the very least, HTTP traffic needs to be forwarded to the machine you'll be running the web server from.
Does your ISP filter incoming traffic? Some ISPs, either because they think they'll lose money from web hosting, or they're trying to stop the proliferation of viruses like Code Red, will filter all incoming HTTP port traffic. If this is the case, you'll have to configure your Apache server to listen on a non-standard port (like 8080). Likewise, your URLs will need to contain this different port information.
Do you or your ISP have a firewall? OS X is pre-configured to allow incoming HTTP traffic on port 80, but other firewalls (either installed on your web server, or part of your existing network) will block all but a small subset of incoming traffic. Be sure to check that port 80 is allowed with no restrictions.
Some of the above problems can be solved internally with your Apache configuration, and we'll get to those relevant changes in future columns. Other solutions, however, depend on your network and ISP's configuration - these case-by-case scenarios exist solely in your own dreamland, and thus, this Nemo can't offer assistance.
Homework Malignments
Students may contact the teacher at morbus@disobey.com.
Read the W3C's "Style Guide" and the supplementary Alertbox column.
Up, Up, Down... complete the code!
Finding out who "Mr. Earbrass" is is easy, but why is Chapter VII significant?
"The inevitable Nelson email" - huh?
Throw another one-liner on the pile.
I partially quote from a TV show. Which show?
Kevin Hemenway, coauthor of Mac OS X Hacks, is better known as Morbus Iff, the creator of disobey.com, which bills itself as "content for the discontented." Publisher and developer of more home cooking than you could ever imagine (like the popular open-sourced aggregator AmphetaDesk, the best-kept gaming secret Gamegrene.com, articles for Apple's Internet Developer and the O'Reilly Network, etc.), he's an ardent supporter of downloading his brain into the computer and living forever. Contact him at morbus@disobey.com.