Enabling the Embedded: PHP
Volume Number: 19 (2003)
Issue Number: 11
Column Tag: Programming
UNTANGLING THE WEB
Enabling the Embedded: PHP
by Kevin Hemenway
We've covered CGI scripts, but how is PHP any better?
In the past five months, we've turned on our built-in Apache web server, fiddled with the quick and dirty Server Side Includes (SSI), configured CGI to run scripts written in languages like Perl, Python, and Ruby (see also Jim Menard's article on Ruby, MacTech, March 2003) and, in general, found that this high-falutin' web serving stuff ain't all that difficult. However, we've yet to launch into a peaceful meandering of PHP, one of the easiest and more popular programming languages available for web development.
In the first of assuredly many self-congratulatory, pointless, and cliched "personal milestone" articles, our amazing, exciting, over-the-top, and action packed sixth entry will... will... welp, it'll teach you about configuring and customizing PHP with your Apache web server. Quick, over there! Bearded lady! Vegas pole dancer! Scarily painted clown!
But First, The Who... Oh, Look! Historical References!
As most developers know, it's hard to talk about a language without getting into religious wars ("my recursively-named is better than your yet-another!"), personal dislikes ("whitespace?! I can't believe Python considers it syntax!"), and programming theory ("procedural is to OOP as 'that's odd' is to 'eureka'!"). Regardless, there is one clear advantage to using PHP under Apache: "forking" (or rather, the lack thereof).
See, anytime a CGI script is requested, Apache forks (or spawns) a new process to handle this request, and then executes the code within this newly created environment. This happens pretty quickly and without fuss or muss, but with one downside: every time a process is spawned, the interpreter (be it Perl, Python, Ruby, etc.) has to be reloaded into memory--nothing is remembered or cached from previous runs. The more and more times this happens (due to heavy incoming traffic, for example), the slower the web server will get: Apache will spend more time waiting for processes to finish than actually serving their results.
This doesn't happen with Apache's Server Side Includes. Since SSIs are built into the web server via a default module called mod_includes, everything is handled internally and no forking is required. This same approach is used with the mod_php module: by preloading PHP into each and every Apache process, you remove the need for the forking and interpreter loading overhead. The only downside is slightly larger memory, a trade-off worth taking.
Let's see how mod_php is configured within Apache.
Enabling The PHP Module Within Apache
As you'll see, configuring PHP under Apache is very similar to what we've seen in our previous articles--this stuff should be old hat to you by now. As we've been doing from the start, to find out more about a feature, we'll search for the keyword within the httpd.conf file. Our first matches for "PHP" are our familiar LoadModule and AddModule lines:
LoadModule php4_module libexec/httpd/libphp4.so
AddModule mod_php4.c
Similar to our previous articles, these two uncommented lines (i.e., not prefaced with a # character) load the module located at /usr/libexec/httpd/libphp4.so into our Apache web server. You may notice that the module name doesn't match what we've grown to expect (mod_includes for Server Side Includes, mod_cgi for CGI scripts, etc.), as it's called libphp4.so instead. There's nothing special about this... everyone calls it mod_php regardless of what the actual file representation is.
Our next "PHP" search research should also look familiar:
#
# To use PHP files:
#
AddType application/x-httpd-php .php
AddType application/x-httpd-php-source .phps
Similar to Server Side Includes, these two lines tell files ending with a php or phps extension to become associated with the PHP module. You'll notice there's no AddHandler equivalent like with SSI or CGI; largely, that's because the "application" of the MIME-type instructs mod_php to become their "handler". It's sort of like Adobe Acrobat Reader handling .pdf files (which have a MIME-type of application/pdf). If you've already designed your entire site around .html files and don't feel like revamping your structure or redirecting old URLs, you can modify the first line like so:
AddType application/x-httpd-php .php .html
This will enable support for PHP code in both .php and .html files. Be careful not to go nuts with this: you don't want to add file extensions "just because". Whether you actually use PHP code within an .html file or not, mod_php will process it like you did, and that can unnecessarily slow down your server when your traffic starts getting heavier.
With that, we've run out of search results in our httpd.conf--PHP has already been configured for our use. But how do we know for sure, besides attempting to run some PHP code and introducing varying levels of user error? To find out which third-party modules are loaded into your web server, check Apache's error_log where, for every startup, the "server tokens" will be logged. These tokens reflect the server version number, what operating system it's running on, and information on which third-party modules have been loaded.
Since we've been searching for "PHP" throughout the httpd.conf, let's do the same with our error_log. Figure 1 shows the results of a grep PHP /var/log/httpd/error_log shell command, listing the server tokens each restart of our Apache server logged. grep is a great tool for quickly searching a file from the command line.
Figure 1: Grepping our error_log for Apache's server tokens.
Now that we know PHP is enabled, let's learn more by writing our first script.
Far More phpinfo() Then You've Ever Wanted
Since PHP has been enabled for any file that ends with .php, we're going to create a test.php file within our personal user directory (/Library/username/Sites/). The contents of this file will be simply:
<? phpinfo(); ?>
Any PHP code you write will need to be sandwiched within a starting and ending delimiter, <? and ?> in this example. You may also see and use starting delimiters like <?php (often recommended for greater portability over the shorter <?), and <?= (which can be used to quickly echo a variable or expression). With our delimiters in place, we'll use one of PHP's built-in functions, phpinfo(), to spit out gobs of information about our installation. Figure 2 shows partial output of http://127.0.0.1/~username/test.php:
Figure 2: The first page of phpinfo()'s many.
Obviously, there's a lot of stuff here, and a good portion of it won't be immediately (or even ever) useful, but there are a number of interesting things to discover. I'll touch briefly on a few of the more helpful entries below, but you can always find out more by searching through the online documentation at http://www.php.net/.
- The first section we come across gives us the version number of PHP (which we previously saw in the output of our error_log), the time the module was built, and more importantly, the configuration line used to build it. This becomes helpful if we ever build our own version of the module, adding a new feature or tweaking an existing one. Also helpful is the configuration file path, which tells us where the php.ini file lives (or should live). The .ini file is similar to Apache's httpd.conf and allows us to tweak the runtime settings of the module.
- Next up is a long list of configuration directives. asp_tags, off by default, allows you to use <% and %> as your delimiters, whereas short_open_tag controls whether you can use the <? we've already encountered. display_errors should ultimately be turned off when we're ready to use PHP on a production system, and any errors configured with error_reporting should be sent to an error_log instead (though we'll use log_errors to send them to our Apache error_log -- confused yet?). Whether incoming data will automatically be escaped for database use is magic_quotes_gpc's intent; since it's on by default, we'll have to be careful to stripslashes if we use the data elsewhere. Finally, register_globals will ensure that visitors or users can't easily pollute your namespace with GET or POST parameters, though some programs (most notably, the excellent osCommerce, http://www.oscommerce.com/) require it to be turned off.
With PHP certifiably enabled, let's configure it to log errors into our Apache error_log. Knowing about errors is always a very good thing, and by enabling all of them, we'll be able to write better scripts (similar to the Perl equivalents strict and warnings; see last column). To do this, we'll have to modify the php.ini file which lives in /usr/lib/.
Tweaking PHP's Initialization
There's one problem with editing this file: it doesn't exist. Since PHP is currently configured with all the standard and expected defaults, Apple never shipped a dummy php.ini file for us to modify. This isn't cataclysmic... most of the settings we care about can be modified in the relevant PHP file itself. Take, for example, the following two scripts:
script example #1:
<?php humiliation; ?>
script example #2:
<?php error_reporting(E_ALL); humiliation; ?>
They're combined output can be seen in Figure 3. As is obvious, there's an error in the code, but only when reporting is enabled would we actually be informed--meanwhile our script would trudge on regardless, perhaps getting deeper and deeper into a well of cascading problems. Further error configuration is possible by using the ini_set function to set relevant values, but who wants to worry about doing that for every .php file?
Figure 3: Our PHP script, with and without error reporting.
Our first task, then, is to create a /usr/lib/php.ini file. Thankfully, we don't have to build one from scratch, as the latest and greatest default version is available from CVS: http://cvs.php.net/co.php/php-src/php.ini-disthttp://cvs.php.net/co.php/php-src/php.ini-dist. If you're new to PHP or security-conscious (rightfully so), it may be better to blindly trust the more secure version, available from http://cvs.php.net/co.php/php-src/php.ini-recommended. Both are heavily commented and should be read fully to understand their ramifications.
For our purposes, we're going to use the php.ini-recommended file. It's always best to start with the most secured installation you can, then slowly open it up when you know what you're doing. The side effect of using this version is, happily enough, logging exactly how we want: none to the browser, everything to the error_log. Save the php.ini-recommended file to your Desktop, open a Terminal, and type the following:
cd Desktop
sudo cp php.ini-recommended /usr/lib/php.ini
sudo apachectl restart
Since /usr/lib/ is a protected directory, you'll need temporary super-user privileges to copy the new configuration into place. That's where sudo comes in. Likewise, since we're making a change to PHP, which is loaded as part of the Apache web server, we'll need to restart Apache with apachectl, which also assumes super-user privileges. Any time you make a change to httpd.conf or your new php.ini file, you'll need to restart Apache before the changes will take effect.
You can ensure your configuration changes are enabled by checking the output of phpinfo, as well as running the first of our script examples. The browser output will remain unchanged, but the error will be logged into Apache's error_log--see Figure 4.
Figure 4: With our new php.ini file, errors are logged to Apache's error_log.
With our new php.ini file working correctly, you'll want to keep in mind that register_globals is off (and you may have to turn it on for some applications to work properly), magic_quotes_gpc has been disabled, so that you'll need to addslashes if you plan on putting content into a database, and that included files and libraries can only exist in the current directory (although you can add more lookups via include_path, which previously had include /usr/lib/php).
Homework Malignments
In our next column, we'll take a look into databases, specifically the MySQL server, and how to integrate our PHP scripts with them. Because it's so easy to use PHP to hook into any database, you'll often be hard-pressed to find a script or application that doesn't require the existence of one. Also, with Panther available by the time you read this, we'll cover any relevant differences in your web serving capabilities. For now, students may contact the teacher at morbus@disobey.com.
- I missed an opportunity to say "stick a fork in it; it's done".
- If you're looking to pick up a book on developing sites with PHP, check out PHP and MySQL Web Development by Luke Welling and Laura Thomson. The Second Edition is available from Sams Publishing and covers programming in PHP, fiddling with your first database and SQL statements, and then combining the two together to create a number of different applications to learn practical techniques from.
- You may have noticed that the PHP shipped with OS X is only 4.1.2, even though, at time of writing, 4.3.3 is available. If you absolutely must be using the latest version, be sure to check our Marc Liyanage's downloadable package available from http://www.entropy.ch/software/macosx/php/. It's full of features not enabled in Apple's version (compare http://www.entropy.ch/software/macosx/php/test.php to ours), and is easily removable for when you want to take a step back to the defaults.
Kevin Hemenway, coauthor of Mac OS X Hacks and Spidering Hacks, is better known as Morbus Iff, the creator of disobey.com, which bills itself as "content for the discontented." Publisher and developer of more home cooking than you could ever imagine (like the popular open-sourced aggregator AmphetaDesk, the best-kept gaming secret Gamegrene.com, the ever ignorable Nonsense Network, etc.), he's trying desperately to find time to work on his next book outline. Soon, he says, soon. Contact him at morbus@disobey.com.