Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers

ArticlesPerl and Apache


March 1998 / Web Project / Perl and Apache

Apache's embedded Perl interpreter, mod_perl, powerfully integrates Web services with scripting.

Jon Udell

Lots of people use Apache and Perl, but I've met only some who use mod_perl, Doug MacEachern's extraordinary synthesis of these two technologies.

Why? Because mod_perl is, frankly, scarier than a typical Apache module. It doesn't just attach to one of Apache's module hooks, à la mod_auth or mod_rewrite. It can attach to all of them, so the Perl interpreter that it binds into the server can directly implement Apache extensions. It also exposes Apache's configuration and run-time data structures to Perl, so Perl code in the Apache configu ration file or in conventional scripts can manipulate these structures. Plus, mod_perl isn't just "fast Perl for Apache" -- although it is surely that, too. It's a deeply integrated Apache/Perl hybrid.

To use mod_perl effectively, Perl and Apache users alike need to acquire some new skills. The learning curve's a little steep, but the rewards are substantial.

An Introduction

I'll assume that you're running some flavor of Unix because, although an NT port of Apache is in the works, it's not quite ready for prime time yet. Begin by retrieving the latest recommended Perl distribution (see http://www.perl.com/ ). Build, install, and test Perl -- that is: ./Configure , make , make test , make install .

Now do the same for Apache (see http://www.apache.org/ ). Configure Apache to suit your taste -- that is, edit the Configuration file so that it includes the modules you want -- and then build ( ./Configure , make ) and test (adjust settings in httpd.conf, run httpd, and point a browser at it). Now retrieve mod_perl (see http://www.perl.com/CPAN/ ). Here's the drill: PerlMakefile.PL , make , make test (this step requires the LWP module), make install .

The mod_perl make file, when invoked, will locate your Apache source tree and offer to create the new, Perl-ified httpd in that location. Don't f orget to make install because, in addition to the new httpd, mod_perl comprises a set of Perl modules that must be added to your Perl installation.

If all went well, your Apache httpd is now much fatter. Should you worry? Admittedly, it's a concern. Since Apache, like most Unix Web servers, uses the flock-of-daemons approach to scalability -- one master process and 10 or 20 or more children handling requests -- the extra bulk of mod_perl multiplies accordingly. Throw lots of memory at the problem, and it will go away.

Alternatively, you can partition your Web application into dynamic parts that require the services of mod_perl, and static parts that don't. So, for example, a handful of mod_perl daemons listening on port 81 might serve the computational needs of a large flock of standard Apache (or other) httpds listening on port 80.

Next you'll want to try the mod_perl version of "Hello, world." Standard CGI Perl is governed by a line like the following in httpd.conf:

Script
Alias /cgi-bin/ /cgi-bin

If the file /cgi-bin/hello.pl contains the following:

#!/usr/bin/perl print
"Content-type: text/html\n\n";
print "Hello from
$ENV{'GATEWAY_INTERFACE'}";

then invocation of the URL /cgi-bin/hello.pl from a browser will produce the phrase "Hello from CGI/1.1" on-screen. Behind the scenes, the Web server spawns a Perl process to achieve this effect. Eliminating that process-creation overhead is one of the major benefits of mod_perl. Here's the standard recommended setup in httpd.conf:

Alias /perl/ /perl

<Location /perl> SetHandler
  perl-script PerlHandler
Apache::Registry Options ExecCGI
  </Location>

This incantation names the directory /perl as a place where mod_perl scripts can live. And it establishes Registry.pm, a crucial Apache/Perl module, as the handler for Perl scripts that run from that directory. If you copy hello.pl to /perl and invoke it from a browser, the phrase "Hello from CGI-Perl/1.1" should appe ar.

Behind the scenes, things are quite different from the CGI example. The Web server does not need to spawn a Perl process to run this code because it already contains Perl. In this respect, mod_perl resembles Win32ISAPI Perl. Both implementations are much quicker than conventional CGI Perl because the interpreter shares the Web server's process. However, mod_perl's performance edge goes beyond that of ISAPI Perl.

Two Kinds of Cached Compilations

An ISAPI Perl version of hello.pl is compiled once per invocation. So while Perl itself springs into action much more quickly than it would with conventional, out-of-process CGI, it must still do the work of compilation once per request. For a toy program like hello.pl, that work is negligible, but for real Perl programs with hundreds or thousands of lines of code, it becomes significant.

Consider CGI.pm, a very popular Perl module that offers a wealth of CGI-related services. You can use CGI.pm under ISAPI Perl, but you might not want t o, because each time a client invokes a script that contains the statement use CGI; there is a perceptible delay as Perl compiles the module.

Can't CGI.pm's components be brought in individually? Yes, that's true, and it's often a good idea to selectively pull them in. But the fact remains that each component you use is compiled once per request.

With mod_perl's Registry.pm, CGI.pm compiles only once per httpd, and thereafter is instantly available to calling scripts. How? Registry.pm conjures up its own Perl package, compiles your scripts into that package's namespace, time-stamps all thecode, and recompiles only if the source files are newer than the compiled bytecodes.

Now for one of the interesting things about mod_perl that took me a while to get used to. Although you can use this compile-on-demand feature for a package like CGI.pm, you probably don't want to. Instead, you should use one of several httpd.conf directives to load CGI.pm when the server starts up. Here's one appro ach:

PerlScript /perl/startup.pl

This directive loads startup.pl when the server starts. If startup.pl contains use CGI; , it's compiled and made available to all subsequent scripts handled by mod_perl. Alternatively , you can do this:

PerlModule CGI

This directive names up to 10 Perl modules that should load at server startup.

Why not just let Registry.pm handle the caching of this code? It compiles into a unique package namespace that becomes cluttered and unwieldy if you pull lots of standard methods into it. There are two different code caches. The "startup" cache, loaded by the PerlScript and PerlModule directives, is immutable. If you change CGI.pm or another module, you need to restart Apache to propagate those changes to the mod_perl environment.

The "runtime" cache, maintained by Registry.pm, is, on the other hand, mutable. If you alter /hello.pl and rerun it, you will see the result of your change immediately. Registry. pm, noticing a newer source file, automatically updates the code cache. This occurs on demand once per Apache httpd -- that is, each instance of Apache pays a one-time cost to recompile that script, and it does so only when the changed script is first invoked.

These two strategies are complementary. Per recommended Perl practice, I've divided a mod_perl application that I'm currently developing into a set of modules that export core services and a set of scripts that use those services. Because the modules change infrequently, it makes sense to compile them once at server start-up. Because the scripts change often, it makes sense to compile them on demand using Registry.pm.

Avoiding Pitfalls

The cardinal rule of mod_perl is to preface every module and script with the statement use strict; . This oft-ignored tenet of good Perl practice will, among other things, prevent use of global variables.

Consider the difference between standard CGI-based Perl and mod_perl. In CGI Perl, the interpreter starts up, loads modules, runs a script, and then goes away. The whole Perl environment is transient. Even here you can get into trouble with global variables. Suppose a module opens a global $DebugFile . Then a script, expecting its own $DebugFile , does the same thing. If the module and the script intend to open different files, there's going to be a problem: The global variable is the same, and so is the file it represents.

With mod_perl, there is far greater danger. Each child process inherits the parent's global Perl namespace and then handles many transient scripts that can all scribble on that copy of the namespace.

It's true that the mod_perl environment is not actually immortal. Apache's MaxRequestsPerChild directive (default: 30) sets an upper bound on the lifetime of every httpd. Child processes that reach this limit expire and are replaced. This cleansing mechanism, intended to limit Apache's vulnerability to memory leaks, also forces a periodic flushing of the mod_perl environment. Nevertheless, the potential havoc that can be caused by contamination of Perl's global namespace makes the discipline of strictness well worthwhile.

Database Connection Caching, the Wrong Way

Inheritance of the master daemon's Perl namespace, though dangerous, has its uses. And mod_perl depends on this effect to make preloaded modules universally available to all children. It's tempting to try to cache handles to your own data this way. Here's a naive attempt to use a global variable as a persistent database handle:

use Fcntl; use SDBM_File;
tie(%myData,'SDBM_File','data',
  O_RDWR,0666);

This fragment uses Perl's tie facility to associate a DBM file (disk-based hash table) with the Perl hash table %myData . After this fragment executes, the statement

$myData{'Jon'} = 41;

does two things. It inserts the key 'Jon' into an in-memory hash table, along with the value 41. And it synchronizes a permanent on-disk representation of the hash table with the transient in-memory table.

This is just standard Perl DBM practice. But suppose you include the tie construct in startup.pl and instruct mod_perl to run startup.pl when Apache starts:

PerlScript /perl/startup.pl

Now the global variable %myData is part of mod_perl's environment, available to scripts. For example, a script called lookup.pl could retrieve the value of 'Jon' like this:

print $main::myData{'Jon'};

The tied hash variable looks like a kind of persistent database handle. Scripts running in any Apache child process can read, and even write, the keys and values of this database.

This scheme is incredibly fast. Unfortunately, it's also completely unreliable. And it's more fatally flawed than you might suspect. I thought at first that it would be safe to read values from the table but that some record-locking protocol (which most DBM implementations lack) would be needed in or der to write values safely.

Wrong! Even reading is unsafe, as I learned from Rob Hartill, who develops the Internet Movie Database and contributes frequently to the Apache/Perl mail-ing list (modperl@listproc.itribe.net, archived at http://outside.organic.com/mail-archives/modperl/ ). Under heavy multiuser load, Rob says, reading the same key twice can produce different results. I tried a test myself by running many concurrent instances of a script that exhaustively read a tied hash. Read errors appeared, and they multiplied in proportion to the number of concurrent scripts.

How can this be? It's a consequence of the way Unix's fork mechanism works. It literally clones a process. If the instance of Perl in the master process has a file descriptor that governs access to a DBM fil e, children inherit that same file descriptor and can interfere with one another's positions in the file.

To safely read the same file from multiple children, you have to open the file once per child so that each child has its own independent file descriptor. The same rule applies to database connections. You could cache a database handle at server start-up, but that wouldn't be very useful. What's needed is a way to cache a data-base handle once per child process. Happily, an indispensable module called Apache::DBI does exactly that.

Database Connection Caching, the Right Way

Once you've eliminated the overhead of process start-up, by locating Perl inside Apache, the next key performance issue becomes fast database access. The holy grail of script-driven Web pages that fetch SQL data is to maintain a persistent connection between the script engine and the database. Here's how:

1 Install and test the DBI module (see http://www.perl.com/CPAN/ ).

2 Install and test the DBD driver for your database. Your test script should begin with use DBI; and then open a connection and read and write some data. When run from mod_perl, this test script issues a sequence of calls like this:

my $dbh = DBI->connect(....

$dbh->prepare(...

$dbh->execute(...

$dbh->disconnect(...

3 Install and test the Apache::DBI module. Now configure httpd.conf like so:

PerlModule Apache::DBI
Apache::DebugDBI

Then remove use DBI from the test script, restart Apache, and repeat the test several times. Apache's error log should look like this:

new connect to...

already connected to...

Here's what's happening. Apache::DBI filters all DBI requests. Once per httpd, it honors DBI->connect . Subsequently, it hands callers a cached database handle. ( Apache::DebugDBI produces the audit trail; you can turn it off as soon as you've proved that things are working.) Note that the persistent handle must be established in a run-time script, rather than a start-up script. Nothing prevents you from opening a handle in startup.pl, and that handle indeed persists and is visible to all child processes. But it's a per-server handle, which can't reliably be shared by multiple children.

Using mod_perl and Apache::DBI, I've prototyped a Perl application that does multiple database lookups behind each dynamically generated Web page without the slightest hint of delay. I've wondered whether it would really be possible to bring Perl's power and productivity to bear on major-league Web applications. The work of the Apache/Perl integration project has brightened the picture considerably.


TOOLWATCH

EventSLog................................shareware
Internet: http://www.adiscon.com/

Mixed Unix/NT installations can now centralize system logs using this nifty tool that pumps NT's System, Application, and Security logs out to a Unix syslog daemon.


BOOKNOTE

UML Toolkit.................................$49.99
by Hans-Erik Eriksson and Magnus Penker
Wiley Computer Publishing
Internet: http://www.wiley.com/compbooks/
ISBN 0-471-19161-2

UML, the Unified Modelin g Language, defines a common approach to object-oriented modeling. This tutorial walks you through various UML design scenarios and includes a case study complete with a Java implementation.


Two Levels of Code Caching with mod_perl

illustration_link (30 Kbytes)

Perl modules, such as Apache::DBI and CGI, can be compiled and cached at server start-up by means of the PerlModule directive.


UML Toolkit

photo_link (34 Kbytes)


Jon Udell is BYTE's executive editor for new media. You can reach him by sending e-mail to jon.udell@byte.com .

Up to the Web Project section contents  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network