Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesDeveloping Applications in Perl


April 1994 / Hands On / Developing Applications in Perl

This public domain language now runs on every major operating system and has solved countless problems for developers

Tom Christiansen

Perl, an interpreted programming language originally designed for text processing and manipulation of files and processes, provides a rich environment for systems programming. While the language was originally written by Larry Wall, then a harried Unix system administrator, as an alternative to the Unix shell for high-level systems programming, it turns out that many general programming problems of short to medium degrees of complexity can be easily expressed in Perl.

By combining the high-level primitives of several Unix workhorses into one easy-to-use, highly efficient, interpreted language, Perl provides a versatile power tool for crafting a custom solution with a minimum of time and effort. It is an effective way to manipulate text, data, files, and even processes. While it was first developed for Unix and runs on virtually any Unix system, Perl now also runs on a multitude of operating systems, including VMS, MS-DOS, Windows NT, and the Amiga and Apple Macintosh operating systems.

Although users of Unix systems will be quick to pick up much of the philosophy and style of approach embodied by Perl due to its roots in Unix shell and C programming, users of other operating systems stand to gain even more. That's because non-Unix systems seldom come with a good tool set for crafting quick solutions to the myriad little text-related problems that crop up. Once you put Perl on your system, you've got everything you need for such tasks in just one application.

With Perl, those two hobgoblins of programming, data typing and memory allocation, disappear as issues. Data typing is trivial because everything in Perl is a string. You can, how ever, have lists and tables of strings to build up more complex data types. If you perform a numeric or Boolean operation on a Perl value, it gets converted for you automatically. No more remembering whether a variable is a string, a character, a byte, a short integer, or a double-precision floating-point number. For example,

print "How many days? ";

$days = ;

$months = $days / 30;

print "That's around $months months\n";

Notice that you can operate directly on the $days variable without first converting the string just read into a numeric value.

The interpreter takes care of all memory handling. You don't have to declare anything if you don't want to, since variables spring into existence when you first mention them--although, as you'll see, there may be times when you'll want to use local variables. You don't have to concern yourself with whether a string is long enough to hold a value or whether an array has enough elements in it. You just do whatever you want, and Perl automatically allocates (and later deallocates, if necessary) any memory needed.

That means it's perfectly fine to do something like this as the first line in your program:

$a[500] = "hello";

You never bothered to declare any array, but right away you assign to the 500th element of it. The procedure couldn't be easier.

Rapid Prototyping

One thing that Perl is great for is rapid prototyping. It provides an easy way to take care of your quick-and-dirty programming. This is an attractive aspect of interpreter programming that users of BASIC will recognize and appreciate. You simply think about what you need to do to solve the problem, and then you type it in using straightforward but high-level constructs. You needn't be a systems programming wizard to do pretty sophisticated systems programming.

A program may take a bit longer to run in Perl than in C--usually around e times longer (e = 2.71828...)--but it takes only one-tenth the time to write it. You're trading cheap machi ne cycles for expensive people cycles. The flexibility of interpreted languages makes them an easier medium than compiled languages for quickly developing application code.

When prototyping, don't get bogged down with little details, efficiency concerns, or aesthetic appeal. The most important thing about your prototype is that it should work. It doesn't have to be particularly efficient, nor particularly pretty. And it certainly doesn't need to be clever--that just gets in the way. After all, it's just a prototype. In writing, you often throw away the first draft or two; you should consider doing the same thing with most programs. By the second rewrite, you'll have code that's cleaner, more efficient, and more maintainable than your first stabs at sketching out the problem.

Once you're done with your prototype, you may choose to convert into C (this has to be done by hand) and then compile it all the way into machine code. Or maybe not--it may well be that you'll decide it's plenty fast enough as it is, or that a bit of performance tuning in Perl will suffice to make it so.

Even if you do choose to convert your code into C, you'll find you've spared yourself most of the laborious effort of developing and debugging your original algorithm. Because Perl is not only an interpreter but also a forgiving one, it's easy to make small changes in your program and quickly find out what effect they have on its overall behavior.

In developing your prototype, there's no reason not to continue to use a reasonable amount of software engineering. By this I mean, to use a bit of structured programming: Break up your large problem into smaller, manageable problems and then put each of these into its own subroutine. Even when you aren't going to call a function more than once, you should still put it into its own routine to abstract out the low-level stuff; that's what prototyping is about. For example,

sub do_it_all {

&do_this();

&do_that();

&do_the_other_thing();

}

sub do_this {}

sub do_that {}

sub do_the_other_thing {}

Notice that I haven't filled in what those other subroutines do. That's OK. When first sketching out how the program works, it's more important to figure out what happens when than to know the low-level details of precisely how something's happening. Those you can fill in later.

At the topmost level, it's perfectly all right to have functions without parameters; these might adjust some global variables and then call things further down. But at the lower-level functions, you really should pass each routine its own arguments and have those routines maintain their own local variables. Avoid even looking at global variables if you can help it, and if you can't, make sure they're clearly marked out. In small programs, this doesn't matter so much; in larger ones, it's essential.

Perl has a notion of global versus local variables that may seem curious at first but actually makes things easier for the kind of programming you're most like ly to use it for. All variables are global unless declared local, and global variables themselves aren't declared at all: Variables just spring into existence when first mentioned. This makes it much easier to sketch out your quick-and-dirty program than if you had to declare every possible variable. But it means it's easy to touch a global variable even if you don't mean to.

Another thing that may surprise you about Perl's local variables is that they are dynamically scoped, not lexically scoped. That means that a subroutine inherits all the local variables that were visible in its caller. In practice, this feature should get you into trouble only if you're intentionally modifying global variables while at the same time creating local variables named exactly the same as the global ones--hardly a good idea in anyone's book.

All in all, Perl is just trying to be helpful and convenient, letting you create and access variables without a lot of the rigmarole you have to go through in more exacting l anguages. But if this fast-and-loose sort of programming puts pitfalls in your path, there are some strategies you can use to help you through it without mishap.

By far the most important way you can help yourself is by using Perl's -w flag. It catches semantic mistakes and error-prone constructs that you might otherwise miss, such as using a variable before you've assigned a value to it or trying to write to a file that isn't open. It gives both compile-time warnings when the program is first parsed and run-time warnings while it's executing. If you're a C programmer, think of it as a lint for Perl--except it's a lint that's resident during program execution as well as at compile time. This allows Perl to catch mistakes that lint never could. The number of bewildered programmers who come to me with Perl problems that the -w flag would have instantly alleviated is lamentably large.

Another simple mechanism you can use to help you (and more important, those who come after you) know which variable s are doing what is to use the variables' case to provide a clue to their intended scope. This technique is sometimes used in large C programs (in C++, it's not really necessary). Since case is significant in identifiers, use all uppercase to indicate a constant and sometimes use all lowercase to indicate a local variable, with mixed case indicating a global variable. Thus, $START would be something that doesn't ever change in the program, $tempfile would be some local variable, and $Update_Time would be a global variable.

Which particular scheme (if any) you select for this is much less important than simply being consistent about it. While you shouldn't become overly complacent and assume that case always conveys scope as defined by the language, it can be a useful style for helping readers of your program understand its structure.

This strategy is probably less important in rapid prototypes (that's manager talk for quick hacks) than in larger programs. It may also make sense in a program that 's going to be sticking around for a while and needs to be maintained by other folks--and remember that, three years down the road, you yourself might as well be another person.

A more sophisticated technique for controlling access to identifiers is to employ packages. Perl packages provide for module initializations, variables and functions private to a function or set of functions, and static variables. This last group consists of variables whose values don't change between the function's invocations. You'll often see these used in robust library code. They help assure you that you aren't messing with someone else's variables and someone else isn't messing with yours. A package also lets you define code to be executed at run time before any routines in that package can be called--something you can't guarantee in C (although you can in C++).

Now that Perl has taken care of your need to worry about nitty-gritty, low-level programming matters like typing and allocation, you can get down to the bu siness at hand: coding up your problems. As you do this, though, you're likely to make some small but mysterious mistakes along the way whose nature won't be immediately obvious. When that happens, you'll want to debug your program.

If you're programming in the shell, that means inserting echo commands. If it's an awk program you're coding up, then you'll probably be using print statements. Unfortunately, neither of these methods helps much--at least, not when you compare them to a real debugger.

One of the tremendous advantages of using Perl over shell scripts for many programs is that Perl comes with a full-fledged, integrated symbolic debugger. It's so integrated into the language that it isn't even a separate process: It's just a compilation mode and customizable library file (enabled by the -d switch) of the existing interpreter.

Combine this with the way the Perl interpreter allows you to access much of its internal state (e.g., symbol tables) through special variables, and you can get at everything right from the debugger. You can set breakpoints, examine and change variables, search for source lines with regular expressions, get stack backtraces, and do pretty much everything that you're used to doing in a C-level debugger like dbx or gdb. Because you're running with the full Perl interpreter under your belt, you can type in any legal Perl code and have it executed on the fly for you--a convenient way to test out new constructs.

Perusing the Perl Library

While a rapid prototype is all well and good, there's no reason to rewrite everything from scratch every time you code up an application. Use existing wheels, don't reinvent them. As you become more experienced, you will want to extract your most useful subroutines and place them in your own private library. Then later you can load your archived function into your new application to use as though it were from a system-supplied library.

But before you write your own library functions, you should know that the Perl dis tribution already comes with a fair allotment of standard libraries. These include functions for handling option processing, unlimited precision numbers, screen manipulations, binary searches on sorted files, and recursive directory processing.

Just how do you get at these libraries from Perl? The basic statement to load a library from within your program is require, as in

require `getopts.pl';

Once this is done, you're free to call any functions loaded by that library, although you do have to know the name of the function or functions that you have just loaded. In the above case, the function will not automatically be called get-opts--in fact, it will be called Getopts (remember that, as in C, identifiers are case-sensitive). The listing "Using a Library Function" shows how to use &Getopts().

Like nearly everything in the language (including local variable "declarations"), require is a run-time event, not a compile-time one. Perl loads the required file only once, no matter how many times you ask for it. This is a feature, because it lets you write code that includes library routines willy-nilly. You don't have to worry that you're doing extra work if the routines you've required have themselves already required something you're about to load: It won't get loaded twice.

Requires don't always succeed. The require will fail by raising a trappable but otherwise fatal exception if any of these occur: The file can't be found in your include path (the @INC variable); the code in the required file has syntax errors in it; or the file doesn't return a true value. This last may need a little explaining. It's there so that you can try to run some routine-specific start-up code and have a clean way to indicate whether it has succeeded or failed. In practice, few library functions take advantage of this; they just finish off the file with a line containing a 1;, which is certainly a true value.

One standard routine that is worth special note is the find.pl library. Its entry point is the &find() function, as you might have predicted. This library is used by the standard Perl utility find2perl.

You invoke find2perl as you would the regular Unix find utility, just changing the name of the command, and it outputs Perl code to do exactly the same thing as the equivalent find command. It even knows about the special GNU find options. You can then inspect this output to learn how you might, from a Perl perspective, do the things that the find program does. On systems without a find program or with an inadequate one, find2perl and find.pl become even more useful.

Pass the &find() function a list of directories to traverse. Then for each file in that directory, &find() calls a user-defined function of yours called &wanted(). If it encounters a directory, it recurses down the directory. Your routine gets called with two variables set: $name is the full path name, whereas $_ is just the filename component.

The program in the listing "Findbig.pl" goes through your whole file s ystem and prints out the full name of any file greater than 100 KB in length. It is a simple example of how to use the &find library function. If you're on a Unix system, the following is also an interesting &wanted() function; it prints out any path names of files that are symbolic links pointing to nonexistent files:

sub wanted {

if (-e && !-l) {

print "$name\n";

}

}

Perl Jamming

OK, I think you're ready for this month's application. I call it the lst program. It's supposed to work something like the Unix command ls -Rt, which recursively lists out all files sorted by modification time. The problem is that ls sorts the files within each directory separately, whereas what you often really want is to have all files sorted against each other irrespective of which directory they occur in. That way you can tell what is the newest file in an entire subtree. So the goal here is to make something like a recursive ls but which does sorting on the whole subtree.

Instead of writing the whole thing from scratch, you'll use several well-known, standard Perl libraries that are included with every Perl distribution. This will shorten the code considerably.

Here's how the program works. First, require some standard Perl library files. Next, use one of the routines loaded from them to check what options were given. If you didn't get a good option, abort the program with a long usage message. Examine the set of options given to determine what kind of sorting the user wants. Then, you either process the files given on standard input, passing each file off to &wanted() for further processing, or else call the &find() function, which in turn calls &wanted indirectly.

So in either case it's &wanted that's doing the work (see the listing "The Wanted Subroutine"). What it does is stat each file that comes into it in $_, skipping it unless it's a plain file (as opposed to, for example, a directory). Inside the %time associative array (i.e., hash table), squirrel off the thing you're going to be sorting on, and save off all the stats you got into another table if you're going to be making a long listing. Both of these hash tables are indexed by the full path name of the file.

Since the long output format (to be compatible with ls) is going to print out the user and group ownerships on the file, you needed to convert these from their internal numeric form to their more frequently used text version; for example, uid 0 should print out as "root," not "0." To do this, call the C library routines getpwuid() and getgrgid(), which are available directly through Perl. But you don't want to call them every time you need that information; that would be far too inefficient. Instead, remember that you already did the conversion by storing the returned value in a Perl array and just fetch the cached name on any subsequent calls that use the same numeric ID.

Back in the main routine, all that's left to do is sort and print. Sort the keys (i.e., indexes) of the %time table, which a re the names of the files given. Reverse the resulting list of sorted keys if the user selected the -r option. If what's wanted is a long listing, then retrieve the saved stat information and split it up again into a list. Convert the correct time to print in standard form and then dump out the whole thing using a printf(), as in C. If all that's wanted is a short listing, just print the filename directly, remembering to add the trailing new line.

Editor's note: The lst program runs under version 4.036 of Perl, which is the current release and is available for many kinds of operating systems and hardware. The full text is available electronically. See page 5 for details.

For More About Perl

If you want to learn more about Perl and you have access to USENET, then you should check out the USENET comp.lang.perl newsgroup for discussions on the Perl language, bugs, features, history, humor, and trivia. It's the best place for the latest information on Perl.

The Frequently Asked Questions list (which I maintain) for that newsgroup contains a wealth of information, ranging from the mundane to the esoteric. This list is retrievable via anonymous FTP from the host rtfm.mit.edu (currently 18.70.0.209) in /pub/usenet/comp.lang.perl/*. It includes information on where to get Perl binaries for some non-Unix architectures.

The ports most likely to be of interest to you are those for MS-DOS, Windows NT, and the Mac. The DOS version is called "bigperl" (actually, BIGPERL4). It's Perl 4.036 that has been compiled using the Watcom C/386 compiler (a 32-bit, flat-memory-model C compiler). It's packed with useful features, including support for up to 32 MB of virtual memory, debugger support, and support for gdbm (the GNU database management routines) for the newer BSD 4.4 db package. A 386/486 with at least 4 MB of RAM is required, and a third-party memory manager is strongly recommended. This version passes those Perl regression tests that do not depend on Unixisms, and it comes complete with full s ource code, all freely distributable.

The NT version of Perl is also alleged to work well (I have no personal experience with it). It includes support for getting at sockets from Perl, so even on non-Unix systems you can use Perl for networking applications. The source code builds out of the box and contains some NT-specific tests.

Both of these ports, along with the Macintosh version, are available via FTP from ftp.cis.ufl.edu (128.227.100.252) in the /pub/perl directory. Here you'll find a veritable treasure trove of Perl tidbits. Inside that directory, look in the src/ subdirectory for other subdirectories called 4.0/, 5.0/, macperl/, msdos/, and ntperl/.


Listing: Using a Library Function



&Getopts("vnf:") ||
  die "usage: $0 [-v] [-n] [-f configfile] [files ...] \n";


if ($opt_v) { $verbose++;       }
if ($opt_n) { $fakeit++;        }
if ($opt_f) { $config = $opt_f; }




Listing: Findbig.pl



require `find.pl';

&find('/');
sub wanted {
  if ( (-s $name) > (100 * 2**10) ) {
  print "$name\n";
  }
} 




Listing: The Wanted Subroutine



sub wanted {
  @stats = stat($_);
  -f _ || return;
  $time{$name} = $stats[$IDX];
  $stat{$name} = "@stats" if $opt_l;
} 




Listing: The Core of lst Program



if ($opt_i) {
  local(*name) = *_;  # $name is now an alias for $_
    warn "file args ignored due to -i" if @ARGV;
    while (
) { chop; &wanted; } 
}  else {
  require `find.pl';
  &find(@ARGV);
}


@sorted_names = sort { $time{$b} 
<
=> $time{$a} }
                      keys %time;
@sorted_names = reverse @sorted_names if $opt_r;


foreach (@sorted_names) {
  if ($opt_l) {
    @stats = split(' ',$stat{$_});
    chop($now = &ctime($stats[$TIME_IDX]));
    printf "%6d %04o %6d %8s %8s %8d %s %s\n",
      $stats[$ST_INO],
      $stats[$ST_MODE] & 07777,
      $stats[$ST_LINK],
      &user($stats[$ST_UID]),
      &group(
$stats[$ST_GID]),
      $stats[$ST_SIZE],
      $now,
      $_;
    } else {
    print "$_\n";
  }
} 


Tom Christiansen is a freelance consultant living in Boulder, Colorado. He serves on the board of directors for the USENIX Association. When he's not on the road lecturing on Perl, he's getting the libraries, utilities, and documentation for the 5.0 release of Perl into production shape. Tom also maintains the Frequently Asked Questions list for the USENET newsgroup comp.lang.perl. He can be reached on the Internet at tchrist@usenix.org or on BIX c/o "editors."

Up to the Hands On section contentsGo to next article: Essential ReadingSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network