;
$months = $days / 30;
print "That's around $months months\n";
Notice that you can operate directly on the $days variable without first converting the string just read into a numeric value.
The interpreter takes care of all memory handling. You don't have to declare anything if you don't want to, since variables spring into existence when you first mention them--although, as you'll see, there may be times when you'll want to use local variables. You don't have to concern yourself with whether a string is long enough to hold a value or whether an array has enough elements in it. You just do whatever you want, and Perl
automatically allocates (and later deallocates, if necessary) any memory needed.
That means it's perfectly fine to do something like this as the first line in your program:
$a[500] = "hello";
You never bothered to declare any array, but right away you assign to the 500th element of it. The procedure couldn't be easier.
Rapid Prototyping
One thing that Perl is great for is rapid prototyping. It provides an easy way to take care of your quick-and-dirty programming. This is an attractive aspect of interpreter programming that users of BASIC will recognize and appreciate. You simply think about what you need to do to solve the problem, and then you type it in using straightforward but high-level constructs. You needn't be a systems programming wizard to do pretty sophisticated systems programming.
A program may take a bit longer to run in Perl than in C--usually around e times longer (e = 2.71828...)--but it takes only one-tenth the time to write it. You're trading cheap machi
ne cycles for expensive people cycles. The flexibility of interpreted languages makes them an easier medium than compiled languages for quickly developing application code.
When prototyping, don't get bogged down with little details, efficiency concerns, or aesthetic appeal. The most important thing about your prototype is that it should work. It doesn't have to be particularly efficient, nor particularly pretty. And it certainly doesn't need to be clever--that just gets in the way. After all, it's just a prototype. In writing, you often throw away the first draft or two; you should consider doing the same thing with most programs. By the second rewrite, you'll have code that's cleaner, more efficient, and more maintainable than your first stabs at sketching out the problem.
Once you're done with your prototype, you may choose to convert into C (this has to be done by hand) and then compile it all the way into machine code. Or maybe not--it may well be that you'll decide it's plenty fast enough
as it is, or that a bit of performance tuning in Perl will suffice to make it so.
Even if you do choose to convert your code into C, you'll find you've spared yourself most of the laborious effort of developing and debugging your original algorithm. Because Perl is not only an interpreter but also a forgiving one, it's easy to make small changes in your program and quickly find out what effect they have on its overall behavior.
In developing your prototype, there's no reason not to continue to use a reasonable amount of software engineering. By this I mean, to use a bit of structured programming: Break up your large problem into smaller, manageable problems and then put each of these into its own subroutine. Even when you aren't going to call a function more than once, you should still put it into its own routine to abstract out the low-level stuff; that's what prototyping is about. For example,
sub do_it_all {
&do_this();
&do_that();
&do_the_other_thing();
}
sub
do_this {}
sub do_that {}
sub do_the_other_thing {}
Notice that I haven't filled in what those other subroutines do. That's OK. When first sketching out how the program works, it's more important to figure out what happens when than to know the low-level details of precisely how something's happening. Those you can fill in later.
At the topmost level, it's perfectly all right to have functions without parameters; these might adjust some global variables and then call things further down. But at the lower-level functions, you really should pass each routine its own arguments and have those routines maintain their own local variables. Avoid even looking at global variables if you can help it, and if you can't, make sure they're clearly marked out. In small programs, this doesn't matter so much; in larger ones, it's essential.
Perl has a notion of global versus local variables that may seem curious at first but actually makes things easier for the kind of programming you're most like
ly to use it for. All variables are global unless declared local, and global variables themselves aren't declared at all: Variables just spring into existence when first mentioned. This makes it much easier to sketch out your quick-and-dirty program than if you had to declare every possible variable. But it means it's easy to touch a global variable even if you don't mean to.
Another thing that may surprise you about Perl's local variables is that they are dynamically scoped, not lexically scoped. That means that a subroutine inherits all the local variables that were visible in its caller. In practice, this feature should get you into trouble only if you're intentionally modifying global variables while at the same time creating local variables named exactly the same as the global ones--hardly a good idea in anyone's book.
All in all, Perl is just trying to be helpful and convenient, letting you create and access variables without a lot of the rigmarole you have to go through in more exacting l
anguages. But if this fast-and-loose sort of programming puts pitfalls in your path, there are some strategies you can use to help you through it without mishap.
By far the most important way you can help yourself is by using Perl's -w flag. It catches semantic mistakes and error-prone constructs that you might otherwise miss, such as using a variable before you've assigned a value to it or trying to write to a file that isn't open. It gives both compile-time warnings when the program is first parsed and run-time warnings while it's executing. If you're a C programmer, think of it as a lint for Perl--except it's a lint that's resident during program execution as well as at compile time. This allows Perl to catch mistakes that lint never could. The number of bewildered programmers who come to me with Perl problems that the -w flag would have instantly alleviated is lamentably large.
Another simple mechanism you can use to help you (and more important, those who come after you) know which variable
s are doing what is to use the variables' case to provide a clue to their intended scope. This technique is sometimes used in large C programs (in C++, it's not really necessary). Since case is significant in identifiers, use all uppercase to indicate a constant and sometimes use all lowercase to indicate a local variable, with mixed case indicating a global variable. Thus, $START would be something that doesn't ever change in the program, $tempfile would be some local variable, and $Update_Time would be a global variable.
Which particular scheme (if any) you select for this is much less important than simply being consistent about it. While you shouldn't become overly complacent and assume that case always conveys scope as defined by the language, it can be a useful style for helping readers of your program understand its structure.
This strategy is probably less important in rapid prototypes (that's manager talk for quick hacks) than in larger programs. It may also make sense in a program that
's going to be sticking around for a while and needs to be maintained by other folks--and remember that, three years down the road, you yourself might as well be another person.
A more sophisticated technique for controlling access to identifiers is to employ packages. Perl packages provide for module initializations, variables and functions private to a function or set of functions, and static variables. This last group consists of variables whose values don't change between the function's invocations. You'll often see these used in robust library code. They help assure you that you aren't messing with someone else's variables and someone else isn't messing with yours. A package also lets you define code to be executed at run time before any routines in that package can be called--something you can't guarantee in C (although you can in C++).
Now that Perl has taken care of your need to worry about nitty-gritty, low-level programming matters like typing and allocation, you can get down to the bu
siness at hand: coding up your problems. As you do this, though, you're likely to make some small but mysterious mistakes along the way whose nature won't be immediately obvious. When that happens, you'll want to debug your program.
If you're programming in the shell, that means inserting echo commands. If it's an awk program you're coding up, then you'll probably be using print statements. Unfortunately, neither of these methods helps much--at least, not when you compare them to a real debugger.
One of the tremendous advantages of using Perl over shell scripts for many programs is that Perl comes with a full-fledged, integrated symbolic debugger. It's so integrated into the language that it isn't even a separate process: It's just a compilation mode and customizable library file (enabled by the -d switch) of the existing interpreter.
Combine this with the way the Perl interpreter allows you to access much of its internal state (e.g., symbol tables) through special variables, and you can
get at everything right from the debugger. You can set breakpoints, examine and change variables, search for source lines with regular expressions, get stack backtraces, and do pretty much everything that you're used to doing in a C-level debugger like dbx or gdb. Because you're running with the full Perl interpreter under your belt, you can type in any legal Perl code and have it executed on the fly for you--a convenient way to test out new constructs.
Perusing the Perl Library
While a rapid prototype is all well and good, there's no reason to rewrite everything from scratch every time you code up an application. Use existing wheels, don't reinvent them. As you become more experienced, you will want to extract your most useful subroutines and place them in your own private library. Then later you can load your archived function into your new application to use as though it were from a system-supplied library.
But before you write your own library functions, you should know that the Perl dis
tribution already comes with a fair allotment of standard libraries. These include functions for handling option processing, unlimited precision numbers, screen manipulations, binary searches on sorted files, and recursive directory processing.
Just how do you get at these libraries from Perl? The basic statement to load a library from within your program is require, as in
require `getopts.pl';
Once this is done, you're free to call any functions loaded by that library, although you do have to know the name of the function or functions that you have just loaded. In the above case, the function will not automatically be called get-opts--in fact, it will be called Getopts (remember that, as in C, identifiers are case-sensitive). The listing "Using a Library Function" shows how to use &Getopts().
Like nearly everything in the language (including local variable "declarations"), require is a run-time event, not a compile-time one. Perl loads the required file only once, no matter how many
times you ask for it. This is a feature, because it lets you write code that includes library routines willy-nilly. You don't have to worry that you're doing extra work if the routines you've required have themselves already required something you're about to load: It won't get loaded twice.
Requires don't always succeed. The require will fail by raising a trappable but otherwise fatal exception if any of these occur: The file can't be found in your include path (the @INC variable); the code in the required file has syntax errors in it; or the file doesn't return a true value. This last may need a little explaining. It's there so that you can try to run some routine-specific start-up code and have a clean way to indicate whether it has succeeded or failed. In practice, few library functions take advantage of this; they just finish off the file with a line containing a 1;, which is certainly a true value.
One standard routine that is worth special note is the find.pl library. Its entry point is
the &find() function, as you might have predicted. This library is used by the standard Perl utility find2perl.
You invoke find2perl as you would the regular Unix find utility, just changing the name of the command, and it outputs Perl code to do exactly the same thing as the equivalent find command. It even knows about the special GNU find options. You can then inspect this output to learn how you might, from a Perl perspective, do the things that the find program does. On systems without a find program or with an inadequate one, find2perl and find.pl become even more useful.
Pass the &find() function a list of directories to traverse. Then for each file in that directory, &find() calls a user-defined function of yours called &wanted(). If it encounters a directory, it recurses down the directory. Your routine gets called with two variables set: $name is the full path name, whereas $_ is just the filename component.
The program in the listing "Findbig.pl" goes through your whole file s
ystem and prints out the full name of any file greater than 100 KB in length. It is a simple example of how to use the &find library function. If you're on a Unix system, the following is also an interesting &wanted() function; it prints out any path names of files that are symbolic links pointing to nonexistent files:
sub wanted {
if (-e && !-l) {
print "$name\n";
}
}
Perl Jamming
OK, I think you're ready for this month's application. I call it the lst program. It's supposed to work something like the Unix command ls -Rt, which recursively lists out all files sorted by modification time. The problem is that ls sorts the files within each directory separately, whereas what you often really want is to have all files sorted against each other irrespective of which directory they occur in. That way you can tell what is the newest file in an entire subtree. So the goal here is to make something like a recursive ls but which does sorting on the whole subtree.
Instead
of writing the whole thing from scratch, you'll use several well-known, standard Perl libraries that are included with every Perl distribution. This will shorten the code considerably.
Here's how the program works. First, require some standard Perl library files. Next, use one of the routines loaded from them to check what options were given. If you didn't get a good option, abort the program with a long usage message. Examine the set of options given to determine what kind of sorting the user wants. Then, you either process the files given on standard input, passing each file off to &wanted() for further processing, or else call the &find() function, which in turn calls &wanted indirectly.
So in either case it's &wanted that's doing the work (see the listing "The Wanted Subroutine"). What it does is stat each file that comes into it in $_, skipping it unless it's a plain file (as opposed to, for example, a directory). Inside the %time associative array (i.e., hash table), squirrel off the thing
you're going to be sorting on, and save off all the stats you got into another table if you're going to be making a long listing. Both of these hash tables are indexed by the full path name of the file.
Since the long output format (to be compatible with ls) is going to print out the user and group ownerships on the file, you needed to convert these from their internal numeric form to their more frequently used text version; for example, uid 0 should print out as "root," not "0." To do this, call the C library routines getpwuid() and getgrgid(), which are available directly through Perl. But you don't want to call them every time you need that information; that would be far too inefficient. Instead, remember that you already did the conversion by storing the returned value in a Perl array and just fetch the cached name on any subsequent calls that use the same numeric ID.
Back in the main routine, all that's left to do is sort and print. Sort the keys (i.e., indexes) of the %time table, which a
re the names of the files given. Reverse the resulting list of sorted keys if the user selected the -r option. If what's wanted is a long listing, then retrieve the saved stat information and split it up again into a list. Convert the correct time to print in standard form and then dump out the whole thing using a printf(), as in C. If all that's wanted is a short listing, just print the filename directly, remembering to add the trailing new line.
Editor's note: The lst program runs under version 4.036 of Perl, which is the current release and is available for many kinds of operating systems and hardware. The full text is available electronically. See page 5 for details.
For More About Perl
If you want to learn more about Perl and you have access to USENET, then you should check out the USENET comp.lang.perl newsgroup for discussions on the Perl language, bugs, features, history, humor, and trivia. It's the best place for the latest information on Perl.
The Frequently Asked Questions
list (which I maintain) for that newsgroup contains a wealth of information, ranging from the mundane to the esoteric. This list is retrievable via anonymous FTP from the host rtfm.mit.edu (currently 18.70.0.209) in /pub/usenet/comp.lang.perl/*. It includes information on where to get Perl binaries for some non-Unix architectures.
The ports most likely to be of interest to you are those for MS-DOS, Windows NT, and the Mac. The DOS version is called "bigperl" (actually, BIGPERL4). It's Perl 4.036 that has been compiled using the Watcom C/386 compiler (a 32-bit, flat-memory-model C compiler). It's packed with useful features, including support for up to 32 MB of virtual memory, debugger support, and support for gdbm (the GNU database management routines) for the newer BSD 4.4 db package. A 386/486 with at least 4 MB of RAM is required, and a third-party memory manager is strongly recommended. This version passes those Perl regression tests that do not depend on Unixisms, and it comes complete with full s
ource code, all freely distributable.
The NT version of Perl is also alleged to work well (I have no personal experience with it). It includes support for getting at sockets from Perl, so even on non-Unix systems you can use Perl for networking applications. The source code builds out of the box and contains some NT-specific tests.
Both of these ports, along with the Macintosh version, are available via FTP from ftp.cis.ufl.edu (128.227.100.252) in the /pub/perl directory. Here you'll find a veritable treasure trove of Perl tidbits. Inside that directory, look in the src/ subdirectory for other subdirectories called 4.0/, 5.0/, macperl/, msdos/, and ntperl/.
Listing: Using a Library Function
&Getopts("vnf:") ||
die "usage: $0 [-v] [-n] [-f configfile] [files ...] \n";
if ($opt_v) { $verbose++; }
if ($opt_n) { $fakeit++; }
if ($opt_f) { $config = $opt_f; }
Listing: Findbig.pl
require `find.pl';
&find('/');
sub wanted {
if ( (-s $name) > (100 * 2**10) ) {
print "$name\n";
}
}
Listing: The Wanted Subroutine
sub wanted {
@stats = stat($_);
-f _ || return;
$time{$name} = $stats[$IDX];
$stat{$name} = "@stats" if $opt_l;
}
Listing: The Core of lst Program
if ($opt_i) {
local(*name) = *_; # $name is now an alias for $_
warn "file args ignored due to -i" if @ARGV;
while (
) { chop; &wanted; }
} else {
require `find.pl';
&find(@ARGV);
}
@sorted_names = sort { $time{$b}
<
=> $time{$a} }
keys %time;
@sorted_names = reverse @sorted_names if $opt_r;
foreach (@sorted_names) {
if ($opt_l) {
@stats = split(' ',$stat{$_});
chop($now = &ctime($stats[$TIME_IDX]));
printf "%6d %04o %6d %8s %8s %8d %s %s\n",
$stats[$ST_INO],
$stats[$ST_MODE] & 07777,
$stats[$ST_LINK],
&user($stats[$ST_UID]),
&group(
$stats[$ST_GID]),
$stats[$ST_SIZE],
$now,
$_;
} else {
print "$_\n";
}
}
Tom Christiansen is a freelance consultant living in Boulder, Colorado. He serves on the board of directors for the USENIX Association. When he's not on the road lecturing on Perl, he's getting the libraries, utilities, and documentation for the 5.0 release of Perl into production shape. Tom also maintains the Frequently Asked Questions list for the USENET newsgroup comp.lang.perl. He can be reached on the Internet at
tchrist@usenix.org
or on BIX c/o "editors."