Getting Started w Perl
Volume Number: 16 (2000)
Issue Number: 9
Column Tag: Tools of the Trade
Getting Started with Perl
By Larry Taylor,
Edited by Steve & Patricia Sheets
Open Source power scripting for Macs
Introduction
Perl is a programming/scripting language developed under Unix, which is distributed under the GNU license and now runs on most platforms, including MacOS. It is the language of choice for Unix system administration, CGI scripts and other goodies. More relevantly, it can really expand your ability to accomplish things on the Mac. In this article I describe a frustrating problem I had and a step by step Perl solution. I hope this example will encourage you to learn Perl and use it. Perl scripts are just text files and so are fairly easily portable across platforms making Perl even more useful if you need to solve the same problem on several platforms. Learning Perl is not difficult and it looks great on your resume, so why not give it a try?
Mac + Perl = MacPerl
Perl arouse because many UNIX programmers wanted a quick alternative to C, with many of C's features. The result was a full-featured, easy to use, C-like programming language. Perl has been ported to the Mac where it can be used to create pseudo-applications called droplets. I call them pseudo because they do not have individual types and creators and so they must either be opened by double clicking or by dragging a document onto them. They are interpreted and so need the Perl interpreter in order to run. No Mac interface is needed to get information in or out, so Perl is ideal for projects that involve reading some data, analyzing it, and outputting some conclusions, projects for which the event-loop paradigm is more of an annoyance then a help (although Cmd-period will stop runaway Perl droplets). One can construct compiled applications with a full Mac interface, but the files are large and the advantages over C largely evaporate. I use Perl for tasks as varied as extracting data from files to emailing students in a class their exam scores.
Perl is "open source" software. The interpreter is available to download for free at <http://www.iis.ee.ethz.ch/~neeri/macintosh/perl.html>, or the book "MacPerl, Power and Ease" by Vicki Brown and Chris Nandor (#1-881957-32-2) from Prime Time Freeware <http://www.ptf.com> contains a CD with the interpreter and lots of other useful stuff. The book itself is a nice introduction to programming in general and Perl in particular. Additional Perl stuff can be gleaned from the net. Try starting at <http://www.perl.com>.
The Problem
Got one of those cool digital cameras that saves images to floppies? Then you know the files are labeled automatically, MVC-01L.JPG, MVC-02L.JPG, etc. Copy the images to your computer and you're in business. But suppose you went wild and filled up two disks? Or ten? Files on different floppies often have the same name, so you can't just copy them to the same folder. So you copy one floppy, change all the names of the files, copy the second, etc. - bummer. Even with just a few images, you tend to put them in a folder with a useful name since otherwise you won't remember what the pictures are about, can't search for them with Sherlock, etc. Wouldn't it be nice to have them named, whatever1.jpg, whatever2.jpg, etc.? This is a perfect job for a script.
The script should begin with a folder named whatever and look inside it for all the MVC files and rename them as whatever1.jpg, whatever2.jpg, etc. It should even be a bit smarter. If there are going to be ten or more, the first should be whatever01.jpg: if there are 100 or more, whatever001.jpg, if there are ... but you get the idea. Even more, if there are already some whatever files, it should number the new MVC files to fit into the pattern. Specifically, it should look at the creation time of the first MVC file and the first whatever file. If the MVC time is later, the MVC files should come after the whatever files, but otherwise the whatever files should be renamed and the MVC files should come first. If the user trashed a few of the whatever files so they are no longer in sequence, the whatever files should be renamed so as to be in sequence.
Using this script, you can copy one disk worth of images into a folder, run the script, copy the next disk, run the script, etc. At any time during the process, the images can be viewed and those that are unwanted can be deleted.. At the end, all of the "keepers" are named consecutively in the order in which they were taken, no matter the order in which they are copied or removed.
The Script
Open the MacPerl application and select New from the File menu and you're ready to start. Line 1 should be #!perl. This is a holdover from the Unix world where this line tells the operating system to feed this file to the Perl interpreter. You can also do things with it in MacPerl, but we don't here. Now save the file. Name it what you will. At the bottom of the dialog box is a pop up menu labeled "Type:" (reading "Plain Text"). Set the menu to "Droplet" and save.
The advantage of a droplet is that you can just drop items onto its icon and the information is passed on to the script. In this script we include no other way to input folder/file information, although Perl can do so, even through standard file. The folder/file information is passed to the script as $ARGV[0]for the first folder/file, $ARGV[1] for the second, etc. Droplets allow us to use the Mac GUI to mimic the command line paradigm. Dropping a collection of files on a droplet has the same effect as the command line, droplet_name file1 file2 ...
Before discussing the code, here is an outline for solving the problem.
- Step 1: Get the folder name. If a folder is dropped, use it; if a file is dropped, use the enclosing folder. If several items are dropped, process them all.
- Step 2: Collect the names of the MVC files and the whatever files.
- Step 3: Get the two creation times and figure out the starting numbers for the two sets of files.
- Step 4: Rename the files.
We do a certain amount of error checking and quit at the first sign of trouble - these may be your only photos of Aunt Rose. Perl borrows much from C, including the tendency to write short functions (subroutines in Perl). One immediate difference is the lack of variable typing (the same variable can be a number or a string, depending on context). Another is the ability to work with arrays whose size is unknown before execution, As a language, Perl is particularly adept at manipulating arrays and strings and it does file management rather well.
Now for the code. We write a sequence of subroutines most of which just do one of the steps outlined above and pass the relevant data on to the next. We try to introduce some interesting features of Perl in discussing each subroutine. More information can be gleaned from the code and its comments. Here is the first routine. The for loop works its way through the dropped items, passing each one in turn to the subroutine do_a_folder which returns false if anything goes wrong. Ordinary Perl variable names start with $; arrays start with @; $#foo is the last index of the array @foo. As with C, the first array element is $foo[0]. If this were C the braces would be optional, but in Perl they are required.
for($ii=0;$ii<=$#ARGV; $ii++) { # This is a Perl comment.
if(!do_a_folder($ARGV[$ii])){exit;}
}
Perl handles file system objects via path names and the $ARGV variables are path names. The first line of the subroutine illustrates the way Perl passes variables to subroutines: the values are in a list/stack named @_ and we can shift them off in order. The rest of the routine is straightforward. Perl has a simple syntax for checking if strings are folders or files, using two simple "if" tests. One wrinkle here is that if you drop two MVC files on the droplet, by the time the second one is ready to be processed, it no longer exists since it was renamed on the first pass. The routine does nothing in this case except return true, which is what we want. In short, this subroutine handles Step 1 for each dropped object and passes the results to the next subroutine.
sub do_a_folder{
$object=shift(@_);
if( -f $object) { # -f checks if $object is a file,
# if it is, get enclosing folder.
$x=rindex($object,':'); # find LAST occurrence of :
$object=substr($object,0,$x); # remove last part of
# path name
}
# $object now path name to folder
$x=rindex($object,':'); # find LAST occurrence of :
$fold_name=substr($object,$x+1); # get name of folder
if( -d $object) { # it's a folder
unlink("$object:MAVICA.HTM");
# This deletes a junk file which often gets copied.
return process_folder($object,$fold_name);
}
# else quietly do nothing.
return 1;
}
Extract the relevant files into two arrays. There is no need to specify the size of these arrays in advance since Perl handles these details. The undef's make sure that these arrays are empty at the start. Explicitly initializing variables is usually a good idea. One outstanding feature of Perl is Unix regular expression matching and substitution. Look how easy it is to find the files we want:
if( $files[$i]=~m/^$fold_name\d*\.jpg$/)
This is true if the string on the left contains the expression between the /'s. That expression says the string must begin (^) with $fold_name, have any number of digits (\d*) and then end ($) with a .jpg. The dot is \. because . means match any character. When we find a file of the desired type, the push puts it at the end of the appropriate array. Note that the elseif of C becomes elsif. Finally, the construction \@mvc_files is a way to pass a reference to the entire array to the next subroutine.
sub process_folder{
$fold=shift(@_);
$fold_name=shift(@_);
# Make sure names can't be too long for the Finder.
$fold_name=substr($fold_name,0,23);
undef(@fold_name_files); # Clear old values
undef(@fold_name_files); # Clear old values
if( opendir(DIR,$fold)) { # if we can read the directory
chdir($fold); # change the working directory
@files=readdir(DIR); # read all objects into an array
closedir(DIR); # close the directory for reading
for($i=0;$i<=$#files;$i++) {
if( $files[$i]=~m/^$fold_name\d*\.jpg$/){
# remember the folder_name files
push(@fold_name_files,$files[$i]);
}
elsif( $files[$i]=~m/^MVC-\d*L\.JPG$/) {
# remember the MVC files
push(@mvc_files,$files[$i]);
}
}
if($#mvc_files<0 && $#fold_name_files<0) {
return 1; # Nothing to do.
}
else { # Go rename the files.
return (
setup_rename(\@mvc_files,\@fold_name_files,$fold_name));
}
}
else { print"Failed to open $fold\n"; return 0;}
}
In the first few lines of the next subroutine, we retrieve the reference to the arrays. The syntax is straightforward: in the previous subroutine @mvc_files was an array: in this subroutine the same array is @$mvc_files. There is no need to use the same name.
Now look at the phrase:
length($#$fold_name_files+$#$mvc_files+1+$startNumber)
This is an example of how variable type changes: $#$fold_name_files is one less than the number of files in the array @$fold_name_files so the sum is the biggest number in a file name. The function length treats the number as a string and returns its length. If we have more than 9,999 files, we quit since then the file names might be longer than the Finder limit of 31 characters.
Perl has built-in functions to easily extract file information. We have no trouble getting creation times: the function stat returns an array of data and the eleventh element in the array is the creation time. Remember, the first is [0]. We then use this information to determine the starting number for the two sets of file names. This completes Step 3 and we pass the needed information on to the next subroutine.
sub setup_rename{
$mvc_files=shift(@_);
$fold_name_files=shift(@_);
$fold_name=shift(@_);
$startNumber=1; # The first file is numbered 1.
#
$new_digit_size=length(
$#$fold_name_files+$#$mvc_files+1+$startNumber);
if($new_digit_size>4){
print"More than 9,999 files? No way!\n";
return 1; # Will process other folders
}
#
# Get MVC creation time (if possible).
if( ($#$fold_name_files>=0) ) {
$time_MVC=(stat($$mvc_files[0]))[10];
}
# Get folder_name creation time (if possible).
if($#$fold_name_files>=0) {
$time_FN=(stat($$fold_name_files[0]))[10];
}
# Calculate starting numbers.
if($#$mvc_files<0) { $fold_name_startNumber=$startNumber;}
elsif($#$fold_name_files<0) {$mvc_startNumber=$startNumber;}
elsif($time_MVC<$time_FN) {
$mvc_startNumber=$startNumber;
$fold_name_startNumber=$#$mvc_files+1+$startNumber;
}
else {
$mvc_startNumber=$#$fold_name_files+1+$startNumber;
$fold_name_startNumber=$startNumber;
}
return rename_files($mvc_files,$mvc_startNumber,
$fold_name_files,$fold_name_startNumber,
$fold_name,$new_digit_size);
}
The rename routine (Step 4) is a bit more complicated. The Perl rename routine is a Unix style routine, so if there already is a file with the new name, the old file is destroyed without warning. The Mac solution is better, but annoying - put up a dialog box and let the user recover. But you don't want dialog boxes, you just want the files renamed. The solution we use is to create a temporary folder, move the files into this folder as we rename them, move them back when we are done, and finally, delete the temporary folder. We put this temporary folder in our enclosing folder so that in the event of an error it should be easy to find all your files.
Here we introduce another way to collect the information passed as the arguments: make a list on the left and set it equal to @_. The mkdir, rmdir functions betray their Unix heritage. Subroutines move the files into the temporary folder and out of it again.
sub rename_files{
# Make temporary folder - the name will be a number
$dir=0;
while( -d $dir || -f $dir ) {$dir++;}
# Possible infinite loop - but need thousands of
# folders/files with numbers as names.Don't worry.
if(!mkdir($dir,0777)) {
print"Failed to make temporary folder.\n";
return 0;
}
($filesA,$startA,$filesB,$startB,$prefix,$digit_size)=@_;
$dir_prefix=":$dir:$prefix";
# Move the first batch of files, then the second.
# Bail if error.
if(!mv_tmp($startA,$filesA,$dir_prefix,$digit_size)){
return 0;
}
if(!mv_tmp($startB,$filesB,$dir_prefix,$digit_size)){
return 0;
}
# move the files back. Bail if error.
if(!mv_back($dir)){return 0;}
# Delete the temporary directory
return rmdir($dir);
}
Nothing much new in the next subroutine except the foreach loop. This works through the array setting $h to the values of the array in order - no need for an index variable. This is not earthshaking, but elegant. The s routine completes the script.
sub mv_tmp{
($first,$list,$dir_prefix,$digitSize)=@_;
foreach $h (@$list) {
$numStr=substr("00000",0,$digitSize-length($first)).$first;
if(!rename($h,"$dir_prefix$numStr.jpg") ){
print"Failed to move $h into $dir\n";
return 0;
}
$first++;
}
return 1;
}
sub mv_back{
$dir=shift(@_);
if(opendir(DIR,$dir) ){
@files=readdir(DIR); # read all objects into an array
closedir(DIR); # close the directory for reading
chdir($dir);
foreach $h (@files) {
if(!rename($h,"::$h") ){
print"Failed to move $h out of $dir\n";
return 0;
}
}
chdir("::");
return 1;
}
else {return 0;}
}
Final Comments
The constructions, syntax and built-in functions discussed in this short article have barely scratched the surface of what is available. And more is coming every day. See <http://www.perl.com> and related links. I hope this example will spark your interest in using Perl for your own projects. Happy scripting.
Larry Taylor is a research mathematician and professor who spends too much time fooling around with this sort of thing. More stuff at http://www.nd.edu/~taylor.