Swiss-army chainsaw turns out pages fast

At top speed, how many web pages can you blam out in an hour? Five or six with a text editor? Maybe twice that using WYSIWYG …

At top speed, how many web pages can you blam out in an hour? Five or six with a text editor? Maybe twice that using WYSIWYG tools such as FrontPage or DreamWeaver? How about 800 pages in under 15 seconds? Better still, how about tuning the "recipe" for creating them and making all 800 pages different 15 seconds later?

This is not a fair comparison, of course. The 15-second figure doesn't count the five or six hours it took to get the recipe together. It does also rely on having formatted data for the information to be turned into pages. This could be a club membership database, a stock-control system, a catalogue, or any list of information which clearly identifies a name, an address, a book title or a phone or membership number.

What it comes down to is the choice between doing it the easy way or the hard way. The hard way is to work from the data source and cut and paste information into pages by hand. Much easier is to involve an extra step, a program which works as a sort of recipe to cook the raw information into pages.

The starting point for the 800-page example above was a project of the writer and map-maker Tim Robinson. In compiling his extraordinary maps of the Burren, Aran Islands and Connemara over the last 25 years, he has assembled a massive collection of place-lore. His field notes for the maps include history, sayings, folklore, observations and research relating to more than 6,000 places.

READ MORE

This information is currently being turned into a database to be published on CD-Rom by his company, Folding Landscapes. In helping to produce a website for Folding Landscapes (www.iol.ie/tandmfl) it was important to include an indication of the nature of the archive.

The data source for the web pages was 800 records relating to Roundstone parish, Connemara, coming from a FileMaker database on the Macintosh. For each of the places there were 18 data fields, including barony, parish, townland, Irish name, Anglicised name, location, description, history and oral lore. Not all fields were filled for all entries.

Exported from FileMaker in "merge" format, the data made a 464K file, 800 lines long, with each place (a "record" in data terms) on a single line, its information divided into 18 fields separated by semicolons. There was one obvious way to turn this into web pages - Perl.

The name stands for "practical extraction and reporting language" - or "pathetically eclectic rubbish lister" if you prefer. Either way, it has rightly been called a Swiss-army chainsaw in recognition of its versatility and power as a scripting language.

Another plus is that it is an easy language in which to build up a program line by line. That is because it is an interpreted language; the program being written is taken and run directly by the Perl interpreter. In contrast, a compiled language could require several steps to turn the code being written into an executable program - something that rapidly becomes a chore for someone who keeps making small changes and wants to see the results.

The way of working was simple. Three windows were open side-by-side on the screen: a text-editor for the program, a DOS box to run it from the command line and a browser to see the resulting web pages. For an hour at a time, or whenever there were a few spare minutes, the Perl recipe was tweaked, the changes saved and the 800 records run through it again to check the results.

What the program does is not very complicated. (The full program is available in the technology site at www.ireland.com with the online version of this article.) Its main elements consist of:

a loop to read in the 800 records, one at a time

a series of commands to change accented characters from Macintosh coding to their HTML equivalents

a section which breaks each line into its constituent parts

For each record, the program then writes out a page of HTML, containing:

two navigation bars, one for the archive example and one for the website generally

the information for the record with headings only for the fields which contain data

Alongside the 800 files, the program also generates two indices, one alphabetical, the other by record number. It also counts the entries processed and reports that number.

There are other ways of turning structured data into web pages en masse. Some web-authoring tools will accept information from a database and some databases will export in HTML. Few of these options are as fast at the job as Perl and none offer the same flexibility. Perl is extremely widely used by websites to interpret user input and to serve up dynamic pages, but the example above shows that is also extremely useful for bashing collections of static pages into shape.

Its syntax may appear daunting at first to anyone without a programming background, but it is not too difficult to get started. Once the time has been put in to learning the basics of Perl, it can be used to create customised tools for all sorts of publishing and site-management tasks.

And what does this amazing program cost? Nothing, apart from the time to download and learn how to use it. The free-software ethos also means that there are thousands of sample programs and extensions to Perl posted on the Web for downloading.

Previous DIY webmaster articles are archived at: www.ireland.com/technology/ webbuilder.htm The series is a step-by-step guide to creat- ing a website with free and easily available tools. For more information on Perl, see www.perl.com

fomarcaigh@irish-times.ie