Census team counting on computers

 

It will take months to process the two million census forms, despite the use of advanced computers

A NEW LEVEL of computer power is being used to tackle the 2011 census. Scanners will record 40 million images of census pages, and computers will interpret what we have written on our forms.

For the first time the Central Statistics Office (CSO) is producing its own “maps”. These will pinpoint every single dwelling in the State on a two-dimensional grid, providing information about population in a county, electoral area, townland or almost any format that can be defined.

The move to let computer power do the heavy lifting came as the enabling technology advanced enough to let this happen, explains Gerry Walker, a senior statistician at the CSO who works on the census.

“The big change happened in 2002. Prior to that we had to key in every single item,” he says. “In 1996 it took a total of 634 person-years to do this with a staff of 312 people.”

The 2011 census will be read and compiled by computer as much as possible, Walker says. This is much trickier than you might think despite what you see in the movies. Some of us print clearly, others not so clearly. And while the computer readily reads such occupations as “teacher” or “journalist”, how does it understand what is written for the job of the person who maintains potted plants in your office?

For this reason Walker already knows the limits of computer “understanding”, depending on the part of the census form involved. For example the system can capture date of birth with very high accuracy and can get county of birth right at least 70 per cent of the time.

But it can only get religion correct 43 per cent of the time, ethnicity about 37 per cent accurate and occupation with just 30 per cent accuracy, Walker says. “The more complex the thing you are trying to code, the less the computer will get it right.”

Even before the computer scanning and processing begins there is a huge logistical effort required just to collect, move and stack the two million census forms that will be filled out next Sunday, according to Walker.

The 5,000 enumerators will collect these by hand, each of them dealing with about 420 homes, and then pack them into standardised black boxes for transport to a central location.

Thousands of boxes will be accumulated and then the files they contain will be taken out and stacked on a shelving system.

At this point the real processing will begin, and it will start with a chop. The spine of the 12-page forms will be cut with a guillotine to free the pages before these are sent through scanners.

“It reads bar codes on the forms, but basically it is just like a photocopier,” Walker says. “It is grabbing an image of the page and gets rid of the original green colour of the forms.”

Each form will have 12 pages, and so 24 computer images. The team working with the forms will scan 15,000 forms each day for six months before clearing the documents.

This will produce 36,000 computer images daily, which will then be converted into readable data, the object being to push the accuracy of this reading process as high as possible. There is a “repair code” to fix errors, such as a place of birth being called “Doland” rather than “Poland”. But for when this doesn’t work, there will be a team of 90 at the processing centre to check details that the computer could not “understand”.

County of birth comes up correct seven times out of 10. “The operator has to code the remaining 30 per cent. We are trying to up the match rate all of the time,” Walker says.

Each “station” involves computer analysis of a different part of the census form. Occupation is the most difficult section for the computer to read, according to Walker.

There are also cross-checks made for all the various stations. For example if someone is away from home on a business trip and is declared as staying in a hotel across the country, then that hotel census return is cross checked to see if the person turns up on that form.

One major departure for this census is the fact that the CSO will produce its own maps by blending the postal database with GPS systems so that every dwelling in the State is defined. It doesn’t matter whether they are vacant, a holiday home, abandoned or inhabited, all will appear on the map.

The CSO worked with researchers at NUI Maynooth to subdivide the country into 19,000 “small areas” of 75 to 140 dwellings. This will allow highly detailed maps to be prepared, giving regional details about population mix, employment and anything else defined on the census form.

The census forms are due in from the enumerators on May 13th and will be in the warehouse by May 20th. All going well, the first details of population will be released by July this year and all of the processing for the 2011 census will be complete within 12 months.