Computerised essay evaluation replacing human markers

While computer grading of multiple choice tests has been used for years, computer grading of more subjective material like written exams is now moving into the academic world.

Is it possible that the Leaving Cert English papers could someday be graded by a computer instead of a teacher toiling away in the summer sun to earn some extra money?

While the technology for the evaluation of written work is still in its infancy, methods for automatic scoring of test essays - based on artificial intelligence, computational linguistics and cognitive science - have come into use in recent years.

There are three companies that offer software in this area: Educational Testing Service (ETS) of Princeton, New Jersey, Vantage Learning of Newtown, Pennsylvania and Knowledge Analysis Technologies of Boulder, Colorado.

These companies gather sets of essays written by hundreds of students of varying levels and have their papers graded by trained teachers. The companies then train their software engines to mimic the standards - content, grammar, spelling - the hundreds of expert human scorers use.

"We take the training papers, run them through the computer and assemble algorithms particular to each question," said Mr Richard Swartz, executive director of technology products and services at ETS, the largest of the three companies. "The computer learns the linguistic features used by the readers and so we develop our scoring model," he added. It only takes a couple of days to train software engines to score tests. "

One factor now driving the use of technology in this area is the United States's No Child Left Behind Act of 2002, which mandates tests in maths and reading for students in grades three to eight (from ages eight to fourteen). This means more tests are being given on a regular basis to a lot more children.

Furthermore, 18 states now require students to pass a writing test for high school graduation. Starting next year, both the SAT and the ACT, national standardised tests which assess students' ability to cope with college-level work, will include writing sections.

Indiana is the first state to use a computer-scored English essay test in a statewide assessment and its experience could influence testing decisions in other states. The Indiana Department of Education has provided all ninth-grade English classrooms in the state with access to Criterion Online Writing Evaluation, an automated essay reading service developed by ETS.

Criterion evaluates a student's writing skills and provides instant score reporting and diagnostic feedback to both the instructor and the student.

Criterion is widely used in schools and colleges in the United States as well as in Britain, Canada, China, Mexico, Korea and Taiwan, said Mr Swartz. In the US Criterion is used in more than 1,200 schools by about 500,000 students at all grade levels, who submit essays for scoring every day.

"We administered over 80,000 tests online in Indiana this spring," said Mr Swartz. "Each test included one or more open-ended questions, all of which were scored by our automated scoring systems." While Indiana developed its programme in advance of the No Child Left Behind Act, Mr Swartz thinks "the law will increase attention on this area".

"The trend is clearly towards increased use of online testing primarily for the benefits of faster turnaround time and reduced expense," Mr Swartz said.

Criterion costs a school $15 per student a year for unlimited access. In its first year, the online version in Indiana cost a quarter of the price of the paper and pencil version which teachers were paid to score. Most papers require scoring by two teachers. Then there are the added costs of shipping the papers and storing them. The margin of error of the online scoring system is said to be comparable to trained human readers.

Since 1999 an ETS product called e-rater, in conjunction with a faculty reader, has scored the essay portion of the Graduate Management Admissions Test (GMAT) Analytical Writing Assessment in conjunction with a faculty reader. In that time the e-rater has scored around two million GMAT essays, agreeing with faculty readers' scores on average around 97 per cent to 98 per cent of the time.

For high stakes tests, like the GMAT, at least one human is always in the scoring loop. For a low stakes writing application, such as a practice essay system on the Web, a single reading by an automated system is often as accurate and economically preferable.

Vantage Learning is waiting to hear if Mississippi will use its IntelliMetric program to score maths, reading, science and social studies in schools. About 33,000 middle school children (ages 11 - 14) will use IntelliMetric in Los Angeles County. It also expects to deliver 760,000 Web assessments to students in Oregon.

Vantage Learning's vice-president of sales and marketing, Mr Harry Barfoot, estimates his company will carry out more than 18.7 million online test transactions this year to 15 million students directly.

"The challenge is to have more Internet access in schools so kids can use it," said Mr Barfoot.

Dr Darrell Laham, co-founder and chief technology officer of Knowledge Technologies, would agree: "From kindergarten to grade 12 the infrastructure is not there yet to have students take tests on computers at one time," he said. "Each state develops its own tests and most still administer them by pen and paper which doesn't work for us."

The No Child Left Behind Act is helping his company's business, which has already grown from three people from the University of Colorado in 1996 to 20 today. "

"We are one of the three players that have the technology to do this," said Dr Laham. "But we're a smaller company and our techniques are different. We work for other test publishing companies and content providers. We don't go after the state contracts."

He estimates about 10 different companies and government agencies are using its Intelligent Essay Assessor system, mostly as a back-end scoring service.

For example, someone could log on to a client's website to write an essay. Once the essay is submitted, it goes from the client's website to Knowledge Technologies' servers in Denver and back again. It can take from one to three seconds to score an essay. "We're as reliable as human readers but it takes some getting used to to know that a computer will score your essay," Dr Laham said. "People are accepting this technology."

While often the technology is offered free to the end-user, a client pays to have a certain test book adopted into the scoring system. The cost can range from $0.50 to $1.00 for every essay scored. This week, when the Council of Chief State School Officers meets for the 34th annual conference in Boston, they will discuss new technologies in online testing.