What Big Data can tell you about your genome – and why it matters

Faster computing power means that decoding work on the human genome can now be done in hours

Sequencing a single genome these days means analysing half a terabyte of data – very Big Data indeed

We've had a week of looking at the implications of "Big Data" here at The Irish Times , starting last Saturday and continuing into this week.

It’s such a huge topic that finding areas for focus, while doing justice to the whole sweep of the issue, posed challenges for all of us who wrote on it.

One area that I barely touched on in my introductory piece that ran in last Saturday's Weekend section, but which has grabbed my interest, is the implications of Big Data for genome work, especially following excellent talks last summer at the huge European Science Open Forum (ESOF) conference in Dublin.

Faster and cheaper computing power and software has meant that decoding work on the human genome, which took months of painstaking study a decade ago, can now be done in hours – an extraordinary shift.


As of October last year, the going rate for sequencing a genome was $6,618, according to the US government's National Human Genome Research Institute. Hardly a snip, but compare that to $7 million five years ago, and a staggering $95.3 million in 2001, when the first genome was completed by Craig Venter's research team. Sequencing a single genome these days means analysing half a terabyte of data – very Big Data indeed.

Personal genomics is definitely a daunting new world of information for which there are as yet few meaningful applications and a minefield of ethical questions (what does it mean to have one of the genes that predispose for breast cancer, when we don’t fully understand the role of genetic versus environmental factors in disease? And might that affect someone’s ability to get health insurance?).

Pinpoint anomalies
But the broader study of genomics is an area in which advances, and the promise of new insights into genetically linked conditions and eventually, better treatments, is gradually taking off. Some of the most interesting potential for such work lies in working with genetic data from national or regional populations. This is because it is easier to pinpoint anomalies that are linked to conditions when the entire genome of those individuals has less variation. In Ireland, that could make it possible to hone in on understanding conditions such as cystic fibrosis, which occurs more frequently in Irish people.

But understanding can also be gained on common conditions. Icelandic biopharmaceutical company deCODE Genetics – now a part of US company Amgen – has said it has identified genes involved with some cancers, heart disease, and schizophrenia, thanks to access to genetic information for about 100,000 Icelanders.

Controversially, it initially planned to create a national DNA database for the entire Icelandic population, forcing a national discussion in that country on who owns such genetic information and who may use it or profit from it. State plans for such a database were scrapped, but the company has consent from about a third of the population to use its genetic information.

deCODE was founded by Icelandic neurologist Kári Stefánsson – like Venter, a larger than life and controversial figure. Exciting new genetic research at deCODE included the discovery of a rare gene that protects against Alzheimers, carried by 1 per cent of the Icelandic population, which could one day lead to preventative medication for the disease.

He also noted that a mother’s DNA, in general, tends to give protection against problem genes inherited in a father’s DNA, and explained how they have found that the increasing age of fathers is associated with a rise in the risk for autism.

The first Irish genome was sequenced in 2010, opening the door to more detailed Irish work in this new area. But ethical and privacy complications have arisen here too. One set of DNA samples that could be used for important genomic research is contained in a national collection of blood samples, the so-called “Guthrie cards”, obtained from Irish babies for decades, to screen for six genetic diseases.

Priceless resource
Under current Irish data protection laws, the entire collection was, until weeks ago, threatened with destruction, unless new legislation was enacted. Advocates such as Prof John Crowe, president of the Royal College of Physicians, argued that the archive was a priceless resource that could yield future treatments as well as identify those at risk of disease, as new Big Data analysis techniques mature.

At the last moment, the cards got a reprieve, with the Government vowing to find a legal way to guarantee their preservation.

The quality of life of someone you care about may one day depend on research that could come from those cards. Finding a way to protect and preserve them is important – and shouldn’t be long-fingered by the State.