New data protection regime could hamper vital research
Without access to proper data wrong policy levers might be applied
An over-zealous interpretation of the General Data Protection Regulation may risk the CSO’s ability to make available anonymised microdata files for the purpose of research that is in the public interest. Photograph: Cyril Byrne
The quality of our understanding of economic and social issues has improved enormously since statistical offices, including the CSO, first made available to researchers anonymised individual data from surveys and administrative sources, albeit under most stringent conditions.
Prior to this development, we had to content ourselves with simple summary tables that could not establish cause and effect, a key issue policymakers need to understand.
The CSO has a statutory mandate to protect anonymity of data supplied by individuals and companies. This is essential to maintain the confidence of those who respond to statistical inquiries.
The CSO has been exemplary in how it does this. Outside researchers undergo a rigorous process to become officers of statistics, data can only be viewed under special conditions, and the resultant research product is reviewed prior to release to safeguard against any privacy violation.
Access to survey microdata has been essential to our understanding of pay inequalities between men and women. Using the microdata, it has been possible to separate what differences have been due to other characteristics that affect earnings, such as age, education, economic sector, and what can be attributed to gender.
We have also seen how time out of the workforce or working part-time depresses lifetime pay rates for women. And we have been able to track progress over time.While the wage penalty for being a woman has fallen, it remains significant.
Without access to the microdata to understand the dynamics involved, the wrong policy levers might be applied to try and rectify the situation.
Before microdata became available, much of the analysis on poverty was based on looking at a small number of so-called “typical households”. Today, we know that these were not, in fact, representative, and we have been able to identify which households actually experience the deepest and most persistent poverty, and to model the impact of tax and benefit changes on the distribution of income and poverty rates among actual households. This has improved policy design.
Another example is research, using both Revenue and CSO microdata, which has looked at who lost jobs in the economic crisis and what happened to individual earnings.
Aggregate data on employment and earnings, with the picture clouded by retirements and new job entrants, masks a lot of the real story of the pain suffered, and who has been able to bounce back.
Ireland was rather behind the curve in developing research using anonymised individual data. Scandinavia has long made much more sophisticated statistical use of the data already available to government, avoiding the need to go out and collect data from individuals.
For example, Norway has linked different official datasets to provide valuable research results on the long-term impact on children of the age at which they start school.
While this economic and social research is very much in the public interest, there is now a danger that the data protection regime, or how these laws are interpreted, will serve to severely restrict access to microdata.
Data protection is, of course, vital to safeguard individuals’ privacy and protect against unwarranted use by commercial bodies, or the State, of data they hold on individuals.
While I’m no lawyer, it seems to me that the Statistics Act 1993 provides both strong protections for privacy and anonymity of statistical data, and a clear legal basis for data linking and for third party research on anonymised microdata. This system has worked well for 25 years.
However, an over-zealous interpretation of the General Data Protection Regulation may risk the CSO’s ability to make available anonymised microdata files for the purpose of research that is in the public interest, and to link anonymised data from sources like Revenue and social welfare databases, that can illuminate how the labour market works and what groups are at most at risk of being left behind.
For researchers, it is a significant burden to have to justify to the CSO a request for access in respect of each individual question in a statistical survey. The nature of research is that you don’t know the answer beforehand, and what variables may prove useful or not.
Scandinavia, which is subject to the same EU data protection laws, continues to provide even more extensive access to such administrative data sets, while protecting individual anonymity.
I would urge the Data Protection Commissioner and the administration to work together to give us a Scandinavian standard of access to research data, so that the deep statistical analysis of key policy questions can continue in the public interest.