Wrong again: Why did most polls predict a Clinton win?
Michael Marsh: Turnout, herding and non-random samples may explain bias against Trump
The majority of polls predicted that the Democrats would win the popular vote and with it the presidency for Hillary Clinton and perhaps even the Senate for the Democrats. Photograph: Rhona Wiserhona/AFP
So it was indeed a Brexit moment, when the people voted for uncertainty against most predictions and Donald Trump was elected as US president. And as in the UK election in 2015, the Brexit referendum and our own election earlier this year the polls got it wrong again.
The majority of polls predicted that the Democrats would win the popular vote and with it the presidency for Hillary Clinton and perhaps even the Senate for the Democrats.
This was not just a few polls. There are thousands, with most US states polled extensively as well as the more familiar national polls.
In the past analysts like Nate Silver on fivethirtyeight.com used these local polls to good effect in predicting outcomes that others, relying on a few national polls, failed to do successfully, but this time even that source proved misleading. Even at national level there are far more polls than we would see here.
On average these gave Clinton a lead of around 3 per cent and it looks as if Donald Trump will win with something around a 1 per cent lead in the national vote.
How could this happen?
Samples are not random
There will be extensive analysis of this event for many years, but there are a number of possible explanations for why the polls seemed to be biased against Trump. These are not peculiar to the US but are universal problems for pollsters.
They key problem in polling to that the underlying theory which justifies the enterprise assumes that the sample of people being surveyed is random, but the samples taken are not.
Most are phone polls or some kind of on-line sample. Face-to-face interviews are unusual. It is possible to adjust these samples so that they look like a random sample, but adjustments are limited by what is known about the sample and the underlying population.
It is easy to ensure the right balance of gender, age, occupation and even past voting patterns and so on but this assumes that the subgroup actually in the sample accurately reflect those in the population.
This was the big mistake in the UK in 2015. At it happens, the samples contained people more interested in politics than the actual population, and this factor led to a serious underestimate of the Tory vote. Something along these lines could be part of the explanation in the US.
The next possible explanation is to do with turnout. Pollsters ask people how they will vote, but not everyone does. And many are unwilling to admit they will not vote, even in a country where only about half of the eligible population turns out.
Adjustments are made to polling data to predict likely turnout but these are far from perfect, and in an unusual election like this one the pattern of turnout may also have been less predictable.
A further explanation is that people lied to the interviewers, being unwilling to admit support for Trump. While this so-called ‘spiral of silence’ theory explains some outcomes in the past, it seems less credible in this one, as Trump supporters are hardly minorities in their communities and the US media is not - as a whole - anti-Trump.
Another explanation is that pollsters don’t want to be wrong. All estimates are the result of quite complex adjustments to the raw data, and those making these adjustments will know if the results of these calculations look ‘right’ or not.
Estimates giving predicted vote shares that are outside the mainstream may prompt a few more tweaks. These ‘herding’ behaviour can lead to pollsters ignoring the evidence of their own data. Certainly the polls converged to a degree unlikely according to the theory of sampling, as they usually do.
Michael Marsh is an emeritus professor of political science in TCD