How a tea-tasting test led to a breakthrough in statistics

One hundred years ago, an English lady, Dr Muriel Bristol, amazed some leading statisticians by proving that she could determine by taste the order in which the constituents are poured in a cup of tea. One of the statisticians was Ronald Fisher. The other was William Roach, who was to marry Bristol shortly afterwards.

Many decisions in medicine, economics and other fields depend on carefully designed experiments. For example, before a new treatment is proposed, its efficacy must be established by a series of rigorous tests. Everyone is different, and no one course of treatment is necessarily best in all cases. Statistical evaluation of data is an essential part of the evaluation of new drugs.

Fisher investigated the design of statistical tests and described many of his ideas in a book The Design of Experiments. To exemplify some key ideas, he described a quirky experiment to investigate Bristol’s claim that she could distinguish if the milk was poured before or after the tea.

Shortly after Fisher had moved to Rothamsted research station in 1919, he poured a cup of tea and offered it to Bristol. She declined, saying that she preferred the milk to be poured first. The arrogant young Fisher scoffed at this, insisting that it could not possibly make any difference, but Bristol maintained her stance, assuring him that she would always know the difference. Overhearing this exchange, another scientist, Roach said, “Let’s test her.”

The assumption that Bristol had no skill is known as the null hypothesis

How should the test be arranged? How many cups of tea should be poured? In what order should they be presented? How about miscellaneous factors such as temperature, sweetness and so on? How was a conclusion to be drawn from the results? Fisher considered all these factors, and more, in his book.

Randomise

Fisher proposed to present eight cups to tea to the lady, four of each variety (milk first or tea first). Ideally, they should be identical in every respect except the order of pouring. He argued that the best way to present the drinks was to randomise the order.

Assuming that there is no ability to discriminate, each choice is correct or incorrect with equal probability. Thus, if eight cups are presented, the four cups chosen as “milk first” are equally likely to be any four of the eight. How many ways are there of selecting four items out of eight? This is a standard problem in combinatorics, and the answer is written as Choose (8,4). There is a button on most scientific calculators to evaluate this, and the result is 70.

If Bristol had no ability to distinguish between the drinks, each of these 70 ways was equally probable. But her selection would be completely correct for only one of the 70 cases. So, if she had no ability to taste the difference, she had only a 1-in-70 chance of making no errors.

The assumption that Bristol had no skill is known as the null hypothesis. It was introduced by Fisher, who also proposed a threshold probability of 1-in-20 or 5 per cent as a limit for statistical significance.

The null hypothesis is assumed to be true unless evidence emerges that indicates that it is invalid. In the present case, it implies that there is no relationship between the choices Bristol made in each case and the actual variety of the drink.

Bristol successfully identified the correct category in each of the eight cases. Since the probability of this happening by chance was about 1.4 per cent, well below the threshold, Fisher was forced to reject the null hypothesis and concede that Dr Muriel Bristol was indeed gifted with the ability to distinguish by taste the order of pouring.

An evening course on recreational maths at UCD, "Awesums: Marvels and Mysteries of Mathematics", is now open for booking online atat www.ucd.ie/lifelonglearning) or by phone (01-7167123).

Peter Lynch is emeritus professor at UCD School of Mathematics & Statistics – He blogs at thatsmaths.com