Clinical trials: how oranges and lemons ensured we don’t compare apples with oranges
Randomised clinical trials help ensure healthcare decisions are based on the best available evidence
The importance of comparing like with like has been understood for centuries. Photograph: Thinkstock
Your three-year-old son gets a small piece of Lego stuck up his nose. What do you do? Who do you call? You want to avoid a trip to the emergency department but, above all, you want to retrieve the piece. Someone suggests you try the “mother’s kiss” technique. So, you block the clear nostril, place your mouth over your son’s open mouth, and blow: Pop, out comes the piece. You breathe a sign of relief and your son goes back to building his fire station-aeroplane hybrid.
Our knowledge about whether a treatment works comes from various sources: trial and error, personal experience, and carefully controlled research studies reported in the medical literature.
When it comes to the mother’s kiss, there is good evidence from case series and individual case reports that it is an effective technique for removing foreign objects from children’s nostrils. It works almost all the time, the benefits are seen very quickly and no side effects have been reported.
However, it is unusual for evidence to be so clear-cut that there is no doubt about whether a particular treatment works. Many things in healthcare are not so dramatic, or so immediately obvious.
Interventions such as a drug or a lifestyle change – a particular diet or exercise programme – can have subtle effects, requiring more robust evaluation of the differences between using and not using the intervention. The best way to determine the effectiveness is by using a fair test.
Ensuring like is compared with like, with the exception of the intervention being tested, is an important feature of fair tests.
Let’s say you want to compare the effects of a month-long exercise programme on the ability of people to walk up three flights of stairs without getting breathless. If you provided the programme to a group of people, many of whom were overweight, and compared them with a group of people of normal weight who did not do the programme, it would be reasonable to conclude that any comparison of the groups after a month would be more to do with their initial characteristics than with the exercise. This would be an unfair test of exercise.
Instead, what you should do is use a random process, such as flipping a coin, to choose half the people, who would receive the exercise programme, and compare them with the other half who don’t get it. Then, after a month, any difference between the groups would be down to the exercise programme.
The importance of comparing like with like has been understood for centuries. In 1747, James Lind, a Scottish naval surgeon, set about finding a way of treating scurvy, a potentially deadly disease.
He compared six methods in 12 sailors on board the HMS Salisbury, being careful to include sailors of similar clinical condition with similar diet and accommodation. Lind found that sailors given oranges and lemons did much better than those given other treatments, which included dilute sulphuric acid.
However, even though Lind found this out by studying just 12 people, we usually need many more for a reliable fair test. This might be because the effect of the intervention isn’t so big that it becomes clear in a small number of patients, and because we need to make sure that the groups really are similar.
Going back to the exercise trial, if there were six overweight people in the study and we flipped a coin for each of them to see whether they would receive the exercise programme, it wouldn’t be that unusual for us to get five heads and one tail, or even six heads.
This would mean that, by chance, one group had a much higher proportion of overweight people than the other, and that could still bias the results. However, if there were 60 overweight people in the study, there is no way that they would be distributed 50-10 by flipping a coin. They might be 26-34, but that would be close enough not to matter.
We can get the big numbers we need by doing large trials, or by combining the results of similar trials.
The first trial reporting the effects of these steroids appeared in 1972. A decade later, there was strong evidence from several trials showing that a short, inexpensive course of steroids given to a women about to give birth too early were effective.
But the findings of these individual trials weren’t brought together systematically and convincingly until 1989. That delay meant that many babies are likely to have died unnecessarily because the evidence had not been made available.
This is why we need to consider the results of any new study in the context of the findings from previous studies, and why we need to keep this evidence base under constant review.
Although many decisions in healthcare are informed by high-quality evidence, too many are underpinned by insufficient high-quality evidence.
Fair tests, or randomised trials, provide an effective means to help ensure that decisions are based on the best available evidence. They help patients and healthcare practitioners make well-informed choices. Why children put Lego or other foreign objects up their noses is a question for another day. Prof Declan Devane is director of the Health Research Board – Trials Methodology Research Network (HRB- TMRN) and professor of midwifery at NUI Galway and the Saolta University Health Care Group.
Prof Mike Clarke is director of the All- Ireland Hub for Trials Methodology Research and chair of research methodology at Queen’s University Belfast.