The irregular distribution of the first digits of numbers in databases provides a valuable tool for fraud detection. A remarkable rule that applies to many datasets was accidentally discovered by an American physicist, Frank Benford, who described his discovery in a 1938 paper, The Law of Anomalous Numbers.
With nine possible choices, we should expect each digit to occur on average one time in nine. In fact, the digits are heavily skewed toward smaller values: the first digit is more likely to be 1 than 2, more likely to be 2 than 3, and so on.
The number 9 occurs as the first digit less frequently than any other number. Benford’s law has unexpected practical benefits: it has led to surprising applications in forensic accounting and elsewhere.
The first-digit law
Imagine a large list of numbers, such as might be found in census returns or stock market reports. Suppose a number is picked at random from the list. We might expect that the first digit, which may be anything from 1 to 9, is equally likely to be any of these values. However, it is frequently found that the number 1 appears as the first digit about 30 per cent of the time, whereas 9 occurs less than 5 per cent of the time.
Benford tested his idea on numerous datasets: lengths of rivers, population sizes, street addresses, death rates and so on. He found the same curious pattern in all these cases. The law applies most accurately to data that span several orders of magnitude.
For datasets where all the entries are similar, this is not the case. For example, the heights of most adults are in the range from 150 to 190cm, all beginning with 1. Changing units to inches, the range runs from 60 to 75, so only 6 and 7 appear as leading digits. However, many real-world distributions have wider ranges and satisfy Benford’s law with remarkable accuracy.
Benford’s law can reveal highly unlikely frequencies of numbers in a dataset. It has been used to detect fraud in elections and tampering with digital images. Swindlers who fabricate figures tend to distribute their digits uniformly. Others, who choose amounts just below checking thresholds, leave tell-tale signals that show up as anomalies using the law.
Benford was not the first person to notice the peculiar distribution of leading digits. In 1881, the American astronomer Simon Newcomb, thumbing through a table of logarithms, noticed that the earlier pages, where the numbers start with 1, were more heavily worn than the later pages. Newcomb proposed that the probability for leading digits followed a logarithmic law. However, his finding was forgotten until Benford re-discovered it some 60 years later.
Numerous websites, papers and textbooks present simple proofs of Benford’s law, but fallacies in their arguments are common. The eminent logician and mathematician CS Peirce once observed that “in no other branch of mathematics is it so easy for experts to blunder as in probability theory”.
Although many aspects of Benford’s law have solid theoretical foundations, there is still no unified approach that accounts for its ubiquity. Most experts agree some mystery remains about the widespread occurrence of the law in real-life circumstances.
Peter Lynch is emeritus professor at UCD School of Mathematics & Statistics – he blogs atthatsmaths.com