That’s Maths: Andrey Markov’s brilliant ideas are still bearing fruit

Imagine examining the first 20,000 letters of a book, counting frequencies and studying patterns. This is precisely what Andrey Markov did when he analysed the text of Alexander Pushkin's verse novel Eugene Onegin. This work comprises almost 400 stanzas of iambic tetrameter and is a classic of Russian literature. Markov studied the way vowels and consonants alternate and deduced the probabilities of a vowel being followed by a another vowel, by a consonant, and so on. He was applying a statistical model that he had developed in 1906 and that we now call a Markov Process or Markov Chain.

Markov processes appear in physics and chemistry when probabilities are used for unknown quantities. In information processing, they have a role in pattern recognition, automatic speech analysis and synthesis and data compression. They are used by meteorologists, ecologists and biologists. Other applications include the control of driverless cars, machine translation, queuing patterns, and prediction of population growth, asset prices, currency exchange rates and market upheavals.

Memorylessness

Many systems have multiple states, with transitions between them occurring intermittently. The systems hop from one state to another with a probability that depends on both the current and previous states. For example, a good summer may be followed by another good one or by a bad one. The likelihood of sequences of good and bad summers can be calculated from climate records.

For a Markov process, only the current state determines the next state; the history of the system has no impact. For that reason we describe a Markov process as memoryless. What happens next is determined completely by the current state and the transition probabilities. In a Markov process we can predict future changes once we know the current state.

The Drunkard’s Walk

As an example, take a random walk along a line. This is often described as a “drunkard’s walk”: one step forward may be followed by another, or by a step backward, each equally likely. There is no telling with certainty where the subject may end up, but his next position may only be one step ahead or one behind his current position. Although the final destination of the drunkard remains unknown, a statistical pattern of possibilities may be derived using Markov’s theory. Surprisingly, if the process continues indefinitely the subject is certain to return sooner or later to his starting point.

A Markov chain can be depicted in a geometric or algebraic manner. The geometric picture is that of a graph, a collections of points (nodes) connected by lines (edges). Each node is a state and each edge has a value giving the transition probability for the two nodes that it links. This information can also be encoded algebraically as a matrix or array of numbers giving the transition probabilities. The matrix is very useful for analysing the properties of the system.

Although more than a century has passed since Markov introduced his ideas, they are still having profound impacts and widespread applications. The Markov chain Monte Carlo method, or MCMC for short, has revolutionized Bayesian statistical analysis, enabling statisticians to estimate probabilities for systems with thousands of unknown parameters. This has revolutionised probability theory and the MCMC technique has been described as “arguably the most powerful mechanism ever created for processing data and knowledge”.

Peter Lynch is emeritus professor at the school of mathematics & statistics, University College Dublin. He blogs at thatsmaths.com