Data mining uncovers the six basic plots of all stories

Scientists have used a “big data lens” to narrow down every story ever told to a list of just six plot types

If only there had been data mining around in Aristotle’s time. The Greek philosopher was one of the first to try to define how many different stories there are in the world. He believed that every story ever told could be divided into one of a number of types because of commonalities in their emotional arcs and plotlines.

Different people have come up with various numbers for these story types and estimates have varied from three to 30, but seven is usually assumed to be the default.

Christopher Booker's The Seven Basic Plots: Why We Tell Stories listed these as: overcoming the monster (The Hunger Games); rags to riches (Aladdin); the quest (Lord of the Rings); voyage and return (The Time Machine); comedy (A Midsummer Night's Dream); tragedy (Romeo and Juliet); and rebirth (A Christmas Carol).

But Andrew Reagan and the chaps at the University of Vermont’s Computational Story Lab have news for Aristotle, Shakespeare, Booker and everyone else. They’ve crunched the numbers and have concluded that all the stories in the world can be categorised into one of six types.

Or, in Story Lab-speak: “We find a set of six core trajectories which form the building blocks of complex narratives.”

From Adam and Eve to Cinderella
It's not a new idea to use AI to get to the bottom of this – it is something author Kurt Vonnegut mused on many years ago. He maintained in his autobiography and various lectures that computers might be able to compare and contrast stories. He used a simple graph to plot the emotional arc of some stories and showed how Cinderella was a lot like the tale of Adam and Eve in the Garden of Eden.

Developments in technology mean that Vonnegut’s theory can now be tested in full. As the university researchers note, “advances in computing power, natural language processing and digitisation of text now make it possible to study . . . a culture’s evolution through its texts using a ‘big data’ lens”.

Their process involved mapping the emotional arcs of 1,737 stories from Project Gutenberg’s fiction collection using sentiment analysis. In sentiment analysis, natural language processing is used to identify if the author’s words have a positive, negative or neutral emotional impact.

By doing this, the researchers established how and when the emotional tone of a story changes within the plot and used data mining to find commonalities between the various stories.

When the numbers were crunched and comparisons made, the outcome was that six basic emotional arcs or plots form the basis for most stories. The six definitions they outlined were: rags to riches (where there’s a rise in the emotional trajectory of the main character); riches to rags (fall); man in a hole (fall then rise); Icarus (rise then fall); Cinderella (rise, fall and rise); and Oedipus (fall, rise and fall).

Oedipus rules
Armed with their results, they then went to find out which of these stories were the most popular. It turns out that it's not tales that adhere to the rags to riches arc which get the most readers, but rather the ones which you'll find under the Cinderella, Oedipus and man-in-a-hole titles.

You could imagine a lot of uses for this research. Reagan has suggested that one possibility might be to train machines to generate original works.

But the chances of a robot winning the Booker is a bit away yet. “There are a lot of hard problems yet to be solved,” Reagan says. “In addition to the plot, structure, and emotional arc, to write great stories, a computer will need to create characters and dialogue that are compelling and meaningful.”

Read More

Most Read