Trinity develops Irish-language speech synthesis system

Most of us use computer terminals, tablets and smartphones, absorbing information quickly and easily. How do the many thousands of Irish people who are blind or visually impaired manage to interact with computers? For them, entering data by keyboard or voice is easy, but special software is needed to convert the text on screen into a form for output to a loudspeaker or headphones, or to drive a refreshable braille display.

Computer processing of language enables us to speak to a machine and to carry out simple tasks such as web searches by voice. Satnav systems issue voice instructions based upon speech synthesis. Text-to-speech systems have reached a high level of development, and input by voice recognition is now practicable. On the other hand, despite great advances, language translation is still primitive and we have some way to go before a computer can translate language with quality comparable to a human translator.

Screen-reading systems that convert output from text to speech have been available for some years. For example, NonVisual Desktop Access (NVDA) is a free, open-source screen reader that runs under the Windows operating system. Until recently there was no means of converting Irish text into speech. The Phonetics and Speech Laboratory at Trinity College Dublin has now developed an Irish-language speech synthesis system. You can try it out at abair.tcd.ie (it gives output in any of three regional dialects).

Fourier analysis

Computers analyse and model human speech using language-processing systems that can recognise words and sentences and synthesise voices. These systems involve informatics, linguistics, cognitive science, acoustics, engineering and, of course, mathematics. The key mathematical concept underlying computer language processing is Fourier analysis.

Jean-Baptiste Joseph Fourier was a remarkable French mathematician and physicist. He was scientific adviser to Napoleon Bonaparte, accompanying him on his Egyptian expedition in 1798. In 1822 Fourier published a book, The Analytical Theory of Heat, introducing his analysis method based on wave functions and what are now called Fourier series.

A segment of speech is essentially a pattern of pressure variations that travel from speaker to listener through the air. The pressure signal can be represented by a mathematical function that oscillates between positive and negative values many times every second. Fourier realised that signals like this could be broken down, or analysed, into a collection of simple components called sine waves.

This idea can also be turned upside-down: Fourier showed that any desired signal can be synthesised by combining a collection of sine waves. Thus, a segment of speech can be treated as a combination of simple waves. These waves are easy to generate and to add together to produce the desired signal.

Fourier’s original ideas led to an explosion of activity. The modern field of digital signal processing has emerged from his work. An impressive range of mathematical techniques – such as digital filters, Laplace transforms and cepstral analysis – are used to analyse and model signals in computer-language processing systems.

Early computer speech was robotic and mechanical. Current systems produce more natural sound, but there is still room for improvement. The goal is to produce synthetic speech that is indistinguishable from a human voice. The availability of a screen-reading system for the Irish language should be of great benefit, not only in everyday use but also as an aid to education in the language for those unable to read from a terminal or phone display.