Automatic face recognition: how it works

As you pass through an airport, you are photographed several times by security systems. Face recognition systems can identify you by comparing your digital image to faces stored in a database. This form of identification is gaining popularity, allowing you to access online banking without a PIN or password. Face recognition systems analyse many thousands of reference points on the face, making the chance of mistaken identity remote, although hackers must be busy developing tricks to fool these systems.

The earliest face recognition system, developed decades ago, used a method called principal component analysis to analyse a face in terms of characteristic images called eigenfaces. This is analogous to breaking down a musical sequence into elementary harmonics or sine waves.

It worked in ideal circumstances, but struggled with lighting variations and changes in expression, hairstyle and glasses. Recent developments in computer power and “big data” have brought great advances. Reported accuracy levels of current systems vary widely, but skill is improving steadily.

Face recognition is accomplished in four stages: detecting a face in an image; transforming it to a standard viewing angle; representing it by a string of numbers relating to features such as distance between the eyes; and comparing this string to a database of known people to identify it and produce the name of the person depicted.

You have probably noticed the little rectangles in the camera of your smartphone; this is face detection in action. The camera automatically identifies faces and ensures that they are in focus. One way of detecting faces is to examine gradients, that is, changes in brightness from pixel to pixel. The gradients of a human face have characteristic patterns that capture the basic features in a simple manner.

Transforming images

Next, as faces look different when viewed from different directions and in different light, the image must be transformed. The idea is to ensure that specific points, called landmarks – the point of the chin, the outside edge of each eye, nose tip, etc – are in the same places in each image. This is done by an “affine transformation” involving translation, rotation and scaling.

Next, a unique representation of the face is constructed. We need to recognise faces in milliseconds so a small set of basic measurements is extracted. For example, we might measure the spacing between the eyes or the length of the nose. The face is then represented by a string of numbers, perhaps 100 in all. This is actually done by a machine-learning process called a deep convolutional neural network. The set of measurements is called an embedding. The aim is that different pictures of the same person should give approximately the same embedding.

The last step is to find the closest fit between the database of known people and the test image. This can be done using a basic classification algorithm: all that is needed is a program that takes in the measurements from a new test image and finds which known person is the best match. This can be done in milliseconds to produce the name of the person.

With increasing surveillance levels, legitimate concerns about personal privacy arise. We are photographed numerous times every day and have little idea who is collecting the information. If we are tempted to misbehave, we should recall the words of journalist and satirist Henry Louis Mencken: "Conscience is the inner voice that warns us somebody may be looking."

Peter Lynch is emeritus professor at UCD School of Mathematics & Statistics – He blogs at thatsmaths.com