Face detection, digital recognition vector illustration. Facial points, biometric identification signs, identify symbols with people avatars.

AI-powered Facial Analysis is Pseudoscience: A Reflection on Physiognomy

I have recently joined the many people who have wondered what can be read from the face of another person, otherwise known as physiognomy. Aristotle considered it possible to infer character from features, at least for passions and desires. Opinions have oscillated over time, physiognomy enjoying certain respectability in the 18th century, before descending to the realm of pseudoscience in the 19th.

The advent of artificial neural networks in recent decades has revived the question of whether there might be a kernel of truth in the assertion that aspects of a person’s personality and character could be inferred from their appearance (see. e.g., Richard Wiseman, Roger Highfield, Rob Jenkins 2009 article in New Scientist, “How Your Looks Betray Your Personality.”

Evidence for interest in AI-powered facial analysis is based on the growing number of commercial products touting myriad applications. The most recent one I’ve come across is the lead piece in The Download from MIT Technology Review published March 5, 2021 (anyone can subscribe at https://forms.technologyreview.com/newsletters/briefing-the-download/): “I asked an algorithm to tell me my ‘beauty score.’” Another example is an experiment conducted by Munich-based Bayerischer Rundfunk on automatic candidate evaluation from video interviews, which will be discussed in detail below.

In the 19th century, physiognomy became associated with phrenology, the claim that psychological attributes could be inferred from the measurement of physical features of the skull. Phrenology has been thoroughly debunked: In “An Empirical 21st Centruy Evaluation of Phrenology,” O. Parker Jones, F. Alfaro-Almagro, and S. Jbabdi wrote, “The present study sought to test … the fundamental claim of phrenology … We found no evidence of this claim.”

I structure my reflection on AI-powered facial analysis in three parts. First, I propose a thought experiment that convinced me originally that physiognomy was at best a pseudoscience. Second,  I report on a very thorough and carefully performed study from mid-2020 finding that a considerable amount of personal information can be inferred from an analysis of facial images using artificial neural networks. Third, I describe a similarly carefully conducted experiment published in February 2021 by Bayerischer Rundfunk (BR, or Bavarian Broadcasting), which casts some doubt on the stability of inferences drawn from AI-powered facial analysis.

A Thought Experiment

My thought experiment arises from having recently read the autobiography of Zahiruddin Muhammad Babur (1483–1530), The Baburnama (the 2002 edition translated by Wheeler M. Thackston). Babur created the Mughal Empire in Northern India in the early 16th century. Babur’s extraordinarily detailed autobiography gives an account of his life, in which he intended to capture, in as direct and honest a manner as possible, everything he experienced and did.

It emerges that he was bisexual, that he often participated in drunken orgies, and frequently consumed mild drugs. He had a deeply refined sense of aesthetics, and was a noted poet. He could be magnanimous and merciful. At the same time, Babur was a highly skilled military commander and consummate tactician. He often ordered his troops to massacre the enemy and heap their severed heads into great mounds. On occasion, having taken a town or district, he would have the males killed and the women and children taken as slaves.

As Salman Rushdie asks in his introduction to my edition of The Baburnama, was Babur a scholar or a barbarian, a nature-loving poet or a terror-inspiring warlord? The answer is he was all of these. The thought that Babur’s complex and apparently contradictory character could be determined from an examination of his facial features seemed to me absurd. This thought experiment is not intended to be thought of as a counter-example to physiognomy, but to stimulate reflection. We do not possess an image of Babur’s face that could promote argument. Images commissioned by his successors long after his death show a rather effete looking aesthete.

My thought example is from the distant past, but we are regularly confronted with the same difficulties in modern life. Who can read the impulses leading to violent acts from photographs in the news or in images posted on social media?

AI and the New Physiognomy

On reviewing the Wikipedia contribution on physiognomy, I found reference to carefully conducted and thoroughly analyzed studies by Yegor Tkachenko and Kamel Jedidi: “What Personal Information Can a Consumer Facial Image Reveal? Implications for Marketing ROI and Consumer Privacy.” Their research suggests a kernel of truth in the new physiognomy revival as AI-powered facial analysis.

My thought experiment had convinced me that such an assertion would seem, on the face of it, impossible. Yet, the work of Tkachenko and Jedidi suggests that the assertion may yield to the application of artificial neural networks to the analysis of facial images.

Tkachenko and Jedidi sought to, “understand the maximum achievable predictive ability of a consumer facial image … as a basis for prediction of consumers’ characteristics.” Without going into detail, and omitting their extensive statistical analyses and discussion of results, this remarkable paper reports two studies. The first employed data, volunteered by 2,646 individuals, consisting of paired facial images and responses to a detailed questionnaire. The second study uses supermarket video surveillance and matched receipt data.

I shall mention only the first study, as my concern is less marketing targeted at a particular population than physiognomy. The authors randomly divided the data into five buckets; trained a network on four of the buckets, and predicted the results for the fifth. This procedure was repeated 20 times. Using a standard statistical measure, the strength of identified signals varied from about 0.6 to about 0.9 on a scale of 0.5 (pure chance) to 1 (certainty). The strongest signals were race (black or white) and gender (male or female). No characteristic was identified with certainty. Most of the remaining surprisingly numerous traits were identified with a relatively weak strength of about 0.6.

The authors frankly admit the relative weakness of most of the signals, but argue that such analysis adds another technique to the toolbox of targeted advertising.

An Experiment Casting Doubt on the New Physiognomy

I then discovered the results of an experiment on the use of AI-powered facial analysis in automated evaluation of job applicants via video interviews, performed by staff of the Bayerischer Rundfunk (BR) and published on February 16, 2021.

The product used in the BR experiment employed artificial neural networks to infer scores for openness, conscientiousness, extraversion, agreeableness, and neuroticism. The BR experiment showed that the scores for these attributes could be significantly altered by changing personal appearance or choosing a different background for the video. Indeed, the contribution of the background would seem to be about as important as that of the face. It follows from the experiment that video candidate evaluations can be readily gamed by the candidate by, for example, wearing spectacles or posing in front of a bookcase.

This experiment raised the question in my mind whether the weaker signals identified in the work of Tkachenko and Jedidi might be washed out by slight modifications of personal appearance or image background. Following private discussions with the lead author, I have understood that their model was trained to be robust against such image manipulations as rotations, crops, resizing, or mild color jitter, for example, but would be susceptible to more significant image modifications.

Two further references that may be construed as casting further doubt on a successful resurrection of physiognomy as a science are the following:

So, is physiognomy a science? Can someone’s character be judged from facial characteristics? In order for such judgment to qualify as science, the analysis would have to be repeatedly tested and verified in accordance with accepted protocols of observation, measurement, and evaluation. The results would have to be repeatedly verified by others. Unless these criteria can be met, AI-powered facial analysis remains a pseudoscience.