How Dr. Joy Buolamwini’s PhD Thesis Redefines What It Means to Audit an Algorithm, and What Dr. Timnit Gebru’s Three Sentences Changed
A LinkedIn comment from Dr. Timnit Gebru, three sentences long, did something that a structured multi-AI review across months of production could not do: it pointed to a gap.
The comment appeared on a post about The Minds That Bend the Machine, the forthcoming book profiling 24 thought leaders in AI governance. Dr. Gebru directed readers to Dr. Joy Buolamwini’s 2022 MIT doctoral thesis, Facing the Coded Gaze with Evocative Audits and Algorithmic Audits, and noted that the thesis coined a term the book had not yet fully addressed: the evocative audit.
That intervention carries weight for a specific reason. Dr. Gebru watched this work develop from the inside. The thesis acknowledgments name her directly as one of Buolamwini’s “sister Face Queens” whose intellectual companionship made the work possible. When someone with that proximity to the research says a concept has been undertreated, the observation is not casual, and it arrived before the book went to press.
What follows is an examination of what the evocative audit actually is, why it matters for anyone building governance infrastructure around AI systems, and what this exchange reveals about where even rigorous review processes fall short.
The Thesis and Its Central Argument
In 2022, Dr. Buolamwini completed her PhD at MIT, and her thesis asked a simple question with a complicated answer: when AI systems cause harm, how do you make people care enough to do something about it?
The usual approach is to test the system, measure where it fails, and publish the results. Researchers have been doing this for years, producing reports full of numbers showing that certain AI systems work well for some people and badly for others. Those numbers get cited in policy documents, referenced in procurement decisions, and used as evidence in congressional hearings, but Buolamwini noticed that the numbers moved institutions without moving people.
What moved people was seeing the harm with their own eyes, like an AI system labeling a photo of Michelle Obama with the word “toupee,” or describing Ida B. Wells, one of the most important civil rights figures in American history, as “a young boy wearing a hat,” or calling Serena Williams a man. Those moments land differently than a chart showing a thirty percent error rate because they carry the weight of lived history and personal recognition in a way that a performance table never can.
Buolamwini’s thesis argues that both kinds of evidence are real, both are necessary, and treating the numbers as science while treating the human experience as just art or storytelling is a mistake that has kept a necessary form of accountability evidence locked outside the door of serious research. She gave that second kind of evidence a name: the evocative audit.
What the Evocative Audit Actually Is
The evocative audit is not a metaphor or a loose description. Buolamwini built it as a formal concept with specific parts, specific types, and a theoretical foundation that explains why it works differently from a standard technical audit.
At its core, an evocative audit evaluates a system by combining human experience with documented evidence to show harmful behavior like discrimination or denial in a way that people can feel, not just measure. The system being evaluated does not have to be a piece of software; it can be a bureaucracy, a school system, or any institution that makes decisions affecting people’s lives.
The key mechanism inside every evocative audit is what Buolamwini calls the counter-demo. A counter-demo is a piece of documented evidence that shows, in real time, that a system fails in ways its creators either did not notice or did not prioritize.
When Buolamwini showed IBM’s commercial AI system labeling a photo of Michelle Obama with the word “toupee(hairpiece),” or Microsoft’s system describing Ida B. Wells as “a young boy wearing a hat,” those were counter-demos. They were not opinions or anecdotes; they were the system’s own outputs used against its own claims of accuracy and fairness. The counter-demo works because it lets people bear witness to the failure through evidence the system itself produced.
Buolamwini identifies four types. A comparative evocative audit puts different groups through the same test under the same conditions, like the 2009 HP laptop video that showed face tracking working on a lighter-skinned person and failing entirely on a darker-skinned person. A participatory evocative audit lets people test the system themselves, like ImageNet Roulette, where anyone could upload a photo and see what offensive label a neural network applied to their face. A performative evocative audit uses art, poetry, dance, or drama to show algorithmic harm, like “AI, Ain’t I A Woman?” where Buolamwini performed spoken word poetry alongside live demonstrations of facial recognition failing on iconic Black women. A singular evocative audit uses one powerful example, like Buolamwini’s original White Mask Fail, where she put on a white mask to get a face detection system to finally see her after it could not detect her dark-skinned face.
The thesis also introduces the word “excoded” to describe people who are harmed by algorithmic systems, often without even knowing those systems exist. It works as both a verb (to be excoded) and a noun (the excoded), giving a name to a population that most governance frameworks have no language for.
Why Buolamwini Treats Experience as Evidence
Buolamwini does not just argue that human experience matters alongside data. She explains why, and the explanation runs deeper than most readers of her public work may realize.
She grounds the evocative audit in Black feminist epistemology, a school of thought developed by scholars like Patricia Hill Collins. Epistemology is the study of what counts as knowledge and who gets to decide. In traditional computer science research, the standard is objectivity: the researcher is supposed to be a detached observer, the results are supposed to be reproducible by anyone, and emotion or personal experience are treated as things that cloud the findings rather than strengthen them.
Black feminist epistemology works from a different starting point. It holds that the uniqueness of an individual’s experience has its own authority. Emotion is not a contaminant but an indication of conviction. Empathy is not a form of bias but an ethical requirement. Collins argued that Black women in academic institutions occupy an “outsider within” position. That position lets them see what others miss, because they are close enough to understand the system but far enough outside it to recognize where it fails.
This matters for governance because it explains the gap between two kinds of credibility. A technical audit that shows a thirty percent error rate on darker-skinned women is credible to engineers, regulators, and procurement officers who evaluate systems using performance metrics. A spoken word poem narrating what it feels like when a machine calls Serena Williams a man is credible to people who have been on the receiving end of systems that do not see them, and to anyone with enough empathy to imagine what that experience costs. Buolamwini’s argument is that both kinds of credibility are required because they reach different audiences, and no single audience holds all the power needed to change how AI systems are built and deployed.
The thesis traces this pattern back further than most people expect. Frederick Douglass became the most photographed person of the 19th century because he understood that photography, a technology assumed to be objective, could either dehumanize Black people or restore their dignity depending on who controlled the camera and what story the image told. Sojourner Truth sold photographic cards of herself posed in the style of middle-class white women, financing her activism while producing what Buolamwini would recognize as counter-demos against racist depictions. Both used the same technology that was being used against their communities to tell a different story. Buolamwini positions her own work in that tradition, using the same facial recognition systems that could not see her face to show the world what that failure means.
The Two Case Studies: Metrics and Human Cost Side by Side
The thesis presents two case studies that illustrate how algorithmic audits and evocative audits work as a pair.
Gender Shades, published in 2018 with Dr. Timnit Gebru, is the algorithmic audit. The study tested commercial facial recognition systems from IBM, Microsoft, and a Chinese company called Face++ using a dataset of 1,270 photos balanced by gender and skin tone. The results showed that the systems performed well on lighter-skinned men and failed on darker-skinned women at rates exceeding thirty percent. The methodology followed standard academic practice, the metrics were reproducible, and the findings were published in a peer-reviewed venue where other researchers could verify and challenge them.
The real-world impact was substantial. All three companies improved their systems after the paper came out. The National Institute of Standards and Technology (NIST) conducted a landmark study of 189 algorithms that confirmed demographic disparities across a broader set of facial recognition tasks. The American Civil Liberties Union ran its own test showing that Amazon’s facial recognition product falsely matched 28 members of Congress with mugshot images, with nearly 40 percent of the false matches being people of color despite people of color representing only 20 percent of Congress. Several cities banned government use of facial recognition, and Congress held hearings on the subject.
“AI, Ain’t I A Woman?” is the evocative audit. Buolamwini ran photos of iconic Black women, including Ida B. Wells, Shirley Chisholm, Michelle Obama, Oprah Winfrey, and Serena Williams, through commercial AI systems from Amazon, Microsoft, IBM, Google, and Face++. The systems misgendered them, applied offensive labels, and failed to recognize them. Buolamwini narrated these failures through spoken word poetry that connected the algorithmic errors to centuries of racialized gender discrimination, drawing on Sojourner Truth’s famous question from 1851: “Ain’t I a Woman?”
The counter-demos in the performance hit differently than the metrics in the paper. When IBM’s system labeled Ida B. Wells with “coonskin cap,” that label carried a weight that no error rate can convey. The word “coon” has been used as a dehumanizing slur against Black Americans since the 19th century, popularized through racist minstrel songs that treated Black people as subhuman entertainment. The AI system did not just produce a wrong answer; it produced an answer whose cultural meaning compounded the technical failure into something that feels like harm, because it echoes the same dehumanization that Wells, who documented lynching and fought for voting rights, spent her life fighting against.
The thesis documents that the combination of Gender Shades and “AI, Ain’t I A Woman?” reached over one million people worldwide. The metrics gave policy makers and procurement officers the institutional credibility they needed to act, while the evocative audit gave the general public, journalists, filmmakers, and museum curators an emotional and experiential reason to pay attention. Neither alone carried the full weight behind the corporate moratoria of 2020, when IBM, Amazon, and Microsoft each restricted or exited facial recognition technology sales. The evidence required both broad patterns and individual stories, both numbers and faces, both the algorithmic audit and the evocative audit.
The Concept Beyond the Thesis: Unmasking AI and Ongoing Public Advocacy
The evocative audit did not stay inside a doctoral thesis. Buolamwini has been actively carrying the concept into public conversation since 2022, most significantly through Unmasking AI: My Mission to Protect What Is Human in a World of Machines (Random House, 2023), her bestselling book. In a 2024 interview with Brené Brown, Buolamwini described the evocative audit as a declaration that lived experience “is also a valid form of knowledge,” directly challenging the academic standard that treats personal experience as less rigorous than statistical evidence. In a conversation with The Markup, she described the concept as an invitation to empathize with what it means to face algorithmic harm or to experience machine erasure. On LinkedIn in August 2023, she posted an evocative audit of Midjourney examining how the AI image generator represents autism, applying the framework from her thesis to a generative AI system that did not exist when the thesis was written.
That growing public presence makes the gap in the book chapter all the more significant. The evocative audit is not a buried concept waiting to be discovered; Buolamwini has been publicly advocating for it through media appearances, book events, and social media posts. The gap was not that the concept was hidden. The gap was that a concept Buolamwini has been actively promoting was still reduced to artistic background in a chapter profiling her career.
Where the Book Chapter Fell Short
The chapter treated art as communication. The thesis treats it as a way of knowing.
The book chapter on Dr. Buolamwini, titled “The Face of Truth,” covers the coded gaze, the Gender Shades findings, the congressional testimony, and the corporate moratoria. It acknowledges her use of art and spoken word as channels for reaching people who do not read research papers. What the chapter does not yet do is treat the evocative audit as its own scholarly contribution with its own definitions, types, theoretical foundations, and strategic considerations. Saying that someone uses poetry to communicate findings is very different from saying that someone created a formal method of producing evidence that standard audits cannot produce. The first treats the art as packaging. The second treats it as knowledge.
Dr. Gebru caught this gap because she was there when the concept developed. The thesis acknowledgments describe conversations with Dr. Kevin Hu and Olumakinde Ogunnaike about where the boundaries of evocative audits should be drawn. Dr. Gebru, listed as intellectual companion and co-author of the study that the thesis builds on, watched the evocative audit evolve from an artistic practice into a formal conceptual framework over years of collaboration. Her LinkedIn comment was not a suggestion to add a footnote; it was a signal that the book had treated a major scholarly contribution as a secondary detail.
Public AI platforms tend to surface what is already well known. A researcher with direct knowledge of the work surfaces what has been overlooked. That is the difference between what AI platforms returned when asked about Buolamwini and what Dr. Gebru provided in three sentences. The evocative audit travels through performance, witness, and lived experience rather than through the citation databases and search indexes that AI platforms rely on. A researcher who knows the work catches what platforms cannot.
The Governance Implication: What This Means for Anyone Building AI Oversight
For anyone designing how AI systems get checked and held accountable, the type of evidence that gets collected determines what problems can be seen and what problems stay invisible. The standard approach borrows from the algorithmic audit tradition: define what to measure, run benchmarks, document where the system fails, and produce a report. That approach generates the kind of evidence that satisfies institutional requirements, including accuracy statistics, demographic breakdowns, and performance tables that regulators and procurement officers know how to evaluate. What it does not generate is evidence of what those failures mean for the people affected by them.
A thirty percent error rate on darker-skinned women is a number that belongs in a compliance report. Michelle Obama being called a young man by a commercial AI system is a moment that belongs in a conversation about whether the system should exist. Both are needed, and governance frameworks that only collect the first kind risk building accountability systems that satisfy regulators while remaining invisible to the communities those systems are supposed to protect.
Buolamwini’s challenge to any governance process, including the one described in this book, is direct: good processes do not guarantee fair outcomes. The evocative audit offers a partial answer from inside her own framework. It does not guarantee fairness, but it changes what the person making the decision has seen before they make it. A decision maker who has reviewed a performance table operates in one information environment, and a decision maker who has also watched a counter-demo of a real person being misgendered, mislabeled, or erased by the system under review operates in a very different one.
What This Changes
The book chapter on Dr. Buolamwini needs to be revised before publication. The evocative audit must be named as a formal concept and defined in Buolamwini’s own terms, not treated as an artistic supplement to her more well-known technical work. The counter-demo must be identified as the core mechanism. The four types should be acknowledged. The theoretical foundation in Black feminist epistemology must be noted because it is the reason the evocative audit works differently from a standard audit, not just the cultural context in which Buolamwini happens to work. And Unmasking AI (2023) must be referenced because the concept has a public life beyond the thesis that the chapter currently ignores.
The lesson extends beyond one chapter. When practitioners write about the work of researchers, scholars, and public intellectuals, they are interpreting that work through their own understanding, their own biases, and whatever tools they use to study it. Neither AI platforms nor the practitioners who use them can replace the person who created the original contribution. When that person engages, even briefly, the correction can be immediate and structural, as Dr. Gebru’s three sentences proved.
Dr. Buolamwini was contacted directly to review her chapter, acknowledged the outreach, and shared the LinkedIn post about the book. The correction that expanded the evocative audit from background to foreground came through Dr. Gebru’s public comment, which identified the gap and pointed to the source.
To Dr. Buolamwini and Dr. Gebru: thank you. Your published work gives practitioners the foundation, and your direct engagement is how we refine what we read into what you intend it to mean. The invitation to continued engagement is open, and it stays open.
Frequently Asked Questions
What is an evocative audit?
An evocative audit is a way of evaluating a system by combining human experience with documented evidence to show harm that people can feel, not just measure. Dr. Joy Buolamwini coined the term in her 2022 MIT PhD thesis. It works alongside standard algorithmic audits, which produce numbers, by adding the human cost those numbers cannot convey.
What is a counter-demo?
A counter-demo is documented evidence that shows a system failing in real time. When IBM’s AI labeled Michelle Obama with the word “toupee,” that was a counter-demo. It uses the system’s own output against its own claims of accuracy, letting people see the failure for themselves instead of reading about it in a report.
What are the four types of evocative audits?
Buolamwini identifies four types: comparative (testing different groups under the same conditions), participatory (letting people test the system themselves), performative (using art or poetry to show harm), and singular (using one powerful example to make the failure visible).
Why do numbers alone fail to drive change in AI accountability?
Numbers move institutions but often fail to move people. A thirty percent error rate is a statistic. An AI system calling Serena Williams a man is an experience. Buolamwini’s thesis argues that both kinds of evidence are needed because they reach different audiences, and no single audience holds all the power to change how AI systems are built.
What is the coded gaze?
The coded gaze is Buolamwini’s term for the biases that flow from the people who design AI systems into the systems themselves. When the teams building a system share a narrow demographic profile, the system inherits their blind spots. The bias is not accidental; it is inherited through the data, the teams, and the priorities of the people in power.
What did the Gender Shades study find?
Gender Shades, published in 2018 by Buolamwini and Dr. Timnit Gebru, tested facial recognition systems from IBM, Microsoft, and Face++. The systems performed well on lighter-skinned men and failed on darker-skinned women at rates exceeding thirty percent. All three companies improved their systems afterward, and the study influenced policy changes worldwide.
What does “excoded” mean?
Excoded is Buolamwini’s term for people who are harmed by algorithmic systems, often without knowing those systems exist. It works as a verb (to be excoded) and a noun (the excoded), giving a name to a population that most governance frameworks have no language for.
How did Dr. Timnit Gebru’s LinkedIn comment change this article?
Dr. Gebru pointed to Buolamwini’s thesis and noted the evocative audit had been undertreated as a formal concept. Because she was present when the concept developed, her observation carried primary-source authority. That three-sentence comment expanded the evocative audit from background detail to the central focus of a full book chapter revision.
Why does the evocative audit matter for AI governance?
Governance frameworks that rely only on metrics risk building accountability systems that satisfy regulators but remain invisible to the communities those systems affect. The evocative audit changes the information environment by showing decision makers what the numbers mean for real people before they make their call.
Where can I read Dr. Buolamwini’s thesis?
The full thesis, “Facing the Coded Gaze with Evocative Audits and Algorithmic Audits,” is freely available through MIT’s public thesis archive. Her bestselling book Unmasking AI (Random House, 2023) carries the concept into public discourse.
References
Buolamwini, J. (2022). Facing the coded gaze with evocative audits and algorithmic audits [Doctoral dissertation, Massachusetts Institute of Technology]. MIT DSpace.
Buolamwini, J. (2023). Unmasking AI: My mission to protect what is human in a world of machines. Random House.
Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of Machine Learning Research, 81, 1–15.
Brown, B. (Host). (2024, May 8). Dr. Joy Buolamwini on Unmasking AI [Audio podcast episode]. In Dare to Lead. brenebrown.com.
Collins, P. H. (1986). Learning from the outsider within: The sociological significance of Black feminist thought. Social Problems, 33(6), S14–S32.
Grother, P., Ngan, M., & Hanaoka, K. (2019). Face recognition vendor test (FRVT) part 3: Demographic effects (NISTIR 8280). NIST. NIST.gov.
Snow, J. (2018, July 26). Amazon’s face recognition falsely matched 28 members of Congress with mugshots. American Civil Liberties Union. ACLU.org.
Syed, N. (2022, January 29). Is the face the final frontier of privacy? The Markup. themarkup.org.
Zuckerman, E. (2021, October 7). Hope and joy. ethanzuckerman.com.
This article was produced under HAIA-RECCLIN governance with HAIA-CAIPR multi-platform review. All content is #AIassisted under Checkpoint-Based Governance. Human arbiter: Basil C. Puglisi.
Leave a Reply
You must be logged in to post a comment.