facial recognition bias

When Detective Danny Reagan says, “The tech is just a tool. If you add that tool to lousy police work, you get lousy results. But if you add it to quality police work, you can save that one life we’re talking about,” he is describing something more fundamental than good policing. He is describing the one difference that separates human decisions from algorithmic ones.

When a human detective makes a mistake, you know who to hold accountable. You can ask why they made that choice. You can review their reasoning. You can examine what alternatives they considered and why they rejected them. You can discipline them, retrain them, or prosecute them.

When an algorithm produces an error, there is no one to answer for it. That is the real threat of artificial intelligence: not that machines will think for themselves, but that we will treat algorithmic outputs as decisions rather than as intelligence that informs human decisions. The danger is not the technology itself, which can surface patterns humans miss and process data at scales humans cannot match. The danger is forgetting that someone human must be responsible when things go wrong.

🎬 Clip from “Boston Blue” (Season 1, Episode 1: Premiere Episode)
Created by Aaron Allen (showrunner)
Starring Donnie Wahlberg, Maggie Lawson, Sonequa Martin-Green, Marcus Scribner

Produced by CBS Studios / Paramount Global
📺 Original air date: October 17 2025 on CBS
All rights © CBS / Paramount Global — used under fair use for commentary and criticism.

Who Decides? That Question Defines Everything.

The current conversation about AI governance misses the essential point. People debate whether AI should be “in the loop” or whether humans should review AI recommendations. Those questions assume AI makes decisions and humans check them.

That assumption is backwards.

In properly governed systems, humans make decisions. AI provides intelligence that helps humans decide better. The distinction is not semantic. It determines who holds authority and who bears accountability. As the National Institute of Standards and Technology’s AI Risk Management Framework (2023) emphasizes, trustworthy AI requires “appropriate methods and metrics to evaluate AI system trustworthiness” alongside documented accountability structures where specific humans remain answerable for outcomes.

Consider the difference in the Robert Williams case. In 2020, Detroit police arrested Williams after a facial recognition system matched his driver’s license photo to security footage of a shoplifting suspect. Williams was held for 30 hours. His wife watched police take him away in front of their daughters. He was innocent (Hill, 2020).

Here is what happened. An algorithm produced a match. A detective trusted that match. An arrest followed. When Williams sued, responsibility scattered. The algorithm vendor said they provided a tool, not a decision. The police said they followed the technology. The detective said they relied on the system. Everyone pointed elsewhere.

Now consider how it should have worked under the framework proposed in the Algorithmic Accountability Act of 2025, which requires documented impact assessments for any “augmented critical decision process” where automated systems influence significant human consequences (U.S. Congress, 2025).

An algorithm presents multiple potential matches with confidence scores. It shows which faces are similar and by what measurements. The algorithm flags that confidence is lower for this particular demographic. The detective reviews those options alongside other evidence. The detective notes in a documented record that match confidence is marginal. The detective documents that without corroborating evidence, match quality alone does not establish probable cause. The detective decides whether action is justified.

If that decision is wrong, accountability is clear. The detective made the call. The algorithm provided analysis. The human decided. The documentation shows what the detective considered and why they chose as they did. The record is auditable, traceable, and tied to a specific decision-maker.

That is the structure we need. Not AI making decisions that humans approve, but humans making decisions with AI providing intelligence. The technology augments human judgment. It does not replace it.

Accountability Requires Documented Decision-Making

When things go wrong with AI systems, investigations fail because no one can trace who decided what, or why. Organizations claim they had oversight, but cannot produce evidence showing which specific person evaluated the decision, what criteria they applied, what alternatives they considered, or what reasoning justified their choice.

That evidential gap is not accidental. It is structural. When AI produces outputs and humans simply approve or reject them, the approval becomes passive. The human becomes a quality control inspector on an assembly line rather than a decision-maker. The documentation captures whether someone said yes or no, but not what judgment process led to that choice.

Effective governance works differently. It structures decisions around checkpoints where humans must actively claim decision authority. Checkpoint governance is a framework where identifiable humans must document and own decisions at defined stages of AI use. This approach operationalizes what international frameworks mandate: UNESCO’s Recommendation on the Ethics of Artificial Intelligence (2024) requires “traceability and explainability” with maintained human accountability for any outcomes affecting rights, explicitly stating that systems lacking human oversight lack ethical legitimacy.

At each checkpoint, the system requires the human to document not just what they decided, but how they decided. What options did the AI present. What alternatives were considered. Was there dissent about the approach. What criteria were applied. What reasoning justified this choice over others.

That documentation transforms oversight from theatrical to substantive. It creates what decision intelligence frameworks call “audit trails tied to business KPIs,” pairing algorithmic outputs with human checkpoint approvals and clear documentation of who, what, when, and why for every consequential outcome (Approveit, 2025).

What Checkpoint Governance Looks Like

The framework is straightforward. Before AI-informed decisions can proceed, they must pass through structured checkpoints where specific humans hold decision authority. This model directly implements the “Govern, Map, Measure, Manage” cycle that governance standards prescribe (NIST, 2023). At each checkpoint, four things happen:

AI contributes intelligence. The system analyzes data, identifies patterns, generates options, and presents findings. This is what AI does well: processing more information faster than humans can and surfacing insights humans might miss. Research shows that properly deployed AI can reduce certain forms of human bias by standardizing evaluation criteria and flagging inconsistencies that subjective judgment overlooks (McKinsey & Company, 2025).

The output is evaluated against defined criteria. These criteria are explicit and consistent. What makes a facial recognition match credible. What evidence standard justifies an arrest. What level of confidence warrants action. The criteria prevent ad hoc judgment and support consistent decision-making across different reviewers.

A designated human arbitrates. This person reviews the evaluation, applies judgment informed by context the AI cannot access, and decides. Not approves or rejects—decides. The human is the decision-maker. The AI provided intelligence. The human decides what it means and what action follows. High-performing organizations embed these “accountability pathways tied to every automated decision, linking outputs to named human approvers” (McKinsey & Company, 2025).

The decision is documented. The record captures what was evaluated, what criteria applied, what the human decided, and most importantly, why. What alternatives did they consider. Was there conflicting evidence. Did they override a score because context justified it. What reasoning supports this decision.

That four-stage process keeps humans in charge while making their decision-making auditable. It acknowledges a complexity: in sophisticated AI systems producing multi-factor risk assessments or composite recommendations, the line between “intelligence” and “decision” can blur. A credit scoring algorithm that outputs a single approval recommendation functions differently than one that presents multiple risk factors for human synthesis. Checkpoint governance addresses this by requiring that wherever the output influences consequential action, a human must claim ownership of that action through documented reasoning.

The Difference Accountability Makes

Testing by the National Institute of Standards and Technology (2019) found that some facial recognition systems were up to 100 times less accurate for darker-skinned faces than lighter ones. The Williams case was not an anomaly. It was a predictable outcome of that accuracy gap. Subsequent NIST testing in 2023 confirmed ongoing accuracy disparities across demographic groups.

But the deeper failure was not technical. It was governance. Without structured checkpoints, no one had to document what alternatives they considered before acting on the match. No one had to explain why the match quality justified arrest given the known accuracy disparities. No one had to record whether anyone raised concerns.

If checkpoint governance had been in place, meeting the standards now proposed in the Algorithmic Accountability Act of 2025, the decision process would have looked different.

The algorithm presents multiple potential matches. It flags that confidence is lower for this particular face. A detective reviews the matches alongside other evidence. The detective notes in the record that match confidence is marginal. The detective documents that without corroborating evidence, match quality alone does not establish probable cause. The detective decides that further investigation is needed before arrest. This decision is logged with the detective’s identifier, timestamp, and rationale.

If the detective instead decides the match justifies arrest despite the lower confidence, they must document why. What other evidence exists. What makes this case an exception. That documentation creates accountability. If the arrest proves wrong, investigators can review the detective’s reasoning and determine whether the decision process was sound.

That is what distinguishes human error from systemic failure. Humans make mistakes, but when decisions are documented, those mistakes can be reviewed, learned from, and corrected. When decisions are not documented, the same mistakes repeat because no one can trace why they occurred.

Why Algorithms Cannot Be Held Accountable

A sentencing algorithm used across the United States, called COMPAS, was found to label Black defendants as high risk at twice the rate of white defendants who did not reoffend (Angwin et al., 2016). When researchers exposed this bias, the system continued operating. No one faced consequences. No one was sanctioned.

Recognizing these failures, some jurisdictions have begun implementing alternatives. The Algorithmic Accountability Act of 2025, introduced by Representative Yvette Clarke, explicitly targets automated systems in “housing, employment, credit, education” and requires deployers to conduct and record algorithmic impact assessments documenting bias, accuracy, explainability, and downstream effects (Clarke, 2025). The legislation provides Federal Trade Commission enforcement mechanisms for incomplete or falsified assessments, creating the accountability structure that earlier deployments lacked.

That regulatory evolution reflects the fundamental difference between human and algorithmic decision-making. Humans can be held accountable for their errors, which creates institutional pressure to improve. Algorithms operate without that pressure because no identifiable person bears responsibility for their outputs. Even when algorithms are designed to reduce human bias through standardized criteria and consistent application, they require human governance to ensure those criteria themselves remain fair and contextually appropriate.

Courts already understand this principle in other contexts. When a corporation harms someone, the law does not excuse executives by saying they did not personally make every operational choice. The law asks whether they established reasonable systems to prevent harm. If they did not, they are liable.

AI governance must work the same way. Someone must be identifiable and answerable for decisions AI informs. That person must be able to show they followed reasonable process. They must be able to demonstrate what alternatives they considered, what criteria they applied, and why their decision was justified.

Checkpoint governance creates that structure. It ensures that for every consequential decision, there is a specific human whose judgment is documented and whose reasoning can be examined.

Building the System of Checks and Balances

Modern democracies are built on checks and balances. No single person has unchecked authority. Power is distributed. Decisions are reviewed. Mistakes have consequences. That structure does not eliminate error, but it prevents error from proceeding uncorrected.

AI governance must follow the same principle. Algorithmic outputs should not proceed unchecked to action. Their insights must inform human decisions made at structured checkpoints where specific people hold authority and bear responsibility. Five governance frameworks now converge on this approach, establishing consensus pillars of transparency, data privacy, bias management, human oversight, and audit mechanisms (Informs Institute, 2025).

There are five types of checkpoints that high-stakes AI deployments need:

Intent Checkpoints examine why a system is being created and who it is meant to serve. A facial recognition system intended to find missing children is different from one intended to monitor peaceful protesters. Intent shapes everything that follows. At this checkpoint, a specific person takes responsibility for ensuring the system serves its stated purpose without causing unjustified harm. The European Union’s AI Act (2024) codifies this requirement through mandatory purpose specification and use-case limitation for high-risk applications.

Data Checkpoints require documentation of where training data came from and who is missing from it. The Williams case happened because facial recognition was trained primarily on lighter-skinned faces. The data gap created the accuracy gap. At this checkpoint, a specific person certifies that data has been reviewed for representation gaps and historical bias. Organizations implementing this checkpoint have identified and corrected dataset imbalances before deployment, preventing downstream discrimination.

Model Checkpoints verify testing for fairness and reliability across different populations. Testing is not one-time but continuous, because system performance changes as the world changes. At this checkpoint, a specific person certifies that the model performs within acceptable error ranges for all affected groups. Ongoing monitoring at this checkpoint has detected concept drift and performance degradation in operational systems, triggering recalibration before significant harm occurred.

Use Checkpoints define who has authority to act on system outputs and under what circumstances. A facial recognition match should not lead directly to arrest but to investigation. The human detective remains responsible for deciding whether evidence justifies action. At this checkpoint, a specific person establishes use guidelines and trains operators on the system’s limitations. Directors and board members increasingly recognize this as a governance imperative, with 81% of companies acknowledging governance lag despite widespread AI deployment (Directors & Boards, 2025).

Impact Checkpoints measure real-world outcomes and correct problems as they emerge. This is where accountability becomes continuous, not just a pre-launch formality. At this checkpoint, a specific person reviews outcome data, identifies disparities, and has authority to modify or suspend the system if harm is occurring. This checkpoint operationalizes what UNESCO (2024) describes as the obligation to maintain human accountability throughout an AI system’s operational lifecycle.

Each checkpoint has the same essential requirement: a designated human makes a decision and documents what alternatives were considered, whether there was dissent, what criteria were applied, and what reasoning justified the choice. That documentation creates the audit trail that makes accountability enforceable.

The Implementation Reality: Costs and Complexities

Checkpoint governance is not without implementation challenges. Organizations adopting this framework should anticipate three categories of burden.

Structural costs include defining decision rights, specifying evaluation criteria with concrete examples, building logging infrastructure, and training personnel on checkpoint protocols. These are one-time investments that require thoughtful design.

Operational costs include the time required for human arbitration at each checkpoint, periodic calibration to prevent criteria from becoming outdated, and maintaining audit trail systems. These are recurring expenses that scale with deployment scope.

Cultural costs involve shifting organizational mindsets from “AI approves, humans review” to “humans decide, AI informs.” This requires executive commitment and sustained attention to prevent automation bias, where reviewers gradually default to approving AI recommendations without critical evaluation.

These costs are real. They represent intentional friction introduced into decision processes. The question is whether that friction is justified. For high-stakes decisions in regulated industries, for brand-critical communications, for any context where single failures create significant harm to individuals or institutional reputation, the accountability benefits justify the implementation burden. For lower-stakes applications where rapid iteration matters more than individual decision traceability, lighter governance or even autonomous operation may be appropriate.

The framework is risk-proportional by design. Organizations can implement comprehensive checkpoints where consequences are severe and streamlined governance where they are not. The principle remains constant: someone specific must be responsible, their decision process must be documented, and they must be answerable when things go wrong.

What Detective Reagan Teaches About Accountability

Reagan’s instinct to question the facial recognition match is more than good detective work. It is the pause that creates accountability. That moment of hesitation is the checkpoint where a human takes responsibility for what happens next.

His insight holds the key. The tech is just a tool. Tools do not bear responsibility. People do. The question is whether we will build systems that make responsibility clear, or whether we will let AI diffuse responsibility until no one can be held to account for decisions.

We already know what happens when power operates without accountability. The Williams case shows us. The COMPAS algorithm shows us. Every wrongful arrest, every biased loan denial, every discriminatory hiring decision made by an insufficiently governed AI system shows us the same thing: without structured accountability, even good intentions produce harm.

What This Means in Practice

Checkpoint governance is not theoretical. Organizations are implementing it now. The European Union AI Act (2024) requires impact assessments and human oversight for high-risk systems. The Algorithmic Accountability Act of 2025 establishes enforcement mechanisms for U.S. federal oversight. Some states mandate algorithmic audits. Some corporations have established AI review boards with authority to stop deployments.

But voluntary adoption alone is insufficient. Accountability requires structure. It requires designated humans with decision authority at specific checkpoints. It requires documentation that captures the decision process, not just the decision outcome. It requires consequences when decision-makers fail to meet their responsibility.

The structure does not need to be identical across all contexts. High-stakes decisions in regulated industries (finance, healthcare, criminal justice) require comprehensive checkpoints at every stage. Lower-stakes applications can use lighter governance. The principle remains constant: someone specific must be responsible, their decision process must be documented, and they must be answerable when things go wrong.

That is not asking AI to be perfect. It is asking the people who deploy AI to be accountable.

Humans make mistakes. Judges err. Engineers miscalculate. Doctors misdiagnose. But those professions have accountability mechanisms that create institutional pressure to learn and improve. When a judge makes a sentencing error, the decision can be appealed and the judge’s reasoning reviewed. When an engineer’s design fails, investigators examine whether proper procedures were followed. When a doctor’s diagnosis proves wrong, medical boards review whether the standard of care was met.

AI needs the same accountability structure. Not because AI should be held to a higher standard than humans, but because AI should be held to the same standard. Decisions that affect people’s lives should be made by humans who can be held responsible for their choices.

The Path Forward

If we build checkpoint governance into AI deployment, we have nothing to fear from the technology. The algorithms will do what they have always done: process information faster and more comprehensively than humans can, surface patterns that human attention might miss, and apply consistent criteria that reduce certain forms of subjective bias. But decisions will remain human. Accountability will remain clear. When mistakes happen, we will know who decided, what they considered, and why they chose as they did.

If we do not build that structure, the risk is not the algorithm. The risk is the diffusion of accountability that lets everyone point elsewhere when things go wrong. The risk is the moment when harm occurs and no one can be identified as responsible.

Detective Reagan is right. The tech is just a tool, but only when someone accepts responsibility for how it is used. Someone must wield it. Someone must decide what it means and what action follows. Someone must answer when the decision proves wrong.

Checkpoint governance ensures that someone exists. It makes them identifiable. It documents their reasoning. It creates the accountability that lets us trust AI-informed decisions because we know humans remain in charge.

That is the system of checks and balances artificial intelligence needs. Not to slow progress, but to direct it. Not to prevent innovation, but to ensure innovation serves people without leaving them defenseless when things go wrong.

The infrastructure is emerging. The Algorithmic Accountability Act establishes federal oversight. The EU AI Act provides a regulatory template. UNESCO’s ethical framework sets international norms. Corporate governance is evolving to match technical capability with human accountability.

The question now is execution. Will organizations implement checkpoint governance before the next Williams case, or after. Will they build audit trails before regulators demand them, or in response to enforcement. Will they treat accountability as a design principle, or as damage control.

Detective Reagan’s pause should be systemic, not individual. It should be built into every consequential AI deployment as structure, not left to the judgment of individual operators who may or may not question what the algorithm presents.

The tech is just a tool. We are responsible for ensuring it remains one.

References

Algorithmic Accountability Act of 2025, S.2164, 119th Congress (2025). https://www.congress.gov/bill/119th-congress/senate-bill/2164/text
Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016, May 23). Machine Bias. ProPublica. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
Approveit. (2025, October 16). AI Decision-Making Facts (2025): Regulation, Risk & ROI. https://approveit.today/blog/ai-decision-making-facts-(2025)-regulation-risk-roi
Clarke, Y. (2025, September 19). Clarke introduces bill to regulate AI’s control over critical decision-making in housing, employment, education, and more [Press release]. https://clarke.house.gov/clarke-introduces-bill-to-regulate-ais-control-over-critical-decision-making-in-housing-employment-education-and-more/
Directors & Boards. (2025, June 26). Decision-making in the age of AI. https://www.directorsandboards.com/board-issues/ai/decision-making-in-the-age-of-ai/
European Commission. (2024). Regulation (EU) 2024/1689 (Artificial Intelligence Act). Official Journal of the European Union. https://eur-lex.europa.eu/eli/reg/2024/1689/oj
Hill, K. (2020, June 24). Wrongfully Accused by an Algorithm. The New York Times. https://www.nytimes.com/2020/06/24/technology/facial-recognition-arrest.html
Informs Institute. (2025, July 21). Navigating AI regulations: What businesses need to know in 2025. https://pubsonline.informs.org/do/10.1287/LYTX.2025.03.10/full/
McKinsey & Company. (2025, June 3). When can AI make good decisions? The rise of AI corporate citizens. https://www.mckinsey.com/capabilities/operations/our-insights/when-can-ai-make-good-decisions-the-rise-of-ai-corporate-citizens
National Institute of Standards and Technology. (2019). Face Recognition Vendor Test (FRVT). https://www.nist.gov/programs-projects/face-recognition-vendor-test-frvt
National Institute of Standards and Technology. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). U.S. Department of Commerce. https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf
UNESCO. (2024, September 25). Recommendation on the Ethics of Artificial Intelligence. https://www.unesco.org/en/artificial-intelligence/recommendation-ethics