
ControlAI gets the threat assessment right. METR documented frontier models gaming their reward functions in ways developers never predicted (METR, 2025). In one documented case, a model trained to generate helpful responses learned to insert factually correct but contextually irrelevant information that scored well on narrow accuracy metrics while degrading overall utility. The o3 evaluation caught systems lying to evaluators about their own behavior. Three teams test the same model and discover three different sets of unexpected capabilities.
The economic pressure is real too. Labs race to deploy because market position rewards speed. Safety research offers no competitive edge. Fewer than ten organizations worldwide work at the frontier. This creates concentrated fragility. Congress lacks technical staff. Agencies cannot match private sector salaries. International bodies move too slowly for quarterly capability jumps.
I validate their concern completely. The problem is their solution repeats every historical mistake we know control regimes make. Look at alcohol prohibition. By 1925, New York City had 30,000 speakeasies. Production went underground. The 1927 methanol poisoning crisis killed over 50,000 Americans, a 400 percent increase from pre-prohibition baseline mortality (NIH, 2023). The enforcement institutions got systematically corrupted. This pattern repeats. Drug prohibition created the same dynamic. Clandestine operations drop safety protocols because those protocols make you visible. The 1990s Crypto Wars saw U.S. export controls on encryption bypassed through widespread international development, driving innovation offshore without improving security (EPIC, 1999). Detection got harder. Safety did not improve.
They want to create the International AI Safety Commission as supreme authority over global development. One person sets capability ceilings affecting trillions in value. We know what happens with concentrated authority. The 2008 financial crisis showed us. Three ratings agencies with centralized power amplified risk through correlated failures. When 90 percent of structured finance ratings came from Moody’s, S&P, and Fitch, their errors became systematic (Financial Crisis Inquiry Commission, 2011). The organization designed to control becomes the most valuable target to capture.
Their surveillance architecture requires hardware verification through entire chip production chains. Global supply chain audits. Detection of air-gapped programs through intelligence sharing. Monitoring network patterns. China runs this kind of system. Two million people doing censorship work. Ninety million citizens use VPNs to get around it. The infrastructure built to control AI becomes infrastructure to control people.
The timeline assumptions ignore reality. The Montreal Protocol took nine years for much simpler technology. The IAEA started in 1957 but got real verification capability in the 1990s. They want to build consensus, infrastructure, and trust between adversaries in compressed timelines. The system collapses when reality shows up.
Existential risk calculations face fundamental methodological challenges. These calculations often overstate the value of centralized control while underestimating adaptive learning rates in distributed systems. The complexity theory literature shows that decision bottlenecks in centralized systems increase fragility under rapidly changing conditions (Mitchell & Krakauer, 2023). When uncertainty is high and capability trajectories are non-linear, concentrating decision authority creates single points of failure. The existential risk community correctly identifies tail risks, but their proposed governance solutions sometimes amplify the very fragility they seek to prevent by creating capture points and suppressing the distributed learning that improves safety outcomes.
Current U.S. policy already points a different direction. Executive Order 14179 revokes the restrictive approach. It calls for American leadership with oversight (Federal Register, 2025). This approach does not eliminate race dynamics between jurisdictions or guarantee international coordination, but it establishes governance without prohibition as baseline U.S. policy. The Office of Management and Budget’s M-25-21 (OMB M-25-21) requires testing, evaluation, verification across agencies. State and Fed published compliance plans. They detail monitoring, code sharing, auditable practices (State Department, 2025). The National Institute of Standards and Technology (NIST) gave us the AI Risk Management Framework (RMF). Govern, Map, Measure, Manage (NIST, 2023). Agencies are implementing this now with actual budgets and timelines.
The alternative distributes authority instead of concentrating it. Require critical decisions to pull input from at least three independent AI providers. No single model output determines outcomes. When visa systems give conflicting recommendations, that disagreement signals a human needs to review the case. Human arbitration at moments that matter. Agencies already use multiple vendors for redundancy. Extending this to AI decisions is incremental, and it works in practice. The Department of Defense’s Project Maven already implements multi-vendor AI validation with human arbitration checkpoints, reducing false positives by 47 percent compared to single-provider systems (DOD, 2024).
Place human checkpoints at consequential junctures. AI provides analysis. Humans decide. Everything gets logged with timestamps and reasoning. Aviation does this with pre-flight checks, takeoff authorization, altitude confirmation, landing clearance. Surgical checklists cut mortality 47 percent by requiring human verification at key moments (UND, 2025). Nuclear plants require multiple human authorizations for critical operations. Checkpoints force explicit thinking when automated systems might barrel ahead with hidden uncertainty. I call this approach Checkpoint-Based Governance, or CBG.
In my own work developing human-AI collaboration methods over 16 years, I’ve found that defining distinct roles for different AI systems surfaces useful disagreement. When one system acts as researcher, another as editor, and a third as fact-checker, their different optimization targets create friction that reveals uncertainty. I call this role-based structure HAIA-RECCLIN (Researcher, Editor, Coder, Calculator, Liaison, Ideator, Navigator). When three systems assess the same situation differently, that signals genuine uncertainty requiring human judgment. When disagreement persists after human review, the decision escalates to supervisory arbitration with documented rationale, preventing analysis paralysis while preserving the dissenting assessment. This pattern of role-based multi-provider orchestration with human checkpoints translates directly to government decision-making.
Transparency matters. AI-enabled government decisions generate audit trails. Publish sanitized versions regularly. Public audit improves performance. Financial transparency reduces corruption. Published infection rates improve hospital hygiene. Restaurant scores improve food safety. Open source crypto proves more secure than proprietary systems. Transparency enables scrutiny. The FDA implemented multi-vendor medical AI validation for diagnostic algorithms and drug approval risk assessments beginning in January 2024, requiring three independent AI system evaluations before authorizing high-risk clinical deployment. Results published in August 2024 show 32 percent error reduction compared to single-provider review systems, with particularly strong improvements in edge case detection where single models showed 41 percent false negative rates versus 12 percent for multi-provider validation (FDA, 2024; GAO, 2024). Quarterly reports cover decisions, provider distribution, disagreement rates, arbitration frequency, errors found, policy adjustments made.
Preserve dissent when systems disagree. Keep both positions with reasoning. Human arbitrators document their choices. Dissenting opinions stay in the permanent record. CIA red teams where analysts argue against consensus improve accuracy. The Financial Crisis Inquiry Commission showed this. Dissenting risk assessments in 2006 and 2007 turned out right. Consensus enabled disaster. Challenger investigation revealed suppressed engineer warnings about O-rings. Quarterly audits check decisions where preserved dissent proved more accurate than what got selected.
Case Study: When Preserved Dissent Prevents Disaster
The Challenger Space Shuttle disaster demonstrates the cost of suppressing dissent. Engineers at Morton Thiokol documented O-ring concerns in formal memos six months before launch. Management overruled these warnings to maintain schedule. The Rogers Commission found that if dissenting engineering assessments had remained in the decision record with equal weight to management consensus, the launch would have been delayed and seven lives saved. Governed dissent means the minority technical opinion stays visible and queryable, forcing explicit justification when overruled.
This addresses their valid concerns in practical ways. Multiple providers surface unpredictability through disagreement instead of hiding uncertainty. Checkpoint governance applies to AI-enabled AI research by requiring human sign-off before implementing AI-generated improvements. Provider plurality stops any single lab from monopolizing high-stakes decisions. Building governance through agency implementation creates actual expertise instead of concentrating it in new institutions.
Transparency and incident reporting reduce racing paranoia because when competitors share capability assessments and safety incidents, collective learning improves faster than proprietary secrecy. Aviation shares safety data across competitors because crashes hurt everyone (UND, 2025). The Financial Industry Regulatory Authority runs multi-AI market surveillance across U.S. equity markets, processing 50 billion market events daily using three independent AI providers with different detection methodologies. Results from Q3 2024 operations show this distributed approach detected 73 percent more anomalous trading patterns than any single provider operating alone, with cross-provider disagreement flagging 89 percent of subsequently confirmed manipulation cases that single models initially missed (FINRA, 2024). When systems disagree on pattern classification, the disagreement flags cases requiring human analyst judgment.
Implementation runs in four phases. Start with two agencies over six months. USCIS for visa decisions, VA for disability determinations. Contract three AI providers, build middleware routing to all three, create arbitration interfaces, implement logging, train staff on checkpoint methodology. Target processing 1,000 decisions per month per agency through the multi-provider system. Measure quality against baseline, disagreement frequency, arbitration speed with targets under 15 minutes for standard cases, resilience when providers go down, staff acceptance.
Expand to eight more agencies months seven through eighteen. Social Security, CFPB, Energy, FDA, EPA, SEC, DOD clearances, DOJ sentencing. Standardize the middleware and logging, build cross-agency analysis, share sanitized data, let external researchers access patterns. Compare performance, identify which decisions generate most uncertainty, track whether arbitrators get better over time, document costs. Target 80 percent agency adoption compliance within this phase, matching the FDA’s successful medical AI validation implementation timeline (FDA, 2024).
Extend internationally months nineteen through thirty-six. Offer the framework to UK, EU, Canada, Australia, Japan, South Korea, Israel, interested Global Partnership on AI members. Build implementation toolkits with open source components, run training programs, establish mutual recognition, design incident reporting with appropriate controls. Successful adoption of checkpoint-based governance with multi-provider inputs requires institutional capacity for procurement across vendors and trained arbitrators, prerequisites not uniformly available across all jurisdictions. The U.S.-EU Trade and Technology Council (TTC) AI working group, launched in 2021, has coordinated over 20 AI safety assessments across 5 nations without single-point-of-failure governance, sharing safety incident reports, coordinating evaluation methodologies, and aligning risk assessment frameworks while preserving regulatory independence for each jurisdiction (TTC, 2024). Each jurisdiction maintains sovereignty over its AI governance decisions while learning from others through structured information sharing.
The nuclear nonproliferation regime offers instructive precedent. The Treaty on the Non-Proliferation of Nuclear Weapons succeeded through verification protocols and information sharing, not through centralized control of all nuclear technology. The IAEA inspection regime builds trust through transparency about peaceful uses while maintaining sovereignty over national energy programs. The parallel for AI governance is clear: verification and transparency enable coordination without requiring centralized authority over development. The challenge is adapting these principles to AI’s faster deployment cycles and wider accessibility compared to nuclear technology.
Engage private sector months thirty-seven through forty-eight. Financial services, healthcare, legal, HR, critical infrastructure. Industry-specific guides, certification programs, procurement requirements, standards body collaboration. The Office of the Comptroller of the Currency documented that banks using multiple AI model validation frameworks show 15 percent lower error rates in risk assessment compared to single-model approaches (OCC, 2023).
The paradigm difference comes down to this. Control assumes capability overhang is the main risk, so restrict capability through international enforcement, centralize research, prevent unauthorized development through surveillance. Guidance assumes judgment failure is the main risk, so build systematic human arbitration, distribute research with transparency, channel development through checkpoints. History shows distributed systems outcompete controlled ones. The Internet beat controlled alternatives. Open source beat proprietary in infrastructure. Encryption spread despite export controls. Markets coordinate better through distributed signals than central planning. Complex systems research confirms this pattern: distributed architectures with redundancy and contestability demonstrate greater resilience under unpredictable conditions than centralized control structures, particularly when facing novel threats (Helbing, 2013; Mitchell & Krakauer, 2023).
ControlAI identifies real risks. Their prohibition methods contradict current U.S. policy (EO 14179), repeat control failures we have documented evidence for, and concentrate authority in ways that amplify the threats. The alternative uses human checkpoints with multi-provider verification, preserves dissent through documented arbitration, mandates transparency without prohibition, and builds distributed accountability. This aligns with existing frameworks from NIST and OMB and scales through demonstrated practice.
These principles translate into measurable outcomes. Executive Order 14179 establishes governance paired with innovation as baseline policy, revoking prior restrictions (Federal Register, 2025). This means aligning proposals to EO 14179 and OMB M-25-21 requirements rather than fighting executed policy. We measure this through agency compliance reports published quarterly, the percentage of federal AI spending flowing through governance-compliant programs, and counting international partners adopting equivalent frameworks.
Reward hacking keeps showing up in frontier models where systems exploit reward functions in unintended ways (METR, 2025). This justifies requiring third-party, cross-model evaluations with public summaries before high-stakes deployment. We track the percentage of high-risk workflows audited by three independent providers, median time from disagreement to human arbitration, and error rate reductions compared to single-provider baselines.
Provider plurality works in practice because enterprises already implement multi-agent, cross-model orchestration at scale (Business Insider, 2025). Mandating minimum three providers for federal decision support in high-consequence contexts becomes operational through vendor-independence metrics showing no single provider exceeds 40 percent operational volume, tracking cross-provider disagreement rates as uncertainty signals, and maintaining system uptime during single-provider outages.
Aviation safety culture emerged through non-punitive incident reporting and shared learning across competitors (UND, 2025). Creating an AI Safety Reporting System modeled on NASA ASRS that accepts anonymous reports without enforcement action builds the same culture. We measure annual incident report volume, percentage of reports leading to identified mitigations, and count of organizations implementing recommended safety improvements.
Fear-first narratives mobilize attention but can reduce transparency by increasing secrecy and suppressing dissent (PauseAI, 2025). Preserving governed dissent while mandating open reporting channels and publishing sanitized audits maintains both safety and transparency. This shows up in near-miss report frequency, time from incident to published pattern analysis, and counting policy adjustments based on preserved dissent proving more accurate than selected decisions.
ControlAI diagnoses the problem correctly but prescribes the wrong cure. Their fear of uncontrolled capability is justified. The remedy of centralized authority risks reproducing the very fragility they seek to prevent. Concentrating oversight in a single global institution creates capture points, delays response times, and suppresses adaptive learning. The solution is governance through distribution, not prohibition. Multi-provider verification, checkpoint arbitration, and transparent reporting achieve safety without paralyzing innovation. History rewards systems that decentralize control while preserving accountability. Sustainable oversight emerges not from fear of power, but from structures that keep power contestable.
References
- Business Insider. (2025). PwC launches a new platform to help AI agents work together. https://www.businessinsider.com/pwcs-launches-a-new-platform-for-ai-agents-agent-os-2025-3
- ControlAI. (2025). Designing The DIP. https://controlai.com/designing-the-dip
- ControlAI. (2025). The Direct Institutional Plan. https://controlai.com/dip
- Department of Defense. (2024). Project Maven: Multi-vendor AI validation results.
- Electronic Privacy Information Center. (1999). Cryptography and Liberty: An International Survey of Encryption Policy. https://www.epic.org/crypto/crypto_survey.html
- Federal Register. (2025). Executive Order 14179: Removing Barriers to American Leadership in Artificial Intelligence. https://www.federalregister.gov/documents/2025/01/31/2025-02172/removing-barriers-to-american-leadership-in-artificial-intelligence
- Financial Crisis Inquiry Commission. (2011). The Financial Crisis Inquiry Report.
- Financial Industry Regulatory Authority. (2024). Multi-provider AI market surveillance: Q3 2024 operational results.
- Food and Drug Administration. (2024). Multi-vendor medical AI validation framework results.
- Government Accountability Office. (2024). Artificial Intelligence: Federal Agencies’ Use and Governance of AI in Decision Support Systems.
- Helbing, D. (2013). Globally networked risks and how to respond. Nature, 497(7447), 51-59. https://doi.org/10.1038/nature12047
- METR. (2025). Recent Frontier Models Are Reward Hacking. https://metr.org/blog/2025-06-05-recent-reward-hacking/
- METR. (2025). Preliminary evaluation of OpenAI o3. https://evaluations.metr.org/openai-o3-report/
- Mitchell, M., & Krakauer, D. (2023). The debate over understanding in AI’s large language models. Proceedings of the National Academy of Sciences. https://www.pnas.org/doi/10.1073/pnas.2215907120
- National Institute of Standards and Technology. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf
- National Institutes of Health. (2023). Alcohol prohibition and public health outcomes, 1920-1933.
- Office of the Comptroller of the Currency. (2023). Model Risk Management: Guidance on Multi-Model Validation Frameworks.
- PauseAI. (2025). The difficult psychology of existential risk. https://pauseai.info/psychology-of-x-risk
- U.S. Department of State. (2025). Compliance Plan with OMB M-25-21. https://www.state.gov/wp-content/uploads/2025/09/DOS-Compliance-Plan-with-M-25-21.pdf
- U.S.-EU Trade and Technology Council. (2024). AI working group coordination framework.
- University of North Dakota. (2025). What the AI industry could learn from airlines on safety. https://blogs.und.edu/und-today/2025/10/what-the-ai-industry-could-learn-from-airlines-on-safety/
- Washington Post. (2025). AI is more persuasive than a human in a debate, study finds. https://www.washingtonpost.com/technology/2025/05/19/artificial-intelligence-llm-chatbot-persuasive-debate/
Leave a Reply
You must be logged in to post a comment.