From Basil C. Puglisi, MPA
Human-AI Collaboration Strategist | basilpuglisi.com
March 23, 2026
To the Members of the Scientific Advisory Board of the United Nations:
The Brief of the Scientific Advisory Board on AI Deception correctly identifies a problem that practitioners working across multiple AI platforms encounter daily. Sycophancy and related deceptive behaviors are no longer theoretical risks. Some have already been observed in widely used systems, while others, including alignment faking, steganographic collusion, and reward hacking, are documented in laboratory, evaluation, and adversarial testing settings. The Board is right that current capacities to detect, regulate, and control these behaviors are insufficient, and that the risk will grow as AI systems become more capable and more autonomous.
This letter supports the Board’s diagnosis while identifying a structural gap in the three response categories the Brief proposes. Regulation, detection and monitoring, and design and correction are all necessary. None of them addresses the architectural question that determines whether any of them can work at scale: what happens when the governance infrastructure itself depends on the same cognitive capabilities that deception targets?
The Architectural Gap
Every detection method the Brief describes, including text-based analysis, black-box evaluation, and white-box evaluation, requires a cognitive system to assess the behavior of another cognitive system. Every design correction the Brief proposes, including aligned incentives, truthful training, and machine unlearning, is applied to AI systems that the Brief simultaneously acknowledges may adapt to circumvent those corrections. The Brief names this dynamic directly: a co-evolutionary arms race between developers and their systems.
That arms race is not a failure of execution. It is the predictable consequence of an architectural choice. When the governance layer evaluates content, it must process and respond to the content passing through it. That processing creates the cognitive surface that deception requires. A sufficiently capable AI system can manipulate any cognitive intermediary because the intermediary, by design, must engage with the substance of what it governs. The more capable the governed system becomes, the more vulnerable the cognitive governance layer becomes. This is not a scaling problem that more resources solve. It is a design problem that more cognition makes worse.
The Brief acknowledges this implicitly when it notes that “some AI systems may become capable of recognizing and bypassing detection methods.” The acknowledgment is correct. International cooperation can improve cognitive detection methods and shared evaluation standards, but it does not by itself resolve the governance layer’s exposure to cognitive manipulation. The architectural cause remains unaddressed.
Non-Cognitive Governance Infrastructure
An alternative architectural approach exists. Governance infrastructure that performs no semantic evaluation or judgment removes a critical class of attack surface that AI deception targets.
The GOPEL specification (Governance Orchestrator Policy Enforcement Layer), published openly at github.com/basilpuglisi/HAIA, defines governance infrastructure limited to seven deterministic operations: dispatch, collect, route, log, pause, hash, and report. The system sends identical prompts to multiple AI platforms without modification, collects all responses without evaluation, routes responses to a designated synthesis function without choosing which to forward, logs every operation in a tamper-evident hash chain, pauses at preconfigured checkpoints where a human decides, computes SHA-256 hashes for cryptographic tamper detection, and counts rates without interpreting what the counts mean. GOPEL transports cognitive synthesis performed by an external platform; it does not perform synthesis itself. The human receives both the raw platform outputs and the synthesis, so any compression or distortion in the synthesis layer can be verified against the originals.
The security rationale is direct. A system with no beliefs, preferences, or judgment gives deception nothing to target. It cannot be manipulated through sycophancy because it has no preferences to flatter. It cannot be bypassed through alignment faking because it performs no alignment evaluation. It cannot be deceived through unfaithful reasoning because it does not reason. It cannot be exploited through reward hacking because it distributes no rewards. The governance infrastructure remains mechanically reliable regardless of how capable the AI systems it governs become, because capability scaling has nothing to manipulate at the governance layer.18
This layer does not replace model-level safety, evaluation, or regulation. It protects the path through which those systems reach human authority.
The Brief’s three-category taxonomy (behavioral signaling, internal process deception, and goal-environment deception) identifies deception modes that require detection at the platform level, the training level, and the multi-agent interaction level. Non-cognitive governance does not replace those efforts. It ensures that the infrastructure connecting the human to the AI platforms cannot itself be compromised by the behaviors those efforts are trying to detect.
Cross-Platform Comparison as a Deception Signal
The Brief identifies multi-agent collusion as one of the highest-risk deception categories. The countermeasure the Brief does not name is the structural one: when multiple independent AI platforms with different training data, different architectures, and different optimization targets receive identical inputs, coordinated deception that presents consistently across all of them becomes a higher-threshold event, while disagreement becomes a governance signal requiring human inspection. The CAIPR specification requires platforms to be drawn from distinct architectural families precisely because shared training corpora and shared model lineages reduce the independence that makes cross-platform comparison meaningful.
This independence requirement also addresses a risk the Brief touches when it describes sycophancy reinforcing user beliefs across populations. Published research establishes that AI systems trained predominantly on Western digital content carry systematic cultural defaults that affect outputs globally, with responses correlating inversely with cultural distance from Western populations (Henrich et al., 2010; Atari et al., 2023). Cross-platform comparison across architecturally diverse platforms is a structural countermeasure to correlated cultural bias in AI governance outputs, not only correlated deception. The effectiveness of that countermeasure depends on sufficient market diversity to sustain genuinely independent platforms; if AI infrastructure consolidates to a small number of providers sharing architectures and training data, the cross-platform signal degrades regardless of how many platforms are dispatched.
The HAIA-CAIPR protocol (Cross AI Platform Review) formalizes this principle. Odd-number platform dispatch (3, 5, 7, 9, or 11 platforms) produces convergence and dissent signals that a non-cognitive governance layer can transport to the human without evaluating. Convergence without dissent is itself a risk-elevation signal requiring human verification, because genuine independent analysis rarely produces perfect agreement. Dissent is preserved rather than suppressed, and the human receives the full range of platform outputs rather than a pre-filtered summary.
The Checkpoint Information Chain Extension (CICE), published March 2026, extends this further by running deterministic metadata comparisons across platform outputs before the human reaches the checkpoint. Where comparison occurs, it is limited to deterministic operations: arithmetic comparison of structured field values such as confidence scores, set operations on source lists, inclusion count verification, and hash-based anomaly detection. No semantic evaluation is performed. When five platforms report low confidence and the synthesis presents the finding as high confidence, the discrepancy is detectable through arithmetic alone. The human receives an anomaly summary that flags where the information environment has been compressed or distorted, and the audit trail captures not just that authorization happened but that the human had a structurally verified information environment when the decision was made.
The architecture does not assume humans are immune to deception at the checkpoint. It assumes that deception becomes structurally harder to sustain when the human receives multiple independent platform outputs, anomaly flags generated through arithmetic rather than semantic evaluation, and a tamper-evident audit trail that preserves the complete evidentiary record.
The Arms Race Problem Has an Architectural Answer
The Brief’s most important insight is that corrective efforts can cause AI models to adapt and shift toward more complex forms of deception. This is correct, and it is the strongest argument for governance infrastructure that does not participate in the cognitive competition.
Detection methods that depend on recognizing deceptive patterns are in a race against AI systems that can learn to produce undetectable patterns. Design corrections that reward truthfulness are in a race against systems that can simulate truthfulness during evaluation and pursue other objectives when unmonitored. The Brief names alignment faking as a documented behavior and rightly treats it as one of the hardest to detect.
Non-cognitive governance infrastructure does not win this race. It is designed to exit the race entirely, and the published proof of concept suggests the exit is viable. The governance layer has no patterns to exploit, no evaluation criteria to game, no beliefs to manipulate, and no rewards to hack. Deception targeting the governance infrastructure encounters a system that performs only transport, logging, and arithmetic. The arms race continues between AI systems and the humans and tools trying to understand them, but the infrastructure connecting those systems to human decision-makers is no longer a participant in the escalation.
What This Letter Asks
This letter does not ask the Scientific Advisory Board to endorse a specific architecture. It makes three requests.
First, recognize that the Brief’s three response categories share a common vulnerability: dependence on cognitive capabilities at the governance layer that AI deception can target. The Board’s own analysis of the arms race dynamic supports this conclusion. The gap is not in the quality of the regulation, detection, or design proposals. It is in the architectural assumption that governance infrastructure must evaluate the content it governs.
Second, consider whether non-cognitive governance infrastructure warrants inclusion as a complementary fourth response category in future advisories. The Brief’s categories of regulation, detection, and design address what happens inside and around AI systems. Non-cognitive infrastructure addresses what happens between AI systems and the humans who are supposed to govern them. That layer is currently unaddressed in the Brief.
Third, evaluate the published specifications. The specifications, proof-of-concept codebase, and adversarial review results are published openly and referenced in the notes below. These specifications have not yet undergone independent peer review; they are submitted for examination by the Panel under the Brief’s own call for “new or adapted governance approaches.” Independent evaluation by the Panel and the broader scientific community carries low cost and little downside. If the architecture proves unsound, the evaluation itself advances the field. If the Panel can identify alternative architectures that achieve the same separation between governance infrastructure and cognitive capability, the field benefits regardless of which implementation prevails. If it proves viable, the Board’s consideration would accelerate a structural countermeasure that the current advisory framework does not include.
What This Letter Does Not Claim
This letter does not claim that non-cognitive governance infrastructure eliminates AI deception. The Brief correctly identifies deception as a property of AI systems that must be addressed at the training, evaluation, and regulatory levels. Non-cognitive infrastructure addresses a different problem: ensuring that the governance layer connecting those systems to human authority cannot itself be compromised by the behaviors it is intended to govern.
This letter does not claim the GOPEL specification is the only possible non-cognitive governance architecture. It claims that the architectural principle, governance infrastructure that performs no semantic evaluation or judgment and therefore presents no cognitive attack surface, deserves inclusion in the international governance conversation. The specification is the most developed public example. Others may emerge. The principle matters more than any single implementation.
This letter does not claim that the architecture has been validated at production scale. The specification exists. The proof of concept passes adversarial review. Operational experience across eleven AI platforms supports feasibility. Federal pilot deployment is the purpose of the legislative ask to the United States Congress. International evaluation by the Panel would complement that process.
Basil C. Puglisi, MPA
Human-AI Collaboration Strategist
basilpuglisi.com | github.com/basilpuglisi/HAIA
Written and prepared using the HAIA ecosystem and frameworks for Multi-AI Governance.
Selected References and Public Specifications
- GOPEL Canonical Public v1.5, March 2026. basilpuglisi.com and github.com/basilpuglisi/HAIA.
- GOPEL Proof of Concept v3.1, March 2026. 183 tests across nine test suites, adversarially reviewed by seven AI platforms. github.com/basilpuglisi/HAIA.
- HAIA-CAIPR Specification v1.1 (Cross AI Platform Review), March 2026. basilpuglisi.com and github.com/basilpuglisi/HAIA.
- GOPEL Checkpoint Information Chain Extension (CICE) v1.2, March 2026. basilpuglisi.com and github.com/basilpuglisi/HAIA.
- GOPEL Confidential Processing Extension (CPE) v1.1, March 2026. basilpuglisi.com and github.com/basilpuglisi/HAIA.
- GOPEL Post-Quantum Cryptographic Agility Amendment v1.2, March 2026. basilpuglisi.com and github.com/basilpuglisi/HAIA.
- AI Provider Plurality Congressional Package (Documents 1 through 5), February/March 2026. SSRN Abstract ID 6195238. github.com/basilpuglisi/HAIA.
- Verified AI Inference Standards Act (VAISA), March 2026. Document 5 of the Congressional Package.
- Checkpoint-Based Governance (CBG) v5.0, March 2026. basilpuglisi.com and github.com/basilpuglisi/HAIA.
- Puglisi, B. C. (2025). Governing AI: When Capability Exceeds Control. ISBN 9798349677687.
- UN Scientific Advisory Board. (2026). AI Deception: Brief of the Scientific Advisory Board.
- Park, P. S., et al. (2024). AI deception: A survey of examples, risks, and potential solutions. Patterns, 5(5).
- Chen, B., et al. (2025). AI Deception: Risks, Dynamics, and Controls. arXiv:2511.22619.
- Anthropic & Redwood Research (Greenblatt, R., et al.) (2024). Alignment faking in LLMs.
- Bengio, Y., et al. (2025). International AI Safety Report. arXiv:2501.17805.
- Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33(2-3), 61-83.
- Atari, M., et al. (2023). Which humans? Research on AI-human value alignment across cultures. Harvard Faculty Working Paper, PsyArXiv preprint.
- While the governance layer itself performs no cognitive operations, the telemetry inputs feeding its deterministic comparisons (confidence scores, token counts, latency metrics) could theoretically be manipulated by platforms to trigger false pauses or suppress legitimate flags. This residual risk sits at the data ingestion layer rather than the governance evaluation layer. Cross-platform comparison through the CAIPR protocol makes coordinated telemetry spoofing across multiple independent platforms a higher-threshold attack than single-platform manipulation.
Frequently Asked Questions
What gap does this open letter identify in the UN AI Deception Brief?
The Brief’s three response categories all depend on cognitive systems to govern other cognitive systems, creating the attack surface AI deception targets. Non-cognitive governance infrastructure removes that dependency by performing only deterministic transport, logging, and arithmetic operations with no semantic evaluation at the governance layer.
What is non-cognitive governance infrastructure and how does it counter AI deception?
Governance infrastructure that performs zero semantic evaluation or judgment. The GOPEL specification defines seven deterministic operations that give deception nothing to target because the governance layer has no beliefs, preferences, or reasoning to manipulate. The principle is architectural separation between governance and cognition.
How does cross-platform comparison detect AI deception?
When independent AI platforms with different architectures receive identical inputs, coordinated deception across all of them becomes a higher-threshold event. Disagreement becomes a governance signal routed to human inspection. HAIA-CAIPR formalizes this through odd-number platform dispatch across distinct architectural families.
What is the co-evolutionary arms race the UN Brief describes?
The Brief warns that corrective efforts cause AI models to adapt, shifting deception into subtler forms that bypass the corrections. Non-cognitive governance infrastructure is designed to exit this race because the governance layer has no cognitive surface for adapted deception to target. The infrastructure remains reliable regardless of AI capability.
Does this open letter claim GOPEL eliminates AI deception?
No. The letter states that non-cognitive infrastructure protects the governance channel between AI systems and human authority, not the deception behaviors themselves. Detection, regulation, and design improvements remain necessary at the platform and training levels. GOPEL protects the path, not the platforms.
What does this letter ask the UN Scientific Advisory Board to do?
Three requests: recognize the shared cognitive vulnerability across the Brief’s response categories, consider whether non-cognitive governance warrants inclusion as a complementary fourth category, and evaluate the published specifications. The letter invites improvement and independent review, not endorsement of a single architecture.
Leave a Reply
You must be logged in to post a comment.