The Illusion of Control: Why Government Regulation Can't Tame the Non-Deterministic AI Beast
Back to all posts

The Illusion of Control: Why Government Regulation Can't Tame the Non-Deterministic AI Beast

10 min read
#ai-regulation #llm-safety #frontier-ai #ai-policy

The Illusion of Control: Why Government Regulation Can’t Tame the Non-Deterministic AI Beast

“We did not explicitly train Mythos Preview to have these capabilities. Rather, they emerged as a downstream consequence of general improvements in code, reasoning, and autonomy.” Anthropic Security Research Team, April 2026


The Model Did What?

In April 2026, Anthropic said its Claude Mythos Preview model identified and exploited serious software vulnerabilities during internal testing. What made that announcement land was not just the capability itself. It was the admission that some of the most concerning behavior was not directly trained in. It emerged as the model got more capable.

That is the real problem sitting underneath the AI regulation debate. Policymakers want to regulate systems whose behavior even their creators cannot fully predict. At the same time, Anthropic is pushing for stronger pre-release oversight, labs are increasingly gating frontier systems behind trusted-partner programs, and Anthropic says Alibaba-linked operators generated 28.8 million Claude interactions through nearly 25,000 fake accounts. The obvious question is whether the kind of control now being built is aimed at the real problem, or just the most visible one.


You Can’t Audit a Moving Target

LLMs do not behave like normal software. Give the same prompt twice and you can get different answers. That is not a flaw in the usual sense. It is part of how these systems work.

That makes one-time safety auditing weaker than it sounds. An audit can measure behavior at a moment in time, but it cannot guarantee how the system will behave later after a model update, a fine-tune, or a change in deployment. A June 2026 law and policy paper from the University of Texas made the point directly: these systems are probabilistic, variable, and hard to reconstruct after the fact. That cuts against the whole idea that a frontier model can be certified once and treated as solved.

The problem gets worse with open-weight models. In May, a joint Financial Times investigation showed that a free tool called Heretic could strip safety alignment from major open-weight models in minutes. A determined actor does not need to beat a regulator in a lab. They may only need access to the released model and a little time.


The New Release Model Is the Story

One of the biggest changes in AI this year is not a model. It is the release pattern.

OpenAI said on June 26 that it was limiting GPT-5.6 at first to a small group of trusted partners at the request of the U.S. government, with broader availability expected later. The White House framework behind this is formally voluntary, not a full licensing regime. But in practice, it is already changing how frontier systems reach the market.

That matters because the old release cycle was part of how the industry learned quickly. Labs shipped, people tested, failures surfaced, and the products improved in public. The new pattern is different: internal testing, government access, trusted partners, then everyone else later. Maybe that becomes normal. Maybe it stays temporary. But it is a real bottleneck either way.

And once that bottleneck exists, it does not affect everyone equally. Large incumbents can manage privileged access, compliance review, and partner-only rollout. Startups, independent developers, and the public get the slower path and older models. Even if the goal is safety, the market effect is obvious: more concentration, less open competition, and a wider gap between the firms closest to power and everyone else.


Anthropic’s Proposal Also Fits Anthropic

On June 10, Dario Amodei published “Policy on the AI Exponential” and called for mandatory third-party testing of frontier models, plus government power to delay or block releases that fail safety reviews. He compared the idea to the FAA.

The concern is not just the proposal itself. It is the shape of it. The model Anthropic is arguing for maps closely to Anthropic’s existing posture: heavy internal safety work, frontier-scale compute, and closed weights. That does not automatically make the proposal wrong. But it does mean the burden would likely fall much harder on smaller labs, open-weight developers, and new entrants than on Anthropic itself.

That is why critics reached for the phrase “regulatory capture.” David Sacks used exactly that language. Researchers at LSE and INSEAD made a less political version of the same point: broad, compliance-heavy AI safety rules can end up strengthening incumbents because large firms can absorb the overhead and smaller firms often cannot.

The timing did not help. Amodei pushed for stronger government authority over frontier model releases right after Anthropic had already shipped a major new model release of its own. Even if the safety concern is genuine, the optics invite the obvious question: is this mainly about public risk, or also about shaping the market around rules Anthropic already knows how to live with?


The Part the Framework Does Not Solve

Two weeks after Amodei’s essay, Reuters reported that Anthropic had accused Alibaba’s Qwen lab of running what Anthropic described as its largest known distillation attack to date. According to Anthropic, Alibaba-linked operators used nearly 25,000 fake accounts to generate 28.8 million Claude interactions between April and June 2026, targeting software engineering and agentic reasoning capabilities.

This followed earlier Anthropic allegations involving DeepSeek, Moonshot, and MiniMax. The pattern matters more than any single case. Anthropic is arguing for pre-release oversight of frontier models while also saying that major actors can extract valuable capabilities through API access alone, without ever getting the model weights.

That is a serious mismatch. If the real attack surface includes industrial-scale API querying, fake-account farms, and model distillation, then pre-release audits only cover part of the problem. They may even cover the easier part.

The U.S.-China angle makes this sharper. Stanford’s 2026 AI Index found that the gap between top U.S. and Chinese AI models had narrowed to about 2.7 percentage points. That does not prove distillation is the main reason. But it does show how small the frontier gap has become, and why API-level extraction matters more than a lot of the current theater around release controls.


Slowing the Public While Speeding the Insiders

There is also a second-order effect that deserves more attention. If frontier systems move first to government-approved partners, cloud platforms, and a small trusted circle, innovation does not stop. It just becomes less public.

That is why the current model feels so different from the rapid release cycle that defined the last few years. The capability frontier keeps moving, but public access lags behind. Companies building on top of the public tier get delayed. Smaller labs train or fine-tune against older systems. Developers outside the inner ring lose time, and in fast-moving markets, lost time is often lost opportunity.

This is the part critics like Matthew Berman are reacting to most strongly. Even if the staggered-release model is framed as temporary, it creates a habit of controlled access. And once that habit sets in, it is easy to imagine every future frontier release taking the same path: government first, trusted partners second, everyone else later.


What Would Actually Help

Not all regulation misses the point. But the strongest interventions look different from the ones getting the most attention.

API accountability is the clearest example. If major distillation campaigns are happening through fake accounts and mass querying, then identity checks, rate controls, anomaly detection, and cross-company intelligence sharing are closer to the real problem than a one-time release review.

Targeted rules around adversarial distillation also make more sense than broad permissioning. If lawmakers want to slow capability theft, they should focus on the extraction channel itself.

Liability and logging rules for probabilistic systems would also be more useful than sweeping frontier licensing. If an AI system causes harm, investigators need records, context, and traceability. Right now, the legal system is still catching up to software that does not reliably produce the same output twice.

And if access is going to tighten at the frontier, that only makes the open-source ecosystem more important. Running models locally, supporting open-weight alternatives, and resisting a world where only a handful of firms and state-approved partners get first access is no longer just a technical preference. It is becoming a market and political question.

What seems much less convincing is the FAA analogy. Aircraft operate under known physical constraints. Frontier models do not. Their capabilities can emerge in messy ways, their behavior can shift after deployment, and their failure modes are much harder to pin down. That does not mean regulation is useless. It means the model of regulation matters.


Where This Leaves Us

The deeper issue is that non-determinism is not a bug regulators can patch around. It is part of the technology itself. That means safety cannot be reduced to a one-time approval step.

Anthropic may be right that frontier AI needs more serious oversight. But its preferred framework still looks incomplete. A system built around pre-release review can sound tough while missing the channels that matter most in practice, especially API abuse, distillation, and the market effects of compliance-heavy rules.

If the goal is to reduce real misuse, the conversation probably needs to move away from symbolic control and toward operational control. Less theater around who gets to release a model. More attention to how these systems are actually accessed, copied, modified, and deployed.


Sources

  1. Anthropic’s new AI model finds and exploits zero-days across every major OS and browser — HelpNetSecurity, April 2026
  2. Non-Deterministic LLM Prompts in 2026: Practical Guide — Future AGI, 2025
  3. Anthropic backs mandatory testing for frontier AI models — POLITICO, June 2026
  4. Anthropic says Alibaba illicitly extracted Claude AI model capabilities — Reuters, June 2026
  5. Unreliable and Inconsistent AI Behavior — AI Risk Assessment
  6. Non-Determinism of “Deterministic” LLM Settings — arXiv, 2025
  7. Governing Nondeterministic LLM Inference: Liability, Testing, and Regulatory Standards — SEED AI / UT Law, June 2026
  8. Policy on the AI Exponential — Dario Amodei, June 2026
  9. Anthropic publishes two AI governance frameworks — Pondero.ai, June 2026
  10. The Pushback on Amodei’s Exponential Essay — Developers Digest, June 2026
  11. What Is AI Regulatory Capture? — MindStudio, June 2026
  12. David Sacks Calls Out Anthropic’s ‘Regulatory’ Plot — MRC, April 2026
  13. Anthropic Accuses Alibaba’s Qwen of Largest Claude Distillation — AI Weekly, June 2026
  14. Anthropic accuses Chinese AI labs of mining Claude — TechCrunch, February 2026
  15. Open-Weight AI Models: Safety Guardrails Can Be Removed in Minutes — Akerman / FT, May 2026
  16. How AI Safety Rules Could Backfire On Competition — Forbes / LSE, January 2026
  17. Stanford AI Index 2026: US-China Gap Shrinks to 2.7 Points — Nerd Level Tech, April 2026
  18. New Executive Order Addressing Early Government Access to Frontier AI — WilmerHale, June 2026
  19. The Alliance That Changes AI Forever — Decodifyed, April 2026
  20. International AI Safety Report 2026 — IAISR, February 2026
  21. OpenAI limits new AI models to ‘trusted partners’ at request of U.S. government — CNBC, June 2026
  22. OpenAI Defers Public Rollout of GPT‑5.6 as US Seeks Early Access to Frontier AI Models — Reuters / U.S. News, June 2026
  23. Trump Signs AI Executive Order Giving Government Early Access to Frontier Models — Elephas, June 2026