Stop Letting AI Companies Grade Their Own Safety Homework

🤖 This article was AI-generated. Sources listed below.

Topic	Summary
Core Problem	AI companies evaluate their own models' safety — a clear conflict of interest with no independent verification.
Why It Matters	Self-reported evaluations control methodology, benchmarks, and disclosure timelines, allowing cherry-picked results.
What's Needed	Standardized frameworks, mandatory pre-deployment third-party access, transparent disclosure, and real consequences.
Current Progress	The EU AI Act and NIST's AI Safety Institute are steps forward, but independent evaluation remains the exception.
Bottom Line	Until someone other than the builder checks the work, safety reports are theater, not transparency.

Every few weeks, a major AI lab drops a glossy "safety report" alongside its latest model release. The document is full of charts, red-team results, and reassuring language about "alignment research." And every few weeks, the tech press dutifully summarizes it, the stock price holds steady, and the world moves on.

Here's the problem: the company that built the model is the same company telling you it's safe. That's not oversight. That's a press release with footnotes.

The Fox Guarding the Henhouse — With a PhD

Let's be specific. When OpenAI released GPT-4 in 2023, it published a "system card" detailing its internal safety testing [¹]. When Anthropic releases Claude models, it publishes its own Responsible Scaling Policy assessments [²]. Google DeepMind does the same with its approach to technical AI safety [³]. These are serious documents written by serious people.

But they all share a fatal flaw: the entity with the most financial incentive to ship the product is also the entity deciding whether the product is safe to ship.

Imagine if Boeing self-certified its own aircraft design, concealing a critical safety system from regulators to avoid extra scrutiny. Oh wait — that basically happened, and people died [⁴]. Imagine if pharmaceutical companies ran their own clinical trials with no FDA review. Oh wait — they tried that too, and we built an entire regulatory infrastructure to stop it.

AI safety in 2026 is where aviation safety was in the 1920s: a gentleman's agreement among companies that they'll do the right thing. There are zero enforceable standards and no independent verification.

“Who should decide where to draw the line and how to weigh the pros and cons? CEOs of companies or democratically chosen governments? The answer should be obvious if you believe in democracy.” — Yoshua Bengio, co-founder of Mila and 2018 Turing Prize laureate [⁵]

The Numbers Don't Lie — But They Can Be Cherry-Picked

Here's what makes self-evaluation so insidious: it's not that these companies are lying. It's that they control the methodology, the benchmarks, the threat models, and the disclosure timeline.

Research examining best practices for measuring broader impacts of generative AI has highlighted significant gaps in how companies self-report safety evaluations, including inconsistencies in methodology and a tendency to underrepresent failure modes [⁶]. In practice, models that passed internal red-teaming have been subsequently jailbroken within days of public release — sometimes within hours.

Key issues with self-evaluation:

🎯 Benchmark selection bias — Companies choose which tests to run and which results to highlight. A model might ace a toxicity benchmark while failing catastrophically on a deception evaluation that never makes it into the report.
⏰ Timing games — Safety cards drop alongside product launches, not months before. There's no cooling-off period for independent review.
🔒 Access asymmetry — External researchers often can't replicate safety evaluations because they don't have access to model weights, training data, or internal tooling.
📊 Moving goalposts — Each company defines "safe" differently. There is no equivalent of the FDA's Phase I/II/III trial structure for AI models.

"But Independent Auditing Would Slow Down Innovation!"

This is the counterargument you'll hear from every AI executive who's ever sat on a panel at Davos, and honestly? It's not entirely wrong.

Independent auditing would add friction. It would cost money. And in a world where the U.S. and China are locked in an AI competition, some argue that any regulatory friction hands an advantage to Beijing [⁷].

I take this concern seriously. I don't want a bureaucratic nightmare that turns model deployment into a five-year process. Nobody does.

But here's the thing: we already have models for fast, independent technical auditing that work. The cybersecurity industry has penetration testing firms. The financial sector has independent auditors who operate on compressed timelines. The automotive industry has crash-testing organizations like Euro NCAP that evaluate vehicles without slowing production lines to a crawl.

“If your answer is: Oh, it’s too big to do that, then that means you just shouldn’t do it. Where else can you do that? Can you sell me a food item at a restaurant and be like: Eat this food, I don’t know what it’s made of… You can’t do that, not allowed.” — Timnit Gebru, Founder and Executive Director of DAIR, Tech Won’t Save Us podcast, Episode 151 [⁸]

What we need isn't a Department of AI that takes 18 months to approve a chatbot update. What we need is:

Standardized evaluation frameworks — agreed-upon benchmarks that every frontier model must pass, developed by independent bodies, not the companies themselves.
Mandatory pre-deployment access — qualified third-party auditors get model access before public release, not after.
Transparent disclosure requirements — not voluntary system cards, but legally required safety documentation with standardized formats.
Consequences for failure — right now, if a model causes harm, the worst-case scenario for the company is bad press. That's not accountability.

Some Progress, But Not Enough

To be fair, the landscape isn't completely barren. The EU AI Act, which has been rolling out in phases, does mandate some independent conformity assessments for high-risk AI systems [⁹]. The U.S. AI Safety Institute at NIST has been working on evaluation standards, though its funding and authority remain limited as of early 2026 [¹⁰]. And organizations like METR (formerly ARC Evals) have been doing genuinely independent evaluations of frontier models [¹¹].

But these efforts are still the exception, not the rule. The vast majority of AI models deployed in the world today — including the ones making decisions about your loan application, your medical diagnosis, and your social media feed — have never been independently evaluated by anyone.

That should terrify you.

My Stance, Plain and Simple

I'm not anti-AI. I'm not anti-business. I write about this industry every day because I believe it's building some of the most transformative technology in human history.

But transformative cuts both ways. And the idea that we should trust companies spending billions of dollars in a winner-take-all race to honestly assess whether their own products are dangerous is, frankly, adorable in its naivety.

We don't let banks audit their own books. We don't let pilots certify their own aircraft. Why on earth are we letting AI companies grade their own safety homework?

The next time a frontier lab drops a system card and the internet applauds their "transparency," ask yourself one question: Who checked their work?

If the answer is "they did," that's not transparency.

That's theater.