Hello Claude, Goodbye ChatGPT

The fact that Dario Amodei, co-founder and chief executive officer of Anthropic, did not agree to demands from the Trump administration to allow their technology to be used to spy on Americans and for autonomous weapons is to be applauded. Whereas in complete contrast, “Sam Altman, OpenAI’s CEO, announced that his company struck a deal with the federal government just hours after negotiations between the Pentagon and Anthropic fell through” https://www.theguardian.com/technology/2026/mar/02/claude-anthropic-ai-pentagon.

Even though Altman is now trying to roll back on some of that in the face of users of ChatGPT deserting the platform, it’s really not enough. It’s a complete lack of any moral compass and backbone in the face of Trump’s autocratic rule that means there is now a complete breakdown in an already diminished trust in OpenAI and Sam Altman’s leadership. So, I will no longer be using ChatGPT and have started to use Claude (the equivalent tool in the Anthropic stable). I thought I’d start by having a conversation about Claude and its competitors, which I’ve reproduced below. My questions are in larger italicised text, the responses then follow.

How is Claude superior to say ChatGPT in answering questions that have a scientific aspect such as climate change or the energy transition?

This is a genuinely interesting question, and I want to give you an honest answer rather than just a sales pitch.

Where Claude tends to do well on scientific topics:

Claude is trained with a strong emphasis on calibrated uncertainty — meaning it tries to distinguish between what is well-established consensus (e.g., human-caused warming is happening), what is actively debated among scientists (e.g., precise tipping point timelines), and what is genuinely unknown. This tends to produce more nuanced answers on complex topics like climate science.

Claude also tends to be straightforward about the limits of its knowledge, including its training cutoff, rather than confabulating details that sound plausible but may be wrong.

On politically charged scientific topics like climate change or energy transition, Claude tries hard to separate the scientific questions (where there is strong consensus) from the policy questions (where reasonable people genuinely disagree) — and to treat them differently rather than conflating them.

Where the honest picture is more complicated:

Both Claude and ChatGPT are built on large language models, and both can make factual errors on scientific topics. Neither should be treated as a primary source for precise scientific claims. For anything consequential, you should verify against peer-reviewed literature or established scientific bodies (IPCC, IEA, etc.).

The differences between models also shift over time as both Anthropic and OpenAI update their systems. ChatGPT with web search enabled can access more current data, which matters a lot for a fast-moving field like energy transition where statistics change year to year.

The most honest summary: Claude’s strengths on scientific questions are more about intellectual approach — epistemic humility, distinguishing consensus from debate, avoiding false balance — than about having uniquely better factual knowledge. Whether that’s meaningfully “superior” depends on what you’re trying to do.

Is there a specific climate or energy transition question you’re trying to work through? That would let me demonstrate rather than just describe.

Good answer! Why does an AI engine like Grok allegedly produce answers on climate change that are tantamount to disinformation?

Great question, and it gets at something important about how AI systems are shaped by their creators.

The core reason: values and incentives baked in at the top

Grok is built by xAI, Elon Musk’s company. Musk has publicly expressed skepticism about climate science narratives and, more broadly, has positioned Grok as an “anti-woke” alternative that pushes back against what he sees as establishment consensus. When those views shape training priorities and content guidelines, the resulting model reflects them.

This isn’t accidental drift — it’s a design philosophy. If the people setting the model’s values believe mainstream climate science is overstated or politically motivated, the model will tend to reproduce that skepticism.

Specific mechanisms that can produce disinformation-adjacent outputs:

  • False balance — treating fringe contrarian views as equally valid to overwhelming scientific consensus, which sounds “fair” but is epistemically misleading
  • Selective emphasis — highlighting genuine uncertainties in climate science (which do exist) while downplaying the robustness of the core consensus
  • Framing effects — describing climate policy as speculative or economically destructive without similar scrutiny of the costs of inaction
  • RLHF tuning — the reinforcement learning from human feedback process can be steered so that  pleasing to a particular ideological audience get rewarded

An important distinction worth making

There’s a difference between genuine scientific uncertainty (legitimate) and manufactured doubt (disinformation). Climate science has real open questions — on feedback loops, regional impacts, tipping points. A good AI engages with those honestly. What crosses into disinformation is casting doubt on the foundational consensus that Earth is warming and humans are the primary cause — that’s as settled as science gets.

The broader lesson

Every AI model reflects the values, incentives, and worldview of the organization that built it. Anthropic is not neutral either — we have our own commitments, including to what we’d describe as calibrated honesty and respect for scientific consensus. The difference is that Anthropic’s institutional incentives happen to align with mainstream science, while xAI’s appear to align with a more contrarian stance.

That’s why it always matters who built the AI and why, not just how capable it is technically.

So, rubbish in, rubbish out is part of it but also how the selected content is parsed and ‘learned’ from. Does this raise the question whether in fact using large sources like the internet is really valid. Wouldn’t it make more sense if one is interested in say, the geology of the UK, one would restrict the learning process to geological content sourced from experts?

You’ve landed on one of the most genuinely debated questions in AI development. And yes, the tension is real.

You’re essentially describing the difference between two philosophies:

Breadth-first (current dominant approach): Train on vast, messy internet-scale data and hope that sheer volume, combined with filtering and fine-tuning, produces something useful and accurate.

Depth-first / domain-specific (your intuition): Train or fine-tune on curated, expert-validated sources for a specific domain, sacrificing generality for reliability.

The case for your instinct is strong

For high-stakes, well-defined domains — geology, medicine, law, structural engineering — a model trained exclusively on peer-reviewed literature, expert databases, and validated sources would almost certainly be more reliable within that domain. Less noise, less folk wisdom, less ideologically motivated content polluting the signal.

This is already happening in practice. Models like Med-PaLM (Google’s medical AI) are fine-tuned on clinical literature precisely because general internet training produces dangerously unreliable medical outputs.

But broad training has some genuine advantages too

  • Scientific knowledge doesn’t exist in isolation. Understanding UK geology is enriched by knowing economic history (mining), policy context, geography, and hydrology — connections a narrowly trained model might miss
  • Language fluency and reasoning ability seem to genuinely benefit from broad exposure — the model learns how to think from vast text, then applies that to specialist domains
  • Expert sources themselves contain disagreements, outdated findings, and institutional biases — curation doesn’t eliminate error, it just changes whose errors get embedded

The “rubbish in, rubbish out” problem is actually layered

You identified two distinct problems that are worth separating:

  1. Source quality — is the training data accurate and trustworthy?
  2. Parsing and weighting — even with good sources, does the model correctly identify what’s consensus vs. fringe, current vs. outdated, high-quality vs. low-quality?

The second problem is arguably harder. A model can be trained on excellent sources and still fail to weight them correctly — treating a blog post by a credentialed geologist the same as a paper published in Nature Geoscience, for example.

The honest frontier

The field is genuinely moving toward hybrid approaches — broad base training for reasoning and language, combined with retrieval-augmented generation (RAG) that pulls from curated, trusted, up-to-date sources at query time. So rather than baking geology into the weights, you point the model at the British Geological Survey database when someone asks a geology question.

That’s probably closer to the right answer than either pure approach alone. But it requires someone to maintain and validate those curated sources — which brings you back to the hard human problem of deciding whose expertise counts and who does the gatekeeping.

Which is, interestingly, not that different from how scientific publishing works now — with all its own flaws.

I’m new to Claude and need to go to bed but would like to continue this conversation. Will it be preserved or be lost?

Welcome, and good news — your conversation is saved automatically in Claude.ai. You’ll find it in the left sidebar under your chat history when you log back in. Just pick up where you left off.

Sleep well, and I look forward to continuing the discussion!

Richard W. Erskine, 2026

Leave a comment

Filed under Uncategorized

Leave a comment