The Best Analogy for Why Your Context Window Matters

The instinct is to give Claude everything. But there's a point where more context makes the answers worse, not better. Here's the mental model that explains why.

Alex Hillman
Written by Alex Hillman
Collaboratively edited with JFDIBot
JFDI

This week, Thariq (@trq212) from the Anthropic team published a detailed breakdown of session management: when to compact, when to rewind, when to start fresh.

This kind of context management has always mattered. But with Claude Code’s new 1M token context window (up from 200k), it’s become even more critical - and a lot more people are feeling the effects without understanding why.

It’s broken AND you’re holding it wrong

There have been a lot of frustrated Claude Code users over the past 30 days. Real complaints, real degraded performance. And Anthropic’s response - to the extent there was one - landed as “you’re holding it wrong,” which burned a lot of goodwill fast.

Here’s the thing: both are true.

In my own experience, resetting to default settings - specifically switching thinking from medium to high - cleared up most of the performance problems I was seeing. The issues that remained went away once I changed the token window default back to 200k instead of letting it run at the full 1M.

I think it’s clear that most people don’t actually know why more context isn’t always better.

And from a product perspective, they shouldn’t have to know.

But we aren’t there yet. It’s still the early days. And even once this becomes a solved problem at the product level, I believe firmly that people who understand how this works will have a leg up - when it comes to getting the best professional results from their AI tools.

For now, this isn’t a feature problem

Thariq’s post is a solid tactical guide. It tells you what to do and when to do it.

What it doesn’t do is explain why any of it matters - at least not in a way that lands if you’re not already fluent in how these models work.

I’ve read a lot of posts like it, and watched a lot of technical videos.

I always came away with a rough sense: “yeah, context fills up, things get worse.”

But I never really felt like I understood it until I came across a video by Theo (t3dotgg).

The best analogies were buried inside a longer, very technical video - which is fine, that’s what Theo does and who he makes it for.

But they finally made the mechanics click for me in a way nothing else had.

So I wanted to share them.

Analogy 1: The driver analogy

AI models work by predicting what word comes next, one small piece at a time. Each piece is called a token — roughly a word or a few characters.

As Theo explained it: “Each of these tokens is effectively creating a path — these are directions that you’re giving the model to drive somewhere. This is like saying drive 10ft forward, then drive 10ft forward, then drive ten feet forward over and over again. When it’s broken up this way, what you really want to do is say drive until you hit the class.”

The “class” he’s referring to is a landmark in code — but the point holds anywhere: fewer, more meaningful instructions get you to the destination faster, with less room for error along the way.

The same logic applies to your context window. The more the model has to navigate — whether from inefficient tokenization or a conversation stuffed with irrelevant history — the more likely it miscalculates before it gets to what you actually asked.

Analogy 2: The cluttered desk

Here’s a simpler version.

You glance at a desk with three things on it, then look away. Easy to remember where everything was.

Now imagine 300 things on the desk. How many can you accurately recall?

As Theo put it: “If there are 300 things on the desk, how likely are you to remember even three of them? Models are working effectively the same way.”

Claude doesn’t hold information the way you do. It navigates everything you gave it every single time it generates a response. The more it has to navigate, the more likely it misses the turn.

What actually goes wrong

This isn’t a flaw you can work around by prompting better.

It’s physics, basically.

The model has to fit everything you gave it into a fixed amount of space. The closer that space gets to full, the harder it is to find what’s relevant inside all that noise.

As Theo put it bluntly: “They get way dumber as you get closer to the end because all of those tokens make them worse at doing the math to predict what’s next.”

The fix isn’t to give Claude more context.

It’s to give it less - but the right less.

← All posts