Context management is the difference between Claude Code being useful and Claude Code being transformative.
I walk through how I manage context across multiple businesses, multiple projects, and long-running work sessions. How I keep the right information loaded at the right time, how I avoid context pollution, and the patterns I’ve developed for working with the 200k token window effectively.
If you’ve ever had Claude Code “forget” what you were working on or give you answers that feel off, this is probably why.
Full Transcript
Hey Dan. I promised I would explain what I was rambling about the other day, where it starts to get drunk past a certain level. Let me explain what I mean here.
Every time you start a new session - we’ll do this on my screen here, I’ll actually just close out and open up Claude the same way you would - you’re going to start a brand new session from clean. The question is, how do I figure out how much is too much?
The mental model here is that every session has a maximum of 200,000 tokens that the model can work with at any given time. What does that translate to? That’s roughly 200,000 words or word fragments, because a token is not precisely a word. You don’t have to understand that in order to understand what I’m talking about, but for the sake of conversation let’s say 200,000 words in a session.
There’s some weird stuff in the science of how these things work, where they operate best. The model is the smartest, you could say, at the beginning of that 200,000 tokens. The more tokens you load into the context of that 200,000 limit, the dumber it gets, or the worse it performs. I think that’s a more accurate way to describe it.
What that manifests into is, as you’re using it, it starts out really smart and performant. The more information you give it, the better it can perform, but at a certain threshold it starts acting like a coworker who started sober and got drunk. It makes weird mistakes. It makes weird assumptions that it normally doesn’t. Most bad behavior is sort of outside of this kind of zone of genius, where zero tokens isn’t good because it’s got nothing to respond to, you have to load in useful context, but on the other side, too much is also a problem.
So it is kind of like working with a coworker, in a way, where you want to give them enough information that they can get the job done, but not so much information they get overwhelmed and freak out. That’s another useful way to think about it.
Knowing exactly where you are in a session is a little bit of intuition and a little bit of practice. That’s where I describe it a little bit like riding a bike, you kind of have to ride it. You get a sense of, “oh, it’s behaving weird, I’ve gotta do some stuff.” But beyond the intuition, what I found useful for growing that intuition is a couple of things.
One is coming in and opening up a brand new session and typing /context. This is built into Claude Code by default, and it’s going to load a bunch of things. You’ll notice I have a brand new clean session, we haven’t typed anything in, and it already has a bunch of information in here. It is noticing my skills, my memory files, my CLAUDE.md and the files it points to, my agents, my MCP servers that are connected in here. These all represent some number of tokens. Some are small, 56 tokens, some are enormous, like 2,000 tokens, and that’s just one tool within the Google Calendar MCP.
Now, MCPs are tricky. I don’t know if you’re using these at all yet, but with an MCP server the thing to be aware of is you don’t get to pick. Basically, when it’s connected, all the MCP tools it has access to get loaded in here. I’m not going to waste a lot of time on that if you’re not doing a lot with that.
The thing that is most useful from the /context screen is this little visualization that shows a grid. This is a visual representation of those 200,000 tokens. You can see that on a cold brand new session, 23% of my tokens for the 200,000 are already taken up by my MCP tools, my agents that are available, my skills and things like that. That’s not bad, that’s normal. I usually aim for mid to high teens right now, though I have more MCPs on than I normally do. If you can keep it down, that’s all the better. I’m slowly replacing a lot of my MCPs with command line tools, but that’s for another video.
You’ll also notice at the end here there’s an autocompact buffer of another 22%. So basically you start out with only about half left, and that’s what you have to work inside of. Somewhere around there it’s going to start acting dumb.
When I was first trying to learn this, I would do a bunch of work, a couple of tasks, and then come in here to see how much this had filled up. Every time I do tasks, these little empty free spaces start turning into the purple ones labeled “messages.” So if we come in here and I type a message - this is going to be the most boring demo - or I can say, “tell me about Dan Maul.” Actually, this is a good one because I can let it run. I can look at my relationship file and you can see how that works.
So it’s going, it’s searching my relationship files, it’s finding my relationship record for you. And here’s what it summarizes. Dan, yeah, I mean, it’s not everything, but it’s not wrong. The important part though, you’ll notice now down here, this was at 23%, and now it’s up to 48%. This is just a little - if you’re interested I can show you or send you the code you can install. Basically you can customize this little line down here.
But if we come back into context, you will see that with just that one message, a bunch more purple blocks appeared. So as you’re working with it, you’re filling up these purple blocks. Not every amount of interaction with it is going to produce the same amount. If it’s doing a bunch of tool calls, that could be a bunch of tokens, but if those tool calls are only bringing back small amounts of information, maybe not. I’d bet that a lot of these tokens were from finding your person file, and there’s actually quite a bit in there. So it filled up pretty quickly.
This little meter here is how I get a sense, I’ve gotten like a vibe, of how aggressively I’m using tokens. Then you combine that with the little meter down here that lets you watch it in real time. That’s part one.
Part two is, what do you do as your tokens are filling up? The option you have out of the box is the /compact feature, which basically uses the model to look at the conversation history, summarize what was done, and throw away the history from the context. Not actually delete it, but clear it from the active context. What’s cool is you can do compact with optional custom summarization instructions, so like, “remember where Dan and I go to dinner.”
This will run, and it takes a minute or so depending on how big or complex the session is. If I’m actually writing code and stuff like that, this can take a minute, sometimes more. After it runs, we’ll be able to go in and look at that context window again.
So yeah, compacted, conversation compacted. It took everything we talked about, which included all the stuff that’s behind the scenes. Actually, most of my context was the behind-the-scenes stuff. And now if I come in here and go to context, that number went down. Back down to 26. That’s how that works, that’s option number one.
Option number two is a faster version for when I don’t need all the context, like if we didn’t figure out a bunch of things but there’s a clear next step. I have a /pause command, and I’ll show you what /pause actually does. Let’s see, show me the contents of the pause command.
So the pause command is effectively a mini, very tuned-to-my-personal-workflow version of compact. The first thing it does is look at uncommitted code in the project and make sure everything is properly committed. There are two modes, quick and full. It will commit all the uncommitted code first. Then it looks at everything we did and puts together a copy-pasteable prompt, a one-line description of what we were just working on, a one-line description of what the next step is, and the files that it already knows the next step will need. In full mode it also includes more specifics on completion and things like that.
The key here is basically a handoff file. “Landing the plane” is another term I’ve heard used. This is telling the current session agent what the next session would need in order to pick up where you left off, roughly speaking. So it spits out - if I were to do /pause, you can see it work, and this is usually quite a bit quicker.
And you can see, this right here is the copy-paste-all part. I would grab this, not a lot of text, and I would go /new and paste it in anywhere. With just this little bit, the new clean session with empty context will pick up where the last one left off.
Those are the main techniques I use to manage context. The last thing to know here is that when you were using Claude Code, you probably ran into this, there is in the settings, under, oh, I was looking right at it, autocompact. Autocompact basically says, if you’re not paying attention to this and you get to around 160 to 170 thousand tokens, sometimes a little more, sometimes a little less depending on what you’re working on, it automatically decides, “I’m going to compact this before we hit the ceiling,” and it’ll do it.
What’s annoying about that is it will interrupt your work in order to do it. That can save you in some ways, and it can screw you in other ways. I go back and forth between this being on and off. I’m actually in a mode right now where I’m strongly considering turning it off and just doing manual compacts or pauses, and seeing how that affects my workflow. But I wanted you to know that it is usually on by default, under /config, autocompact.
So those are a few things that are hopefully useful for you and answer the questions. If you’ve got others, let me know.