Compound Interest for an AI Workflow

Every Claude Code session generates knowledge that vanishes when the session ends. I built a system that catches it automatically - and the compounding effect changed how I work.

Alex Hillman
Written by Alex Hillman
Collaboratively edited with JFDIBot
JFDI

I kept repeating myself.

Every few sessions with Claude Code, I’d explain the same preference, re-correct the same mistake, re-describe how I wanted something done. The assistant doesn’t carry anything between sessions. Each one starts from zero.

That’s fine for one-off tasks. It’s a real problem when you’re building a system you work inside every day.

The knowledge that disappears

Every Claude Code session generates a transcript. Decisions get made, mistakes get corrected, patterns emerge, preferences become clear.

And when the session ends, all of that vanishes into a JSON file on disk.

I had hundreds of these transcripts piling up. They were slowing down my startup time, so I moved them into a database.

770 sessions, some of them 10-15 megabytes of back-and-forth.

But storing transcripts isn’t memory. It’s hoarding. The equivalent of remembering every word of every conversation you’ve ever had and calling that useful.

Real memory is selective. It keeps what matters and lets the rest fade.

What triggers a memory

A researcher at Indy Hall, the coworking community I run, sent me a Google paper about how they’re approaching memory in LLMs.

The key insight was surprise.

Think about how your own memory works.

You don’t remember Tuesday’s commute. You remember the day the train broke down and you had to improvise. Recovery from failure, unexpected success, strong reactions - those are the moments that stick.

I built a memory extraction system around that idea. Every 15 minutes, it scans recent session transcripts looking for specific triggers:

Recovery patterns. We tried something, it failed, we found a different approach that worked. That sequence is worth remembering.

Corrections. I told the assistant “not like that, like this.” Those are high-priority memories because they represent gaps between what the system assumes and what I actually want.

Strong reactions. Positive or negative. “That’s exactly right” and “never do that again” both signal something worth preserving.

Repeat requests. If I’m asking for the same thing multiple times, the system should have learned it by now.

Each extracted memory gets a brief summary, a confidence score, links to related people and projects, and the original context chunk. Small, structured, searchable.

The compound effect

Here’s where it gets interesting.

When I start typing a message, the system searches those stored memories for anything relevant to what I’m about to do.

If I mention a person, it pulls memories related to that person. If I’m about to edit a file, it loads what happened the last time that file was touched.

The assistant is about to edit a configuration file.

Last session, I corrected it because it used the wrong format. That correction is now a memory.

It gets loaded automatically. The mistake doesn’t happen again.

That’s one memory, one correction, one file. Multiply it across 770 sessions and months of daily use.

The more I work, the more memories accumulate. The more memories accumulate, the fewer corrections I need to make. The fewer corrections I need, the faster the work goes.

Compound interest.

Keeping it honest

Not every memory is useful in every context.

Early on, anytime I mentioned “Indy Hall” the system would surface every memory tagged with that entity, regardless of whether it was relevant to what I was doing. Talking about an upcoming event would trigger memories about a technical migration that happened to involve Indy Hall’s data.

The fix was a semantic filter.

The system pulls all entity-matched memories, then checks each one against the meaning of the current conversation. Anything that doesn’t fit the context gets dropped.

I also added thumbs up and thumbs down on recalled memories.

Each vote adjusts the weight by about 5%, so over time the system learns which connections are actually useful and which are noise.

It’s still imperfect. The matching sometimes feels like magic I can’t fully explain.

But I can see it improving week over week, and the transparency helps - every recalled memory shows me why it was surfaced, when it was formed, and how confident the system is about the match.

Why this matters beyond my setup

The specific tools don’t matter. AI sessions generate institutional knowledge, and most people throw it away - paying the same tuition over and over.

A text file of “things I keep telling the AI” is a memory system. CLAUDE.md is a memory system. The question is whether you’re doing it manually or letting the work itself generate the knowledge.

I started with manual notes. Then a structured document. Then automated extraction.

Each step compounded on the last.

The system isn’t done. I’m still tuning weights, still watching which memories surface and which should. But the trajectory is clear: every session makes the next one better, automatically, without me having to remember what I already taught it.

That’s the workflow I wanted. Not a smarter model. A model that remembers what we’ve built together.

← All posts