Every Claude Code session is a learning opportunity. The problem is that lessons get lost when the session ends.
I built Memory Lane to automatically turn my session transcripts into reusable knowledge. It extracts patterns, captures decisions, and builds a growing library of institutional memory that makes every future session better than the last.
Think of it as compound interest for an AI workflow. The more I use it, the smarter it gets. The mechanism is structured memory the assistant can reference, with no fine-tuning or training required.
Full Transcript
You’ve been following along with my Claude Code-powered executive assistant and the JFDI system that I’ve kind of built around all of it. I went from showing sort of the big picture, got a lot of folks really excited, a lot of questions. That’s been fun. The last few days it’s shifted more into like, well, how did you do that one thing?
A couple days ago I showed the way I had been doing memory, with a focus on learning with an audit trail system. That part is not going away. But I wanted something that was a little more on the fly. The learning system is looking at the actions and, like, once a week looking for patterns and improving SOPs and workflows and building new features. That is not going to change in the system. That has been so powerful.
But I wanted to basically make the agent that I interact with every day, my main assistant, whose name is Andy, there’s a little bit of an inside joke from Indy Hall, I wanted Andy to have memory. The document-based approach was working really well, but I found myself still very often having to provide a lot of information up front over and over and over again, especially around repeat stuff that wasn’t quite a command or a skill. In conversation I was like, you know this, we’ve talked about this before. But obviously it doesn’t know anything. It needs to recall those things.
So I started doing some research on how people are approaching the idea of memory around agents and models, and then started prototyping this thing that I called Memory Lane as a way to generate memories. Then I had a real breakthrough last night. Somebody who comes to Indy Hall, the coworking community that I run, was there for Philly.rb, our local Ruby meetup. We were talking about this stuff, and a couple days later he sent me a link to, I think Google published a paper, a research paper, about memory that they are building actually into the model, into the LLM itself.
One of the key insights was this idea of surprise. If you think about it, a lot of this is biomimicry. We are trying to understand how things work in nature and asking how do we make technology mimic what happens in nature. The biomimicry of memory is thinking about things like long-term and short-term. What triggers cause a memory to form? What triggers cause a memory to be recalled? What triggers cause us to lose memory? Being inspired by those ideas, not necessarily copying them, and going, “Well, if those things work in our actual human brains, what is a version that could maybe empower an agent who has no context between any individual message that gets sent back and forth? The model has no idea. It’s just processing the back and forth.”
So I came up with a few ideas, and that paper really pushed me over the finish line. I want to show you what I’ve built over the last four or five days or so, because it’s pretty cool. This is going to get a little more technical. I’m not just going to be showing off the features, I’m going to start there because you see some of this on my screen. But I’ve also got an architecture document that we can go through together.
The way this system really began, and we’re looking at the memories page here, actually makes more sense to go into my Claude Code sessions, because this is where memory starts. If you haven’t dug under the hood of Claude Code, you might not realize that all the back and forth you do, all the stuff that appears in the terminal, is also being saved to a local JSON file. I think it’s a JSONL file. I don’t even know what JSONL versus normal good old-fashioned JSON is, but it is all the back and forth with the agent. It’s all your tool calls. It’s all of the files that it touches and all this other useful information.
It’s actually pretty smart about how it does it. It doesn’t save things like the responses that might include secrets and so forth, so it does that in a pretty smart way, and it’s all kept on your local machine. If you’ve noticed over time of using Claude Code that it gets kind of slow, it’s because that pile of JSON files is piling up and it has to read all of them. That’s how it, if you can resume a session or rewind, that’s all just basically reading those files.
I’ve been using this long enough that my startup was getting kind of slow. I wanted to get that stuff off disk, but I didn’t want to throw it away. Initially I was just doing a basic backup, and then I was like, no, I want to make it useful instead of just sticking it in cold storage somewhere. So I set up a table in the JFDI system in the database, which by the way is using Supabase, which has been awesome. There’s a table in Supabase for my Claude chat sessions and every session gets its own row. It syncs like every 5 minutes or so, sending those JSON files up and stuffing the entire transcript, the entire JSON file, which is just text, but in some of my really long chats can be 7, 10, maybe 15 megabytes if I’m really going kind of bananas and running very long sessions with a lot of compacting and things like that.
I also do a little bit of processing as I’m doing that. One column is full transcript, the next is going to be the first message, the next column is going to be the most recent message. Also timestamps for both of those. I pull out separately all of the messages that I say, all of the messages that the agent generates, so we can keep those things easy and separate and recallable on their own. Also in another column, things like tool calls, what kind of work it was, what files were touched, and so on and so forth. That all gets automatically generated. I’ve been doing that for quite a few weeks now. It was sort of a backup plus, but it also gave me my own interface for being able to recall old messages without the slowness of the Claude Code startup.
So that little button takes the chat transcript, throws it against the Haiku model, and has it auto-generate a nicer title rather than just whatever my first message happened to be, which sometimes makes it easier to find these things. It’s cheap. It’s relatively fast. It’s something that I might have it just start doing automatically. You can see it turned that from whatever it was to “debug Memory Lane,” which is not what that was.
So this is in a way its own kind of memory. But the way I think about it, it would be the equivalent of if the way you remembered things is you remembered everything you said, everything the other person said, everything you both did. That’s not very useful memory. It’s almost too high resolution. There’s too much information there, and recall from that would take a whole lot of work.
So what I started prototyping was a way to, once a session is in the database, extract memories from it. What would the qualification for that be? I basically have another job that runs roughly every 15 minutes and it looks at all new information in this table. It does incremental updates and all those kinds of things. I call it a memory catcher.
What the memory catcher does is go through those chats and look for what I would call moments of consequential decision, which another Indy Hall member pointed out is like a really awesome band name. Moments of consequential decision includes a handful of things: decisions outright, insights, patterns, commitments, moments of learning, correction in either direction, usually me correcting the agent, but sometimes it’s the agent correcting me, I make an assumption and it’s like, “That’s not how that works”, workflows, and then gaps. Gaps are really cool. That’s when it notices two parts of the system that are related but not talking to each other, and where, if they were talking to each other, there would be an improvement in performance or output.
So it’s looking at the entire transcript for those moments and extracting the information out into smaller little chunks. For folks familiar with RAG, retrieval-augmented generation, chunking is one of the key parts of getting the components that are retrievable to the right size and context. This is all being done with the model. It’s using that to decide what to pull out, how to structure it, following all of those rules. Then it generates memories with all of these pieces of information. There’s a title, a type, this is technical learning, this one is systems learning, this is a relationship insight. We’ve got workflows. As you can see, there are a couple of things that might duplicate in here. Here’s an interesting example of a correction at the systems level.
So it’s keeping track of when the memory was formed, that is the timestamp of when either I said the thing or the agent said the thing. When the memory was saved is usually about 15 to 20 minutes later. Everything except for original context is being generated on the fly by this process. So it creates this brief little summary. The reasoning for why it chose it is a huge piece of this part of the system. Confidence score is helpful for filtering out stuff below a certain threshold. Related entities can be anything from people, projects, reminders, things in my JFDI system. Events, documents, those are all entities. Files within the file system can be an entity as well. That’s useful for recalling memories associated with a specific file. If I touch a file or if the agent touches a file, it should go look up memories about the last time that file was touched. It saves the chunk of the original context, formats it in a consistent markdown way that’s easy to read, and provides the link to the source session.
So all day long as I’m working, as I’m doing things, as the agent is doing things, because I’ve got a bunch of jobs that run on various schedules, a lot of which is agents being invoked or workflows, slash commands, skills being invoked automatically to either do prep work or all the admin stuff that makes it so when I sit down to work things are ready for me, work also gets read and turned into memory. This isn’t just when I’m actively talking to it. It’s when Claude Code does anything. It is generating all these memories.
There’s a fun little visualization you might have seen me share a couple nights ago, where I said, “Hey, this is kind of fun.” I was inspired by my business partner Amy Hoy and her partner Thomas, who many years ago made this really cool visualization called Twist Story that was reading from Twitter data at the time. This is not nearly as cool or inspiring, but that was kind of where my head went. It does this river view of all the memories as they’re coming through, which is a fun visualization. I also had it build a heat map. If I mouse over each node, I can see when I click on it, they all kind of shift around, pulling the one into view and showing all the things that are its closest connections. This is more of a toy than a utility, but cool.
Now, the cool and useful part is over here on the right. My chat window has something new. We’re storing all of those memories. That’s half of the equation, and I would argue the easier half. It was much easier than I thought it was going to be, it really comes down to like one main prompt. The retrieval part is where things are powerful and challenging.
Part of that is I had to learn a bunch of stuff, like the fundamentals of RAG. I had to really wrap my head around what embeddings are, how they work, and how semantic retrieval works compared to just text-based search, which was getting very good results. But everybody kept saying, you’ve got to try vector search. So I got a great bit of code from my buddy Joel Hooks, who has this GitHub project called Semantic Memory. In the demo he gave me, he was doing project planning and had a body of books that he references: “while you’re planning, reference these books for best practices, because these are the experts I listen to anyway.” That’s a really cool idea, but it’s not as useful for me in my day-to-day for the kind of work that I do. What I liked is the idea of the system using its own body of knowledge to kind of recurse on itself, making it so that I don’t have to remind it all the time of certain things.
So I combined Joel’s idea with a few of my own, some other best practices, and then that surprise idea from Google. Google’s idea is at the model level, where it’s using surprise to weight things that will be surfaced. I’m doing it at the retrieval side.
I’m using Claude Code hooks at certain points: when I type a message and hit enter, or when the agent sends a message or invokes a tool call. There’s a little bit of logic that decides when to do it, when there’s enough information, when there’s not enough information for it to be useful. Then it runs a little shell script that does a bit of searching. The searches layer on top of each other, it’s doing a combination of text search and semantic search.
In real time as I’m working, I get a little bar that pops up. Part of the challenge is, in the same way you’re not in control of when memories pop up, I’m not entirely in control of when the memories pop up either. I know what things generally trigger it. So I’m going to mention a person and say, like, when is my and Adam’s next meeting? I’m going to very intentionally leave Adam’s last name out, because in theory what it should do is notice a name, or something it thinks is a name, and say, “I think that’s a name and there’s more than one Adam in the system.” Which Adam did you mean?
So it’s going to pull out a few different Adams. In this case I meant Adam Tetterus.
That is a clarifying step that happens automatically anytime I mention an entity, a person, a project, a reminder, an event, and it’s like, “Hey, there’s a few by the same name. Which one do you actually mean?” It will use that information to clarify the rest of the question and the rest of the actions it takes.
These little bubbles up here are really what I want to show off, because this is where the magic happens. When it invokes a memory, it is either going to do an entity memory, the little yellow ones, or a semantic memory, which are memories that it thinks are related based on meaning, not based on a term necessarily. The entity memories are pretty straightforward: if I mention a specific person, it will go find information about that person. One of the really key things is that anytime it touches a relationship file, it’s going to invoke memories related to that person. Not every memory is in the relationship file, so this notices not just what the person and I talk about but how that person shows up in the entire universe of the system, and is able to pull pieces of that into the chat.
The semantic ones are where it gets really cool. All of the memories that it’s pulling here, and this is weighted towards technical stuff because I’ve largely been building this for the last couple of days, so it’s a little self-referential, these were all pulled in real time. I was working on something related to a feature and the system noticed, did a vector search, found memories that matched what we were talking about, and once they passed a certain set of thresholds, it loaded not just that the memory exists but grabs the entire text. Remember when we were in memories, we’re storing the actual body, so all of this, if it’s useful and if it passes all the thresholds, actually gets stuck into the context.
This is a way to build relevant context on the fly using all of these past memories, which gives you in effect the ability to remember something. The agent is about to edit this file. Last time we edited this file I had to correct it because it made a mistake. It’s going to load that correction in and most likely not make the same mistake again, because I’m providing that as context. This is a way to build context on the fly, way fancier than what I was doing before with just grep-ing across text files.
So it shows me what kind of memory it is, decision, correction, learning. I get a little tip. The thing I’m working on is I wanted to know what message caused it to pull in this memory, so I can start getting a better sense of what kinds of messages invoke or recall what kinds of memories. The hardest part of all this is the vector search feels a little bit like magic. The best explanation I’ve seen about how vector search works is that it’s basing things on the relative proximity of meaning, it guesses the meaning of words and phrases and sentences based on the words and phrases and sentences that are closest to them in this three-dimensional plane, mathematical equation way that, I’m going to be honest, I still don’t fully understand. But I’m getting a sense of how it works in the same way that I have a sense of how the LLM works. I don’t fully understand all of the architecture. I know more than I did six months ago. There’s a lot I don’t get, but I have a sense of it that I think you can really only get by tinkering with it at this level and then observing it.
It tells me when the memory was actually from. This one is from yesterday. In this case, this memory was recalled 30 minutes ago. So while I’m in chat I can see what memories are influencing it. The recall ratings, 64% and so on, are how semantically similar the system thinks it is, and that’s one of the factors that goes into whether a memory is injected into a session or dropped or maybe kept but weighted much less. That’s all built in here as well. That’s the algorithm I’m still playing with.
The last piece is your good old-fashioned thumbs up, thumbs down. If the link between two things makes sense, I give it a thumbs up. If it makes less sense, I give it a thumbs down. That not only indicates it visually here, but there is a table that keeps track of all of those in context, because it’s not that the matching is bad, it’s that it wasn’t useful in whatever we were doing. So the next time the system does this loop, it’s going to factor in those thumbs up, thumbs down as a plus or minus weight, I think it’s like plus or minus 5% per vote. So over time this will get better.
A lot of my thinking about this process is watching how quickly the very unsophisticated audit trail got powerful and compounding. I’m very quickly starting to see little bits and pieces here. It’s still a little goofy, and I think it’s going to be better once I’m using it not just while I’m building the system but actually having it track memories in my day-to-day, how I do work rather than how it works technically speaking. Functionally they should be about the same thing. What I might end up doing is having different kinds of work carry different weights. Technical work on the system may have a different kind of weighting, whereas my day-to-day work, whether it’s creative work or management work or communications work or community work or planning work, those may have different weights. I may have another layer that figures out what kind of work we’re doing in a session and then adjusts that for Memory Lane, which again is the name of this whole feature.
Most of the time these are up here just glowing and showing me that it is actually recalling these memories. And then if something is going really right or really wrong, I might peek at it. I don’t know how much longer this stays a persistent bar, but for now it has been useful for me for learning. I think of this as a transparency tool. It’s a way for me to understand how a tool that I built actually works compared to how I thought it works. But I do think there’s some long-term utility in it, in the same way that a human collaborator who is good at voicing why they’re doing what they’re doing is easier to collaborate with, I kind of want the same thing from here. It might end up being a toggle that I turn on and off for different kinds of work. We’ll find out.
That’s the general gist of Memory Lane and how it works in terms of the UI. Let’s get into the architecture.
I covered a lot of this already, but for folks that are more into architectural diagrams and things like that, I’m going to share this document rather than talk about it. So this is that workflow diagram of the overview, how it’s extracting learnings from sessions that are completed, and also, it says “completed” here but it also does incremental updates. So if I go back to a session or pick up a session later, those things will get picked up. The metadata and vector embeddings get stored all automatically.
The vector embedding stuff, by the way, is all being done on-device. That doesn’t hit anybody’s APIs. So it’s another opportunity to keep all this stuff local, using Ollama and one of the popular embedding models, I think it was the one I borrowed from Joel. Whatever model he was using for embedding, I was like, cool, I’ll just use that. Postgres and pgvector for actually storing and retrieving those.
The context hooks are the Claude Code hooks that run these little bits of code that decide when to search, when not to search, how to search, and then the weighting and all of those kinds of things. Then it gets injected into the new session.
The memory extraction section goes over the main types of memories and how they’re prioritized. There’s something in here I want to change, the gap value. I would like it to be finding more of those and executing on more of those. Maybe it’s more useful for the audit trail type workflow. What I might do is when it finds them, have it promote them into feature ideas or something along those lines.
Extraction triggers, this was the thing I was most inspired by from Google. The system is looking explicitly for four things. Recovery patterns: either of us tried a thing, it didn’t work, we had to do it a different way and got it to work. That’s a trigger for a memory. That’s the same way when you solve a problem it’s very satisfying because it was fail, fail, fail, succeed. You watch an agent work and it fails, fails, fails, then succeeds. That’s very cool that it can succeed, but it’s very frustrating when it doesn’t remember that for next time. Now mine does.
User correction: when I say, “Hey, you did it this way, I want you to do it this other way,” it will detect that and not only create that memory, it’ll prioritize it. Enthusiasm signals: “Hey, that’s exactly what I wanted” or “wow, that’s really cool.” It does feel weird to say that to a robot sometimes, but I find it helps somehow, and more importantly in this case it’s a signal that however it did it, I liked that. I want more of that in the future, which will make that memory useful. Same thing in the other direction, negative user reactions also get caught. So it’s like, “Hey, definitely never do that” will go in there and be weighted appropriately. And then repeat requests: where I’m asking it to do the same thing multiple times.
Some stuff about the schema, I’ll let you read that on your own time. A bit about entity resolution, where it’s detecting people and the entities in the system. A bit about the table, where stuff is stored. There is the core memories table, the session recalls table that links the memory to when it was surfaced, so if I refresh the page it will stay there, and also down the road I can look at a session and see how many memories it has. So they’re linked in both directions.
Session recalls, and then memory feedback, that’s storing the thumbs up and thumbs down within the scope of a memory being included in a session, and it includes the query that surfaced it. So all of that voting on whether or not a memory was useful is in context and that context is stored along with the memory feedback.
The smart memory retrieval section: two main hooks in Claude Code. There’s the user prompt submit. When I hit enter, it does two things at the same time. It pulls out entities, so for example, “what events does Indy Hall have coming up?”, it determines that Indy Hall is an entity, matches that against our internal databases, and if it finds it, looks for other memories that share the same entity. It’s the existence of an entity that tells the system, “Hey, there’s probably more things, go look for them using that term.”
But before I move on, it brings all those back and still runs a semantic similarity filter on them, because I found that anytime I mentioned Indy Hall for any reason it was bringing back things that were not relevant to the conversation. So it brings back all the stuff related to Indy Hall, but then looks at the semantic meaning of our conversation and the semantic meaning of each individual memory that has the entity Indy Hall in it, and drops anything that doesn’t match the context of what we’re talking about. That got rid of a lot of junk. It’s kind of like the way when you’re talking and you hear a trigger word in conversation, your brain goes to that trigger word instead of what you’re supposed to be paying attention to. That’s what was happening here, and I was able to basically program that out. Program out that ADHD kind of response of “ooh, shiny thing.” It doesn’t do that anymore. It stays in context.
The second half is the pure semantic search, which is used when there aren’t any entities. It will generate an embedding in real time for whatever I posted or whatever it posted, and it generally looks at a window, not just that one message, but that plus the last three to five. There are a few different ways it decides how much to look at. It generates an embedding on that and throws it against the stored embeddings from our memories to find things that line up, and again it has to match a minimum threshold. There is a layer in here that looks for intent-type keywords that will boost some of these elements. For example, if there are keywords around “mistake,” “wrong,” or “error,” it will give a little bit of extra weight to a memory that might be a correction or a gap, whereas in another context it might drop those because they’re below the minimum threshold.
The re-ranking algorithm, this is a piece I modeled off of best practices. I had it search Claude’s, like Anthropic’s best practices on this, and also things like what Matt Pocock’s material says on this, and based it on that. You get a sense of how it’s weighting and scoring and boosting based on those things. The minimums are adaptive, and then also how my positive and negative feedback boosts or degrades where a memory will show up next time.
The second hook is after a tool use, specifically after Claude reads or edits files in the personal data directory. So it’s not doing it for the entire project, but it can catch things that I didn’t necessarily say, but maybe it did. If it’s going to go edit a file, it should load what we know about touching that file in the past into the chat so that it doesn’t repeat mistakes, or maybe is more consistent in the development patterns it uses, the templates it follows, my preferences for how a particular action goes. One of the things that gets updated most in the files is my relationship files. So every time Claude Code touches a relationship file, that generates memories related to what changes are being made in that file, so in the future that is easier to recall.
There’s some stuff in here about the Memory Lane UI, how it works, the embeddings and infrastructure, and some of the decisions that we made along the way that got us here, a collection of examples and test scenarios, and some real examples that it pulled out.
I didn’t generate this document, by the way. I asked Andy, I asked my assistant, based on everything we built in the system, to put together a document. I’ve checked it against most of our things. One of the things I didn’t tell it to do, but sometimes it anticipates me in ways I don’t expect, is a checklist of what things you would need to create something like this for yourself. So all the more reason for me to share this. I’m just going to put it in a gist so it can be easily shared around, perhaps by file references.
That’s about it. That’s more than enough for me today. I’ve been working on this for a few days and it’s really starting to work well. I’m most excited to see what this looks like once the memories reflect my day-to-day. I also have a backfill script that I’m going to probably run overnight and try to generate memories on stuff that I’ve done over the last few weeks besides build a memory management system for an AI assistant, so that this looks a little more diverse in terms of what it can represent.
I wanted to share progress on this because folks seem really excited and interested in how this worked. Happy to answer questions and stuff like that as well. Keep the good stuff coming. Anything you build based on what you learned here, let me know what it is. Happy to share.
For folks that have been asking about open source and sharing, my brain is going in one of two ways. One is, for many of you don’t know, I’ve been doing education for not explicitly developers but technically minded people and creative people for a long time. So I could see doing some kind of workshop or clinic where we build a version of this, enough to get you started in some of these core concepts, and then you can be off to the races building your own. Maybe a little community of people building them, perhaps.
The other thing is I could see this memory system, if it continues working well, as the first part that is not so deeply ingrained into my personal workflows. There are some parts, like the entity stuff, that are more specific. But this feels like the first part that is very abstractable. Maybe as a Claude Code plugin or something like that, whether we open source it or make it available as a product. I’m not sure. I would like to open source as much of the learnings as possible, and if I can get to a point where I can extract the useful chunks of code that I don’t feel like are putting myself or any parts of my system at risk, I want to share more of that stuff as well. Curious to hear what you all think and what questions you have.