Everyone wants to talk about RAG, vector databases, and embeddings for AI memory. I went a different direction.
I walk through how I gave my AI assistant persistent memory using nothing but plain text files. No embeddings needed. No vector DB. No infrastructure to maintain.
The key insight: memory designed for learning is fundamentally different from retrieval. Simple text files, organized well, are surprisingly effective at it.
Full Transcript
One of the biggest questions I got from sharing my JFDI system and my AI executive assistant system was the audit trail, the self-learning part. I want to show you a couple of pieces of that. What’s fun is I’m going to start by showing you how this started, how I learned this was going to work, and how I kind of inadvertently built a very simple but very powerful memory system that I’m now adapting into what I would call a proper memory system.
This has been working so well that I don’t want people to think you have to do a fancy database memory system with vector search and all that stuff to get the power. The way I set things up, and I set this up pretty early, like in the first couple of weeks of building the system, once I had a handful of agents and commands I was using on a daily basis, I wanted them to learn from patterns in usage.
The way I did that is there are sort of two main pieces here, and they’re nested. This is like a bigger design concept that I have a lot of includes for. One of the issues with context management is loading really big files, and some of my agents and slash commands are pretty big. Making it so they can progressively load through included files is really what’s happening here.
So every single agent and command, including the main agent, this is my claude.md as well, has some version of an important statement that tells it to read the agent startup. There are actually two different versions: a startup and a startup quick. They work basically the same way. The quick one works well in some of my agents because they get spun up, do a single task, and then hand off what they gather. The quick one is fine for a more robust workflow.
My morning overview command, for instance, is enormous. Something like 1500 lines. There’s a lot going on to make sure that one works all the time. A big part of that is before it does anything, it includes this agent startup checklist. The very first thing in the agent startup checklist is the audit trail creation requirements. That tells it after any workflow operation it has to create an audit trail, where to save it, and the structure.
It goes into an audit folder. Every day gets a new folder, and inside that folder is a bunch of files, usually the agent name and then an activity markdown. There’s a specific template of the structure and framework of what goes in there, and then when and why. We’re going to go into that implementation file. This is really at the heart of what makes the audit trail system work on the generating side. Every agent ends up here, and before it’s done it’s required to write a file that provides visibility into what happened, decisions that were made, data for the strategic adviser.
The strategic adviser is another agent that comes along and notices opportunities within the rule set I gave it. A lot of the proactive stuff comes from the strategic adviser. When I originally started this, “trust through transparency” is a core value of the system. I design things to provide maximum transparency, and as a byproduct that transparency becomes value to both me and the system.
Effectively what’s happening is this rule set has it come through and say: what happened, what actions were taken, what decisions were made within context, what options were considered. That last one is a really interesting one, because a lot of times I’m asking the system what are my options and how do I trade them off. It does that and then does it autonomously, which is pretty amazing.
Data for analysis, files that were created, cross-agent notes, that’s super helpful, especially for that morning overview, where I’ve got a relationship manager, a task orchestrator, a strategic adviser, and actually a few others that run in parallel. While they’re running they’re not necessarily talking to each other. I’ve seen people do this with agent mail and some other approaches, but I don’t feel like I need extra infrastructure to make this work. It kind of just happens.
There are some data integrity parts that say these are the things that need to be in every file. It is still an LLM, so it follows these rules well, not perfectly. There’s another TypeScript validator that runs, I think once a day or maybe a little less often, that’s basically checking data integrity and quality. I shared a video of this, I think I posted on LinkedIn and Twitter, the quality audit of the files generated by this. That’s a mix of deterministic TypeScript and a rule set where it does an evaluation, looks for improvements, and then actually makes those improvements where those quality standards are recommended.
This has been happening with basically everything of consequence in the system for the last two months or so and has been generating a lot of useful information. Every day there’s a folder, and inside that folder individual commands, like my meeting processing, which every time it’s doing research on a person it uses my inbox, not the internet. So it is very tightly scoped research about a person based on actual communication and the relationship we have. That all gets logged.
Here’s a pattern you can see, the person researcher is literally happening all the time in the system, so you get a lot of those files. Outside of those, here’s a cool example: it processed a link I dropped in, a YouTube link, a Raycast hidden features link. This is a small task that happens a lot, but it’s noticing the decisions about where to put things and things like that.
Then I have another command called synthesize that I run once a week. Right now it’s still a manual step, but I could just as easily make it automated and have it happen more or less often. The reason it’s manual is I’m playing with the quality of the results based on how many audits it needs to get good results. What this does is look at all the audit trails, look at past synthesis, look at my daily briefs. There’s also a pattern tracker agent that runs around just looking for things that happened as a pattern. It synthesizes all of that.
It’s also using commit history, which has been amazing. Not everything hits git now that I have more things in a database, but you get so much value out of that. It looks at the audit trails, runs the pattern miner agent, runs the recommendation tracker agent, and the system gap detector agent, which goes, hey, these two things are happening and they’re fine, but if they talk to each other both of them will be better. It notices that and suggests improvements to the system.
There’s cross-week analysis, and then it generates a weekly synthesis. Here’s one of my most recent ones from about a week ago. You can see it’s tracking weekly patterns cross-week, and it’s also paying attention to whether or not I implement its recommendations. Basically it comes in and says - this is oversimplified - the goal was to have it say, hey, you did this a certain way the last three times, do you want to just make that an SOP so I don’t have to guess each time and potentially guess wrong?
It goes through, looks at all those patterns, and assigns them a confidence score, which is what the percentages are. Some of these are patterns in my behavior and decisions. Some are patterns in the systems we’ve built and how they’re working or not, errors they throw, what’s happening. It’s wild what it picks up, especially with the fact that it’s reading my git history. It’s able to pick up things like technical velocity, which is wild. Then it makes recommendations and basically puts those into my queue for things I may want to build, sorted by things like readiness, potential impact, and how much it’ll break the system.
So this is a pretty sophisticated memory system with no vector search. It’s really geared toward turning actions that are repeated into consistent patterns to make sure they stay consistent. But it also allows a lot of space for emergent recommendations. I don’t know the exact number, I could probably ask it what percentage of features grew out of this system, but if I had to ballpark it, it’s probably somewhere near a third. Hey, if we built this feature or updated this feature in this way, this problem would go away or this opportunity would be greater. I’ve never worked with anything quite like that, human or software. I work with creative people all the time and they suggest things, but I’ve never worked with somebody who does it so systematically and consistently. I feel like there’s a lesson in there somewhere for people trying to stand out at work. This is truly blowing my mind.
So that is how the current audit trail system works at a very high level. When things graduate out of here, they end up in my SOPs. At the system level, this docs section is everything from particular patterns and standards for development to various bits of system reference documentation on subsystems. It is the most thorough documentation for anything I’ve ever made, because this thing is really good at documenting things. It’s basically documentation all the way down.
That is what it looks like. It has been running for a while. I just gave it an upgrade, and that upgrade is pretty cool.
So the way this works now: my Claude sessions file. Every time you type into Claude Code it’s logging everything locally, every session, all the messages, all the agent messages, all the tool calls, all the file touches, all that stuff gets saved to a JSON file. It’s technically JSONL because it’s long. I actually don’t know what JSONL stands for. Those don’t take up a lot of space on the hard drive, but actually, if your Claude Code is slow starting up and you’ve been using it for a while, it’s because it’s parsing all of those when you load it. I have those cleaned up about once a week. I might end up doing it a little more often.
Regardless of the cleanup, the reason I can clean them up confidently is I’m putting them right into my database. I have this giant searchable database of every session, you can see almost 2,000 sessions. These 2,000 sessions include things like me talking to it and working with it, but it also includes agent runs and commands that are automated. Stuff that doesn’t include my direct invocation or interaction is in here too. Totally searchable, organized by, and the system guesses these and it’s not 100% right, but it’s not super important that it is, at least not yet.
Classification means if I come in and open a session I can say classify with Haiku. Here’s an internal task, actually the memory catcher, we’re going to come back to the memory catcher in a second. I have this little button that appears in a few other places around the interface, like my session picker. Basically it takes the transcript, throws it at Haiku for a fairly cheap fast summary, and generates a nicer title than just the first line of the message. Takes about 15 or 20 seconds.
Session ended without user interaction. It’s really basing this on my interactions rather than the agent’s, and that actually might be a bug I want to fix. At any rate, this is in a way your first line of defense when it comes to memory. These files contain everything. I think of this as longest-term memory, it’s everything that went back and forth, the highest resolution. But you generally wouldn’t want to surface all of that, because it would be like remembering everything you ever said and everything everybody else said to you. That’s just not all that useful day-to-day. You need access to moments of consequence.
So, the newest addition to the system is memories. Memories is a system currently set up as a job, and I’m still experimenting before I decide whether to backfill my entire session history. New Claude Code sessions get inserted into the claude sessions table, and then another job comes along and looks at them and extracts - I actually didn’t like the word “extract.” I said let’s make it “catch memories.” I was thinking about dream catchers and memory catchers.
What this does is look at the chat transcript for certain types of memorable moments. Those include decisions, insights, noticed patterns, commitments I make, learning moments, corrections, which are not identical to learning. That cross-agent thing, workflows, and gaps, which is like two systems doing things that would work better if they talked to each other. It automatically looks at every session, catches the memories, and generates individual memories. Currently it’s set to generate as few as zero to as many as needed, there’s no upper bound. It’s meant to go through and extract everything of consequence. This is going to take a little tuning of the exact prompt, but you can see the actual memories it’s generating.
The timestamp here is not when the memory was generated by the LLM. It’s tied to when the original moment that was worth remembering occurred. You can see some really interesting things. Without even clicking in here, this memory is related to learning, categorized as data, given a confidence score of 85%. It’s also identified what it’s calling entities, basically any noun, any person in the system, a business I interact with, almost anything that is a database entity has the potential to be linked here. It automatically links the memory to that entity.
Why does that matter? The last piece of this is using Claude Code hooks in a way where, when I post something on the fly, it can extract entities using the same rule set and then instantly look up those entities and any memories linked with them, and contextually inject them along with whatever I type into the chat box. That way I don’t have to open up a new chat and give Claude Code a bunch of context about what I’m working on.
Let me give you a concrete example, and this is a live demo. If I start a brand new clean chat and say “tell me about my friend Dan”, I happen to know I have multiple Dans in the system. If it works correctly, it should figure out that Dan is a person, hit the database, look up the potential Dans. Found them, both of them, and asks which one I meant. Because there are two, and it doesn’t want to guess. One of my rules in the system is: don’t guess, if you’re not 100% sure, verify.
If I create another example, “when is the last time I talked to Kathy?” Again, I didn’t even give it a last name. Previously I would have had to say a bunch of things about that person. If we were in a chat session from earlier that day or even a couple of days ago, I’d have to kind of rebuild it all from scratch. This found that there’s only one Kathy entity, went and found the person file in the system which includes the last contact. So in this case it wasn’t the last time I talked to Kathy, it was that she was mentioned in an email. That’s a little tweaking I can do, because I did specify “talk to” versus her showing up anywhere in the system, but you get the drift.
So thinking about memory in terms of the longest-term context, those JSONL files, the useful caught memories, and then the ability to look up memories that have a high likelihood of being relevant to the session I just started. It doesn’t just do this at the top of a new session. It’ll do it anywhere in the session. It’s going to extract memories and keep doing it.
This is brand new, I literally built this in the last 24 hours. It’s pretty awesome. There’s even this little thing where if it extracts a memory but doesn’t know where in the system it represents, in this case, these are partners we have, I have a file for Kent C. Dodds, who is a friend and a client, and it mentions Epic Web, but we don’t have a standalone Epic Web entity yet. I have to figure out what that means, whether I want to or not. This lets me ignore it or type ahead to resolve it.
It does the same thing with files. If I’m talking about a particular problem area, it can look at the last time I talked about that problem area or part of the app or feature, and instead of having to scan for all the files, it can look at what files we touched last time we worked on this, and save a bunch of time and tokens while doing it.
Again, this doesn’t even have a vector search component to it. I had a great call with my buddy Joel Hooks this morning and a bunch of his crew from Egghead. Joel has some cool open source stuff I’m going to try out as adding a vector layer to this, to see what that unlocks, because my last two attempts I’m pretty sure I was just doing vector stuff wrong. We’ll see, that’ll be an update in the future.
So that is the evolution of a very powerful but also very simple text-based memory system, and a much more sophisticated one that I’m evolving toward. My plan is to keep running them in parallel, and at some point in the future I may merge them or keep them separate if they generate notably different results. Otherwise they’ll end up being sort of one and the same. I really do kind of like the simplicity and the beauty of the flat file-based one. That’s my update for the day.