I sent Andy three screenshots at the same time.
One was a relationship-refresh job throwing validation errors. One was the task panel in my dashboard showing stale data. One was a health monitor alert I couldn’t explain. Three different systems, three different symptoms, and I figured I was looking at three different problems.
I was looking at one problem wearing three disguises.
That’s the failure mode I worry about most: the silent one.
For my most important workflows, I run a validation layer underneath. The model can report that something finished without the underlying tool calls actually succeeding. So I have health monitoring that watches transcripts, not summaries — checking that the right tools fired, not just that the output looked clean.
Those checks are what surfaced the screenshots.
Pulling the first thread
The relationship-refresh job was the obvious place to start. Health monitoring had flagged it — the job was failing validation because expected TodoWrite tool calls weren’t showing up in the session transcript. Andy traced it to the source in about two minutes: the validator was checking for TodoWrite, but Claude had silently switched to TaskCreate and TaskUpdate instead.
We checked the changelog. TodoWrite was disabled by default as of Claude Code v2.1.142, released May 14. No announcement I noticed. No deprecation warning in my sessions. One day the tool name changed and everything downstream of it kept running — just not correctly.
Which meant every workflow that validated against TodoWrite was passing checks it should have been failing. The tool names evolved and the validators didn’t keep up.
That explained one screenshot. Two more still broken on my screen.
The hooks went dead
I had PreToolUse and PostToolUse hooks keyed to TodoWrite. Those hooks enforced guard rails on task state changes — preventing accidental overwrites, logging task transitions. The moment the tool name changed underneath them, they stopped firing.
Not because they broke. Because the new task tools bypass PreToolUse and PostToolUse hooks entirely. TodoWrite triggered them correctly. TaskCreate and TaskUpdate don’t.
Which meant anything I’d automated around task lifecycle was quietly doing nothing. The hooks were still configured. They looked fine in the settings file. They just never ran.
The IDs changed shape
The third break surfaced while fixing the task panel. The SSE pipeline that feeds the dashboard UI was choking on task data, and the reason turned out to be two-fold: the field name for referencing a task changed from id to taskId, and the ID format changed from UUIDs to sequential numbers.
My state-tracking code expected UUID format. So tasks would get created, the ID would come back as "1" instead of a UUID, the tracker wouldn’t recognize it, and every subsequent update would fail to find the task. Silently.
Three different failure modes. Same root cause: a tool got renamed and nobody told the ecosystem around it. Tasks appear to create. Updates appear to run. Hooks appear configured. Nothing actually works.
If I’d been checking by reading the session summary, I would have missed all of it. The model said everything was done. The health monitor — three screenshots landing at the same time — told a different story.
Freestyle needs a spotter
Some of my workflows are fully scripted. Every step on rails, predictable inputs, predictable outputs. That works when the task is the same every time.
The more interesting workflows can’t be scripted. They depend on context that changes between runs — what came in overnight, what state a project is in, what needs judgment. The model builds the plan as it goes.
I think of it like Harry Mack. His freestyle looks like pure spontaneity. From the inside, he’s working within constraints — rhyme schemes, callback patterns, structures he’s internalized so deeply they vanish from view.
That’s the approach I use for my most important workflows. Let it freestyle, but put a validation layer underneath so I can tell the difference between “ran” and “ran correctly.”
Without that validation layer, three things would have broken and I wouldn’t have known for days. Maybe weeks.
The pattern that caught it
Tasks as declared intent. Before work starts, create a task for every expected step. Mark each one in_progress before starting it — not after. Mark completed only when the step actually verified, not just called. If a task stays pending when it should have progressed, that’s your signal.
Transcript validation as the receipt. After execution, scan the session transcript for the raw tool calls. Not the summary the model wrote. The actual calls. Define a set of tools that should always show up if the workflow ran correctly. If one’s missing, flag it.
Tasks tell you it’s running. Transcript validation tells you it ran correctly. This bug had tasks that said complete — the transcript proved they weren’t.
If you’re still on TodoWrite
The migration checklist:
- Replace
TodoWritewithTaskCreate+TaskUpdate - In
TaskUpdate, the required field istaskId— notid - Task IDs are now sequential numbers, not UUIDs — update any state-tracking code
- Remap hooks from
TodoWriteto the four new tool names - Test end-to-end: create a task, update it, verify the ID you got back matches what your tracking code expects
There’s a temporary escape hatch: CLAUDE_CODE_ENABLE_TASKS=0 re-enables TodoWrite. Buy time with it, not a permanent fix.
The real lesson isn’t about this particular migration. Upstream tools change under you. They always will. The question is whether you find out when it happens or three weeks later when the damage has compounded. Silent failures don’t get louder on their own. You need something that’s watching.