Matt Pocock

Full Walkthrough: Workflow for AI Coding

Matt Pocock

Matt Pocock의 2시간짜리 워크숍 — idea부터 PRD, kanban, AFK 루프, QA까지의 전체 흐름. smart zone vs dumb zone, Momento 같은 LLM, grill me 스킬, vertical slice (tracer bullets), TDD red-green-refactor, deep module 아키텍처, push vs pull로 코딩 표준 강제, sandcastle을 통한 병렬 에이전트까지.

Full Walkthrough: Workflow for AI Coding

생각 덩어리

Software fundamentals carry over to AI

When we talk about AI being a new paradigm, we forget that actually software engineering fundamentals, the stuff that's really crucial to working with humans, also works super well with AI.

Smart zone vs dumb zone — attention scales quadratically

When you're working with LLMs, they have a smart zone and a dumb zone. When you're first kind of like working with an LM and it's like you just started a new conversation, you start from nothing. That's when the LLM is going to do its best work because in that situation, the attention relationships are the least strained. Every time you add a token to an LLM, it's kind of like you're adding a team to a football league. ... It just go scales quadratically.

By around sort of 40% or around I would say around 100k is kind of my new marker for this because it doesn't matter whether you're using 1 million uh context window or 200k. It's always going to be about this. It starts to just get dumber.

Don't bite off more than you can chew — keep tasks in the smart zone

We kind of want to size our tasks in a way that sticks within the smart zone, right? We don't want the AI to bite off more than it can chew. And this goes back to old advice like Martin Fowler in refactoring uh like uh the pragmatic programmer talks about this. Don't bite off more than you can chew.

Multi-phase plans → phase N → Ralph Wiggum loop

A developer worth their salt will look at this and go, "This is a loop, right? This is a loop. We've just got phase one, phase two, phase three, phase four. Why don't we just have phase n, right?

There's this idea called Ralph Wigum uh which is kind of um sort of based on this which is essentially all you need to do is sort of specify the end of the journey where you just say okay we create a PRD a product requirements document to say okay let's describe where we're going and then we just say to the AI just make a small change make a small change that gets us closer and closer to there

LLMs are like the guy from Memento — clear beats compact

Another weird constraint of LLM is LLM are kind of like the guy from Momento, right? They just continually forget. They could just keep resetting back to the base state.

Devs love compacting for some reason, but I hate it. I much prefer my AI to behave like the guy from Momento because this state is always the same. Always the same. Every time you do it, you clear and you go back to the beginning. And so if you're able to do that and you're able to optimize for that, then you're in a great spot.

Watch your token count — status line is essential

There's this tiny little status line here that tells me how many tokens I'm using. The exact number of tokens I'm using. ... This is essential information on every coding session because you need to know exactly how many tokens you're using so that you know how close you are to the dump zone. Absolutely essential.

Specs-to-code is vibe coding by another name

Specs to code movement ... if there's something wrong with the resulting code. You don't look at the code, you look back at the specs, you change the specs and you sort of just keep going like this. This is kind of like vibe coding by another name where you're essentially ignoring the code.

I tried this. I really tried it and it sucks. It doesn't work because you need to keep a handle on the code. You need to understand what's in it. You need to shape it because the code is your battleground.

Grill me skill — adversary that pings ideas until alignment

Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies one by one. For each question, provide your recommended answer.

What I found was that I was really trying to find the words for this for what I wanted instead of that. ... when you're working on something new with someone when you're uh all trying to build something together then there's this shared idea that's shared between all participants and that is the design concept and that's what I realized I needed with Claude I needed to reach a shared understanding I didn't need an asset I didn't need a plan I needed to be on the same wavelength as the AI

Sub agents — delegated isolated context

This explore sub agents, it has essentially gone and called another LLM which has an isolated context window and then that LLM has reported a summary back. So a sub agent is kind of like a delegation. You're delegating a task to a sub agent. It goes eagerly does all the thing, explores a ton of stuff and then just drip feeds the important stuff back up to the orchestrator agent to the parent agent.

Grilling reveals questions you never considered

Should points be retroactive? There are existing lessons progress records. ... This is a really nasty question, right? Should we actually go back and backfill all of the lesson progress events? This is a kind of question that you need to be aligned on if you're going to fulfill the feature properly. This is not something I considered and Sarah Chen certainly didn't consider.

Two task types — human in the loop vs AFK

I think of there as being two types of tasks in the AI age where you have human in the loop tasks where a human needs to sit there and do it which is this we are the human in the loop with multiple humans in the loop and there are AFK tasks ... where the human can be away from the keyboard and it doesn't matter. implementation as we'll see can be turned into an AFK task but planning this alignment phase has to be human in the loop has to be.

Own your stack — beware overusing frameworks

I personally believe you at at this stage when there's no clear winner, when there's no kind of like one true way and when things are changing all the time, you need to own as much of your planning stack as you possibly can. ... they tend to overuse a certain stack. they get into trouble and they because they don't own the stack and they don't have observability over the whole thing, they just go, "This isn't working. This sucks."

Pair programming with AI — domain expert in the room

If you have a question about implementation, it should be you, a fellow developer and the AI in the same room. ... I think the really crucial decisions, the ones you need humans for, you actually need a lot of humans and it doesn't really matter how many humans are in there. You can actually throw a bunch like a kind of like mob programming with AI essentially.

Two essential documents — destination and journey

I think of there as being two essential documents that we need. We need a document that documents the destination. ... We need something to document the destination and we need something to document the journey.

Don't review the PRD — you already share the design concept

Should I be reviewing this document? ... I don't look at these. The reason I don't look at these is because what am I testing at this point? ... I know that LLMs are great at summarization because they are they're really good at summarization. I have reached the same wavelength as the LLM, right? Using the grill meme skill, we have a shared design concept. So if I have a shared design concept, all I'm doing is I'm just essentially checking the LLM's ability to summarize.

1M context window is more dumb zone, not smarter coding

What Claude code did is they essentially just did this. They shipped a lot more dumb zone to you essentially. Now, this is good for tasks where you want to retrieve things from a large context window. ... It's good for retrieval. It's less good for coding. So, I consider that it is about 100K at the moment is the smart zone.

Kanban board over multi-phase plan — parallelizable

I like creating a canban board out of this. ... A camon board is essentially just a set of tickets that you put on the wall that have blocking relationships to each other.

Why I prefer a canon board set up like this to a sequential plan because a sequential plan can really only be picked up by one agent. ... only one agent can work on these because we have numbered phases and they're not parallelizable.

Tracer bullets / vertical slices — feedback before phase three

AI loves to code horizontally. So it loves to code layer by layer. So in other words, in phase one, it will do all of the database stuff, all of the schema, ... Then it will go into phase two and do all of the API stuff. Then it will add the front end on top of that.

You don't get feedback on your work until you've really started or completed phase three. ... You haven't got an integrated system that you can test against. And so instead you need to think about vertical layers. You need to think about thin slices of functionality that cross all of the layers that you need to.

Tracer bullets is they attach a tiny bit of phosphoresence or phosphor or something to make it glow as it goes. So, this means that every sixth bullet or something, you actually see a line in the sky. So, you have feedback on where you're aiming.

The day shift / night shift — humans plan, AI implements AFK

We can think of this as kind of like the day shift and the night shift. This is the day shift for the human, right? Planning everything, getting all the uh all the stuff ready and then once we kick it over to the night shift, the AI can just work AFK.

Ralph loop — bash script over a backlog of issues

We grab all of the issues um which are inside markdown files and we cap them into a local variable. So that issues variable contains all of the issues that are in our entire backlog. Then we grab the last five commits. ... and then we grab the prompt and we just run claude code with permission mode except edits and then just essentially just pass it all of the information.

Bad codebases make bad agents

I believe that code is very important and this is kind of going to feed through the whole session and that bad code bases make bad agents. If you have a garbage codebase you're going to get garbage out of the agent that's working in that codebase.

Review in a fresh context — implementer in dumb zone, reviewer in smart zone

If you have let's say an implementation that's sort of like used up a bunch of tokens in the smart zone. If you get it to sort of try to do its reviewing, it's going to be doing the reviewing in the dumb zone. And so the reviewer will be dumber than the thing that actually implemented it.

Whereas, if you clear the context, then you're essentially going to be able to just review in the smart zone, which is where you want to be.

TDD red-green-refactor — instrument the code before writing it

What it's essentially doing is it's doing a something called red green refactor. ... So what it's doing is it's writing a failing test first. So it's saying, okay, I've broken down the idea of what I'm doing and I'm just going to write a single test that fails and then I need to make the implementation pass.

AI tends to try to cheat at the tests because it's sort of doing it in layers. it will do the entire implementation and then it will do the entire test layer just below it. ... using this technique, it generally is a lot harder to cheat because it's sort of instrumenting the code before it's then writing the code.

Feedback loops are the ceiling of AI quality

If your codebase doesn't have feedback loops you're never ever ever going to get decent AI decent output out of AI and often what you'll find is that the quality of your feedback back loops influences how good your AI can code. Essentially, that is the ceiling. So, if you're getting bad outputs from your AI, you often need to increase the quality of your feedback loops.

QA imposes taste back onto the codebase

QA is how I then um impose my uh opinions back onto the codebase, how I impose my taste. What you'll often find is that um there are teams out there who are trying to automate everything like every part of this process ... if you try to like automate the sort of creation of the idea, automate uh the QA, automate the research, automate the prototype, you end up with uh apps that I feel just lack taste and are bad.

You need a human touch when you're building this stuff because without that you just end up with slop and we are not producing slop here.

Doc rot — why I delete the PRD after shipping

Let's say a month later, we want some edits to the gamification system. And we go in with Claude and it finds this old PR and says, "Yes, I found the original documentation for the PRD system." Well, it turns out that the actual code has changed so much from the original PRD that it's almost unrecognizable. ... This is dock rot where the documentation for something is rotting away in your repo and influencing claude badly or claude agents badly. So I tend to not keep it around.

Don't over-optimize the plan — invest the work in QA

How do you then uh like should you then try to optimize and optimize and optimize that PRD until it's the perfect PR you can possibly imagine? I don't think there's a lot of value in that because I think the journey is really just sort of a hint of where you want to go and the place that you need to be putting the work is in QA

Push vs pull — coding standards strategy

Push is where you push instructions to the LLM. So you say, okay, if you put something in claw.md, uh, talk like a pirate, that instruction is always going to be sent to the agent, right? So that is a push action. You're pushing tokens to it. Pull is where you give the agent an opportunity to pull more information.

When you have an implementer ... then you want the coding standards to be available via pull. If it has a question, you want it to be able to sort of answer it. But if you then have an automated reviewer afterwards, then you want it to push. You want to push that information to the reviewer.

Sand Castle — parallel agents in worktrees and Docker sandboxes

This thing called Sand Castle. ... a TypeScript library for running these loops. So you have uh a run function that creates a work tree um sandboxes it in a docker container and then allows you to run a prompt inside there. And in that work tree then it's just a git branch and you have that code and you can then merge it later.

We have here first of all a planner that takes in it's has a plan prompt here that looks at the backlog and chooses a certain number of issues to work on in parallel. ... I'm actually using uh sonet for implementation and opus for um reviewing because I consider reviewing sort of I need I need the smarts.

So how do you turn a codebase that looks like this into a codebase that looks like that? Well, I've got a skill for that. Improve codebase architecture. ... You just sort of explore the codebase, look for opportunities where there's code that's kind of um related, and wrap all of that in a deep module.

One really cool um thing that it found here is part of my uh part of my course video manager app is a video editor. ... I wanted a way that I could wrap the entire front end all the way to the back end in like a single big module so that I could test the fact that I press something on the front end and it goes all the way to the back end. ... And it meant that AI could see the entire flow, could act on the entire flow and test on the entire flow. And honestly, it was just night and day in terms of the uh ability of AI to actually make changes

Working harder, knowing your codebase less

Raise your hands if you feel like you're working harder than ever before with AI. Yeah. Uh, raise your hands if you feel like you know your codebase less well than you used to. Yeah. This is a real thing. um because we're moving fast, because we're delegating more things, we end up losing a sense of our codebase. And if we lose the sense of our codebase, we're not going to be able to improve it.

Gray boxes — keep the shape, delegate the inside

What I think you should do is design the interface for these modules, but then delegate the implementation. In other words, these modules can become like gray boxes where you just need to know the shape of them. You need to know what they do and sort of how they behave, but you can delegate the implementation of those modules. ... I've got a big overview of my codebase and I understand kind of the shapes inside it, understand what the interfaces all do, but I can delegate what's inside.

Front-end is multimodal — use throwaway prototypes

Front end is multimodal. And so my experiences with trying to plug AI into um let's say agent browser or playright MCP to give it ... it's not very good at that yet and it can't create a nice front end in a mature codebase. ... what it can do is you say okay uh I want some ideas on how uh this front end might look. give me three prototypes um that I can click between in a throwaway uh throwaway route that I can decide which one looks best and you take the asset of that prototype and you then feed it back into the grilling session

More code review than ever — there's no avoiding it

Raise your hand if you feel like you're doing more code review now than you used to. Yeah, definitely. Um I don't think there's a way to avoid this. If we delegate all of our coding to agents, you notice that the implementation here is really the only AFK bit. We then also need to QA the work and code review the work, right?

Buy the old books

What I recommend if you take one thing away from this session is that you should head back you should head to Amazon and just buy a ton of those old books because I mean I just found it so enlightening reading them. Uh you know preai writing is always like a really fun to read anyway and I just on every single page I found that there was something useful and something interesting to to read.

YouTube 원본 →원본 사이트 →