Simon Willison2025-03-11

LLM로 코드를 쓰는 법

Simon Willison

LLM으로 2년 넘게 코드를 뽑아온 Simon Willison의 실전 가이드. 컨텍스트 관리·학습 컷오프·인턴처럼 지시하기·테스트 책임·대화의 반복·안전 샌드박스까지. 'context is king'과 'LLMs amplify existing expertise' 두 문장에 핵심이 담겨 있다.

LLM로 코드를 쓰는 법

생각 덩어리

Using LLMs for code is difficult and unintuitive

Online discussions about using Large Language Models to help write code inevitably produce comments from developers who's experiences have been disappointing.

Using LLMs to write code is difficult and unintuitive. It takes significant effort to figure out the sharp and soft edges of using them in this way, and there's precious little guidance to help people figure out how best to apply them.

If someone tells you that coding with LLMs is easy they are (probably unintentionally) misleading you. They may well have stumbled on to patterns that work, but those patterns do not come naturally to everyone.

과잉확신 페어 프로그래머라는 멘탈 모델

Ignore the "AGI" hype—LLMs are still fancy autocomplete. All they do is predict a sequence of tokens—but it turns out writing code is mostly about stringing tokens together in the right order, so they can be extremely useful for this provided you point them in the right direction.

My current favorite mental model is to think of them as an over-confident pair programming assistant who's lightning fast at looking things up, can churn out relevant examples at a moment's notice and can execute on tedious tasks without complaint.

Over-confident is important. They'll absolutely make mistakes—sometimes subtle, sometimes huge. These mistakes can be deeply inhuman—if a human collaborator hallucinated a non-existent library or method you would instantly lose trust in them.

Don't fall into the trap of anthropomorphizing LLMs and assuming that failures which would discredit a human should discredit the machine in the same way.

Training cut-off — 라이브러리 선택의 제약

A crucial characteristic of any model is its training cut-off date. This is the date at which the data they were trained on stopped being collected.

This is extremely important for code, because it influences what libraries they will be familiar with. If the library you are using had a major breaking change since October 2023, some OpenAI models won't know about it!

I gain enough value from LLMs that I now deliberately consider this when picking a library—I try to stick with libraries with good stability and that are popular enough that many examples of them will have made it into the training data. I like applying the principles of boring technology—innovate on your project's unique selling points, stick with tried and tested solutions for everything else.

Context is king

Most of the craft of getting good results out of an LLM comes down to managing its context—the text that is part of your current conversation.

This context isn't just the prompt that you have fed it: successful LLM interactions usually take the form of conversations, and the context consists of every message from you and every reply from the LLM that exist in the current conversation thread.

When you start a new conversation you reset that context back to zero. This is important to know, as often the fix for a conversation that has stopped being useful is to wipe the slate clean and start again.

One of the reasons I mostly work directly with the ChatGPT and Claude web or app interfaces is that it makes it easier for me to understand exactly what is going into the context. LLM tools that obscure that context from me are less effective.

One of my favorite code prompting techniques is to drop in several full examples relating to something I want to build, then prompt the LLM to use them as inspiration for a new project.

Ask them for options — 초기 리서치 단계

Most of my projects start with some open questions: is the thing I'm trying to do possible? What are the potential ways I could implement it? Which of those options are the best?

I use LLMs as part of this initial research phase.

The training cut-off is relevant here, since it means newer libraries won't be suggested. Usually that's OK—I don't want the latest, I want the most stable and the one that has been around for long enough for the bugs to be ironed out.

The best way to start any project is with a prototype that proves that the key requirements of that project can be met. I often find that an LLM can get me to that working prototype within a few minutes of me sitting down with my laptop—or sometimes even while working on my phone.

디지털 인턴으로 쓴다 — 함수 시그니처 지시

Once I've completed the initial research I change modes dramatically. For production code my LLM usage is much more authoritarian: I treat it like a digital intern, hired to type code for me based on my detailed instructions.

I find LLMs respond extremely well to function signatures like the one I use here. I get to act as the function designer, the LLM does the work of building the body to my specification.

If your reaction to this is "surely typing out the code is faster than typing out an English instruction of it", all I can tell you is that it really isn't for me any more. Code needs to be correct. English has enormous room for shortcuts, and vagaries, and typos, and saying things like "use that popular HTTP library" if you can't remember the name off the top of your head.

The good coding LLMs are excellent at filling in the gaps. They're also much less lazy than me—they'll remember to catch likely exceptions, add accurate docstrings, and annotate code with the relevant types.

테스트는 외주 불가

The one thing you absolutely cannot outsource to the machine is testing that the code actually works.

Your responsibility as a software developer is to deliver working systems. If you haven't seen it run, it's not a working system. You need to invest in strengthening those manual QA habits.

This may not be glamorous but it's always been a critical part of shipping good code, with or without the involvement of LLMs.

대화라는 사실을 기억하라 — 첫 결과는 출발점

If I don't like what an LLM has written, they'll never complain at being told to refactor it! "Break that repetitive code out into a function", "use string manipulation methods rather than a regular expression", or even "write that better!"—the code an LLM produces first time is rarely the final implementation, but they can re-type it dozens of times for you without ever getting frustrated or bored.

I often wonder if this is one of the key tricks that people are missing—a bad initial result isn't a failure, it's a starting point for pushing the model in the direction of the thing you actually want.

코드 실행 도구 — 샌드박스 기준의 선택

An increasing number of LLM coding tools now have the ability to run that code for you. I'm slightly cautious about some of these since there's a possibility of the wrong command causing real damage, so I tend to stick to the ones that run code in a safe sandbox.

This run-the-code-in-a-loop pattern is so powerful that I chose my core LLM tools for coding based primarily on whether they can safely run and iterate on my code.

Vibe-coding은 배우는 방식

The best way to learn LLMs is to play with them. Throwing absurd ideas at them and vibe-coding until they almost sort-of work is a genuinely useful way to accelerate the rate at which you build intuition for what works and what doesn't.

My simonw/tools GitHub repository has 77 HTML+JavaScript apps and 6 Python apps, and every single one of them was built by prompting LLMs. I have learned so much from building this collection, and I add to it at a rate of several new prototypes per week.

사람이 넘겨받을 준비를 하라

LLMs are no replacement for human intuition and experience. I've spent enough time with GitHub Actions that I know what kind of things to look for, and in this case it was faster for me to step in and finish the project rather than keep on trying to get there with prompts.

속도가 아니라 야심의 확장

This is why I care so much about the productivity boost I get from LLMs so much: it's not about getting work done faster, it's about being able to ship projects that I wouldn't have been able to justify spending time on at all.

AI-enhanced development makes me more ambitious with my projects.

The fact that LLMs let me execute my ideas faster means I can implement more of them, which means I can learn even more.

LLMs amplify existing expertise

Could anyone else have done this project in the same way? Probably not! My prompting here leaned on 25+ years of professional coding experience, including my previous explorations of GitHub Actions, GitHub Pages, GitHub itself and the LLM tools I put into play.

I also knew that this was going to work. I've spent enough time working with these tools that I was confident that assembling a new HTML page with information pulled from my Git history was entirely within the capabilities of a good LLM.

If I was trying to build a Linux kernel driver—a field I know virtually nothing about—my process would be entirely different.

코드베이스 질문 — 긴 컨텍스트에 통째로 던지기

Good LLMs are great at answering questions about code.

This is also very low stakes: the worst that can happen is they might get something wrong, which may take you a tiny bit longer to figure out. It's still likely to save you time compared to digging through thousands of lines of code entirely by yourself.

The trick here is to dump the code into a long context model and start asking questions.

I use this trick several times a week. It's a great way to start diving into a new codebase—and often the alternative isn't spending more time on this, it's failing to satisfy my curiosity at all.

원본 사이트 →