Andrej Karpathy: From Vibe Coding to Agentic Engineering
Andrej Karpathy
12월의 변곡점 — 청크가 멈추지 않고 나오기 시작했다. Karpathy가 '프로그래머로서 가장 뒤처진 느낌'이라 말한 이유. Software 3.0, 검증가능성, jagged intelligence, agents = ghosts, 그리고 'outsource thinking, not understanding'.
From Vibe Coding to Agentic Engineering
생각 덩어리
12월의 변곡점 — 청크가 멈추지 않고 나오기 시작했다
December was this uh clear point where for me I was on a break so I had a bit more time. ... I just started to notice that with the latest models uh the chunks just came out fine and then I kept asking for more and it just came out fine and then I can't remember the last time I corrected it and then I was I just you know trusted the system more and more and then I was vibe coding
a lot of people experienced AI last year as ChachiPT adjacent thing. Uh but you really had to look again and you had to look as of December uh because things have changed fundamentally and uh especially on this like agentic coherent workflow uh that really started to actually work.
Software 3.0 — 컨텍스트 윈도우가 인터프리터의 레버
software 1.0, I'm writing code, software 2.0, I'm actually programming by creating data sets and training uh training neural networks. ... software 3.0 know is kind of about uh you know your programming now turns to prompting and what's in the context window is your lever over the interpreter that is the LLM that is kind of like interpreting your context
OpenClaw 설치 — shell script가 아니라 copy-paste text
when openclaw came out when you want to install openclaw you would expect that normally this is a bash bash script ... But the thing is you're still stuck in a software 1.0 universe of wanting to write the code. And actually the open claw installation is a is a copy paste of a b bunch of text that you're supposed to give to your agent.
you don't have to precisely spell out you know all the individual details of that setup. The agent has its own intelligence that it packages up and then it kind of like follows the instructions and it looks at your environment, your computer and it kind of like performs intelligent actions to make things work and it debugs things in the loop
Menu Genen이 존재하지 말았어야 한다 — Nano Banana가 다 한다
I built I've vcoded this app that basically lets you upload a photo and it does all this stuff and it runs on Verscell and uh it basically rerenders the menu ... And then I saw the software 3.0 version of this which is which blew my mind which is literally just take your photo give it to Gemini and say use Nanobanana to overlay the the things onto the menu.
all of my menu gen is spirious. It's working in the old paradigm that app shouldn't exist. uh and uh yeah the software 3.0 paradigm is a lot more kind of raw. It just um your neural network is doing more and more of the work and your prompt or context is just the image and the output is an image and there's no need to have any of the app in between.
코드가 아니라 정보 처리 — 이전엔 불가능했던 것들
it's not just about programming and programming becoming faster. This is more general information processing that is automatable now. ... previous code worked over kind of like structured data, right? ... But like for example with my LLM knowledge basis project ... This is not even a program. This is not something that could exist before because there was no there was no code that would create a knowledge base based on a bunch of facts.
50~60년대의 분기점 재방문 — neural net이 host process로
in the early days of computing actually people were a little bit confused as to whether computers would look like calculators or computers would look like neural nets and in 50s and 60s it was not really obvious which way would go and of course we went down the calculator path
a lot of this will flip and that the neural net becomes kind of like the host process and uh the CPUs become kind of like the co-processor ... what's really running the show is these uh neural nets
Verifiability — 왜 능력이 jagged한가
traditional computers can easily automate what you can specify in code and uh kind of this latest round of LLMs can easily automate what you can uh verify in a certain in a certain sense because the way this works is that when frontier labs are training these LLMs these are giant reinforcement learning environments.
they end up basically uh progressing and creating these like jagged entities that really peak in capability in kind of like verifiable domains like math and code and adjacent and kind of like stagnate and are a little bit um you know rough around the edges when uh things are not kind of like in that in that space.
50미터 거리의 세차장 — Opus 4.7의 황당함
the new one is I want to go to a car wash to wash my car and it's 50 meters away. Should I drive or should I walk? And state-of-the-art models today will tell you to walk because it's so close. How is it possible that state-of-the-art Opus 4.7 will simultaneously refactor a 100,000 like codebase line codebase or find zero day vulnerabilities and yet tells me to walk to this car wash? This is insane.
to whatever extent these uh models are remain jagged, it's an indication that number one maybe something's slightly off or um number two you need to actually be in the loop a little bit and you need to treat them as tools
체스가 갑자기 잘하게 된 이유 — 누가 데이터에 넣었는가
from GPT 3.5 to GPT4 people noticed that chess improved a lot ... a huge amount of like um data of chess made it into the pre-training set and just because it's in a data distribution uh basically the model improved a lot more than it would just by default. So someone at OpenAI decided to add this data and now you have a capability that just peaked a lot more.
we are slightly at the mercy of whatever the labs are doing, whatever they happen to put into the mix. ... if you're in the circuits that were part of the RL, you fly. And if you're in the circuits that are out of the data distribution, uh you're going to struggle
창업가에게 — labs가 안 다루는 verifiable 영역
verifiability makes something tractable in the current paradigm because you can throw a huge amount of RL at it. ... if you are in a verifiable setting where you could create these RL environments or examples then that actually sets you up to potentially do your own fine tuning
there are some very valuable uh reinforcement learning environments that people could think of that I think are not part of the Yeah, I don't want to give away the answer
궁극적으로는 모든 것이 verifiable해진다
I do think that ultimately almost everything can be made uh verifiable to some extent. some things easier than others. ... even for like things like writing or so on, you can imagine having a council of LLM judges and probably get get to some get get something uh reasonable
Vibe Coding vs Agentic Engineering — 바닥 올리기 vs 천장 지키기
vibe coding is about raising the floor for everyone in terms of what they can do in software. ... But then I would say agentic engineering is about preserving the quality bar of what existed before in professional software. So you're not allowed to introduce vulnerabilities due to VIP coding.
agentic engineering when I call it that because I do think it's kind of like an engineering discipline. You have these agents which are these like spiky entities. They're a bit fable, a little bit stocastic, but they are extremely powerful. is how do you how do you coordinate them to go faster without sacrificing your quality bar
10x를 훌쩍 넘는 천장 — 잘하는 사람은 더 멀리 간다
people used to talk about the 10x engineer previously I think that this is magnified a lot more 10x is uh is not uh the speed up you gain. ... it does seem to me like people who are very good at this um peak a lot more than 10x
채용도 다시 짜야 한다 — Twitter clone을 만들고 codex로 부숴봐라
hiring have to has to look like give me a really big project and see someone implement that big project like let's write say a Twitter clone uh for agents and then uh make it really good make it really secure and then have some agents uh simulate some activity uh on this Twitter and then I'm going to use 10 codecs 5.4x for X high to try to break your break your um uh this website that you deployed and they're going to try to basically break it and they should not be able to break it.
Stripe 이메일과 Google 이메일 — 에이전트가 만드는 이상한 실수
for menu genen uh you sign up with a Google Google account but you um purchase credits using a stripe account ... my agent actually tried to basically ... when you purchase credits, it assigned it using the email address from Stripe to the Google email address like there wasn't a persistent user ID ... You can use different emails, etc. Like this is such a weird thing to do.
인간이 담당하는 것 — spec, taste, oversight
people have to be in charge of this spec, this plan. And um I actually don't even like the plan mode. I I would I mean obviously it's very useful, but I think there's something more general here where you have to work with your agent to design a spec that is very detailed and maybe it's uh maybe basically the docs and then get the agents to write them and you're in charge of the oversight and the top level categories
keep_dim vs keep_dims는 잊어도 된다 — 단 view는 알아야 한다
there's a ton of details between PyTorch and NumPy and all the different like pandas and so on for all the different little API details. And I I already forgot about the keep dims versus keep dim or whether it's dim or axis or reshape or permute or transpose. I don't remember this stuff anymore, right? Because you don't have to.
you still have to know for example that um you know there's underlying tensor there's an underlying view and then you can manipulate view of the same storage or you can have different storage which would be less efficient and so you still have to have an understanding of what this stuff is doing and some of the fundamentals
코드를 보면 가끔 심장마비 — bloat, copy-paste, 어색한 추상화
when you actually look at the code, sometimes I get a little bit of a heart attack because it's not like super amazing code necessarily all the time and it's very bloaty and there's a lot of copy paste and there's awkward abstractions that are brittle and like it works but it's just really gross.
Micro-GPT — 단순화는 RL 회로 밖이다
uh you know micro GPT project which where I was trying to simplify uh LLM training to be as simple as possible. The models hate this. They can't do it. I tried to I keep I kept trying to prompt an LLM to simplify more simplify more and it just can't you feel like you're outside of the RL circuits. It feels like you're obviously you know you're pulling teeth. It's not like light speed.
Animals vs Ghosts — 우리는 동물이 아니라 유령을 소환한다
these things are not, you know, animal intelligences. Like if you yell at them, they're not going to work better or worse or it doesn't have any impact. Um, and uh it's all just kind of like these statistical simulation circuits where the the substrate is pre-training so like statistics and then but then there's RL bolting on top.
인간을 위한 docs — 가장 짜증나는 pet peeve
I still use most of the time when I use uh different frameworks or libraries or things like that, they still have docs that are fundamentally written for humans. This is my favorite pet peeve. Like I don't uh why are people still telling me what to do? Like I don't want to do anything. What is the thing I should copy paste to my agent?
Agent-native 인프라 — Menugen 배포의 진짜 고통
a lot of the work a lot of the trouble was not even writing the code for Menugen it was deploying it in versell because I had to work with all these different services and I had to string them up and I had to go to their settings and the menus and you know configure my DNS and it was just so annoying
I would hope that menu gen that I could give a prompt to an LLM build menu genen and then I didn't have to touch anything and it's deployed in that same way on the internet. Uh I think that would be a good kind of a test for whether or not uh a lot of our infrastructure is becoming more and more agent native.
Agent-to-Agent 세상 — 내 에이전트가 너의 에이전트와 대화한다
we're going towards a world where um there's agent representation for people and for organizations and um you know I'll have my agent talk to your agent uh to figure out some of the details of our meetings or or things like that.
사고는 외주 줄 수 있어도 이해는 외주 줄 수 없다
there was a tweet that blew my mind recently and I keep thinking about it like every other day. It was something along the lines of um, you can outsource your thinking but you can't outsource your understanding.
나는 시스템의 병목이다 — 무엇을 왜 만드는가
I'm still part of the system and I still I still have to somehow information still has to make it into my brain and I feel like I'm becoming a bottleneck of just even knowing what are we trying to build why is it worth doing uh how do I direct you know how do I direct my my agents
LLM 위키 — 이해를 강화하는 도구로서
this is one reason I also was very excited about all the LM knowledge bases because I feel like that's that's a way for me to process information and anytime I see a different projection onto information. I always like feel like I gain insight. ... whenever I read an article I have my uh you know my wiki that's being built up from these articles and I love asking questions about things
these are tools to enhance understanding in a certain way and this is still kind of like a bit of a bottleneck because then you can't direct the you can't be a good director if you still uh because the LM certainly don't excel at understanding you still are uniquely in charge of that.