Agents need more than a chat
Jacob Lauritzen
Legora CTO가 푸는 vertical AI agent의 UX 문제. compaction 직전까지 30분 돌고 결과물을 망치는 chat 인터페이스의 한계. Verifier's law·trust·control 축으로 task를 분해하고, planning → skills → elicitation → high-bandwidth artifact(문서·tabular review)로 협업 표면을 옮기자는 주장. 'agents are not humans, so we should not constrain them to human language.'
Agents need more than a chat
생각 덩어리
30분 후 compaction — 포기하는 순간
You're told to research something, draft a contract, make no mistakes, and um it starts thinking, it starts reading, launches a bunch of sub-agents, does web search, writes files, launches more sub-agents, does more reading, writes more files, keeps going, takes forever, after 30 minutes, it gives you your contract.
Then you see this, compaction. That's when you know you can give up. It's going to forget everything. It's in the the the context rot state.
Does it look Was it only clause three that was changed? Probably not. And so you end up in this state. Not the greatest experience.
새 경제 — planning과 reviewing이 진짜 병목
Our goal and the goal of most vertical AI companies is to make agents complete more and more complex work end-to-end. That's sort of doing that has changed a lot in the past 6 to 12 months because there are new economics of production.
Right now planning work and reviewing work is the new bottleneck. So doing the actual work is extremely cheap. It's very easy to do. But now you have to spend time planning, you have to get the non-functional requirements, you have to get the specs, and you have to spend a lot of time reviewing the work.
Verifier's law — 검증 가능하면 풀린다
The verifier's rule is a a term that was coined by Jason, which states that if it's a task is solvable and it's easy to verify, then it's going to get solved by AI.
I think it also goes for agents. You know, if you can make a task verifiable, you can just run an agent in a loop and tell it, "Hey, you did this wrong. Please fix it." and it'll eventually get there.
같은 vertical 안에서도 task가 다른 위치에 있다
If you take legal, we can check definitions in a contract, super easy to verify, super easy to get done.
Writing a contract is very easy to solve, but actually extremely difficult to verify cuz if you think about it, when you write a contract, the only time you can actually verify if, you know, the language you used works is if it goes to court and a judge basically verifies it.
Litigation strategy is also basically impossible to verify. ... If you ask five lawyers, "What should be the right strategy for this litigation case?" they're going to give you different answers.
Trust × Control 두 축
There's two things that are important um to think about with agent-human collaboration. Control is the first one. Control is how effectively can a human instill their knowledge into the work that the agent is doing. So how effectively can I steer it?
Control is a matter of how much do I need to review. So if I have very low control, I'm going to look at every single agent trace and see exactly what it did. If I have very Oh, sorry, low trust. If I have very high trust, I won't look at it at all.
Trust 끌어올리기 1 — task를 spectrum 아래로 끌어내리기
If you want to implement a feature, well, you can give it browser access, you can do test-driven development, and then suddenly it's actually a verifiable task and it's going to do much better.
Let's take the contract example in legal. You can't really verify it, but you can look for a proxy for verification. So for contracts, what you can do is you can take a look at previous contracts. These are our golden contracts. We know they work well. Let's set up a test. Is it the new contract Is it similar to the old one? That's sort of a proxy for verification.
Trust 끌어올리기 2 — task decomposition
I can turn that from one task into a bunch of other tasks, and I can leave picking a risk profile, picking the precedent documents, the negotiation stance, I can leave that to the human, but I can try to get other stuff done where it's easy to verify. So apply formatting, make it look like all my other contracts. Apply checking definition, which is essentially linting.
Trust 끌어올리기 3 — guardrails
Guardrails is essentially a way to increase trust by limiting what the agent can do. So instead of being able to do all of this, you're just going to say you can only do these, you can only edit these three files, you can only read these from this directory, you can only search these websites. By limiting what it can do, you basically get more trust cuz you know they won't do all these weird things.
An example of this, probably all know this one, Claude Code. If there's very low trust, it's going to basically tell you every single time it wants to do anything, which makes it extremely useless. Uh and on the high trust end of the spectrum, you just YOLO mode it, let it rip, and hope that it doesn't delete your prod database.
Tree of work — control은 어디에 개입할 수 있느냐의 문제
If you think about complex agent work, you can kind of think about it as a tree of work, as a DAG essentially.
This is extremely low control because essentially, I can only impose my judgment at the root level. So it's going to do all of this work and then it's going to get back to me and then I can try to talk to you again.
Planning의 한계 — 다 해봐야 뭘 해야 할지 안다
Planning essentially allows you to steer the agent up front and align on the approach.
The problem is planning, you basically have to do all the work to just know what to do. ... It's really inefficient. It takes a long time and asks you a bunch of questions, and in the end, it's basically impossible for it to really know if you it has all the information it needs.
Essentially, you could compare planning to working with a co-worker that's uh comes up to you, tells you about the approach, you align with them, and then you never ever hear from them again until they deliver the final document. It's not a super nice way to collaborate.
Skills — judgment를 node에 인코딩
Skills are really, really, really good. They are really good because the skills allow you to encode human judgment into essentially the nodes of work the that happen here. So I can say whenever you review confidentiality, you should do it in this way.
The really good thing about this is it allows for contingencies. So here at one of the termination reviewing termination clauses, there's a special EU law. But I have that in a skill, so that means whatever happens when it actually does the work, it knows how to handle that special case.
The problem is um you don't have skills for everything.
Elicitation — 막히면 묻되 막히지 마라
The next step is then uh to use elicitation, which means ask the user. Ask the the human. So you might have skills as well, but then instead of you giving all the info, it's going to come to you. It's going to say "Hey, here's the thing I don't know how to handle, and what do you want me to do?"
What you don't want is you don't want the agent to be blocked. So ideally, if you implement this, what you do is you tell the agent "If you're unsure about something, make a decision, unblock yourself, but write this to a decision log." So then the human can review the decision log afterwards and reverse decisions if it needs to.
Chat은 1차원 — work tree를 선형으로 짓누른다
If you imagine this work, this tree, being 10 times bigger, 100 times bigger, um you don't want this in a chat. You don't want to open up a chat and then it's infinitely long. You have to answer 50 questions. You wouldn't know what to answer.
Chat is one-dimensional. It's a very low bandwidth interface, and it tries to collapse this work tree into a single sort of linear thing.
High-bandwidth artifacts — 문서, tabular review
I think humans and agents should collaborate in high bandwidth artifacts. I think they need to work in things that are maybe typically persistent, um and they will look different industry to industry, vertical to vertical, depending on what task you're solving.
An example from us is um a document. That's like a durable interface where it makes sense to collaborate. That's how you collaborate with your co-workers. You can highlight clause three and it will only change clause three. You can add comments. You can tag your agents. You can tag your collaborators. You can hand off parts of the document to special agents.
Tabular review — agent가 flag만 띄우고 인간이 빠르게 판단
Another example is our tabular review, which is essentially I ask it to do um the contract review that I talked about, and it's going to say, "Okay, let me spin up a tabular review, which is like a known primitive that our users know."
It's going to say, "I'm going to review all the contracts, and I'm going to just flag a few items for you that I want your take on." And then I can go in there and I can see very quickly where the problems are. So it's high control. I it's very effective for me to instill judgment. And I can also very quickly get an idea for what the agent has actually done.
Chat은 input으로는 좋다 — main mode가 아닐 뿐
To be clear, chat boxes as input is great. I think you get a lot it's extremely flexible, allows you to do a lot of stuff, but you don't want chat to be your main mode of collaboration with a complex agent.
Agents are not humans — 인간 언어에 가두지 말 것
The good thing about this is language is essentially the universal interface. It's what people use to communicate. You can do everything with the voice. Um but agents aren't humans.
I was limited because I can only use language. I wish that I could just draw up an org chart and they could interact with it and they could use it, but I can't because I'm a human. Uh I'm limited by language, but agents are not humans, and so we should not constrain them to human language.