Hot takes from Coding Agents (with receipts)

What six hours of agent talks changed my mind about. Repo included.

Mar 12, 2026

Of course I’m going to say our Coding Agents conference was great, but you don’t have to take my word for it.

Two people who were in the room wrote up their own notes.

Michelle Faits focused on the practical bits she wanted to try immediately and even ran Shrivu’s codebase scorecard from the workshop on one of her own projects.

Daniel Pickem went session by session with his own engineering take and notes on what he’s planning to test at NVIDIA.

My version takes, well, a slightly different approach.

PRESENTED BY JFROG

Detect & Govern Shadow AI With JFrog

Shadow AI is emerging as one of the biggest risks in enterprise AI adoption.

Teams are pulling unvetted models, APIs, and datasets—creating blind spots that bypass existing DevSecOps governance.

JFrog’s new AI Catalog capabilities now automatically detect Shadow AI, govern model consumption, and apply policy-based controls across your entire AI pipeline.

Read the Release Blog

Six hours of talks, workshops, and hallway conversations rattled around and came out as five things I now believe that I didn’t quite believe before.

Some of them might be wrong, maybe even a little overcooked.

But I think each one was earned by something said on that stage.

If you want to poke around the ideas directly, I fed the six-hour transcript into Claude Code and asked it to extract the techniques people demonstrated on stage - things like CRISPI Planning, Adversarial Code Review, Context Window Management, and Hooks and Enforcement.

Grab whatever looks useful (the video recording is in there too): https://go.mlops.community/NL_SB_Skills_Repo

On to the spicy stuff...

We spent years building RAG pipelines. Turns out the answer was grep.

Sometimes the model just wants to read the codebase the way a developer would.

Jess from Braintrust ran exactly that comparison - vector search versus agentic search using Claude Code with grep, find, and cat.

On a Django codebase, vector search hit 60% accuracy. Agentic search hit 68%. On a TypeScript/Go codebase, they tied at 70%, but vector search burned significantly more tokens and cost more per run.

The structural issue is context. Vector search returns fragments - chunks without imports, type definitions, or the functions calling into them. Agentic search follows references until the model finds the full picture.

Jess was careful about the caveats. Small sample sizes. Cost estimates still rough. Nobody declared vector search dead.

But if you are adding a vector DB because it feels like the “correct” architecture, it might be worth running the eval first.

Where a few assumptions started to wobble.

Nobody should be reviewing code. Including you.

Most PR reviews are humans double-checking things machines already check better.

Sid from Anthropic described how their team now runs Claude across every PR in multiple passes.

The model reviews the change from different angles, then filters the output down to a small number of high-confidence issues.

That filtering turned out to matter. Once a review agent produces too many low-signal comments, engineers stop reading them entirely.

With the tuning in place, the model regularly finds bugs that would have slipped through human review. Sid also mentioned something interesting about behavior changes. The bar for humans approving PRs has gone down.

The person who directed the agent is accountable for the outcome anyway. Approve quickly, fix quickly.

There are still gaps. Architecture decisions, cross-project patterns, institutional memory about why something exists.

But line-by-line bug hunting might not be where humans add the most value anymore.

Packing out the room for the 3 hands-on workshops.

The coordination meeting is now a bug, not a feature.

If agents make building cheap, the expensive thing becomes all the discussion around it.

Scott from Kilo described how they run with one engineer owning a feature end-to-end. Not a team, one person. From initial idea through user feedback.

They call the philosophy anti-collaboration. Work with someone else when it adds value, not by default.

His argument was blunt. A lot of collaboration is a safety blanket.

Kilo has fifteen engineers shipping one to two features per week. Scott attributes most of the improvement not to better tooling but to removing the machinery around the work.

This approach obviously has limits. Fifteen engineers is not a hundred and fifty.

But many engineering org structures were designed for a world where implementation was the slow part. That assumption may no longer hold.

Fireside chat with Sid Bidasaria.

The most valuable engineer on your team next year won’t write a single line of code.

They’ll spend their time making sure agents don’t quietly derail themselves.

Sid talked about plan files as if they were the real product. Before any code gets written, Claude interviews him in plan mode about edge cases and failure modes. Those plans go into version control. The quality of the plan strongly predicts whether the build succeeds.

Later in the unconference session, Josh made a related point from the enforcement side. Instructions in CLAUDE.md often get ignored. Hooks do not. Hooks run.

They can enforce tests passing, lint clearing, context pruning, and other constraints that stop sessions from drifting into nonsense.

Someone in the skills repo discussion thread later suggested a name for the role: context janitor.

The person who knows what to keep, what to discard, and when a session needs to be restarted from scratch.

Some teams already do this deliberately. Others are still wondering why their agents collapse after an hour.

Talks, questions, sponsor booths, and the conversations in between.

Your MacBook Pro is about to become a very expensive remote control.

Because running a few agents locally is fine. Running ten quickly melts your laptop.

Zach from Warp described hitting that ceiling already. Warp’s codebase is roughly a million lines of Rust. He often runs four or five worktrees with agents operating in parallel.

His laptop is already at the ceiling.

That is one engineer running a handful of tasks.

Zach predicted that 2026 becomes the year of agent orchestration, and the pressure point is obvious. Local machines were not designed to run fleets.

Niels from Union made the same argument from the infrastructure side. Production agent systems need replay logs, caching layers, and persistent intermediate state so that a preempted compute instance does not erase hours of work.

None of that fits neatly on a laptop.

Local machines will still make sense for quick interactive work. But long-running tasks, parallel exploration, and anything that saturates RAM or battery is already drifting toward remote execution.

Sam Partee and Harrison Chase closing things out with the keynote.

Want your own set of hot takes?

We’re doing it all again in Seattle on April 14. Early bird tickets are live.

One of the real advantages of in-person events is the side conversations. I met Rob at the conference - he built Broomby, an open-source tool for running multiple coding agents at once.

He’s joining us this Friday, for a virtual lunch and learn session. Come ask him things.

CAPTION COMPETITION

Best caption wins some merch.

MLOps Community

Discussion about this post

Ready for more?