Catch Breakage Before It Catches You

Plus: Terragrunt fixes Terraform pain, NVIDIA’s compact GPU rig, and the State of AI 2025.

MLOps Community

Oct 16, 2025

We’ve been running a quick pulse on what teams are focused on right now.

So far, deployments and infra are taking the lead, but there’s a quiet push for career growth and team structures too.

Add your voice before we close the poll - it only takes a few seconds.

HOT TAKE

Roleplaying Agents

Agents that can’t take action are chatbots with a costume.

Be honest, what are you building: Action or Chat?

ACTION CHAT

LAST WEEK’S TAKE

Rage against the machine

Last week we asked what’s holding you back, and it wasn’t the models.

MLOPS COMMUNITY VIRTUAL CONFERENCE

Agents in Production - MLOps x Prosus

Now in its sixth year, Agents in Production brings together practitioners from OpenAI, NVIDIA, Meta, and Google DeepMind to share what’s actually working when deploying agentic AI systems.

Join us on 18 & 19 November for practical sessions on architecture choices, coordination frameworks, and debugging tools - plus lighter moments between talks to keep the live event energy.

Check out the agenda and register here.

HIDDEN GEMS

Curated finds to help you stay ahead

Unified-memory inference on the DGX Spark shows how NVIDIA’s compact Grace Blackwell system handles large open-weight models locally, benchmarking SGLang and Ollama for prototyping, efficiency, and speculative decoding gains.

GPU-accelerated video curation pipeline built for large-scale physical AI workflows, handling segmentation, annotation, and deduplication to streamline dataset creation and management at scale.

MCP Dev Summit Europe playlist compiles technical sessions on how AI agents, servers, and tools communicate via the Model Context Protocol, covering authentication, orchestration, and implementation details from production systems.

The State of AI Report 2025 tracks a year of acceleration across research, infrastructure, and policy, from China’s rapid rise and agent-based science to the industrial-scale compute race reshaping global AI power.

💡Job of the week

Senior Backend Software Engineer- (AI Platform) // Databricks (San Francisco)

Databricks is expanding its AI Platform team to build and scale core infrastructure powering model training, serving, and AI applications. The role focuses on backend systems engineering for distributed AI workloads and developer-facing APIs.

Responsibilities:

Design and optimize scalable infrastructure for AI workloads
Develop and maintain APIs for model training and serving
Improve performance, reliability, and observability of core systems
Collaborate with ML, infra, and product teams on platform features

Requirements:

5+ years in backend or infrastructure engineering
Proficiency in Scala, Go, or Python
Experience with distributed systems and cloud-native tools
Knowledge of deployment pipelines and system observability

Find more roles on our new jobs board - and if you want to post a role, get in touch.

MLOPS COMMUNITY

Evals Aren’t Useful? Really?

When your agent starts leaking secrets or handing out free discounts, it’s already too late. The only thing standing between a stable system and chaos? Solid evals.

Building evals from zero: Start with curated test sets and treat them like unit tests - flow-by-flow coverage that catches breakage before users do.
Red-teaming with agents: Use persona-based simulations to push your own systems to failure - persistent, goal-driven attackers reveal weak guardrails fast.
Scaling evaluation like CI/CD: Move from handcrafted tests to automated pipelines and production feedback loops that evolve with real user behavior.

It’s a reminder that shipping agents safely isn’t about perfection - it’s about testing like your users are trying to break you.

Video || Spotify || Apple

How to build agents that take ACTION

In a standout session from our AI Agent Builders Summit, Alex Salazar unpacked why most agents never make it past the demo - 70% fail before reaching production due to security gaps, high costs, latency, or poor accuracy.

Evals first: Treat curated scenarios as unit tests; track tool-use accuracy and regressions before rollout.
Tools over APIs: Build intention-based tools, not raw wrappers; push logic into tools to cut LLM loops, cost, and compounding errors.
Auth that scales: Use delegated authorization (user and app scopes) for Gmail/CRM access; handle token refresh and revocation.

Building agents that act means thinking like engineers - testing rigorously, enforcing permissions, and designing for real-world complexity.

Watch the talk here

Why I Use Terragrunt Over Terraform/OpenTofu in 2025

Terraform breaking your CI/CD at 2 AM? That’s not bad luck - it’s a design flaw. A newer approach fixes the duplication, orchestration, and backend chaos that Terraform users have lived with for years.

Code reuse: Shared configs with include blocks remove the folder-copy mess across environments.
Orchestration: Dependency graphs handle deploy order automatically with a single command.
Stacks: Pattern-level reuse turns repeated setups into reusable infrastructure blueprints.

The takeaway: it’s finally possible to manage multi-environment IaC without fighting the tool itself.

Read the blog

IN PERSON EVENTS

Barcelona - October 22
San Jose - October 30
Helsinki - November 6

VIRTUAL EVENTS

Reading Group - October 23
We’ll discuss LiveMCP-101, a new benchmark that stress-tests MCP-enabled agents on real-world, multi-step tasks across web search, data analysis, and more.
Agents in Production - MLOps x Prosus - November 19
Learn how leading teams from OpenAI, NVIDIA, Meta, and Google DeepMind are turning agentic AI experiments into production systems.

MEME OF THE WEEK

ML CONFESSIONS

The early days of ML...

Found in the wild...

Share your confession here.

Discussion about this post

Ready for more?