Catch Breakage Before It Catches You
Plus: Terragrunt fixes Terraform pain, NVIDIA’s compact GPU rig, and the State of AI 2025.
We’ve been running a quick pulse on what teams are focused on right now.
So far, deployments and infra are taking the lead, but there’s a quiet push for career growth and team structures too.
Add your voice before we close the poll - it only takes a few seconds.
HOT TAKE
Roleplaying Agents
Agents that can’t take action are chatbots with a costume.
Be honest, what are you building: Action or Chat?
LAST WEEK’S TAKE
Rage against the machine
Last week we asked what’s holding you back, and it wasn’t the models.
MLOPS COMMUNITY VIRTUAL CONFERENCE
Agents in Production - MLOps x Prosus
Now in its sixth year, Agents in Production brings together practitioners from OpenAI, NVIDIA, Meta, and Google DeepMind to share what’s actually working when deploying agentic AI systems.
Join us on 18 & 19 November for practical sessions on architecture choices, coordination frameworks, and debugging tools - plus lighter moments between talks to keep the live event energy.
Check out the agenda and register here.
HIDDEN GEMS
Curated finds to help you stay ahead
Unified-memory inference on the DGX Spark shows how NVIDIA’s compact Grace Blackwell system handles large open-weight models locally, benchmarking SGLang and Ollama for prototyping, efficiency, and speculative decoding gains.
GPU-accelerated video curation pipeline built for large-scale physical AI workflows, handling segmentation, annotation, and deduplication to streamline dataset creation and management at scale.
MCP Dev Summit Europe playlist compiles technical sessions on how AI agents, servers, and tools communicate via the Model Context Protocol, covering authentication, orchestration, and implementation details from production systems.
The State of AI Report 2025 tracks a year of acceleration across research, infrastructure, and policy, from China’s rapid rise and agent-based science to the industrial-scale compute race reshaping global AI power.
💡Job of the week
Senior Backend Software Engineer- (AI Platform) // Databricks (San Francisco)
Databricks is expanding its AI Platform team to build and scale core infrastructure powering model training, serving, and AI applications. The role focuses on backend systems engineering for distributed AI workloads and developer-facing APIs.
Responsibilities:
Design and optimize scalable infrastructure for AI workloads
Develop and maintain APIs for model training and serving
Improve performance, reliability, and observability of core systems
Collaborate with ML, infra, and product teams on platform features
Requirements:
5+ years in backend or infrastructure engineering
Proficiency in Scala, Go, or Python
Experience with distributed systems and cloud-native tools
Knowledge of deployment pipelines and system observability
Find more roles on our new jobs board - and if you want to post a role, get in touch.
MLOPS COMMUNITY
Evals Aren’t Useful? Really?
When your agent starts leaking secrets or handing out free discounts, it’s already too late. The only thing standing between a stable system and chaos? Solid evals.
Building evals from zero: Start with curated test sets and treat them like unit tests - flow-by-flow coverage that catches breakage before users do.
Red-teaming with agents: Use persona-based simulations to push your own systems to failure - persistent, goal-driven attackers reveal weak guardrails fast.
Scaling evaluation like CI/CD: Move from handcrafted tests to automated pipelines and production feedback loops that evolve with real user behavior.
It’s a reminder that shipping agents safely isn’t about perfection - it’s about testing like your users are trying to break you.
How to build agents that take ACTION
In a standout session from our AI Agent Builders Summit, Alex Salazar unpacked why most agents never make it past the demo - 70% fail before reaching production due to security gaps, high costs, latency, or poor accuracy.
Evals first: Treat curated scenarios as unit tests; track tool-use accuracy and regressions before rollout.
Tools over APIs: Build intention-based tools, not raw wrappers; push logic into tools to cut LLM loops, cost, and compounding errors.
Auth that scales: Use delegated authorization (user and app scopes) for Gmail/CRM access; handle token refresh and revocation.
Building agents that act means thinking like engineers - testing rigorously, enforcing permissions, and designing for real-world complexity.
Why I Use Terragrunt Over Terraform/OpenTofu in 2025
Terraform breaking your CI/CD at 2 AM? That’s not bad luck - it’s a design flaw. A newer approach fixes the duplication, orchestration, and backend chaos that Terraform users have lived with for years.
Code reuse: Shared configs with
includeblocks remove the folder-copy mess across environments.Orchestration: Dependency graphs handle deploy order automatically with a single command.
Stacks: Pattern-level reuse turns repeated setups into reusable infrastructure blueprints.
The takeaway: it’s finally possible to manage multi-environment IaC without fighting the tool itself.
IN PERSON EVENTS
VIRTUAL EVENTS
Reading Group - October 23
We’ll discuss LiveMCP-101, a new benchmark that stress-tests MCP-enabled agents on real-world, multi-step tasks across web search, data analysis, and more.Agents in Production - MLOps x Prosus - November 19
Learn how leading teams from OpenAI, NVIDIA, Meta, and Google DeepMind are turning agentic AI experiments into production systems.
MEME OF THE WEEK
ML CONFESSIONS
The early days of ML...
Found in the wild...
Share your confession here.




