When Your Agent Books the Flight

MLOps Community

Oct 30, 2025

Every Halloween there are stories about drugs given to trick-or-treaters. Every year I go out but come back with just candy.

Anyway, the real danger this year is in your training data - it could be spiked to make your model hallucinate. Anthropic just showed a few samples can corrupt even the biggest models.

HOT TAKE

Scaling Fatigue

The scaling race never stopped to ask if less could do more.

Is efficiency the real frontier - or just the unfunded one?

FRONTIER UNFUNDED

LAST WEEK’S TAKE

I’ll never leave you.

You’ve all set your stacks to be the AI version of Annie Wilkes in Misery.

HIDDEN GEMS

Curated finds to help you stay ahead

Steerable multi-agent research workflows combine planning, search, and visualization agents with real-time human steering, enabling automated enterprise analytics and benchmarking through the open-source EDR framework.

Scaling laws for RL training are explored through the new ScaleRL paper, showing how frontier labs predict learning trajectories, optimize compute, and benchmark algorithms like CISPO and GSPO for LLM reinforcement learning.

High-performance GPU communication is demonstrated through NVIDIA’s HotI 2025 tutorial, showing practical methods for optimizing multi-GPU data exchange with NCCL, NVSHMEM, and other distributed-training libraries.

Meta-tools in Claude’s Skills mode are broken down through examples showing how context injection, routing, and runtime control shape structured agent workflows and modular reasoning.

💡Job of the week

Senior AI Product Engineer, Backend // Arize (Remote)

Arize builds infrastructure for monitoring and evaluating AI and ML models in production. The Senior AI Product Engineer will focus on backend systems that handle large-scale data processing, analytics, and performance-critical workloads for AI observability tools.

Responsibilities

Design and implement scalable backend systems for AI observability
Develop APIs supporting ML and LLM workflows
Optimize distributed data processing and storage pipelines
Collaborate with product teams to enhance system capabilities

Requirements

5+ years in backend software engineering
Proficiency in Go, Python, or Java for distributed systems
Experience with OLAP systems or message queues like Kafka
Knowledge of AWS, GCP, or Kubernetes environments

Find more roles on our new jobs board - and if you want to post a role, get in touch.

MLOPS COMMUNITY

Best AI Hackathon Project Ever | Unicorn Mafia Wins $100,000+

Something a bit different: a hackathon story about building an AI travel agent in under a week. Building an AI agent that books a full group holiday sounds fun - until it starts handling your credit card. This hackathon team did both.

Secure agent transactions: They used browser automation and Skyvern to book real flights and hotels, with Stripe’s one-time virtual cards enabling safe, autonomous payments.
Guardrails for chaos: Multiple agents worked together through a custom orchestrator, managing everything from WhatsApp chats to voice calls without losing control.
Bleeding-edge integrations: ElevenLabs’ new dial-tone API was wired in days after release, letting the agent navigate real phone menus.

A working multi-agent travel planner with real revenue - all built in under a week.

Video || Spotify || Apple

Pretraining: Breaking Down the Modern LLM Training Pipeline

Pretraining is where LLMs learn most of what they know. Misread it and you ship brittle models, burn compute, and get weird failure modes.

Pipeline: Next-token CLM builds world knowledge, then post-training aligns behavior - ULMFiT and InstructGPT shaped this baseline.
Data at scale: Trillion-token corpora, strict curation, and Chinchilla-style data-parameter balance matter more than sheer model size.
New twists: Instruction-augmented corpora, multi-phase and continual pretraining, plus RPT recasting pretraining as RL to boost zero-shot reasoning.

Get the pretraining map right and downstream results get a lot more predictable.

Read the blog

IN PERSON EVENTS

San Jose - October 30
Helsinki - November 6
London - November 12

VIRTUAL EVENTS

Agents in Production - MLOps x Prosus - November 18
Learn how leading teams from OpenAI, NVIDIA, Meta, and Google DeepMind are turning agentic AI experiments into production systems.

MEME OF THE WEEK

ML CONFESSIONS

Chronologically Cursed

October’s meant for horror stories. I have one in my data lake.

Last October I was debugging why our recommendation model started drifting. Traced it back through six months of logs. Turns out a schema migration in April silently truncated a timestamp field - everything after 2038 wrapped to 1970. Our feature engineering pipeline didn’t crash. It just… kept going. Generated perfectly valid training data with completely scrambled temporal features.

The model learned around it. Accuracy barely budged. We served 40 million recommendations based on features that were chronologically nonsense. When I fixed it, performance got worse for two weeks while the model relearned actual time.

I told my manager we “discovered a schema optimization opportunity.” The corrupted data is still in our data lake. Every time someone wants to retrain on historical data, I have to pretend I don’t know that April through October is cursed. I should add a filter. I should document it.

I won’t.

Share your confession here.

Discussion about this post

Ready for more?