AgentOps for Real: Evals, Tracing, and Regression Tests for AI Agents

Intermediate
Putting an agent into production without evals and tracing is like stepping into a cave with a dim headlamp and no map. You’ll keep moving… right up until something goes wrong and nobody can explain what happened, why, or whether the last “small prompt tweak” made things worse.

In this demo-heavy session, we’ll turn the Shadow AI reality into visibility without killing innovation by connecting your governance foundation: safe sandboxes, policy guardrails, approved connectors, and an AI registry. You will be equipped with the operational discipline that makes agents production-grade: continuous evaluation, telemetry, tracing, and regression testing.

You’ll see a practical AgentOps loop you can plug into real delivery:
• Build golden test sets (happy paths, edge cases, adversarial prompts) and score runs to catch regressions before users do
• Instrument end-to-end tracing across model calls and tool actions so you can replay the decision path. Not just the final answer
• Add quality gates (drift detection, cost/latency thresholds, tool-call correctness) that fail fast instead of failing loudly
• Incorporate safety checks and red-teaming concepts to surface risky behaviors early
• Establish an incident response playbook for agents: triage, rollback, containment, and learning loops

You’ll leave with a clean, repeatable operating model. So, when your agents go deeper, you stay in control.

Speaker note:
Happy to present multiple sessions, former St Louis resident.
Session prerequisites and resources may be available. Sign in to access