AgentOps for Real: Evals, Tracing, and Regression Tests for AI Agents
In this demo-heavy session, we’ll turn the Shadow AI reality into visibility without killing innovation by connecting your governance foundation: safe sandboxes, policy guardrails, approved connectors, and an AI registry. You will be equipped with the operational discipline that makes agents production-grade: continuous evaluation, telemetry, tracing, and regression testing.
You’ll see a practical AgentOps loop you can plug into real delivery:
• Build golden test sets (happy paths, edge cases, adversarial prompts) and score runs to catch regressions before users do
• Instrument end-to-end tracing across model calls and tool actions so you can replay the decision path. Not just the final answer
• Add quality gates (drift detection, cost/latency thresholds, tool-call correctness) that fail fast instead of failing loudly
• Incorporate safety checks and red-teaming concepts to surface risky behaviors early
• Establish an incident response playbook for agents: triage, rollback, containment, and learning loops
You’ll leave with a clean, repeatable operating model. So, when your agents go deeper, you stay in control.
Speaker note:
Happy to present multiple sessions, former St Louis resident.