Why teams switch
A lot of production agent debugging comes down to one question: what changed? Run diff gives teams a direct way to answer it by comparing structure, timing, tool calls, and outputs between two runs.
Pinpoint divergence fast
Compare control flow, tool usage, prompts, and outputs to find the exact moment two runs stopped behaving the same way.
Validate releases with evidence
Use run diff before and after deployments to confirm that intended changes shipped without introducing unexpected behavior changes.
Support regression workflows
Pair run diff with evaluations and trace search to turn broken behavior into repeatable regression tests for future releases.
Foxhound vs manual run comparison
| Capability | Foxhound | Manual comparison |
|---|---|---|
| Change detection | Highlights divergence across execution flow, tools, prompts, and outputs. | Requires developers to compare logs, outputs, and traces by hand. |
| Release confidence | Makes before-and-after behavior easier to validate during rollout. | Teams often rely on spot checks and user feedback. |
| Regression workflow | Fits naturally with evaluations and production incident analysis. | Comparison is ad hoc and hard to operationalize repeatedly. |
Frequently asked questions
What is an AI agent run diff?
It is a side-by-side comparison of two executions so teams can see where control flow, prompts, tools, outputs, or timing changed.
Why is run diff useful in production?
Because many issues only become obvious when you compare a known-good run to a broken one. Diff makes that investigation much faster.
Can run diff help catch regressions after releases?
Yes. It is especially useful for validating prompt, tool, and orchestration changes across versions and deployments.