RunLedger
CI for tool-using agents.
Control the chaos of probabilistic software.
RunLedger acts as a flight recorder and gatekeeper for your agents. Deterministic evals, replayable tool calls, and schema enforcement in one CLI.
Field 'reason' is required in StripeRefundSchema.
Regression Blocking
Catch drift before it costs you.
LLMs are probabilistic, but your tool interfaces are strict. RunLedger enforces Pydantic schemas and compares execution traces against known baselines.
-
Trace comparison
Diff the full execution trace (messages, tool calls, outputs).
-
Schema Enforcement
Fail the build if an agent hallucinates a parameter.
The Developer Infrastructure for Agents
Built for teams shipping to production, not just prototyping.
Deterministic Replay
Network calls (Stripe, Postgres) are recorded once. In CI, we replay the recording. Tests are instant and deterministic.
@record_replay
Flight Recorder
We capture agent messages, tool calls, tool outputs, and structured logs to build a complete trace of execution.
runledger record
Cost & Latency Gates
Set budgets for token usage and latency. If a prompt change causes your agent to loop or ramble, the test fails.
budget: 500ms
Drop-in Middleware
RunLedger works with any agent framework. Wrap your agent as a subprocess runner—minimal changes.
- LangChain / LangGraph
- LlamaIndex
- AutoGen / CrewAI
- Raw Python / Node.js