CI FOR TOOL-USING AGENTS

RunLedger
CI for tool-using agents.

Control the chaos of probabilistic software.

RunLedger acts as a flight recorder and gatekeeper for your agents. Deterministic evals, replayable tool calls, and schema enforcement in one CLI.

View Live Demo

INPUT STREAM INTERCEPTION LAYER PRODUCTION

AGENT

RunLedger

LLM API

regressions.diff

FAIL

BASELINE (main)

CURRENT (PR #124)

1 {

2 "tool": "stripe_refund",

3 "args": {

4 "id": "ch_1Mc...",

5 "reason": "fraud"

6 }

7 }

1 {

2 "tool": "stripe_refund",

3 "args": {

4 "id": "ch_1Mc...",

MISSING ARGUMENT

6 }

7 }

Schema Error

Field 'reason' is required in StripeRefundSchema.

Regression Blocking

Catch drift before it costs you.

LLMs are probabilistic, but your tool interfaces are strict. RunLedger enforces Pydantic schemas and compares execution traces against known baselines.

Trace comparison

Diff the full execution trace (messages, tool calls, outputs).
Schema Enforcement

Fail the build if an agent hallucinates a parameter.

The Developer Infrastructure for Agents

Built for teams shipping to production, not just prototyping.

Live Run

240ms

Replay

0ms

CACHED

Deterministic Replay

Network calls (Stripe, Postgres) are recorded once. In CI, we replay the recording. Tests are instant and deterministic.

@record_replay

[INFO] Agent initialized

[TOOL] Calling search_users...

[DEBUG] Payload size: 2kb

[OK] Tool output received

[INFO] Reasoning step 2...

[TOOL] Calling update_db...

[OK] Commit successful

Flight Recorder

We capture agent messages, tool calls, tool outputs, and structured logs to build a complete trace of execution.

runledger record

Latency Budget 420ms / 500ms

Token Budget FAIL

Cost & Latency Gates

Set budgets for token usage and latency. If a prompt change causes your agent to loop or ramble, the test fails.

budget: 500ms

Drop-in Middleware

RunLedger works with any agent framework. Wrap your agent as a subprocess runner—minimal changes.

LangChain / LangGraph
LlamaIndex
AutoGen / CrewAI
Raw Python / Node.js

Client

PROXY

RunLedger

LLM API

Ship agents with confidence.

Get Started Free Read the Manifesto

RunLedger CI for tool-using agents.