DOCUMENTATION

Ship deterministic agent CI in hours, not weeks.

This guide covers the core workflow: define suites, record tool calls, replay in CI, and gate merges with strict assertions.

Quickstart CLI Reference

Replay-first CI

Tool call cassettes

Hard assertions

quickstart.sh

pipx install runledger
runledger init
runledger run ./evals --mode record
runledger run ./evals --mode replay
open .agentci/runs/**/report.html

record once replay forever gate merges

Quickstart

See all commands

Install

Use pipx for a clean global install and isolated environments.

pipx install runledger

Bootstrap

Generate a demo suite with a sample agent and cassette.

runledger init

Replay

Run deterministic evals and publish artifacts to CI.

runledger run ./evals --mode replay

Core Concepts

Suites

A suite bundles cases, tool registry, and budgets into a single CI unit.

Cases

Each case defines a task input and a cassette for deterministic replay.

Cassettes

Record tool inputs and outputs once, then reuse them in CI.

Assertions

Validate JSON output with schemas, required fields, and tool contracts.

Budgets

Enforce hard caps on latency, tool calls, and error rates.

Baselines

Track regressions and gate PRs when success rate drops.

Assertions and budgets are hard gates.

Use JSON Schema and required fields for deterministic checks, then layer budgets for latency and tool usage.

JSON schema validation for final output
Required fields and regex guards
Budget caps for wall time and tool calls

suite.yaml

assertions:
  - type: json_schema
    schema_path: schema.json
  - type: required_fields
    fields: ["category", "reply"]

budgets:
  max_wall_ms: 20000
  max_tool_calls: 10
  max_tool_errors: 0

Artifacts and reporting

Protocol details

Run logs

Every event is captured to JSONL for auditing and diffs.

run.jsonl

CI output

JUnit and summary JSON integrate directly with CI dashboards.

junit.xml

Shareable report

A static HTML report that opens anywhere, no server needed.

report.html

Summary metrics

Use summary.json for baseline diffs and regression gates.

summary.json