Debug failures via MCP #

The MCP tools turn an AI assistant into an investigation partner that can pull run data, channel waveforms, and event timelines without you leaving the chat. This recipe is the diagnostic workflow — "the run failed, why?"

Prerequisites #

MCP server registered with your AI client
A failing run in the data dir (this recipe assumes you already know which one)

The investigative toolkit #

Tool	What it surfaces
`litmus_runs(action="get", run_id=...)`	Run-level summary — outcome, station, part, started / ended timestamps
`litmus_steps(run_id=..., action="list")`	Every step the run executed, in order, with outcome + measurement count
`litmus_steps(run_id=..., action="tree")`	Same data as a step_path hierarchy (better for cluster / parametrize layouts)
`litmus_events(session_id=..., event_type=..., role=..., since=..., limit=...)`	Events around the failure — dialogs, instrument connects, errors
`litmus_sessions()`	List of sessions; useful to map a run back to the session it ran in
`litmus_channels(channel_id=..., session_id=..., last_n=..., max_points=...)`	Time-series channel data — supply rails, temperatures, anything logged via `context.observe()`
`litmus_open(type="run", id=...)`	Returns a browser URL to the operator UI's Results detail — fallback when you need to see the rendered view

Recipe — "Why did this run fail?" #

1. Get the lay of the land #

Show me run a4f8b201.

Assistant calls litmus_runs(action="get", run_id="a4f8b201") and reports outcome, station, part, started time. If outcome != failed, redirect — Litmus distinguishes failed (measurement crossed a limit or assertion failed), errored (exception during the step), terminated (operator or harness graceful stop with cleanup), and aborted (the run ended without a proper close — the rig may be in an unknown state). Each implies a different next step.

2. Find the step that flipped #

Which step failed?

litmus_steps(run_id="a4f8b201", action="list") returns the flat step list with outcomes. The assistant scans for the first failed or errored row. From there:

failed: a measurement crossed a limit. Drill into that step's measurements (see Recipe step 4 below).
errored: an exception was raised. The error message lives in the event log — jump to step 3.

3. Pull the events around the failure #

When the step errored, the exception is in the event log. Get the run's session id from step 1, then:

litmus_events(session_id="<session_id>", since="<step_started_at>", limit=100) returns the events in order. The assistant scans the test.step_ended and diagnostic.error events for the step's error; the event body carries the exception type and message.

When the step failed (limit violation), events are less useful than the measurement table itself — skip to step 4.

4. Inspect the measurements #

For a failed step, the measurement table is the source of truth. The MCP tool surface doesn't have a direct "fetch measurements" method, so the assistant either:

Opens the run in the operator UI via litmus_open(type="run", id="a4f8b201") (returns the URL — paste it into a browser), or
Queries the measurements directly with a parquet / DuckDB query

For a programmatic measurement diff across runs, see the Compare two runs recipe — that walks the DuckDB join you'd run.

5. Cross-check environment with channels #

If the measurement is wild but the UUT is fine, the cause is usually environmental. Get the channel ids the run logged:

Show me the supply-rail channels from session <session_id> over the last 5 minutes.

litmus_channels(channel_id="<rail_name>", session_id="<session_id>", last_n=300) returns timestamped values. The assistant inspects for brown-outs, glitches, or thermal drift coincident with the failure window. max_points thins a large series down to that many points (details) when it has too many points to return in one response.

6. Hand off to a human if needed #

When the assistant has narrowed the cause but the operator needs to verify visually:

litmus_open(type="run", id="a4f8b201") returns the /results/<run_id> URL — share it in the chat, the operator opens it.

Tips that compound #

Prefix run IDs. All run-id parameters accept the 8-char prefix Litmus uses in human-readable contexts. No need to copy/paste the full UUID.
Phase filter on metrics. litmus_metrics excludes development runs by default. Pass phase="production" to be explicit, or phase="all" to include development noise when you want to see everything.
Channel queries return raw rows by default. Setting max_points thins a large waveform down to that many points — useful when it has too many points to return in one response; skip it when you want every raw point.