Test Harness Integration #

For new pytest projects, use the plugin: Litmus fixtures (context, verify, measure, pins, … — 20 in total) and Litmus markers (litmus_limits, litmus_sweeps, …) handle setup automatically. TestHarness is the imperative entry point for non-pytest runners (Robot Framework, unittest, custom harnesses) or for situations where you need explicit lifecycle control.

If you only need to record results (run → steps → measurements → finish) from an external system, use LitmusClient — it wires the session/run events and persistence for you in ~4 lines:
client = LitmusClient()
run = client.start_run(uut_serial="SN12345", station_id="bench_1")
with run.step("voltage_check") as step:
    step.measure("vcc", 3.31, unit="V", low=3.0, high=3.6)
run.finish()
Reach for TestHarness when you need its test-execution machinery: vector expansion, retry loops, spec-driven limit resolution, operator prompts, and channel writes. TestHarness is the lower-level engine; LitmusClient is the thin results path.

TestHarness (in litmus.execution.harness) wraps the same machinery the pytest plugin uses: vector expansion, retry, limit resolution, measurement logging with full traceability.

What you wire up yourself #

If you don't need the harness's execution machinery, prefer LitmusClient, which does all of the wiring below for you. The rest of this section is for when you want a TestHarness and explicit control of the run boundaries.

TestHarness writes through a RunScope. The logger only persists events to disk when it has an EventLog attached. The pytest plugin wires this up automatically; outside pytest you do it yourself:

from litmus.queries import EventStore
from litmus.data.events import RunStarted, SessionStarted
from litmus.execution.harness import TestHarness
from litmus.execution.run_scope import RunScope
 
logger = RunScope(
    uut_serial="SN12345",
    station_id="bench_1",
    test_phase="characterization",
    data_dir="data",
)
 
# Attach an EventLog so emitted events actually hit disk.
store = EventStore(_data_dir="data")
logger.event_log = store.get_event_log(logger.test_run.session_id)
 
# Emit run-open before any measurement, or the run never appears.
logger.event_log.emit(
    SessionStarted.from_station(
        session_id=logger.test_run.session_id,
        station_id=logger.test_run.station_id,
        station_hostname=logger.test_run.station_hostname,
    )
)
logger.event_log.emit(
    RunStarted(
        session_id=logger.test_run.session_id,
        run_id=logger.test_run.id,
        station_id=logger.test_run.station_id,
        station_hostname=logger.test_run.station_hostname,
        uut_serial=logger.test_run.uut.serial,
        test_phase=logger.test_run.test_phase,
    )
)
 
harness = TestHarness(logger=logger, step_name="test_output_voltage")
 
# … iterate vectors, measure, etc.
 
# Close the run: finalize() emits RunEnded only — caller is responsible for
# closing the session and the event log.
from litmus.data.events import SessionEnded
 
logger.finalize()                                                     # emits RunEnded and closes the open step
logger.event_log.emit(SessionEnded(session_id=logger.test_run.session_id))
logger.event_log.close()

finalize() emits RunEnded and closes the open step but does NOT emit SessionEnded, does NOT close the event log, and does NOT close the channel store. Leaving any of these open from a long-running process leaks file handles and the run stays open. Close the event log and emit SessionEnded before exiting (and channel_store.close() if you wired one).

Confirm the run landed:

litmus runs            # the run appears with its run_id, serial, station, outcome
litmus show <run_id>   # full run detail

Runs write under the data_dir you passed to RunScope.

RunScope takes the run-level metadata directly (uut_serial, station_id, station_name, operator_id, test_phase, part_id, data_dir, etc.). It constructs a TestRun and a RunContext (which exposes a .set(key, value) method for custom metadata) for you; you don't construct either.

A harness whose logger has no event_log still runs, but nothing is persisted — every event the harness would emit silently no-ops. Useful for unit-testing the harness loop without writing to disk; not what you want for a real run. If your data dir stays empty, this is the first thing to check.

Constructor signature #

TestHarness(
    config: Mapping[str, Any] | None = None,
    logger: RunScope | None = None,
    step_name: str = "test",
    retry: RetryConfig | None = None,
    limits: dict[str, MeasurementLimitConfig | Limit] | None = None,
    part_context: PartContext | None = None,
    instruments: dict[str, Any] | None = None,
    mock_instruments: bool = False,
    channel_store: Any | None = None,
)

Argument	Purpose
`config`	Optional dict with `vectors:` / `limits:` / `mocks:` / `retry:` keys — same shape as the sidecar YAML. This is the integrator's only way to get vector expansion outside pytest. Example: `config={"vectors": [{"vin": 4.75}, {"vin": 5.0}, {"vin": 5.25}], "retry": {"max_retries": 2}}`
`logger`	`RunScope` that owns the event log writes
`step_name`	Name attached to the step records this harness emits; defaults to `"test"` if not passed
`retry`	Explicit `RetryConfig` (overrides `config["retry"]`)
`limits`	Map of measurement name → `Limit` (or the raw `{low, high, unit}` limit entry); overrides `config["limits"]`
`part_context`	Active part spec — enables `verify(name, value)` style limit + traceability resolution
`instruments`	Dict of instrument instances; used by mock-configuration to patch return values
`mock_instruments`	Whether mocks are enabled
`channel_store`	Optional `ChannelStore` for direct time-series writes

Running vectors #

harness.vectors is the expanded list of Vector instances; iterate them inside run_vector to scope context per vector:

for vector in harness.vectors:
    with harness.run_vector(vector) as test_vector:
        # `harness.context` is now the vector-level context
        if vector.changed("temperature"):
            harness.prompt(f"Set chamber to {vector['temperature']}°C")
        psu.set_voltage(vector["vin"])
        harness.measure("output_voltage", float(dmm.measure_dc_voltage()))

run_vector is a context manager that opens / closes the vector boundary, runs the configured retry loop, and stamps every measurement inside with vector params and indices.

Convenience entry points #

For the common case — one test function executed across every vector with retries handled for you — the harness exposes two higher-level entry points:

def measure_rail(vector):
    psu.set_voltage(vector["vin"])
    return float(dmm.measure_dc_voltage())   # value goes to inferred measurement name
 
step = harness.run_all(measure_rail, step_name="output_voltage")
# step is a completed TestStep with one TestVector per harness.vectors entry,
# each carrying a Measurement with name inferred from limits.

Method	Signature	What it does
`harness.run_all(test_fn, step_name=None)`	`Callable[[Vector], Any] → TestStep`	Opens a step, iterates `harness.vectors`, runs each through `run_with_retry`. Returns the completed step.
`harness.run_with_retry(vector, test_fn)`	`(Vector, Callable[[Vector], Any]) → TestVector`	Runs `test_fn(vector)` inside `run_vector`, retrying up to `retry_config.max_retries` times. Returns the final `TestVector`.
`harness.current_vector` (property)	→ `Vector \| None`	The vector currently inside `run_vector`, or `None` when called outside a vector boundary.
`harness.retry_config` (property)	→ `RetryConfig`	The active `RetryConfig` (constructor arg, sidecar `retry:`, or the default `max_retries=0, delay=0`).

test_fn can return a single value (logged under the inferred measurement name) or yield (name, value) tuples for multiple measurements per vector. See harness.measure(...) below for the per-call form.

Recording measurements #

harness.measure(
    name="output_voltage",
    value=3.31,
    unit="V",                   # optional — defaults to limit.unit
    limit=Limit(low=3.135, high=3.465, unit="V"),   # optional — explicit override
    uut_pin="VOUT",             # optional — auto-resolved from part_context
    instrument_channel="CH1",   # optional
    fixture_connection="vout_dmm",  # optional
)

Limit resolution order (when limit= is not passed):

Per-vector limit, if the current vector was built with one
Test-level limits — the harness's limits= constructor kwarg if you passed one; otherwise the entries parsed from config["limits"]. They aren't merged: if you pass limits=, the harness ignores config["limits"] entirely. Pick one source per test.
The active part context's get_limit(name, **vector_params) — vector params are passed as condition kwargs so the right SpecBand is selected
None — measurement recorded as unchecked

Pass a Limit object (from litmus import Limit) for explicit limits. The sidecar-style dict shape ({"low": 3.0, "high": 3.6, "unit": "V"}) goes in config["limits"], not as the limit= kwarg.

Steps #

A harness writes step records via the step context manager:

with harness.step(name="warmup", description="Drive PSU to nominal"):
    psu.set_voltage(5.0)
    psu.enable_output()
    time.sleep(2.0)
 
with harness.step(name="measure"):
    for vector in harness.vectors:
        with harness.run_vector(vector):
            harness.measure("output_voltage", float(dmm.measure_dc_voltage()))

Group measurements under named steps with the step() context manager. When you call harness.measure(...) outside any step() block, the logger silently auto-creates a step named after the measurement — convenient for quick scripts, but it means every loose measurement becomes its own one-row step in the parquet. For grouped reporting, wrap related measurements in an explicit step() block. step_name (the harness's constructor arg) is the default name harness.run_all(test_fn) uses when it opens a step for you.

Operator prompts #

harness.prompt(
    message="Verify the chamber temperature is 25°C ± 1°C",
    prompt_type="confirm",
    timeout_seconds=30,
)

prompt_type is one of "confirm", "choice", or "input" (default "confirm"). No "text"; for free-form text input use "input".

Hierarchical context #

harness.context returns the active Context (vector ▸ step ▸ run, most-specific-wins). harness.run_context returns the run-level Context directly. Each child context inherits from its parent and can override locally.

To stamp stimulus values (→ parquet in_* columns), use configure(). For environmental readings (→ out_* columns), use observe():

harness.run_context.configure("operator", "jane")            # run scope
with harness.step(name="measure"):
    harness.context.configure("fixture.id", "FIX-01")        # step scope
    for vector in harness.vectors:
        with harness.run_vector(vector):
            harness.context.observe("temp_probe.temperature", 24.8)   # vector scope
            harness.measure("output_voltage", float(dmm.measure_dc_voltage()))

There is no Context.set(name, value) method — the verb pair is configure / observe. The pytest run_context fixture exposes a different object (a RunContext) which DOES have a .set() method for custom run-level metadata. Don't confuse the two: harness.run_context is a Context; the pytest run_context fixture is a RunContext.

Run-scope fields appear as columns in every parquet row this run produces. Step- and vector-scope fields appear only on the rows from that scope.

Bulk seeding (useful when you already hold the dict from somewhere else):

harness.context.set_params({"vin": 5.0, "load": 0.5})
harness.context.set_observations({"temp_probe.temperature": 24.8})

set_params / set_observations are dict-update bulk helpers: equivalent to configure(k, v) / observe(k, v) for every key with one important asymmetry — observe() routes large numeric arrays to the channel store and stashes a channel:// URI on the row, while set_observations() writes whatever you pass directly through with no channel-store routing. Use observe() for waveforms / array readings; use set_observations() for plain scalar dicts you've already assembled.

context.measure(name, value, ...) is a third option for recording. It's a thin redirect to harness.measure(...), so you can record without holding a harness reference — useful inside helper functions that already take a Context:

def log_voltage(ctx, dmm):
    ctx.measure("output_voltage", float(dmm.measure_dc_voltage()))

harness.measure(...) and context.measure(...) produce identical events; pick whichever is in scope.

Spec-driven limits #

from litmus.parts.context import PartContext
 
part_ctx = PartContext.from_file("parts/power_board.yaml", guardband_pct=10)
 
harness = TestHarness(
    logger=logger,
    step_name="characterize",
    part_context=part_ctx,
)
 
harness.measure("output_voltage", float(dmm.measure_dc_voltage()))
# Limit resolved from part YAML, guardband applied, traceability columns populated

Comparison with pytest-native #

Concern	`LitmusClient`	`TestHarness`	pytest-native
Use when	Recording results only (no vectors/retry/limits/prompts); persistence handled for you	Embedding execution machinery (vectors, retry, limits, prompts, channel writes) in a non-pytest runner	Writing new tests under pytest
Lifecycle	`start_run()` / `run.finish()`	Explicit (`step()`, `run_vector()`)	Implicit (pytest collection + hooks)
Vector expansion	Not supported	Configure via `config["vectors"]`	`@pytest.mark.parametrize` / sidecar `sweeps:`
Limit resolution	Inline `low=`/`high=` per measurement	Explicit `limits=` / `part_context=`	Fixture + marker chain (see Litmus fixtures + Litmus markers)
Trace context	Not supported	`harness.context.*`	`context` fixture
Instrument access	Caller-managed	Caller-managed	Auto-fixtures from station YAML

If you can use pytest-native, prefer it — every feature works out of the box. If you only need to record results from an external system, use LitmusClient. Reach for TestHarness when you need its execution machinery and the embedding environment leaves you no choice.