Event Log Architecture #

The event log is Litmus's unified record of everything that happens during testing — sessions, instrument connections, measurements, diagnostics, and more. It replaces earlier text-log + streaming patterns with a single, typed event stream.

Why a Unified Event Stream #

Previous approaches split test data across multiple systems: a journal for text logs, streaming destinations for live data, and Parquet files for results. This made it hard to reconstruct what happened during a test, correlate instrument reads with measurements, or monitor tests in real time.

The event log unifies all of this into one ordered stream. Every significant action emits a typed event. Subscribers process events for their own purposes — writing Parquet files, updating the UI, streaming to Grafana.

Event Hierarchy #

All events inherit from EventBase, which provides:

FieldTypeDescription
idUUIDUnique event identifier
occurred_atdatetimeWhen the event happened
received_atdatetimeWhen EventLog.emit() processed it
session_idUUIDWhich session this event belongs to
run_idUUIDWhich test run (if applicable)

Each event type adds a Literal event_type field used as a discriminator for deserialization.

Event Categories #

Litmus defines events across 11 categories.

Session (2 events) #

EventType StringDescription
SessionStartedsession.startedSession-wide metadata: station, operator, fixture
SessionEndedsession.endedSession outcome

Run (3 events) #

EventType StringDescription
RunStartedrun.startedFull run context: DUT, product, operator, config snapshots
RunEndedrun.endedRun outcome
RunMaterializedrun.materializedParquet file written; ready for downstream consumers (defined but not currently in the Event discriminated union).

Slot (2 events — multi-DUT) #

EventType StringDescription
SlotStartedslot.startedA multi-DUT slot subprocess begins
SlotCompletedslot.completedA multi-DUT slot subprocess finishes

Sync (2 events — multi-DUT) #

EventType StringDescription
SyncArrivedsync.arrivedA worker reached a synchronization barrier
SyncReleasesync.releaseAll workers arrived; barrier released

Route (2 events — signal switching) #

EventType StringDescription
RouteClosedroute.closedSwitch route closed (signal connected)
RouteOpenedroute.openedSwitch route opened (signal disconnected)

Fixture (5 events) #

EventType StringDescription
InstrumentConnectedfixture.instrument_connectedInstrument identified and connected
IdentityVerifiedfixture.identity_verifiedExpected vs actual instrument identity
CalibrationWarningfixture.calibration_warningCalibration due date approaching
DutScannedfixture.dut_scannedDUT serial barcode scanned
InstrumentDisconnectedfixture.instrument_disconnectedInstrument released during teardown

Test (5 events) #

EventType StringDescription
StepsDiscoveredtest.steps_discoveredFull list of collected test items
StepStartedtest.step_startedA test step begins execution
MeasurementRecordedtest.measurementA single measurement with limits and outcome
RecordEventtest.recordA key/value record from harness.record()
StepEndedtest.step_endedA test step finishes

Instrument (3 events) #

EventType StringDescription
InstrumentReadinstrument.readDriver read via proxy (scalars inline, arrays as claim-check URIs)
InstrumentSetinstrument.setDriver set via proxy
InstrumentConfigureinstrument.configureDriver configure via proxy

Diagnostic (2 events) #

EventType StringDescription
DiagnosticWarningdiagnostic.warningNon-fatal warning
DiagnosticErrordiagnostic.errorError condition

Stream (3 events) #

EventType StringDescription
StreamStartedstream.startedA data stream begins
StreamEndedstream.endedA data stream ends
StreamFrameIndexstream.frame_indexFrame count update

Dialog (2 events) #

EventType StringDescription
DialogOpeneddialog.openedOperator dialog shown, execution paused
DialogRespondeddialog.respondedOperator responded to dialog

Event Timeline #

A typical test session emits events in this order:

SessionStarted          # Session-wide metadata (station, operator)
├── RunStarted          # Run context (DUT, product, config snapshots)
├── InstrumentConnected # One per instrument role
├── IdentityVerified    # Optional identity check
├── StepsDiscovered     # Full list of collected test items
├── StepStarted         # First test step
│   ├── InstrumentRead  # Instrument interactions
│   ├── InstrumentSet
│   └── MeasurementRecorded
├── StepEnded
├── StepStarted         # Next test step...
│   └── ...
├── StepEnded
├── RunEnded            # All steps complete
├── InstrumentDisconnected
└── SessionEnded        # Cleanup complete

Push Model: emit() → internal materializers #

The EventLog class manages the write path:

  1. emit(event) stamps received_at, buffers the event for batched Arrow IPC writes
  2. Internal materializers are notified immediately — the runs daemon ingests events into an AccumulatorPool, and materialize_run_to_parquet() writes the canonical per-run parquet on RunEnded. The litmus export CLI replay path drives the same materializer post-hoc against stored events.
  3. IPC flush happens every 50 events (configurable), writing a batch to the Arrow IPC file

The EventSubscriber base class is internal scaffolding for these materializers — not a public extension protocol. Adding a new format requires editing litmus.data.exporters, not a third-party plugin.

Storage #

Events are stored as Arrow IPC files, date-partitioned:

<data_dir>/events/
├── 2026-03-10/
│   ├── {session_id}.arrow
│   └── {session_id}_ref/     # Large data (waveforms, images)
└── 2026-03-11/
    └── ...

Each .arrow file contains index columns (id, event_type, occurred_at, received_at, session_id, run_id) plus a json column with the full serialized event for lossless replay.

Dual-Write Pattern #

The EventStore layer adds queryability on top of EventLog:

  1. Arrow IPC file — crash-safe append-only storage
  2. In-memory DuckDB via Flight — immediate SQL queryability

On each flush, batches are pushed to the DuckDB daemon via Arrow Flight do_put. Queries go through Flight do_get with SQL, so you get read-after-write consistency.

HARD contract — additive evolution only #

The event WAL is a HARD contract alongside the parquet artifact — events are written append-only and consumers can replay arbitrary history, so the wire format has to evolve additively. Until the 1.0 cut, the following invariants hold and the project must not break them:

  • New event types only. Every release may add event types. The existing event-type discriminator strings (e.g. "test.step_started", "test.measurement", "run.started") and their Literal tags are stable across 0.x.
  • New optional fields only. Existing event types may grow new fields; they must be optional (have a default) so older events (replayed from disk or read from older daemons) still validate against the current schema. Required fields are frozen for 0.x.
  • No type changes on existing fields.
  • event_number monotonicity is part of the contract: insert-order monotonic per-daemon, used as the watcher cursor for live subscribers.
  • JSON column preserves the full serialized event for lossless replay, regardless of which index columns the daemon's DuckDB schema happens to project. Consumers that need the full payload read from JSON and don't depend on the index column set.

Breaking event-shape changes (renaming, removing, type-narrowing required fields) defer to the 1.0 cut.

See also #

Same topic, other quadrants:

Sibling concepts:

  • Event sourcing — why the platform is event-sourced rather than mutation-based
  • Three stores — how EventStore fits with ChannelStore and ParquetBackend
  • Sessions — the observation window the event log keys events by