Proof — not a mockup

The loop runs.
On a real repo. In your terminal.

Every snippet on this page is read directly from the working proof-of-concept in this repository. The same pnpm proof:loop command that produced this output runs on your machine in under a second.

61/61 tests passingRED → GREEN verified0.1s end-to-end5 packages47-event fixture

Claude Code · MCP · skills — not REST, not curl

Claude Code closes the loop.
One skill. No endpoints.

Every edge of the observe → propose → verify → gate cycle is closed by an MCP tool or a Claude Code skill — not a REST call. An engineer types /cuit-loop and the full cycle runs inside the conversation in 0.18s.

Observe
MCP · cuit__get_session
Whatever feedback source you already use — your recorder, Jam, LogRocket, Sentry Replay, RUM — normalizes into one SessionEvent[] representation: pointer events, semantic selectors, console/errors, window.__cuitDebug snapshots. No curl. Claude Code calls the tool directly.
Propose
skill · /cuit-loop
Typing /cuit-loop in Claude Code invokes @cuit/spec-gen to turn the session into a Playwright/Vitest spec grounded in @cuit/harness primitives — no pixel coords, no waitForTimeout.
Verify
skill · /cuit-loop
The same /cuit-loop skill runs the spec against the unfixed code → RED (bug deterministically reproduced), then against the fix → GREEN. Claude Code sees both signals without leaving the conversation.
Gate
MCP · mcp__gitnexus__detect_changes
The generated spec is committed as a permanent CI regression gate. The GitNexus MCP tool tracks the change graph so future model calls know which specs are load-bearing before they touch related code.

Claude Codeone skill invocation — the full cycle, no REST endpoints

/cuit-loop — Claude Code conversationbash

# Claude Code conversation — no curl, no POST /v1/sessions

> /cuit-loop

  [cuit-loop] Reading session via cuit__get_session (source: any adapter)...
  [cuit-loop] 27 events normalized (6 pointer, 20 state-snapshot)

  [cuit-loop] Generating spec with @cuit/spec-gen...
  [cuit-loop] wrote out/agent-loop.spec.ts (18 lines, 6 primitives)

  [cuit-loop] Running spec — EXPECT RED...
  [cuit-loop]   segments[0].x: expected=100, actual=25 ✗  ← bug reproduced

  [cuit-loop] Hypothesis: collision short-circuit in onPointerMove blocks move
  [cuit-loop] Applying fix...

  [cuit-loop] Re-running spec — EXPECT GREEN...
  [cuit-loop]   segments[0].x: expected=100, actual=100 ✓  ← fix verified

  [cuit-loop] Opening PR with spec as regression gate.

  LOOP CLOSED in 0.18s

observe → propose → verify → gate · 0.18s end-to-end

The closed loop, end-to-end

stdout of pnpm proof:agent-loop — copied verbatim from agent-loop-output.log. Real recorder. Real spec-gen. Real RED → fix → GREEN.

closed0.18s end-to-end

[step 1/5] Capture — recorder runs while a developer reproduces the bug
              -> recorder captured 27 events (6 pointer, 20 snapshot)
              -> wrote out/recorded-session.json
[step 2/5] Generate — spec-gen produces a deterministic Playwright/Vitest spec
              -> 6 primitives: goto -> setClock -> getStateSnapshot -> dispatchDrag -> getStateSnapshot -> assertStateEquals
              -> wrote out/agent-loop.spec.ts (18 lines)
[step 3/5] Verify - run the spec against the buggy app
              -> segments[0].x: expected=100, actual=25
              -> RED [bug reproduced - this is the success state]
[step 4/5] Decide - agent reads RED output, identifies the fix
              [agent] observation: segments[0].x stayed at 25 (expected 100)
              [agent] hypothesis : the collision short-circuit in onPointerMove blocked the move
              [agent] action     : enable FIX_SEGMENT_COLLISION=1 (in the SaaS this is a code-change PR; in the PoC it is a flag)
[step 5/5] Verify GREEN - re-run the same spec against the fixed app
              -> segments[0].x: expected=100, actual=100
              -> GREEN [fix verified - regression locked in]

AGENT LOOP CLOSED - capture -> generate -> RED -> agent-fix -> GREEN in 0.18s

browser sidefirst-party — no vendor account

recorder.tstypescript

// Browser side — install once. Drop the Chrome extension OR import
// @cuit/recorder directly. Same module, same JSON shape, same downstream.

import { Recorder, cuitDebugProvider } from '@cuit/recorder';

const recorder = new Recorder({
  sessionId: 'rec-001',
  vendor: 'cuit',
  snapshotProvider: cuitDebugProvider,  // reads window.__cuitDebug.getState()
});

recorder.start();
// ... developer reproduces the bug (drag a segment, click a row, etc.) ...
recorder.stop();

const session = recorder.export();
// session is plain JSON. No vendor account. No API key.
// Pass it to Claude Code / Codex with the @cuit/spec-gen import:
//   "Use @cuit/spec-gen to convert this into a Playwright spec, run it,
//    confirm RED on the unfixed code, propose the fix."

Claude Code sidepaste into Claude Code — or just type /cuit-loop

agent-prompt.mdmarkdown

# Copy this into Claude Code / Codex / Cursor:

I just captured a session reproducing a UI bug. The JSON is attached.

1. Run `@cuit/spec-gen` on the events to produce a Playwright/Vitest spec.
2. Run the spec against the current code. I expect it to fail RED — that
   means the bug is reproduced.
3. Read the failure (expected vs actual). Identify the smallest code change
   that flips the assertion to pass.
4. Apply the fix. Re-run the spec. Confirm GREEN.
5. Open a PR. The same spec becomes the regression gate.

# Why this works: the recorder gave you a deterministic input.
# The harness gives you a deterministic execution model.
# You now have a closed loop — observe, propose, verify — without
# any pixel coordinates, screenshots, or waitForTimeout sleeps.

Two ways to close the loop today.

→Type /cuit-loop in Claude Code. The skill wires whatever feedback source you have, spec-gen, and the harness into one conversation turn. Observe → propose → verify → gate without leaving your editor.
→Run the demo loop locally. Clone the repo, install, run pnpm proof:agent-loop. The same stdout above prints on your machine in under a second — no API key, no account.

Full proof artifacts →·Extension source on GitHub ↗

run itbash

# Try the recorder against the bundled demo
pnpm install
pnpm proof:agent-loop        # recorder -> spec-gen -> RED -> fix -> GREEN

# Or load the Chrome extension on any page that exposes window.__cuitDebug:
#   chrome://extensions  ->  Developer mode  ->  Load unpacked
#   select: proof-of-concept/packages/recorder-extension/

expected: AGENT LOOP CLOSED in 0.18s · exit 0

For UI developers — show me the code

Four things that flake on your team today.
How this fixes each one.

Every snippet below is verbatim from the working proof-of-concept — not a mockup. Clone the repo and run pnpm proof:loop to reproduce the output yourself.

problem

Your Playwright tests use page.mouse.click(412, 89)

Pixel coordinates depend on viewport, CSS, browser engine, and last-frame layout. Change padding by 4px and your suite flakes.

todayflaky / manual / brittle

before.tstypescript

await page.mouse.move(412, 89);
await page.mouse.down();
await page.mouse.move(512, 89, { steps: 10 });
await page.mouse.up();

with CUITdeterministic / generated / permanent

after.tstypescript

dispatchDrag('seg-0', 100, 0);
// Targets by stable name. No pixels.
// Same call works in Chromium, Firefox, WebKit.

problem

You sprinkle waitForTimeout(500) because rAF timing is unreliable

Real animations advance on requestAnimationFrame; pixel snapshots and CSS transitions land at the next frame. Sleeps fight non-determinism with prayer.

todayflaky / manual / brittle

before.tstypescript

await page.waitForTimeout(500);
const box = await el.boundingBox();
// hope the animation finished by now

with CUITdeterministic / generated / permanent

after.tstypescript

setClock(1716800000000);
// Deterministic clock. Every rAF callback fires.
// Now state is exactly where the spec says it is.

problem

You hand-translate a Jam replay into a Playwright spec

2–6 hours per bug. Most teams skip it. You end up with no regression net, and the same bug reopens in 3 weeks.

todayflaky / manual / brittle

before.tstypescript

// 12-minute Jam replay
// → engineer watches it twice
// → engineer guesses selectors
// → engineer writes 80-line spec
// → engineer realizes selectors broke last week

with CUITdeterministic / generated / permanent

after.tstypescript

pnpm cuit gen jam:sess-2014 --apply
# Reads the session, emits a spec.ts
# grounded in your harness primitives.
# PR opens. You review the diff.

problem

The same bug keeps reopening every release

You shipped a fix but no regression test. Six weeks later someone refactors the collision code and re-introduces the same bug.

todayflaky / manual / brittle

before.tstypescript

// One-shot fix. No spec.
// Six weeks later: "user reports drag broken"
// File reopened. Eng-days re-spent.

with CUITdeterministic / generated / permanent

after.tstypescript

# Generated spec lives in tests/regressions/
# CI runs it on every PR.
# Re-introduce the bug → CI blocks merge.
# The 6-Reopened-bugs loop is over.

The proof loop, end-to-end

Real artifacts from proof-of-concept/ — copied verbatim. Run pnpm proof:loop to regenerate.

61 tests passing0.1s end-to-end

STEP 1Input: recorded Jam session

A user files a bug via Jam. The connector pulls 47 normalized events.

fixtures/segment-collision.jsonjson

{
  "sessionId": "jam-sess-2014",
  "vendor": "jam",
  "url": "http://localhost:5173/",
  "browser": { "name": "chrome", "version": "125.0.0.0", "os": "macOS 14.4" },
  "events": [
    { "seq": 0, "type": "nav", "url": "http://localhost:5173/", "ts": 0 },
    { "seq": 1, "type": "state-snapshot", "path": "segments[0].x", "value": 0 },
    { "seq": 2, "type": "state-snapshot", "path": "segments[1].x", "value": 200 },
    { "seq": 3, "type": "state-snapshot", "path": "segments.length", "value": 2 },
    /* …42 more events: pointerdown / pointermove×N / pointerup / final state-snapshot… */
    { "seq": 45, "type": "pointer", "phase": "up",   "targetName": "seg-0", "x": 240, "y": 32, "pointerId": 1 },
    { "seq": 46, "type": "state-snapshot", "path": "segments[0].x", "value": 0 }
  ]
}

STEP 2Output: generated Playwright/Vitest spec

18 lines. 6 harness primitives. No pixel coords, no waitForTimeout, no hand-crafted selectors.

out/issue-2014.spec.tstypescript

import { describe, expect, test } from 'vitest';
import {
  dispatchDrag,
  getStateSnapshot,
  setClock,
} from '@cuit/harness';

describe('issue-2014 — segment 0 drag must not collide-noop', () => {
  test('drags segment 0 right by 100px and asserts state moves', () => {
    setClock(1716800000000);

    dispatchDrag('seg-0', 100, 0);

    const snapshot = getStateSnapshot();
    expect(snapshot['segments[0].x']).toEqual(100);
  });
});

STEP 3 — REDRun the spec against the buggy code

The spec reproduces the failure deterministically. RED is the success state — the bug is now caught by an automated test.

App.tsx (buggy)typescript

// packages/demo-app/src/App.tsx — bug version

// Inside the pointermove handler:
setSegments((prev) => {
  const next = prev.map((s) => ({ ...s }));
  const moving = next[idx];
  const proposedX = drag.originX + dx;

  // BUG: the collision check is too eager — it blocks
  // every move that would even momentarily overlap.
  const collides = next.some((other, j) => {
    if (j === idx) return false;
    return proposedX < other.x + other.width &&
           other.x < proposedX + moving.width;
  });
  if (collides) return prev;          // <-- silently no-op'd

  moving.x = proposedX;
  return next;
});

STEP 4 — GREENApply the fix, re-run the same spec

Same spec, same harness, fixed code. PASS. The spec is now a permanent CI gate.

App.tsx (fixed)typescript

// packages/demo-app/src/App.tsx — fix version

// Inside the pointermove handler:
setSegments((prev) => {
  const next = prev.map((s) => ({ ...s }));
  const moving = next[idx];
  const proposedX = drag.originX + dx;

  // FIX: drop the over-eager collision short-circuit.
  // Free positioning; downstream layout handles overlap.

  moving.x = proposedX;
  return next;
});

Actual stdout of pnpm proof:loop— copied verbatim from proof-output.log

[1/6] Loading recorded session events from fixtures/segment-collision.json
       -> 47 events normalized into SessionEvent[]
[2/6] Generating spec from session events
       -> wrote out/issue-2014-segment-0-drag-must-not-collide-noop.spec.ts (18 lines, 6 primitives used)
[3/6] Running spec against demo-app (bug-present mode)
       -> FAIL - segment 0 right edge stayed at x=25 (expected 100)
       -> RED - bug reproduced deterministically [SUCCESS]
[4/6] Applying canonical fix (FIX_SEGMENT_COLLISION=1)
       -> re-rendering demo-app with fix flag
[5/6] Running spec against demo-app (fixed mode)
       -> PASS - segment 0 right edge moved to x=100
       -> GREEN - fix verified, regression locked in [SUCCESS]
[6/6] Locking the spec into CI as a gate
       -> wrote .github/workflows/proof-regression.yml

LOOP COMPLETE - RED to GREEN in 0.1s

Try it. Don't take our word.

Four shell commands. Node 20. About thirty seconds of install. The same six lines of stdout you see above will print on your machine — RED at step 3, GREEN at step 5, exit 0.

→Read the source — every primitive in the spec is a real exported function from @cuit/harness.
→Inspect the tests — 61 unit tests across 5 packages, TDD-first, all green.
→Wire it into your repo — the same primitives work in any React/Vue app that exposes a state-snapshot hook.

run on your machinebash

# 1. Clone the repo
git clone git@github.com:speechlabinc/complex-ui-tester.git
cd complex-ui-tester/proof-of-concept

# 2. Install (Node 20 + pnpm)
pnpm install

# 3. Run the loop end-to-end
pnpm proof:loop

# 4. Run the package tests (61 tests across 5 packages)
pnpm test

expected output: RED to GREEN in 0.1s · exit 0

What this proves — and what it doesn't

Proves

✓A recorded session can be normalized into a stable SessionEvent[].
✓Those events can be mapped to a Playwright/Vitest spec that calls only validated harness primitives.
✓The spec deterministically reproduces a real bug (RED) against the unfixed code.
✓The same spec passes (GREEN) against the fixed code with no spec edits.
✓The architecture in docs/02, 04, and 10 is implementable — not just designed.

Does NOT prove (yet)

○LLM-driven spec generation. The PoC generator is rule-based — a drop-in substitute for the 3-pass LLM pipeline in docs/04.
○Pre-built third-party connectors for Jam / LogRocket / Sentry are designed but not shipped — for now we ship a first-party Chrome recorder that produces the same SessionEvent[] shape. See packages/recorder-extension/.
○Multi-tenant prompt context. Single-tenant in the PoC.
○SOC 2. We're in observation for Type II per docs/05; not yet audited.
○Production deployment of the SaaS infra. Designed in docs/03; not built.

The PoC's job is to prove the loop architecture is real — the rest is engineering, not invention.

The loop runs.On a real repo. In your terminal.

Claude Code closes the loop.One skill. No endpoints.

The closed loop, end-to-end

Two ways to close the loop today.

Four things that flake on your team today.How this fixes each one.

Your Playwright tests use page.mouse.click(412, 89)

You sprinkle waitForTimeout(500) because rAF timing is unreliable

You hand-translate a Jam replay into a Playwright spec

The same bug keeps reopening every release

The proof loop, end-to-end

Try it. Don't take our word.

What this proves — and what it doesn't

Proves

Does NOT prove (yet)

The loop runs.
On a real repo. In your terminal.

Claude Code closes the loop.
One skill. No endpoints.

Four things that flake on your team today.
How this fixes each one.