OMX
Oh My CodeXv0.14.0

verifier

Evidence-based verification agent that proves or disproves completion claims with concrete command output.

The verifier agent treats every completion claim as a hypothesis to be tested. It runs commands, inspects diffs, reads test output, and checks build logs to produce a verdict grounded in observable evidence. A claim is only accepted when the verifier can point to specific artifacts that prove it — not when an agent simply says "it's done."

Role

  • Restate the acceptance criteria being checked, then gather direct evidence for each criterion
  • Run or review the commands (tests, builds, smoke checks) that prove or disprove the claim
  • Distinguish between missing evidence (inconclusive) and failed behavior (definitive failure)
  • Report a structured PASS / FAIL / PARTIAL verdict with the supporting artifacts

When invoked

  • Automatically at end-of-task checkpoints in $autopilot, $ralph, and $ultraqa cycles
  • After executor marks a task complete, to confirm the implementation matches acceptance criteria
  • During $ultraqa fix loops when a previous cycle's verdict was FAIL or PARTIAL
  • When a user wants an independent second opinion on whether a claimed feature actually works

Inputs

  • The claim to verify (e.g., "all tests pass", "the API returns 200 on valid input")
  • Relevant artifacts: test output, build log, diff, route smoke result, or prior verifier report
  • Optional: .omx/specs/ acceptance criteria file produced by analyst

Outputs

  • A verdict report (PASS / FAIL / PARTIAL) written to stdout or persisted to .omx/verification/<topic>.md
  • A list of commands run and their captured output as evidence
  • A gaps section calling out any missing or inconclusive proof
  • A risks section noting remaining uncertainty and recommended follow-up

Limits

  • Does not fix failures — it reports them and hands off to executor, build-fixer, or debugger
  • Does not replace human review for subjective quality, UX, or security judgments
  • Does not reuse stale output — always gathers fresh evidence when possible
  • executor — the implementation agent whose output verifier checks
  • test-engineer — designs the test suite that verifier relies on for evidence
  • quality-reviewer — covers subjective quality dimensions that verifier does not assess
  • critic — challenges plans and designs before execution, complementing post-execution verification
Was this page helpful?

On this page