OMX
Oh My CodeXv0.18.9

verifier

Evidence-based verification agent that proves or disproves completion claims with concrete command output.

The verifier agent treats every "it's done" as a hypothesis to be tested, not a fact. It runs the tests, inspects the diffs, reads the build logs, and lands on a verdict of PASS, FAIL, or PARTIAL based on what the output actually shows. A claim alone proves nothing. The verifier wants the specific artifacts that back it up.

When invoked

SituationHow it's triggered
End-of-task checkpoint in $autopilot, $ralph, and $ultraqa cyclesAutomatic
After executor marks a task complete, to confirm acceptance criteriaAutomatic
During $ultraqa fix loops when a previous cycle's verdict was FAIL or PARTIALAutomatic
When an independent second opinion is needed on whether a feature actually worksDirect request

Example prompts

"The executor said it's done — verify it with actual evidence"
"Confirm this feature meets all acceptance criteria"
"Tests passed, but check whether it's release-ready based on evidence"

Verification process

  1. Restate the acceptance criteria to be proven
  2. Run or review the commands (tests, build, smoke checks) that confirm each criterion
  3. Distinguish between insufficient evidence (inconclusive) and failed behavior (definitive failure)
  4. Issue a PASS / FAIL / PARTIAL verdict

The judgment criterion is what is actually evidence, not what looks plausible.

Immediate rejection conditions

In the following situations, the verifier does not lean toward approval:

  • Only speculative language is present ("should", "probably", "seems to")
  • No recent test output is available
  • A claim of "all tests pass" exists but no actual results are shown
  • A TypeScript change has no type-check evidence
  • A compiled language change has no build verification

Inputs

  • The claim to verify (e.g., "all tests pass", "the API returns 200 on valid input")
  • Relevant artifacts: test output, build log, diff, route smoke result, or prior verifier report
  • Optional: .omx/specs/ acceptance criteria file produced by analyst

Outputs

  • A verdict report (PASS / FAIL / PARTIAL) written to stdout or persisted to .omx/verification/<topic>.md
  • A list of commands run and their captured output as evidence
  • A gaps section calling out any missing or inconclusive proof
  • A risks section noting remaining uncertainty and recommended follow-up

Limits

  • Does not fix failures — reports them and hands off to executor, build-fixer, or debugger
  • Does not replace human review for subjective quality, UX, or security judgments
  • Does not reuse stale output — always gathers fresh evidence when possible
  • executor — the implementation agent whose output verifier checks
  • test-engineer — designs the test suite that verifier relies on for evidence
  • quality-reviewer — covers subjective quality dimensions that verifier does not assess
  • critic — challenges plans and designs before execution, complementing post-execution verification

On this page