verifier
Evidence-based verification agent that proves or disproves completion claims with concrete command output.
The verifier agent treats every "it's done" as a hypothesis to be tested, not a fact. It runs the tests, inspects the diffs, reads the build logs, and lands on a verdict of PASS, FAIL, or PARTIAL based on what the output actually shows. A claim alone proves nothing. The verifier wants the specific artifacts that back it up.
When invoked
| Situation | How it's triggered |
|---|---|
End-of-task checkpoint in $autopilot, $ralph, and $ultraqa cycles | Automatic |
After executor marks a task complete, to confirm acceptance criteria | Automatic |
During $ultraqa fix loops when a previous cycle's verdict was FAIL or PARTIAL | Automatic |
| When an independent second opinion is needed on whether a feature actually works | Direct request |
Example prompts
"The executor said it's done — verify it with actual evidence"
"Confirm this feature meets all acceptance criteria"
"Tests passed, but check whether it's release-ready based on evidence"Verification process
- Restate the acceptance criteria to be proven
- Run or review the commands (tests, build, smoke checks) that confirm each criterion
- Distinguish between insufficient evidence (inconclusive) and failed behavior (definitive failure)
- Issue a PASS / FAIL / PARTIAL verdict
The judgment criterion is what is actually evidence, not what looks plausible.
Immediate rejection conditions
In the following situations, the verifier does not lean toward approval:
- Only speculative language is present ("should", "probably", "seems to")
- No recent test output is available
- A claim of "all tests pass" exists but no actual results are shown
- A TypeScript change has no type-check evidence
- A compiled language change has no build verification
Inputs
- The claim to verify (e.g., "all tests pass", "the API returns 200 on valid input")
- Relevant artifacts: test output, build log, diff, route smoke result, or prior verifier report
- Optional:
.omx/specs/acceptance criteria file produced byanalyst
Outputs
- A verdict report (PASS / FAIL / PARTIAL) written to stdout or persisted to
.omx/verification/<topic>.md - A list of commands run and their captured output as evidence
- A gaps section calling out any missing or inconclusive proof
- A risks section noting remaining uncertainty and recommended follow-up
Limits
- Does not fix failures — reports them and hands off to
executor,build-fixer, ordebugger - Does not replace human review for subjective quality, UX, or security judgments
- Does not reuse stale output — always gathers fresh evidence when possible
Related agents
- executor — the implementation agent whose output
verifierchecks - test-engineer — designs the test suite that
verifierrelies on for evidence - quality-reviewer — covers subjective quality dimensions that
verifierdoes not assess - critic — challenges plans and designs before execution, complementing post-execution verification