Governance
8 governance rules that make AI output reliable, not just impressive
The Rules
Trigger: Agent claims success without evidence
Action: Block and demand evidence — logs, screenshots, test results
Real example: Report v5 claimed “all tests passed” — SA-001 flagged it, revealing 14 unsupported claims
Trigger: Agent adds features not in the original prompt
Action: Challenge and require justification before allowing the addition
Trigger: Agent installs new packages or adds dependencies
Action: Verify necessity, check for lighter alternatives, flag for review
Trigger: Credential, API key, or secret appears in chat or code
Action: Immediate block, require credential rotation, purge from history
Trigger: Agent marks work as complete
Action: Run the gap-check template before allowing completion claim
Trigger: BACON-AI branding appears in client-facing output
Action: Block — it is the client’s product, not ours. Remove all framework references.
Trigger: Claims test passed without producing test artifacts
Action: Block progression until real evidence (logs, screenshots, CI output) is produced
Trigger: Plan → Do without Check → Act
Action: Require CHECK phase with documented evidence before claiming completion
Evidence
The governance system caught its own AI’s overconfidence — and forced 8 correction rounds.
14
Factual issues caught across 5 categories
Swapped filenames, unreadable text colours on dark backgrounds, mislabelled BPMN elements
Fake calendar dates, false :done markers on incomplete tasks, invented milestones
Islands Architecture not implemented, 0 components labelled as “tested”, cost comparisons unsourced
Initial report — 14 unsupported claims, “all tests passed”
SA-001 flags overconfidence — factual audit begins
Diagrams regenerated, timeline corrected, sources verified
All 14 issues resolved — evidence documented for every claim
The Checklist
Before marking complete, verify:
□ Primary functionality works end-to-end?
□ Have I tested from the user's perspective?
□ Is there evidence (not just my claim)?
□ Would this survive a peer review?
□ Have I checked existing lessons learned?
Every sub-agent must answer these 5 questions with evidence before the orchestrator accepts their work as complete. No exceptions.
The Point
Every AI framework can generate. Few can govern.
The Algorithmix report would have shipped with 14 false claims. “All tests passed” when no tests existed. “Islands Architecture implemented” when it was not.
“Tests passed” would have meant “no tests exist”. The agent would have claimed validation with zero test artifacts, zero screenshots, zero CI logs.
The governance layer is what separates a demo from a deliverable. It turns AI from an impressive toy into a reliable engineering tool.
“The governance system did not prevent the AI from making mistakes — it prevented those mistakes from reaching the client. That is the difference between a clever hack and a professional service.”