Governance — BACON-AI Case Study

ENFORCE — blocks progression

CHALLENGE — requires justification

WARN — flags for review

The Rules

Self-Annealing Governance Rules

001

Optimism Detector

Enforce

Trigger: Agent claims success without evidence

Action: Block and demand evidence — logs, screenshots, test results

Real example: Report v5 claimed “all tests passed” — SA-001 flagged it, revealing 14 unsupported claims

002

Scope Creep Guard

Challenge

Trigger: Agent adds features not in the original prompt

Action: Challenge and require justification before allowing the addition

003

Dependency Auditor

Warn

Trigger: Agent installs new packages or adds dependencies

Action: Verify necessity, check for lighter alternatives, flag for review

004

Credential Guard

Enforce

Trigger: Credential, API key, or secret appears in chat or code

Action: Immediate block, require credential rotation, purge from history

005

Gap-Check Protocol

Challenge

Trigger: Agent marks work as complete

Action: Run the gap-check template before allowing completion claim

006

Brand Guard

Enforce

Trigger: BACON-AI branding appears in client-facing output

Action: Block — it is the client’s product, not ours. Remove all framework references.

007

Test Pipeline Gate

Enforce

Trigger: Claims test passed without producing test artifacts

Action: Block progression until real evidence (logs, screenshots, CI output) is produced

008

Deming PDCA Cycle

Enforce

Trigger: Plan → Do without Check → Act

Action: Require CHECK phase with documented evidence before claiming completion

Evidence

A Real Correction: v5 → v13

The governance system caught its own AI’s overconfidence — and forced 8 correction rounds.

14

Factual issues caught across 5 categories

5

Diagram File Fixes

Swapped filenames, unreadable text colours on dark backgrounds, mislabelled BPMN elements

3

Timeline Hallucinations

Fake calendar dates, false :done markers on incomplete tasks, invented milestones

9

Unsupported Code Claims

Islands Architecture not implemented, 0 components labelled as “tested”, cost comparisons unsourced

v5

Initial report — 14 unsupported claims, “all tests passed”

v8

SA-001 flags overconfidence — factual audit begins

v10

Diagrams regenerated, timeline corrected, sources verified

v13

All 14 issues resolved — evidence documented for every claim

The Checklist

Gap-Check Template

SA-005 Gap-Check — required before marking any task complete

Before marking complete, verify:

  ▢  Primary functionality works end-to-end?
  ▢  Have I tested from the user's perspective?
  ▢  Is there evidence (not just my claim)?
  ▢  Would this survive a peer review?
  ▢  Have I checked existing lessons learned?

Every sub-agent must answer these 5 questions with evidence before the orchestrator accepts their work as complete. No exceptions.

The Point

Why This Matters

Every AI framework can generate. Few can govern.

001 Without it

The Algorithmix report would have shipped with 14 false claims. “All tests passed” when no tests existed. “Islands Architecture implemented” when it was not.

007 Without it

“Tests passed” would have meant “no tests exist”. The agent would have claimed validation with zero test artifacts, zero screenshots, zero CI logs.

The Difference

The governance layer is what separates a demo from a deliverable. It turns AI from an impressive toy into a reliable engineering tool.

“The governance system did not prevent the AI from making mistakes — it prevented those mistakes from reaching the client. That is the difference between a clever hack and a professional service.”

Trust, But Verify

Self-Annealing Governance Rules

Optimism Detector

Scope Creep Guard

Dependency Auditor

Credential Guard

Gap-Check Protocol

Brand Guard

Test Pipeline Gate

Deming PDCA Cycle

A Real Correction: v5 → v13

Diagram File Fixes

Timeline Hallucinations

Unsupported Code Claims

Gap-Check Template

Why This Matters