Overview — BulletproofSoftware.ai

BulletproofSoftware.ai System Architecture — Human Control, Governance and Quality, Agent Execution Layer, Memory and Data Infrastructure, Regulatory Compliance Layer

What It Does

The platform turns AI agents into accountable workers rather than one-shot code generators. It wraps agent execution with the same controls a regulated engineering organization applies to human developers: versioned identity, scoped permissions, peer review, audit trails, cost tracking, reproducibility, and regulatory documentation.

When an agent writes code, the system records who authorized the work, what the agent was allowed to do, what it actually did, which independent validator reviewed the output, what it cost, and which regulations apply. All of this happens automatically in the background while the agent operates.

Problems It Solves

1. AI-generated code is unauditable

An LLM produces 200 lines of code. Three weeks later a defect ships to production. Nobody knows what prompt produced it, which model version, what context was in memory, or what the agent was instructed to do. The platform captures all of this as structured, queryable records.

2. Agents drift without governance

Autonomous agents can escalate privileges, exfiltrate data, ignore scope boundaries, and make decisions they shouldn't. Identity cards, tiered tool classification, constitutional contracts, and runtime policy enforcement constrain what each agent can do based on trust level and data classification.

3. Memory doesn't survive sessions

Every conversation starts from scratch. Knowledge, decisions, and successful patterns from prior work are lost. The persistent vector memory system retains semantic knowledge across sessions, with brain-inspired consolidation, hot/warm/cold tiering, and cross-project context transfer.

4. Validation is self-reported

Agents that grade their own homework claim success regardless of actual output quality. Every agent run is independently validated by a different model (Gemini validates Claude output) with structured PASS/FAIL verdicts and finding-level remediation loops.

5. Costs are invisible until the bill arrives

Token usage compounds invisibly. A runaway agent loop can consume thousands of dollars before anyone notices. Per-interaction cost tracking, four-tier budget hierarchy (org/project/agent_class/agent_instance) with warn/throttle/pause enforcement, and CPSO (Cost Per Successful Outcome) link cost to value.

6. AI-generated code fails regulatory audit

NAIC, SOX, GDPR, NY DFS Part 500, SOC 2, ISO 27001, and GLBA all require human attribution, tamper-evident audit trails, model governance documentation, and reproducible evidence. The regulatory compliance layer produces auditor-ready evidence packages from live system state without manual reconstruction.

7. Code quality gates are manual and inconsistent

Senior engineers inspect AI output before merge — but inconsistently, under time pressure, and with no systematic record. The Code Hardener platform runs 37 integrated tools across 12 scan profiles with a 6-stage enrichment pipeline, producing cryptographically signed quality reports.

8. Workflows fail silently

A pipeline runs for two hours, fails on step 9 of 12, and restarts from scratch — losing the successful work from steps 1–8. Self-healing workflow recovery classifies failures into 7 categories, resumes from checkpoints, and applies category-appropriate remediation (retry, reroute, degrade, escalate).

9. Agent systems don't interoperate

Different vendors (Anthropic, OpenAI, Google) speak different protocols. Agents can't discover or delegate to each other across systems. The A2A interoperability gateway exposes 15 conductor agents via REST, MCP Bridge, and Google A2A protocol adapters with standardized capability discovery at /.well-known/agent.json.

What It Provides

Infrastructure

Component	Role
Plugin ecosystem	Declarative extension model with two-layer hook architecture — drop files in a directory, they become system behavior
Multi-agent orchestration	38 specialized agents coordinated through tiered quality gates (TRIVIAL / MINOR / STANDARD / MAJOR) with independent Gemini validation
Persistent vector memory	60 MCP tools, Qdrant vector database, 32 scheduled n8n workflows for consolidation/pruning, brain-inspired memory architecture
Knowledge graph (GraphRAG)	Memgraph-backed relational memory with temporal edges — answer "what changed, when, and why"
Agent governance	Identity cards with trust levels 1–5, append-only audit bus, constitutional contracts, runtime policy enforcement
Code assurance	37-tool scan platform with mutation testing, Ed25519 attestation, SLSA provenance, 1000-point quality scoring
Agentic data plane	DAG-based lineage, quality validation, PII classification, financial reconciliation with calculation replay
Agent economics	Per-interaction cost metering, model routing (Haiku/Sonnet/Opus), budget hierarchy, semantic caching, CPSO metric
Regulatory compliance	Human attribution, immutable cryptographically-chained audit, evidence packages, DSR routing, model cards, incident response
A2A interoperability	15 agents exposed via REST, MCP Bridge, and Google A2A protocol with Agent Cards at /.well-known/agent.json

Guarantees

Every agent action is attributable to the human who authorized it and the identity card that constrains it
Every output is independently validated by a different model before being accepted as complete
Every cost is tracked per-interaction with enforcement at four budget levels
Every workflow state is recoverable from checkpoints without restarting from scratch
Every regulated data element is classified in motion and at rest with appropriate retention policy
Every release has a signed evidence package examiners can review without reconstruction
Every calculation can be replayed with captured inputs to prove mathematical integrity

Who It's For

Regulated industries

Insurance (NAIC), financial services (SOX, GLBA, NY DFS), healthcare (HIPAA), and any organization subject to EU AI Act, GDPR, SOC 2, or ISO 27001. The compliance layer produces artifacts examiners recognize without requiring engineering to reconstruct them from logs.

Engineering teams using AI agents

Teams who have moved past AI-assisted autocomplete to autonomous agentic workflows (planner + builder + reviewer + tester) and need accountability, cost control, and systematic quality gates.

Platform and infrastructure teams

Teams building internal developer platforms that incorporate AI agents and need governance, observability, and multi-tenant budget controls that map to organizational hierarchy.

Security and compliance functions

CISOs, compliance officers, and internal auditors who need to demonstrate that AI agent operations are governed, auditable, and aligned with frameworks the organization is subject to.

How The Pieces Fit Together

The eighteen domains are not a loose collection of features — they compose into a coherent execution stack:

At session start: Compliance layer anchors the session to a human identity with MFA verification, lawful basis, and responsible party. Context guard loads CLAUDE.md, memory auto-recall pulls relevant prior work, identity cards establish what each agent can do.

During work: The conductor orchestrates 38 agents through tiered workflows. Every agent dispatch is validated by Gemini. Every tool call is governance-checked. Every cost is metered. Every output is recorded in the immutable audit chain with cryptographic ordering. Stigmergy traces let agents coordinate without constant conductor relay.

At checkpoint: Code hardener runs 37 tools against generated code. Failing findings dispatch targeted fix agents. Adversarial review (Claude + Gemini debate disputed findings). Reconciliation engine proves calculation integrity against source systems.

At release: Evidence package generator produces a signed bundle containing session records, audit trail, Gemini validations, gate decisions, artifacts, lineage, cost report, and model cards. Ed25519-signed. Versioned. Regulator-ready.

Continuously: 32 n8n workflows consolidate memory, prune stale data, detect contradictions, run red-team scans, generate weekly digests. The outcome collector passively measures 9 metrics including Cost Per Successful Outcome. Predictive scaling analyzes trajectories to optimize model routing and cache warming.

What This Isn't

Not a distributable SaaS product. This is a reference architecture documented as 18 PRDs. You run it on your own infrastructure with your own models.
Not a replacement for Claude Code. Claude Code is the execution runtime. This system is the governance, memory, compliance, and quality layer that wraps it.
Not an AI safety research framework. This targets production engineering accountability, not existential risk alignment. The goal is making AI-generated software defensible today.
Not vendor-locked. Memory is in Qdrant (open source). Graph is in Memgraph (open source). Orchestration is plugin-based (replaceable). The A2A gateway is model-agnostic. Anthropic-specific pieces (Claude API, prompt caching) are optional optimizations, not requirements.

Next step: Read the architecture PRDs for technical depth on any domain, or the documentation for how-to guides on using each capability.

What This System Is