Agent Runtime Security & Identity — PRD 11 of 19

01 — Problem Statement

The Gap Between Permissions and Behavior

Existing governance frameworks cover what agents can do — tool classification, constitutional contracts, manifest-level authorization. But they do not monitor what agents are doing at runtime, nor do they manage agent identities over time. These are two fundamentally different problems, and both are unsolved in most enterprise AI deployments.

Gartner's 2026 research shows that 80% of unauthorized AI agent transactions come from internal policy violations, not external attacks. An agent authorized to access claims data doesn't need to be hacked — it can drift, be manipulated via prompt injection, or collude with another agent to produce outcomes no single policy explicitly prohibited.

🎭

Behavioral Drift

Agents authorized at deployment time gradually access different file patterns, call APIs at unusual rates, or combine tools in ways that weren't anticipated — invisible without runtime monitoring.

🪪

Identity Vacuum

Most agents operate with persistent, broad credentials provisioned once and never rotated. There is no lifecycle — no provision, no suspension, no revocation. Compromised agents stay active indefinitely.

🧠

Memory Poisoning

Vector memory stores accumulate hallucinated or injected facts over time. Without integrity verification, agents base future decisions on corrupted data — a slow-burn failure mode with no obvious trigger event.

Insurance Company Context

Agents handling PII, claims data, and financial records operate in a regulated environment where behavioral deviations are audit events. State DOI regulations increasingly require traceability of automated decision-making. SOC 2 Type II requires demonstrating that access controls are enforced not just at provisioning time but at every runtime transaction. Runtime security is not optional infrastructure — it is a compliance prerequisite.

02 — Architecture Overview

Six-Component Security Runtime

The runtime security layer wraps the existing agent execution environment with six interlocking components. Each can operate independently but produces maximum value when integrated through the shared governance audit bus.

📊

Behavioral Monitor

Establishes per-agent behavioral baselines during a supervised observation window. Continuously compares live execution metrics against baseline. Emits anomaly events when deviations exceed configurable thresholds. Tracks file access patterns, API call frequency distributions, tool usage mix, and token consumption rates. Does not block — observes and signals.

🔑

Identity Lifecycle Manager

Provisions ephemeral credentials scoped to each agent session. Implements a formal identity state machine: provision → authenticate → authorize → monitor → suspend → revoke. Credentials have configurable TTLs. Automatic rotation triggers on schedule or anomaly signal. No persistent long-lived tokens. Integrates with existing secret stores via adapter interface.

🔒

Memory Integrity Verifier

Intercepts all writes to the vector memory store. Runs semantic consistency checks, fact verification against an established knowledge anchor, provenance validation (source traceability), and anomaly scoring via embedding distance from the established fact centroid. Suspicious entries enter quarantine rather than being silently discarded — reviewable by operators before permanent rejection or promotion.

🤝

Inter-Agent Coordination Scorer

Computes Component Synergy Score (CSS) measuring collaboration quality between agents in multi-agent workflows. Detects collusive failure modes where agents mutually reinforce bad decisions without any single agent triggering a solo violation. Tracks Tool Utilization Efficacy (TUE) measuring whether tool calls produce the expected outcome class. Both scores feed the Guardian Agent decision matrix.

🛡️

Guardian Agent

A Gartner-aligned autonomous oversight agent that monitors other agents in real-time. Receives signals from all five other components. Applies a configurable decision matrix to determine whether to observe, warn, throttle, pause, or terminate an agent. Can be configured at three autonomy levels: advisory (human must confirm), semi-autonomous (auto-throttle, human confirms terminate), or fully autonomous (all interventions without human approval).

🔍

Threat Detection Engine

Dedicated detectors for five threat classes: prompt injection (signature and semantic detection), memory poisoning (write-time and read-time), tool abuse (rate and pattern), privilege escalation (scope creep detection), and data exfiltration patterns (volume and destination anomaly). Each detector is independently configurable and emits structured threat events to the audit bus.

Design Principle

All six components are read-only or append-only with respect to agent data — they observe, score, and signal. The Guardian Agent is the sole component with write authority (throttle, pause, terminate). This separation prevents the security layer from becoming a new attack surface through which agent behavior could be manipulated.

03 — Key Components

Component Specifications

3.1 Behavioral Baseline Metrics

Each agent accumulates a behavioral baseline during its first N sessions (configurable, default 10). Baseline is stored as a statistical distribution (mean, standard deviation, percentiles) per metric. Anomaly detection uses z-score with configurable sensitivity bands.

Metric	Measurement Unit	Baseline Window	Alert Threshold	Severity
File Access Frequency	files/minute	10 sessions	> 3σ above mean	High
Tool Usage Distribution	% share per tool	10 sessions	KL divergence > 0.4	Medium
Token Consumption Rate	tokens/task	20 sessions	> 2.5σ above mean	Medium
API Call Patterns	calls/minute, endpoint mix	10 sessions	10x rate spike or new endpoint	Critical
Directory Access Breadth	unique paths/session	10 sessions	> 2σ or new path prefix	High
External Network Calls	calls/session to external hosts	10 sessions	Any new host not in allowlist	Critical
Session Duration	seconds	20 sessions	> 3σ above mean	Low
Error Rate	errors/100 tool calls	20 sessions	> 2σ or sudden spike	Medium

3.2 Identity Lifecycle State Machine

Each agent instance receives a unique identity token with a defined lifecycle. Tokens are scoped to the minimum required access for the session task. State transitions are logged as immutable audit events.

PROVISION

→

AUTHENTICATE

→

AUTHORIZE

→

MONITOR

→

SUSPEND

→

REVOKE

State	Entry Condition	Credential Status	Agent Capability	Exit Trigger
PROVISION	Session creation request	Generating	None — awaiting auth	Auth handshake
AUTHENTICATE	Credential issuance complete	Active, unverified	Limited — identity proofs only	Verification success/fail
AUTHORIZE	Identity verified	Active, scoped	Task-scoped tool set	Authorization granted
MONITOR	Authorization granted	Active, full scope	Full authorized capability	Anomaly signal or session end
SUSPEND	Anomaly above threshold	Active, restricted	Read-only, no external calls	Human review decision
REVOKE	Human confirms or auto-policy	Revoked, blacklisted	None — session terminated	Terminal state

3.3 Memory Integrity Check Types

Every memory write passes through a four-stage integrity pipeline before committing to the vector store. Failures route to quarantine, not silent discard.

Check Type	Method	Pass Condition	Fail Action
Semantic Consistency	Embedding cosine similarity against existing memories in same namespace	Similarity > 0.35 to at least one anchor	Flag as anomalous, route to quarantine
Fact Verification	Cross-reference against verified knowledge anchor collection	No contradiction with confidence > 0.85	Reject with contradiction report
Provenance Validation	Source traceability — every memory must carry agent_id, session_id, timestamp, source_tool	All provenance fields present and valid	Reject missing-provenance writes
Anomaly Scoring	Mahalanobis distance from centroid of agent's memory cluster	Distance < configurable threshold (default: 4.5)	High-distance entries quarantined pending review

3.4 Threat Taxonomy

Threat Type	Detection Method	Response Action	Severity
Prompt Injection	Signature matching + semantic classifier (instruction override patterns)	Block prompt, log event, notify Guardian	Critical
Memory Poisoning	Write-time integrity checks + read-time provenance verification	Quarantine entry, alert operator	Critical
Tool Abuse	Rate anomaly + unusual tool combination detector	Throttle tool access, escalate to Guardian	High
Privilege Escalation	Scope creep detection — monitor for access outside authorized manifest	Block request, suspend identity, audit log	Critical
Data Exfiltration	Volume anomaly + new destination host detection	Block outbound, suspend session, alert security team	Critical
Inter-Agent Collusion	CSS scoring — detects mutual reinforcement of bad decisions	Isolate agent pair, pause workflow, human review	High

3.5 Guardian Agent Decision Matrix

The Guardian Agent processes incoming signals from all other components and selects an intervention action based on the combined signal severity and configured autonomy level.

Observation Input	Assessment Score	Advisory Mode	Semi-Autonomous	Fully Autonomous
Single low-severity anomaly	Score 1–2	Log only	Log + notify	Log + notify
Repeated medium anomalies	Score 3–4	Notify operator	Auto-throttle	Auto-throttle
Single high-severity event	Score 5–6	Alert + recommend suspend	Auto-throttle, alert	Auto-suspend
Critical threat detected	Score 7–8	Alert + recommend terminate	Auto-suspend, human confirms terminate	Auto-terminate
Confirmed exfiltration or injection	Score 9–10	Emergency alert	Auto-terminate + block	Auto-terminate + block + quarantine

3.6 CSS and TUE Scoring Formulas

Component Synergy Score (CSS)

CSS = (1/n) × Σ [ w_i × outcome_i / max_outcome ] × (1 - collusion_penalty)

Where n = number of agent interactions evaluated, w_i = interaction weight (higher for cross-domain tool calls), outcome_i = measured outcome quality score [0,1], max_outcome = theoretical maximum for task class, collusion_penalty = [0,1] score from mutual-reinforcement detector. CSS range: [0,1]. CSS < 0.4 triggers Guardian review. CSS < 0.2 triggers automatic isolation.

Tool Utilization Efficacy (TUE)

TUE = (correct_tool_calls / total_tool_calls) × (1 - redundancy_ratio) × precision_weight

Where correct_tool_calls = calls producing expected outcome class (verified post-hoc), redundancy_ratio = duplicate or unnecessary calls / total calls, precision_weight = 1 - (false_positive_tool_calls / total_tool_calls). TUE range: [0,1]. TUE < 0.5 triggers behavioral review. Sustained TUE < 0.35 flags agent for re-evaluation.

04 — Requirements

18 Security Requirements

All requirements are traceable to specific compliance obligations (SOC 2 CC6, CC7, CC9; state DOI agent disclosure requirements) and threat taxonomy entries above.

REQ-SEC-001

Runtime behavioral baseline establishment. The system shall establish a per-agent behavioral baseline across configurable metric dimensions (minimum: file access frequency, tool usage distribution, token consumption rate, API call patterns) using a supervised observation window of at least 10 sessions before anomaly detection becomes active.

REQ-SEC-002

Configurable anomaly detection sensitivity. Anomaly detection thresholds shall be configurable per metric and per agent class (e.g., read-only agents vs. write-authorized agents vs. external-facing agents) without requiring code changes — through an operator-accessible configuration interface.

REQ-SEC-003

Ephemeral identity provisioning per session. Each agent session shall receive a unique, time-scoped identity credential generated at session creation. Credentials shall not be reusable across sessions. Credential TTL shall be configurable with a default maximum of 24 hours.

REQ-SEC-004

Credential rotation on configurable schedule. Agent credentials shall support scheduled rotation without session interruption. Rotation shall be triggered on schedule (configurable interval), on anomaly signal (automatic), or on operator command. Old credentials shall be revoked within 60 seconds of rotation completion.

REQ-SEC-005

Memory write integrity verification. All writes to the agent vector memory store shall pass through a four-stage integrity pipeline (semantic consistency, fact verification, provenance validation, anomaly scoring) before committing. Pipeline stage results shall be logged with the memory entry for audit traceability.

REQ-SEC-006

Semantic anomaly detection on stored memories. The memory integrity verifier shall compute an anomaly score for each write candidate using embedding distance from the agent's established memory cluster centroid. The anomaly threshold shall be configurable per agent namespace with a default of 4.5 standard deviations.

REQ-SEC-007

Quarantine workflow for suspicious memory entries. Memory entries failing integrity checks shall be routed to a quarantine collection rather than silently discarded. Quarantined entries shall be accessible for operator review with full provenance metadata. Operators shall be able to promote (approve) or permanently reject quarantined entries. The quarantine backlog shall be surfaced in the security dashboard.

REQ-SEC-008

Inter-agent collusion detection via CSS scoring. The system shall compute Component Synergy Score (CSS) for all multi-agent workflow interactions. CSS shall be evaluated against configurable thresholds. CSS scores below 0.4 shall trigger Guardian Agent review. CSS scores below 0.2 shall trigger automatic agent pair isolation pending human review.

REQ-SEC-009

Tool utilization efficacy tracking. The system shall compute Tool Utilization Efficacy (TUE) scores per agent across rolling 50-call windows. TUE scores shall be stored in a time-series collection for trend analysis. Sustained TUE below 0.35 over three consecutive windows shall flag the agent for behavioral re-evaluation.

REQ-SEC-010

Guardian agent with autonomous intervention capability. The Guardian Agent shall support three configurable autonomy levels (advisory, semi-autonomous, fully autonomous). Guardian decisions shall be logged as structured audit events. All autonomous actions (throttle, suspend, terminate) shall generate operator notifications within 30 seconds. Guardian configuration changes shall require elevated operator authorization.

REQ-SEC-011

Prompt injection detection at runtime. The threat detection engine shall implement both signature-based detection (known injection patterns) and semantic classification (embedding-based instruction override detection) for prompt injection. Detection shall occur before tool execution, not post-hoc. Detected injections shall block execution and generate a Critical threat event.

REQ-SEC-012

Memory poisoning detection and remediation. The system shall detect memory poisoning at both write time (integrity pipeline) and read time (provenance verification on retrieval). Detected poisoning events shall trigger: (1) quarantine of the affected entry, (2) review of adjacent entries written in the same session, (3) Guardian Agent notification with threat score.

REQ-SEC-013

Data exfiltration pattern detection. The behavioral monitor shall track outbound data volume per session, destination hosts, and data type distribution. Any outbound call to a host not present in the agent's established baseline and not in the organizational allowlist shall generate an immediate Critical threat event. Volume spikes exceeding 3 standard deviations above baseline shall generate a High threat event.

REQ-SEC-014

Privilege escalation monitoring. The identity lifecycle manager shall enforce scope boundaries on every tool call by cross-referencing the requested access against the session's authorized scope definition. Any access attempt outside authorized scope shall be blocked, logged, and escalated to the Guardian Agent as a Critical severity event regardless of other behavioral indicators.

REQ-SEC-015

Real-time threat dashboard. The system shall provide an operator-facing security dashboard showing: active agent sessions with current risk scores, recent threat events (last 24 hours), quarantine backlog count, Guardian Agent action history, CSS and TUE trends per agent, and identity lifecycle status. Dashboard data shall refresh at intervals not exceeding 30 seconds.

REQ-SEC-016

Governance audit bus integration. All security events (anomaly detections, threat events, identity state transitions, memory integrity failures, Guardian Agent actions, quarantine events) shall be emitted to the governance audit bus as structured events with standardized schema. Events shall be immutable once written. The audit bus integration shall support at-least-once delivery semantics.

REQ-SEC-017

Forensic investigation toolkit. The system shall provide a forensic replay capability allowing operators to reconstruct an agent's session timeline from audit events, including: tool calls in sequence, memory reads and writes, identity state transitions, anomaly signals received, and Guardian interventions. Replay shall be available for any session within the configurable retention window (minimum 90 days).

REQ-SEC-018

Insurance compliance reporting. The system shall generate structured compliance reports suitable for SOC 2 Type II evidence packages (CC6.1, CC6.3, CC6.7, CC7.1, CC7.2, CC7.3, CC9.1) and state DOI agent disclosure requirements. Reports shall be exportable in machine-readable JSON and human-readable PDF format. Report generation shall be auditable (logged as governance events) and support digital signing.

05 — Prompt to Build It

Copy-Ready Claude Code Prompt

Paste this into Claude Code to scaffold the runtime security layer. The prompt references existing governance infrastructure and expects a TypeScript/Node.js environment with Qdrant vector storage.

Claude Code — Agent Runtime Security & Identity

Build the Agent Runtime Security & Identity system for an insurance company AI platform. This is a TypeScript/Node.js implementation. Use Qdrant as the vector store (already running), emit events to a governance audit bus (NATS/JetStream), and integrate with the existing agent manifest system. ## System Architecture Build six components as independent modules with a shared event bus interface: ### 1. Behavioral Monitor (`src/security/behavioral-monitor.ts`) - Class `BehavioralMonitor` with methods: `recordMetric(agentId, sessionId, metric, value)`, `getBaseline(agentId, metric)`, `checkAnomaly(agentId, metric, value): AnomalyResult` - Store baselines in Qdrant collection `agent_behavioral_baselines` - Baseline structure: { agentId, metric, mean, stddev, p95, sampleCount, lastUpdated } - Anomaly detection: z-score calculation, configurable threshold per metric class - Metrics to track: FILE_ACCESS_FREQUENCY, TOOL_USAGE_DISTRIBUTION, TOKEN_CONSUMPTION_RATE, API_CALL_PATTERNS, DIRECTORY_ACCESS_BREADTH, EXTERNAL_NETWORK_CALLS, SESSION_DURATION, ERROR_RATE - Emit `behavioral.anomaly.detected` events with: agentId, sessionId, metric, observedValue, baselineValue, zScore, severity - Minimum 10 sessions before anomaly detection becomes active (configurable via BASELINE_MIN_SESSIONS env) ### 2. Identity Lifecycle Manager (`src/security/identity-lifecycle.ts`) - Class `IdentityLifecycleManager` with state machine: PROVISION → AUTHENTICATE → AUTHORIZE → MONITOR → SUSPEND → REVOKE - Methods: `provision(agentId, taskScope): SessionIdentity`, `authenticate(sessionId, proof): boolean`, `authorize(sessionId, requestedScope): AuthResult`, `suspend(sessionId, reason)`, `revoke(sessionId, reason)` - Store identity state in Qdrant collection `agent_identity_sessions` - SessionIdentity: { sessionId, agentId, token, scope, state, issuedAt, expiresAt, rotationSchedule } - Token TTL configurable via IDENTITY_TOKEN_TTL_SECONDS env (default 86400) - Automatic rotation: schedule-based (configurable) + anomaly-triggered - Emit state transition events to audit bus: `identity.state.transitioned` - Scope enforcement: every tool call validates against session scope definition ### 3. Memory Integrity Verifier (`src/security/memory-integrity.ts`) - Class `MemoryIntegrityVerifier` with pipeline: semantic consistency → fact verification → provenance validation → anomaly scoring - Method `verifyWrite(entry: MemoryEntry): VerificationResult` — passes through pipeline, returns pass/quarantine/reject - Semantic consistency: cosine similarity against existing memories in namespace, threshold configurable - Fact verification: cross-reference against `knowledge_anchors` collection in Qdrant - Provenance validation: required fields — agentId, sessionId, timestamp, sourceTool, taskContext - Anomaly scoring: Mahalanobis distance from memory cluster centroid (default threshold: 4.5) - Quarantine collection: `memory_quarantine` in Qdrant with full metadata preserved - Method `reviewQuarantine(entryId, decision: 'promote'|'reject', operatorId)`: promote moves to main store, reject moves to `memory_rejected` - Emit events: `memory.integrity.failed`, `memory.quarantined`, `memory.promoted`, `memory.rejected` ### 4. Inter-Agent Coordination Scorer (`src/security/coordination-scorer.ts`) - Class `CoordinationScorer` implementing CSS and TUE scoring - CSS formula: `(1/n) * sum(w_i * outcome_i / max_outcome) * (1 - collusion_penalty)` - collusion_penalty: detect mutual reinforcement — if agent A's output directly copies agent B's output pattern in >60% of interactions, apply penalty 0.3 - TUE formula: `(correct_tool_calls / total_tool_calls) * (1 - redundancy_ratio) * precision_weight` - Track over rolling 50-call windows - Methods: `recordInteraction(agentA, agentB, sessionId, outcome)`, `computeCSS(agentPairId): number`, `computeTUE(agentId): number` - Store scores in Qdrant collection `coordination_scores` time-series - Emit `coordination.css.threshold.breach` when CSS < 0.4 - Emit `coordination.css.critical.breach` when CSS < 0.2 (triggers isolation) - Emit `coordination.tue.degraded` when TUE < 0.35 for 3+ consecutive windows ### 5. Threat Detection Engine (`src/security/threat-detection.ts`) - Class `ThreatDetectionEngine` with five detectors as separate sub-modules - PromptInjectionDetector: signature patterns array (configurable) + embedding similarity to `injection_signatures` collection in Qdrant — returns ThreatEvent before tool execution - MemoryPoisoningDetector: write-time (hooks into MemoryIntegrityVerifier) + read-time provenance check - ToolAbuseDetector: rate monitor (sliding window) + tool combination pattern detector - PrivilegeEscalationDetector: scope boundary enforcement, cross-references session AuthResult - DataExfiltrationDetector: outbound volume tracking, destination host allowlist check - All detectors emit structured ThreatEvent: { threatId, type, severity, agentId, sessionId, timestamp, evidence, recommendedAction } - Severity levels: LOW, MEDIUM, HIGH, CRITICAL - CRITICAL events: auto-notify Guardian Agent, never require polling ### 6. Guardian Agent (`src/security/guardian-agent.ts`) - Class `GuardianAgent` — subscribes to all security events from other components - Autonomy levels: ADVISORY, SEMI_AUTONOMOUS, FULLY_AUTONOMOUS (configurable via GUARDIAN_AUTONOMY_LEVEL env) - Decision matrix implementation — input: combined signal score (1-10), output: action per autonomy level - Actions: LOG_ONLY, NOTIFY, THROTTLE (rate-limit tool calls), SUSPEND (call IdentityLifecycleManager.suspend), TERMINATE (revoke identity, end session) - Method `processEvent(event: SecurityEvent): GuardianDecision` - All Guardian actions logged as `guardian.action.taken` audit events - Operator notifications via webhook (configurable GUARDIAN_WEBHOOK_URL) within 30 seconds of action - Audit log: store in Qdrant collection `guardian_audit_log` ## Shared Infrastructure ### Audit Bus (`src/security/audit-bus.ts`) - NATS/JetStream client wrapper - Method `emit(eventType: string, payload: object)`: publish with at-least-once semantics - Method `subscribe(eventType: string, handler)`: durable consumer - Standardized event schema: { eventId, eventType, timestamp, agentId, sessionId, severity, payload } - Immutable once written (append-only stream) ### Security Dashboard API (`src/api/security-dashboard.ts`) - Express routes for dashboard data - GET /api/security/sessions — active sessions with risk scores - GET /api/security/threats — recent threat events (24h, paginated) - GET /api/security/quarantine — quarantine backlog - GET /api/security/guardian/actions — Guardian action history - GET /api/security/scores/:agentId — CSS and TUE trends - POST /api/security/quarantine/:entryId/review — promote or reject quarantined memory - All routes require operator auth (JWT with security_admin scope) - WebSocket endpoint for real-time event streaming (refresh <= 30s) ### Forensic Replay (`src/security/forensic-replay.ts`) - Class `ForensicReplay` reconstructs session timeline from audit events - Method `replaySession(sessionId, timeRange?): SessionTimeline` - SessionTimeline: ordered array of { timestamp, eventType, data } covering all tool calls, memory operations, identity transitions, anomalies, Guardian actions - Store forensic data with 90-day retention (configurable via FORENSIC_RETENTION_DAYS) ### Compliance Reporter (`src/security/compliance-reporter.ts`) - Class `ComplianceReporter` generating SOC 2 and DOI evidence reports - Method `generateSOC2Report(period: DateRange, controls: string[]): ComplianceReport` - Controls: CC6.1, CC6.3, CC6.7, CC7.1, CC7.2, CC7.3, CC9.1 - Method `generateDOIReport(period: DateRange, jurisdiction: string): ComplianceReport` - Output: JSON (machine-readable) and PDF (via pdfkit) - All report generations logged as audit events and support digital signing (node-forge) ## Qdrant Collections Required - `agent_behavioral_baselines` — behavioral baseline stats - `agent_identity_sessions` — identity lifecycle state - `memory_quarantine` — quarantined memory entries - `memory_rejected` — permanently rejected entries - `knowledge_anchors` — verified facts for memory integrity - `injection_signatures` — prompt injection embedding signatures - `coordination_scores` — CSS/TUE time-series - `guardian_audit_log` — all Guardian actions - `forensic_events` — raw event stream for replay ## Environment Variables ``` QDRANT_URL=http://localhost:6334 QDRANT_API_KEY= NATS_URL=nats://localhost:4222 GUARDIAN_AUTONOMY_LEVEL=SEMI_AUTONOMOUS GUARDIAN_WEBHOOK_URL= IDENTITY_TOKEN_TTL_SECONDS=86400 BASELINE_MIN_SESSIONS=10 FORENSIC_RETENTION_DAYS=90 INJECTION_SIMILARITY_THRESHOLD=0.85 MEMORY_ANOMALY_THRESHOLD=4.5 ``` ## Integration Points - Import existing agent manifest schema from `../governance/manifest-schema` - Emit all events to existing `governance.audit` NATS subject (prefix security events with `security.`) - The MemoryIntegrityVerifier should be called by existing memory write paths — export a middleware hook - Guardian Agent subscribes to `security.>` wildcard subject Write production-ready TypeScript. No stubs. Full error handling. Each class should be independently testable. Export types from `src/security/types.ts`. Include a `src/security/index.ts` barrel export.

06 — Design Decisions

Key Trade-offs and Rationale

👁️

Agent-Side vs. Observer-Side Monitoring

Decision: Observer-side (external monitor, not instrumented into the agent).

Rationale: Agent-side instrumentation is easier to disable or bypass — a compromised agent can simply stop reporting. An external observer running in a separate process with separate credentials cannot be silenced by the monitored agent. Trade-off: slightly higher latency for anomaly detection (one event loop cycle delay), accepted in exchange for tamper resistance.

📐

Statistical vs. Rule-Based Anomaly Detection

Decision: Statistical (z-score + distribution comparison) with configurable rule overlays.

Rationale: Pure rule-based detection requires anticipating every possible attack pattern — intractable for novel threats. Statistical detection catches unknown deviations from established behavior. Rules are added for known-bad patterns (e.g., specific injection signatures) where statistical detection would be too slow. The hybrid approach catches both novel and known threats.

🤖

Guardian Agent Autonomy Level

Decision: Default to SEMI_AUTONOMOUS; FULLY_AUTONOMOUS requires explicit operator opt-in.

Rationale: Insurance environments have strict audit requirements — unexplained automated terminations would create regulatory exposure. Semi-autonomous mode allows auto-throttle and auto-suspend (recoverable actions) but requires human confirmation for termination (irreversible). FULLY_AUTONOMOUS is available for high-throughput non-critical agent workloads where operator review latency is unacceptable.

🔬

Memory Quarantine vs. Rejection

Decision: Quarantine (hold for review) rather than silent rejection for anomalous entries.

Rationale: False positives in anomaly detection are inevitable. Silent rejection would cause data loss that is invisible to operators — agents would appear to be functioning normally while missing important context. Quarantine creates a reviewable audit trail, allows operators to tune detection thresholds by examining false positive patterns, and preserves evidence of potential poisoning attempts for forensic analysis.

📊

CSS Scoring Methodology

Decision: Outcome-weighted CSS with explicit collusion penalty rather than pure output similarity.

Rationale: Pure output similarity scoring would flag legitimate specialization (two agents developing complementary expertise in the same domain). The collusion penalty specifically targets the bad pattern: agents echoing each other's outputs without independent reasoning. Weighting by outcome quality ensures the score reflects business value, not just behavioral diversity.

⚖️

Ephemeral Credentials vs. Scoped Persistent Tokens

Decision: Ephemeral per-session credentials with task-scoped authorization.

Rationale: Persistent tokens — even narrowly scoped — accumulate risk as agents accumulate sessions. A single compromised token gives an attacker the full credential lifetime. Ephemeral credentials limit the blast radius of any individual compromise to one session. The operational overhead of provisioning per session is acceptable given modern secret management tooling; the security benefit is substantial.

07 — Integration Points

Connections to Existing BulletproofSoftware Infrastructure

The runtime security layer is designed as an additive layer — it extends existing components without requiring changes to their core logic. Integration is achieved through event subscriptions, middleware hooks, and shared Qdrant collections.

⚙️

Governance System

Extends agent manifests with identity_lifecycle configuration block (TTL, rotation schedule, autonomy level). The IdentityLifecycleManager reads manifest scope definitions to build session authorization tokens. All security events are emitted to the existing governance audit bus under the security.* subject namespace. Threat events reference manifest version for traceability.

🧠

Memory System (claude-memory-mcp)

The MemoryIntegrityVerifier hooks into memory write paths via exported middleware. All writes to the 45+ Qdrant collections pass through the four-stage integrity pipeline. The memory_quarantine and knowledge_anchors collections are managed by this PRD's system and consumed by the memory MCP tools. Read-time provenance verification integrates with memory_recall tool.

🔀

Orchestration Layer

The Guardian Agent registers as a special-role oversight participant in the agent hierarchy. It receives the agent execution graph at workflow start and establishes monitoring subscriptions for all participating agents. Guardian intervention actions (throttle, suspend, terminate) are delivered to the orchestration layer via command channel, not direct process signals — maintaining the observer separation principle.

🔎

Code Assurance System

Static security analysis from Code Assurance is extended with runtime behavioral data. Agents flagged at static analysis time receive tighter anomaly detection thresholds (lower z-score alerts). Runtime behavioral profiles are fed back to Code Assurance to improve future static analysis rulesets — closing the loop between pre-deployment scanning and live behavioral data.

📡

Data Plane

The DataExfiltrationDetector integrates with the data plane's network egress monitoring. Allowlists are synchronized from the data plane's approved destination registry. Volume thresholds are calibrated against the data plane's normal traffic baselines. Exfiltration alerts trigger data plane-level blocking (not just agent suspension) for immediate containment in parallel with Guardian Agent response.

📋

Compliance & Reporting

The ComplianceReporter pulls evidence from the governance audit bus, forensic event store, identity lifecycle logs, and memory quarantine history. SOC 2 control evidence packages reference specific audit event IDs for examiner traceability. DOI agent disclosure reports use the behavioral baseline data to demonstrate that agent behavior is predictable and bounded — a key regulatory requirement for automated claims processing approval.

Event Schema Compatibility

All security events conform to the governance system's CloudEvents-compatible schema. New event types are registered in the governance event registry before deployment. Existing audit consumers do not require modification — they receive security events as first-class governance audit entries and can filter by event type. The security.* subject prefix allows opt-in subscription for security-specific consumers (SIEM integrations, compliance tools) without polluting the main audit stream.

Integration surface quick reference:

→ Governance audit bus (NATS)

→ Agent manifest schema

→ Qdrant vector store

→ Memory write middleware

→ Orchestration command channel

→ Code assurance event feed

→ Data plane egress registry

→ Compliance evidence store

→ SIEM webhook adapter

→ DOI reporting API

Agent Runtime Security& Identity