PRD 11 of 19

Agent Runtime Security
& Identity

Runtime behavioral monitoring, agentic identity lifecycle management, memory integrity verification, inter-agent coordination scoring, and guardian agent patterns — purpose-built for insurance company compliance requirements including SOC 2 and state DOI mandates.

The Gap Between Permissions and Behavior

Existing governance frameworks cover what agents can do — tool classification, constitutional contracts, manifest-level authorization. But they do not monitor what agents are doing at runtime, nor do they manage agent identities over time. These are two fundamentally different problems, and both are unsolved in most enterprise AI deployments.

Gartner's 2026 research shows that 80% of unauthorized AI agent transactions come from internal policy violations, not external attacks. An agent authorized to access claims data doesn't need to be hacked — it can drift, be manipulated via prompt injection, or collude with another agent to produce outcomes no single policy explicitly prohibited.

🎭

Behavioral Drift

Agents authorized at deployment time gradually access different file patterns, call APIs at unusual rates, or combine tools in ways that weren't anticipated — invisible without runtime monitoring.

🪪

Identity Vacuum

Most agents operate with persistent, broad credentials provisioned once and never rotated. There is no lifecycle — no provision, no suspension, no revocation. Compromised agents stay active indefinitely.

🧠

Memory Poisoning

Vector memory stores accumulate hallucinated or injected facts over time. Without integrity verification, agents base future decisions on corrupted data — a slow-burn failure mode with no obvious trigger event.

Insurance Company Context

Agents handling PII, claims data, and financial records operate in a regulated environment where behavioral deviations are audit events. State DOI regulations increasingly require traceability of automated decision-making. SOC 2 Type II requires demonstrating that access controls are enforced not just at provisioning time but at every runtime transaction. Runtime security is not optional infrastructure — it is a compliance prerequisite.

Six-Component Security Runtime

The runtime security layer wraps the existing agent execution environment with six interlocking components. Each can operate independently but produces maximum value when integrated through the shared governance audit bus.

Agent Runtime Security Architecture
📊

Behavioral Monitor

Establishes per-agent behavioral baselines during a supervised observation window. Continuously compares live execution metrics against baseline. Emits anomaly events when deviations exceed configurable thresholds. Tracks file access patterns, API call frequency distributions, tool usage mix, and token consumption rates. Does not block — observes and signals.

🔑

Identity Lifecycle Manager

Provisions ephemeral credentials scoped to each agent session. Implements a formal identity state machine: provision → authenticate → authorize → monitor → suspend → revoke. Credentials have configurable TTLs. Automatic rotation triggers on schedule or anomaly signal. No persistent long-lived tokens. Integrates with existing secret stores via adapter interface.

🔒

Memory Integrity Verifier

Intercepts all writes to the vector memory store. Runs semantic consistency checks, fact verification against an established knowledge anchor, provenance validation (source traceability), and anomaly scoring via embedding distance from the established fact centroid. Suspicious entries enter quarantine rather than being silently discarded — reviewable by operators before permanent rejection or promotion.

🤝

Inter-Agent Coordination Scorer

Computes Component Synergy Score (CSS) measuring collaboration quality between agents in multi-agent workflows. Detects collusive failure modes where agents mutually reinforce bad decisions without any single agent triggering a solo violation. Tracks Tool Utilization Efficacy (TUE) measuring whether tool calls produce the expected outcome class. Both scores feed the Guardian Agent decision matrix.

🛡️

Guardian Agent

A Gartner-aligned autonomous oversight agent that monitors other agents in real-time. Receives signals from all five other components. Applies a configurable decision matrix to determine whether to observe, warn, throttle, pause, or terminate an agent. Can be configured at three autonomy levels: advisory (human must confirm), semi-autonomous (auto-throttle, human confirms terminate), or fully autonomous (all interventions without human approval).

🔍

Threat Detection Engine

Dedicated detectors for five threat classes: prompt injection (signature and semantic detection), memory poisoning (write-time and read-time), tool abuse (rate and pattern), privilege escalation (scope creep detection), and data exfiltration patterns (volume and destination anomaly). Each detector is independently configurable and emits structured threat events to the audit bus.

Design Principle

All six components are read-only or append-only with respect to agent data — they observe, score, and signal. The Guardian Agent is the sole component with write authority (throttle, pause, terminate). This separation prevents the security layer from becoming a new attack surface through which agent behavior could be manipulated.

Component Specifications

3.1 Behavioral Baseline Metrics

Each agent accumulates a behavioral baseline during its first N sessions (configurable, default 10). Baseline is stored as a statistical distribution (mean, standard deviation, percentiles) per metric. Anomaly detection uses z-score with configurable sensitivity bands.

Metric Measurement Unit Baseline Window Alert Threshold Severity
File Access Frequency files/minute 10 sessions > 3σ above mean High
Tool Usage Distribution % share per tool 10 sessions KL divergence > 0.4 Medium
Token Consumption Rate tokens/task 20 sessions > 2.5σ above mean Medium
API Call Patterns calls/minute, endpoint mix 10 sessions 10x rate spike or new endpoint Critical
Directory Access Breadth unique paths/session 10 sessions > 2σ or new path prefix High
External Network Calls calls/session to external hosts 10 sessions Any new host not in allowlist Critical
Session Duration seconds 20 sessions > 3σ above mean Low
Error Rate errors/100 tool calls 20 sessions > 2σ or sudden spike Medium

3.2 Identity Lifecycle State Machine

Each agent instance receives a unique identity token with a defined lifecycle. Tokens are scoped to the minimum required access for the session task. State transitions are logged as immutable audit events.

PROVISION
AUTHENTICATE
AUTHORIZE
MONITOR
SUSPEND
REVOKE
State Entry Condition Credential Status Agent Capability Exit Trigger
PROVISION Session creation request Generating None — awaiting auth Auth handshake
AUTHENTICATE Credential issuance complete Active, unverified Limited — identity proofs only Verification success/fail
AUTHORIZE Identity verified Active, scoped Task-scoped tool set Authorization granted
MONITOR Authorization granted Active, full scope Full authorized capability Anomaly signal or session end
SUSPEND Anomaly above threshold Active, restricted Read-only, no external calls Human review decision
REVOKE Human confirms or auto-policy Revoked, blacklisted None — session terminated Terminal state

3.3 Memory Integrity Check Types

Every memory write passes through a four-stage integrity pipeline before committing to the vector store. Failures route to quarantine, not silent discard.

Check Type Method Pass Condition Fail Action
Semantic Consistency Embedding cosine similarity against existing memories in same namespace Similarity > 0.35 to at least one anchor Flag as anomalous, route to quarantine
Fact Verification Cross-reference against verified knowledge anchor collection No contradiction with confidence > 0.85 Reject with contradiction report
Provenance Validation Source traceability — every memory must carry agent_id, session_id, timestamp, source_tool All provenance fields present and valid Reject missing-provenance writes
Anomaly Scoring Mahalanobis distance from centroid of agent's memory cluster Distance < configurable threshold (default: 4.5) High-distance entries quarantined pending review

3.4 Threat Taxonomy

Threat Type Detection Method Response Action Severity
Prompt Injection Signature matching + semantic classifier (instruction override patterns) Block prompt, log event, notify Guardian Critical
Memory Poisoning Write-time integrity checks + read-time provenance verification Quarantine entry, alert operator Critical
Tool Abuse Rate anomaly + unusual tool combination detector Throttle tool access, escalate to Guardian High
Privilege Escalation Scope creep detection — monitor for access outside authorized manifest Block request, suspend identity, audit log Critical
Data Exfiltration Volume anomaly + new destination host detection Block outbound, suspend session, alert security team Critical
Inter-Agent Collusion CSS scoring — detects mutual reinforcement of bad decisions Isolate agent pair, pause workflow, human review High

3.5 Guardian Agent Decision Matrix

The Guardian Agent processes incoming signals from all other components and selects an intervention action based on the combined signal severity and configured autonomy level.

Observation Input Assessment Score Advisory Mode Semi-Autonomous Fully Autonomous
Single low-severity anomaly Score 1–2 Log only Log + notify Log + notify
Repeated medium anomalies Score 3–4 Notify operator Auto-throttle Auto-throttle
Single high-severity event Score 5–6 Alert + recommend suspend Auto-throttle, alert Auto-suspend
Critical threat detected Score 7–8 Alert + recommend terminate Auto-suspend, human confirms terminate Auto-terminate
Confirmed exfiltration or injection Score 9–10 Emergency alert Auto-terminate + block Auto-terminate + block + quarantine

3.6 CSS and TUE Scoring Formulas

Component Synergy Score (CSS)
CSS = (1/n) × Σ [ w_i × outcome_i / max_outcome ] × (1 - collusion_penalty)
Where n = number of agent interactions evaluated, w_i = interaction weight (higher for cross-domain tool calls), outcome_i = measured outcome quality score [0,1], max_outcome = theoretical maximum for task class, collusion_penalty = [0,1] score from mutual-reinforcement detector. CSS range: [0,1]. CSS < 0.4 triggers Guardian review. CSS < 0.2 triggers automatic isolation.
Tool Utilization Efficacy (TUE)
TUE = (correct_tool_calls / total_tool_calls) × (1 - redundancy_ratio) × precision_weight
Where correct_tool_calls = calls producing expected outcome class (verified post-hoc), redundancy_ratio = duplicate or unnecessary calls / total calls, precision_weight = 1 - (false_positive_tool_calls / total_tool_calls). TUE range: [0,1]. TUE < 0.5 triggers behavioral review. Sustained TUE < 0.35 flags agent for re-evaluation.

18 Security Requirements

All requirements are traceable to specific compliance obligations (SOC 2 CC6, CC7, CC9; state DOI agent disclosure requirements) and threat taxonomy entries above.

REQ-SEC-001
Runtime behavioral baseline establishment. The system shall establish a per-agent behavioral baseline across configurable metric dimensions (minimum: file access frequency, tool usage distribution, token consumption rate, API call patterns) using a supervised observation window of at least 10 sessions before anomaly detection becomes active.
REQ-SEC-002
Configurable anomaly detection sensitivity. Anomaly detection thresholds shall be configurable per metric and per agent class (e.g., read-only agents vs. write-authorized agents vs. external-facing agents) without requiring code changes — through an operator-accessible configuration interface.
REQ-SEC-003
Ephemeral identity provisioning per session. Each agent session shall receive a unique, time-scoped identity credential generated at session creation. Credentials shall not be reusable across sessions. Credential TTL shall be configurable with a default maximum of 24 hours.
REQ-SEC-004
Credential rotation on configurable schedule. Agent credentials shall support scheduled rotation without session interruption. Rotation shall be triggered on schedule (configurable interval), on anomaly signal (automatic), or on operator command. Old credentials shall be revoked within 60 seconds of rotation completion.
REQ-SEC-005
Memory write integrity verification. All writes to the agent vector memory store shall pass through a four-stage integrity pipeline (semantic consistency, fact verification, provenance validation, anomaly scoring) before committing. Pipeline stage results shall be logged with the memory entry for audit traceability.
REQ-SEC-006
Semantic anomaly detection on stored memories. The memory integrity verifier shall compute an anomaly score for each write candidate using embedding distance from the agent's established memory cluster centroid. The anomaly threshold shall be configurable per agent namespace with a default of 4.5 standard deviations.
REQ-SEC-007
Quarantine workflow for suspicious memory entries. Memory entries failing integrity checks shall be routed to a quarantine collection rather than silently discarded. Quarantined entries shall be accessible for operator review with full provenance metadata. Operators shall be able to promote (approve) or permanently reject quarantined entries. The quarantine backlog shall be surfaced in the security dashboard.
REQ-SEC-008
Inter-agent collusion detection via CSS scoring. The system shall compute Component Synergy Score (CSS) for all multi-agent workflow interactions. CSS shall be evaluated against configurable thresholds. CSS scores below 0.4 shall trigger Guardian Agent review. CSS scores below 0.2 shall trigger automatic agent pair isolation pending human review.
REQ-SEC-009
Tool utilization efficacy tracking. The system shall compute Tool Utilization Efficacy (TUE) scores per agent across rolling 50-call windows. TUE scores shall be stored in a time-series collection for trend analysis. Sustained TUE below 0.35 over three consecutive windows shall flag the agent for behavioral re-evaluation.
REQ-SEC-010
Guardian agent with autonomous intervention capability. The Guardian Agent shall support three configurable autonomy levels (advisory, semi-autonomous, fully autonomous). Guardian decisions shall be logged as structured audit events. All autonomous actions (throttle, suspend, terminate) shall generate operator notifications within 30 seconds. Guardian configuration changes shall require elevated operator authorization.
REQ-SEC-011
Prompt injection detection at runtime. The threat detection engine shall implement both signature-based detection (known injection patterns) and semantic classification (embedding-based instruction override detection) for prompt injection. Detection shall occur before tool execution, not post-hoc. Detected injections shall block execution and generate a Critical threat event.
REQ-SEC-012
Memory poisoning detection and remediation. The system shall detect memory poisoning at both write time (integrity pipeline) and read time (provenance verification on retrieval). Detected poisoning events shall trigger: (1) quarantine of the affected entry, (2) review of adjacent entries written in the same session, (3) Guardian Agent notification with threat score.
REQ-SEC-013
Data exfiltration pattern detection. The behavioral monitor shall track outbound data volume per session, destination hosts, and data type distribution. Any outbound call to a host not present in the agent's established baseline and not in the organizational allowlist shall generate an immediate Critical threat event. Volume spikes exceeding 3 standard deviations above baseline shall generate a High threat event.
REQ-SEC-014
Privilege escalation monitoring. The identity lifecycle manager shall enforce scope boundaries on every tool call by cross-referencing the requested access against the session's authorized scope definition. Any access attempt outside authorized scope shall be blocked, logged, and escalated to the Guardian Agent as a Critical severity event regardless of other behavioral indicators.
REQ-SEC-015
Real-time threat dashboard. The system shall provide an operator-facing security dashboard showing: active agent sessions with current risk scores, recent threat events (last 24 hours), quarantine backlog count, Guardian Agent action history, CSS and TUE trends per agent, and identity lifecycle status. Dashboard data shall refresh at intervals not exceeding 30 seconds.
REQ-SEC-016
Governance audit bus integration. All security events (anomaly detections, threat events, identity state transitions, memory integrity failures, Guardian Agent actions, quarantine events) shall be emitted to the governance audit bus as structured events with standardized schema. Events shall be immutable once written. The audit bus integration shall support at-least-once delivery semantics.
REQ-SEC-017
Forensic investigation toolkit. The system shall provide a forensic replay capability allowing operators to reconstruct an agent's session timeline from audit events, including: tool calls in sequence, memory reads and writes, identity state transitions, anomaly signals received, and Guardian interventions. Replay shall be available for any session within the configurable retention window (minimum 90 days).
REQ-SEC-018
Insurance compliance reporting. The system shall generate structured compliance reports suitable for SOC 2 Type II evidence packages (CC6.1, CC6.3, CC6.7, CC7.1, CC7.2, CC7.3, CC9.1) and state DOI agent disclosure requirements. Reports shall be exportable in machine-readable JSON and human-readable PDF format. Report generation shall be auditable (logged as governance events) and support digital signing.

Copy-Ready Claude Code Prompt

Paste this into Claude Code to scaffold the runtime security layer. The prompt references existing governance infrastructure and expects a TypeScript/Node.js environment with Qdrant vector storage.

Claude Code — Agent Runtime Security & Identity
Build the Agent Runtime Security & Identity system for an insurance company AI platform. This is a TypeScript/Node.js implementation. Use Qdrant as the vector store (already running), emit events to a governance audit bus (NATS/JetStream), and integrate with the existing agent manifest system. ## System Architecture Build six components as independent modules with a shared event bus interface: ### 1. Behavioral Monitor (`src/security/behavioral-monitor.ts`) - Class `BehavioralMonitor` with methods: `recordMetric(agentId, sessionId, metric, value)`, `getBaseline(agentId, metric)`, `checkAnomaly(agentId, metric, value): AnomalyResult` - Store baselines in Qdrant collection `agent_behavioral_baselines` - Baseline structure: { agentId, metric, mean, stddev, p95, sampleCount, lastUpdated } - Anomaly detection: z-score calculation, configurable threshold per metric class - Metrics to track: FILE_ACCESS_FREQUENCY, TOOL_USAGE_DISTRIBUTION, TOKEN_CONSUMPTION_RATE, API_CALL_PATTERNS, DIRECTORY_ACCESS_BREADTH, EXTERNAL_NETWORK_CALLS, SESSION_DURATION, ERROR_RATE - Emit `behavioral.anomaly.detected` events with: agentId, sessionId, metric, observedValue, baselineValue, zScore, severity - Minimum 10 sessions before anomaly detection becomes active (configurable via BASELINE_MIN_SESSIONS env) ### 2. Identity Lifecycle Manager (`src/security/identity-lifecycle.ts`) - Class `IdentityLifecycleManager` with state machine: PROVISION → AUTHENTICATE → AUTHORIZE → MONITOR → SUSPEND → REVOKE - Methods: `provision(agentId, taskScope): SessionIdentity`, `authenticate(sessionId, proof): boolean`, `authorize(sessionId, requestedScope): AuthResult`, `suspend(sessionId, reason)`, `revoke(sessionId, reason)` - Store identity state in Qdrant collection `agent_identity_sessions` - SessionIdentity: { sessionId, agentId, token, scope, state, issuedAt, expiresAt, rotationSchedule } - Token TTL configurable via IDENTITY_TOKEN_TTL_SECONDS env (default 86400) - Automatic rotation: schedule-based (configurable) + anomaly-triggered - Emit state transition events to audit bus: `identity.state.transitioned` - Scope enforcement: every tool call validates against session scope definition ### 3. Memory Integrity Verifier (`src/security/memory-integrity.ts`) - Class `MemoryIntegrityVerifier` with pipeline: semantic consistency → fact verification → provenance validation → anomaly scoring - Method `verifyWrite(entry: MemoryEntry): VerificationResult` — passes through pipeline, returns pass/quarantine/reject - Semantic consistency: cosine similarity against existing memories in namespace, threshold configurable - Fact verification: cross-reference against `knowledge_anchors` collection in Qdrant - Provenance validation: required fields — agentId, sessionId, timestamp, sourceTool, taskContext - Anomaly scoring: Mahalanobis distance from memory cluster centroid (default threshold: 4.5) - Quarantine collection: `memory_quarantine` in Qdrant with full metadata preserved - Method `reviewQuarantine(entryId, decision: 'promote'|'reject', operatorId)`: promote moves to main store, reject moves to `memory_rejected` - Emit events: `memory.integrity.failed`, `memory.quarantined`, `memory.promoted`, `memory.rejected` ### 4. Inter-Agent Coordination Scorer (`src/security/coordination-scorer.ts`) - Class `CoordinationScorer` implementing CSS and TUE scoring - CSS formula: `(1/n) * sum(w_i * outcome_i / max_outcome) * (1 - collusion_penalty)` - collusion_penalty: detect mutual reinforcement — if agent A's output directly copies agent B's output pattern in >60% of interactions, apply penalty 0.3 - TUE formula: `(correct_tool_calls / total_tool_calls) * (1 - redundancy_ratio) * precision_weight` - Track over rolling 50-call windows - Methods: `recordInteraction(agentA, agentB, sessionId, outcome)`, `computeCSS(agentPairId): number`, `computeTUE(agentId): number` - Store scores in Qdrant collection `coordination_scores` time-series - Emit `coordination.css.threshold.breach` when CSS < 0.4 - Emit `coordination.css.critical.breach` when CSS < 0.2 (triggers isolation) - Emit `coordination.tue.degraded` when TUE < 0.35 for 3+ consecutive windows ### 5. Threat Detection Engine (`src/security/threat-detection.ts`) - Class `ThreatDetectionEngine` with five detectors as separate sub-modules - PromptInjectionDetector: signature patterns array (configurable) + embedding similarity to `injection_signatures` collection in Qdrant — returns ThreatEvent before tool execution - MemoryPoisoningDetector: write-time (hooks into MemoryIntegrityVerifier) + read-time provenance check - ToolAbuseDetector: rate monitor (sliding window) + tool combination pattern detector - PrivilegeEscalationDetector: scope boundary enforcement, cross-references session AuthResult - DataExfiltrationDetector: outbound volume tracking, destination host allowlist check - All detectors emit structured ThreatEvent: { threatId, type, severity, agentId, sessionId, timestamp, evidence, recommendedAction } - Severity levels: LOW, MEDIUM, HIGH, CRITICAL - CRITICAL events: auto-notify Guardian Agent, never require polling ### 6. Guardian Agent (`src/security/guardian-agent.ts`) - Class `GuardianAgent` — subscribes to all security events from other components - Autonomy levels: ADVISORY, SEMI_AUTONOMOUS, FULLY_AUTONOMOUS (configurable via GUARDIAN_AUTONOMY_LEVEL env) - Decision matrix implementation — input: combined signal score (1-10), output: action per autonomy level - Actions: LOG_ONLY, NOTIFY, THROTTLE (rate-limit tool calls), SUSPEND (call IdentityLifecycleManager.suspend), TERMINATE (revoke identity, end session) - Method `processEvent(event: SecurityEvent): GuardianDecision` - All Guardian actions logged as `guardian.action.taken` audit events - Operator notifications via webhook (configurable GUARDIAN_WEBHOOK_URL) within 30 seconds of action - Audit log: store in Qdrant collection `guardian_audit_log` ## Shared Infrastructure ### Audit Bus (`src/security/audit-bus.ts`) - NATS/JetStream client wrapper - Method `emit(eventType: string, payload: object)`: publish with at-least-once semantics - Method `subscribe(eventType: string, handler)`: durable consumer - Standardized event schema: { eventId, eventType, timestamp, agentId, sessionId, severity, payload } - Immutable once written (append-only stream) ### Security Dashboard API (`src/api/security-dashboard.ts`) - Express routes for dashboard data - GET /api/security/sessions — active sessions with risk scores - GET /api/security/threats — recent threat events (24h, paginated) - GET /api/security/quarantine — quarantine backlog - GET /api/security/guardian/actions — Guardian action history - GET /api/security/scores/:agentId — CSS and TUE trends - POST /api/security/quarantine/:entryId/review — promote or reject quarantined memory - All routes require operator auth (JWT with security_admin scope) - WebSocket endpoint for real-time event streaming (refresh <= 30s) ### Forensic Replay (`src/security/forensic-replay.ts`) - Class `ForensicReplay` reconstructs session timeline from audit events - Method `replaySession(sessionId, timeRange?): SessionTimeline` - SessionTimeline: ordered array of { timestamp, eventType, data } covering all tool calls, memory operations, identity transitions, anomalies, Guardian actions - Store forensic data with 90-day retention (configurable via FORENSIC_RETENTION_DAYS) ### Compliance Reporter (`src/security/compliance-reporter.ts`) - Class `ComplianceReporter` generating SOC 2 and DOI evidence reports - Method `generateSOC2Report(period: DateRange, controls: string[]): ComplianceReport` - Controls: CC6.1, CC6.3, CC6.7, CC7.1, CC7.2, CC7.3, CC9.1 - Method `generateDOIReport(period: DateRange, jurisdiction: string): ComplianceReport` - Output: JSON (machine-readable) and PDF (via pdfkit) - All report generations logged as audit events and support digital signing (node-forge) ## Qdrant Collections Required - `agent_behavioral_baselines` — behavioral baseline stats - `agent_identity_sessions` — identity lifecycle state - `memory_quarantine` — quarantined memory entries - `memory_rejected` — permanently rejected entries - `knowledge_anchors` — verified facts for memory integrity - `injection_signatures` — prompt injection embedding signatures - `coordination_scores` — CSS/TUE time-series - `guardian_audit_log` — all Guardian actions - `forensic_events` — raw event stream for replay ## Environment Variables ``` QDRANT_URL=http://localhost:6334 QDRANT_API_KEY= NATS_URL=nats://localhost:4222 GUARDIAN_AUTONOMY_LEVEL=SEMI_AUTONOMOUS GUARDIAN_WEBHOOK_URL= IDENTITY_TOKEN_TTL_SECONDS=86400 BASELINE_MIN_SESSIONS=10 FORENSIC_RETENTION_DAYS=90 INJECTION_SIMILARITY_THRESHOLD=0.85 MEMORY_ANOMALY_THRESHOLD=4.5 ``` ## Integration Points - Import existing agent manifest schema from `../governance/manifest-schema` - Emit all events to existing `governance.audit` NATS subject (prefix security events with `security.`) - The MemoryIntegrityVerifier should be called by existing memory write paths — export a middleware hook - Guardian Agent subscribes to `security.>` wildcard subject Write production-ready TypeScript. No stubs. Full error handling. Each class should be independently testable. Export types from `src/security/types.ts`. Include a `src/security/index.ts` barrel export.

Key Trade-offs and Rationale

👁️

Agent-Side vs. Observer-Side Monitoring

Decision: Observer-side (external monitor, not instrumented into the agent).

Rationale: Agent-side instrumentation is easier to disable or bypass — a compromised agent can simply stop reporting. An external observer running in a separate process with separate credentials cannot be silenced by the monitored agent. Trade-off: slightly higher latency for anomaly detection (one event loop cycle delay), accepted in exchange for tamper resistance.

📐

Statistical vs. Rule-Based Anomaly Detection

Decision: Statistical (z-score + distribution comparison) with configurable rule overlays.

Rationale: Pure rule-based detection requires anticipating every possible attack pattern — intractable for novel threats. Statistical detection catches unknown deviations from established behavior. Rules are added for known-bad patterns (e.g., specific injection signatures) where statistical detection would be too slow. The hybrid approach catches both novel and known threats.

🤖

Guardian Agent Autonomy Level

Decision: Default to SEMI_AUTONOMOUS; FULLY_AUTONOMOUS requires explicit operator opt-in.

Rationale: Insurance environments have strict audit requirements — unexplained automated terminations would create regulatory exposure. Semi-autonomous mode allows auto-throttle and auto-suspend (recoverable actions) but requires human confirmation for termination (irreversible). FULLY_AUTONOMOUS is available for high-throughput non-critical agent workloads where operator review latency is unacceptable.

🔬

Memory Quarantine vs. Rejection

Decision: Quarantine (hold for review) rather than silent rejection for anomalous entries.

Rationale: False positives in anomaly detection are inevitable. Silent rejection would cause data loss that is invisible to operators — agents would appear to be functioning normally while missing important context. Quarantine creates a reviewable audit trail, allows operators to tune detection thresholds by examining false positive patterns, and preserves evidence of potential poisoning attempts for forensic analysis.

📊

CSS Scoring Methodology

Decision: Outcome-weighted CSS with explicit collusion penalty rather than pure output similarity.

Rationale: Pure output similarity scoring would flag legitimate specialization (two agents developing complementary expertise in the same domain). The collusion penalty specifically targets the bad pattern: agents echoing each other's outputs without independent reasoning. Weighting by outcome quality ensures the score reflects business value, not just behavioral diversity.

⚖️

Ephemeral Credentials vs. Scoped Persistent Tokens

Decision: Ephemeral per-session credentials with task-scoped authorization.

Rationale: Persistent tokens — even narrowly scoped — accumulate risk as agents accumulate sessions. A single compromised token gives an attacker the full credential lifetime. Ephemeral credentials limit the blast radius of any individual compromise to one session. The operational overhead of provisioning per session is acceptable given modern secret management tooling; the security benefit is substantial.

Connections to Existing BulletproofSoftware Infrastructure

The runtime security layer is designed as an additive layer — it extends existing components without requiring changes to their core logic. Integration is achieved through event subscriptions, middleware hooks, and shared Qdrant collections.

⚙️

Governance System

Extends agent manifests with identity_lifecycle configuration block (TTL, rotation schedule, autonomy level). The IdentityLifecycleManager reads manifest scope definitions to build session authorization tokens. All security events are emitted to the existing governance audit bus under the security.* subject namespace. Threat events reference manifest version for traceability.

🧠

Memory System (claude-memory-mcp)

The MemoryIntegrityVerifier hooks into memory write paths via exported middleware. All writes to the 45+ Qdrant collections pass through the four-stage integrity pipeline. The memory_quarantine and knowledge_anchors collections are managed by this PRD's system and consumed by the memory MCP tools. Read-time provenance verification integrates with memory_recall tool.

🔀

Orchestration Layer

The Guardian Agent registers as a special-role oversight participant in the agent hierarchy. It receives the agent execution graph at workflow start and establishes monitoring subscriptions for all participating agents. Guardian intervention actions (throttle, suspend, terminate) are delivered to the orchestration layer via command channel, not direct process signals — maintaining the observer separation principle.

🔎

Code Assurance System

Static security analysis from Code Assurance is extended with runtime behavioral data. Agents flagged at static analysis time receive tighter anomaly detection thresholds (lower z-score alerts). Runtime behavioral profiles are fed back to Code Assurance to improve future static analysis rulesets — closing the loop between pre-deployment scanning and live behavioral data.

📡

Data Plane

The DataExfiltrationDetector integrates with the data plane's network egress monitoring. Allowlists are synchronized from the data plane's approved destination registry. Volume thresholds are calibrated against the data plane's normal traffic baselines. Exfiltration alerts trigger data plane-level blocking (not just agent suspension) for immediate containment in parallel with Guardian Agent response.

📋

Compliance & Reporting

The ComplianceReporter pulls evidence from the governance audit bus, forensic event store, identity lifecycle logs, and memory quarantine history. SOC 2 control evidence packages reference specific audit event IDs for examiner traceability. DOI agent disclosure reports use the behavioral baseline data to demonstrate that agent behavior is predictable and bounded — a key regulatory requirement for automated claims processing approval.

Event Schema Compatibility

All security events conform to the governance system's CloudEvents-compatible schema. New event types are registered in the governance event registry before deployment. Existing audit consumers do not require modification — they receive security events as first-class governance audit entries and can filter by event type. The security.* subject prefix allows opt-in subscription for security-specific consumers (SIEM integrations, compliance tools) without polluting the main audit stream.

Integration surface quick reference:

Governance audit bus (NATS)
Agent manifest schema
Qdrant vector store
Memory write middleware
Orchestration command channel
Code assurance event feed
Data plane egress registry
Compliance evidence store
SIEM webhook adapter
DOI reporting API