PRD 10 of 19

Agent Economics
& Cost Governance

Per-agent cost tracking, budget enforcement, model routing, and ROI dashboards — so insurance companies always know exactly what their AI factory costs and why.

14 Requirements

6 Core Components

3 Model Tiers

<100ms Metering Latency

Section 01

Problem Statement

AI agents consume tokens, compute, and external API calls at scale — but today there is zero visibility into what any of it costs.

Critical Gap

An agent stuck in a loop can burn through a monthly budget in minutes. Without real-time enforcement, a single runaway agent can generate thousands of dollars in unexpected API costs before anyone notices.

📊

No Cost Visibility

Teams have no idea what each agent costs per task, per session, or per project. LLM invoices are a black box — total spend with no attribution.

🚨

No Budget Guardrails

Nothing prevents an agent from exceeding any spending threshold. There are no caps, no warnings, and no automatic pauses when costs spike.

🗺️

No Model Routing

Every task uses the same model regardless of complexity. Formatting a JSON file gets billed at the same rate as a multi-file architectural review.

📉

No ROI Justification

Insurance companies need to show value to procurement. Without cost-per-feature tracking, there's no data to justify AI development investment.

Insurance Context

Insurance companies operate under strict cost controls and fiduciary obligations. Unpredictable AI infrastructure spend is a blocker for procurement approval. Cost predictability — not just performance — is a first-class requirement for this sector.

Section 02

Architecture Overview

Six purpose-built components form the economics layer, sitting between the agent orchestration plane and the external LLM providers.

⚡

Cost Metering Engine

Intercepts every LLM call, tool invocation, and external API request. Records token counts (input/output/cache), model used, latency in milliseconds, and computed dollar cost per interaction. Emits structured cost events to the audit bus with sub-100ms overhead.

🛑

Budget Controller

Maintains per-project and per-agent budget caps with configurable threshold actions. Warns at 80%, throttles new task dispatch at 90%, auto-pauses at 100% with graceful task handoff — saving context and state so work can resume when the budget resets.

🔀

Model Router

Routes every task to the cheapest model capable of completing it. Uses complexity signals extracted from task metadata: token length estimate, tool call count, code diff size, and task classification labels. Haiku for boilerplate; Sonnet for standard; Opus for architecture.

💾

Token Optimizer

Reduces token consumption through semantic caching (unchanged files are not re-read), batched tool calls (multiple reads in one round-trip), and context window management that prunes stale conversation history before it inflates costs.

📈

Economics Dashboard

Real-time cost visualization by agent, project, and session. Displays current spend vs. budget, cost-per-feature metrics, trend charts with rolling 30-day history, and spend forecasting based on current burn rate. Built for both engineering teams and finance stakeholders.

🏢

Chargeback Engine

Allocates AI infrastructure costs to internal projects, departments, and cost centers for insurance company internal billing. Generates monthly chargeback reports with line-item detail by project and feature, suitable for finance system ingestion.

Data Flow

Agent makes LLM call → Metering Engine intercepts and records → Budget Controller checks cap → if within budget, call proceeds → response returns with cost appended → event published to audit bus → Dashboard updates in real time → Chargeback Engine accumulates for period-end reporting.

Section 03

Key Components

Detailed specifications for each subsystem, including pricing models, routing rules, threshold actions, caching strategies, and cost allocation taxonomy.

3.1 — Model Tier Pricing

Tier	Model	Input (per 1M tokens)	Output (per 1M tokens)	Cache Read	Best For
Tier 1 — Haiku	claude-haiku-3-5	$0.80	$4.00	$0.08	Formatting, linting, docstrings, boilerplate generation, simple Q&A
Tier 2 — Sonnet	claude-sonnet-4-5	$3.00	$15.00	$0.30	Standard implementation, bug fixes, test generation, API integration
Tier 3 — Opus	claude-opus-4-5	$15.00	$75.00	$1.50	Architecture design, security review, complex reasoning, cross-system planning

3.2 — Model Routing Decision Matrix

Task Type	Complexity Signal	Est. Token Range	Routed To	Fallback
Code formatting / linting	Single-file, no reasoning	<2K tokens	Haiku	—
Docstring / comment generation	Isolated, low context	<4K tokens	Haiku	—
Unit test generation	Single function, clear spec	2K–8K tokens	Haiku	Sonnet
Bug fix (isolated)	Stack trace + single file	4K–12K tokens	Sonnet	—
Feature implementation	Multi-file, spec provided	8K–32K tokens	Sonnet	—
API integration	External schema + mapping	8K–24K tokens	Sonnet	—
Security/compliance review	Multi-file, cross-cutting	16K–64K tokens	Opus	—
Architecture design	System-level reasoning	12K–48K tokens	Opus	—
Complex multi-system bug	Cross-repo, unknown root cause	>24K tokens	Opus	—

3.3 — Budget Threshold Actions

Threshold	Trigger Condition	Automated Action	Notification	Override Allowed?
60%	Agent crosses 60% of session cap	None — monitoring only	Info log	N/A
80%	Agent crosses 80% of cap	Downgrade routing tier by one level	Dashboard alert	Yes, by project admin
90%	Agent crosses 90% of cap	Throttle: pause new task dispatch, complete active task	Slack + email	Yes, with explicit approval
100%	Budget exhausted	Hard pause: save state, queue remaining tasks, halt billing	All channels + webhook	Only by billing admin
Anomaly	Agent exceeds 10× baseline cost/task	Immediate pause + anomaly flag on audit trail	PagerDuty / webhook	After investigation sign-off

3.4 — Caching Strategies & Hit Rate Targets

Cache Layer	What Is Cached	TTL	Hit Rate Target	Estimated Savings
Prompt Cache (native)	System prompts, static context blocks	5 minutes (sliding)	≥ 70%	90% reduction on cache reads vs. full input
File Content Cache	Unchanged file reads (by git SHA)	Until next commit	≥ 60%	Eliminates redundant file-read tokens
Semantic Result Cache	Equivalent task outputs (cosine similarity ≥ 0.95)	24 hours	≥ 20%	Full LLM call cost avoided
Tool Response Cache	External API responses (read-only endpoints)	Configurable (default 1 hour)	≥ 40%	Avoids external API call costs + latency
Context Compression	Long conversation history → compressed summary	Triggered at 75% context window	N/A — always runs	20–40% reduction in context tokens

3.5 — Cost Allocation Taxonomy

Dimension	Granularity	Example Values	Used For
Organization	Top level	Acme Insurance Group	Enterprise billing, contract compliance
Department	Business unit	Underwriting, Claims, IT	Internal chargeback, budget approval
Project	Initiative or product	Claims Portal v2, Policy API	Project ROI calculation, PO reconciliation
Feature	User story / ticket	FEAT-142: Auto-approval rules	Cost-per-feature, sprint cost tracking
Agent	Individual agent instance	orchestrator-7f2a, coder-3b9c	Agent efficiency analysis, anomaly detection
Session	Single work session	session-2026-04-02-0912	Per-run cost audit, replay analysis

Section 04

Requirements

Fourteen formally specified requirements for the Agent Economics & Cost Governance subsystem.

REQ-ECON-001

Per-Agent Cost Tracking

Every agent instance must have an associated cost ledger that accumulates spend by LLM call, tool use, and external API call. Cost records must be queryable by agent ID with millisecond-precision timestamps.

REQ-ECON-002

Per-Session and Per-Project Cost Tracking

Cost data must be aggregatable at session scope (single agent run) and project scope (all agents across all sessions for a given project). All three scopes — agent, session, project — must be independently queryable without recomputation.

REQ-ECON-003

Real-Time Budget Enforcement with Configurable Caps

Budget caps must be enforceable at the agent level and the project level. Caps are configurable per project by administrators. Enforcement must apply within one LLM call cycle — no cost overrun exceeding one call's maximum cost beyond the cap.

REQ-ECON-004

Automatic Model Routing Based on Task Complexity

The Model Router must classify every task prior to dispatch and select the cheapest model tier capable of completing it. Routing decisions must be logged with the signals used. Manual tier override must be available with audit trail entry.

REQ-ECON-005

Semantic Caching with Configurable TTL

The Token Optimizer must maintain a semantic cache with per-cache-layer TTL configuration. Cache hits must be logged as cost savings events. Cache invalidation must be triggered by code changes (via git SHA tracking) and configurable time expiry.

REQ-ECON-006

Cost Anomaly Detection

The system must detect agents consuming more than 10× their rolling 7-day average cost per task and immediately pause the agent pending review. Anomaly events must be emitted to the governance audit bus with full context (agent ID, task ID, cost, baseline).

REQ-ECON-007

Chargeback Reporting by Project and Department

The Chargeback Engine must generate monthly cost allocation reports broken down by organization, department, project, and feature. Reports must be exportable as CSV and JSON. Report generation must complete within 60 seconds for up to 12 months of history.

REQ-ECON-008

ROI Calculation

For each completed feature, the system must calculate and store: AI cost incurred, estimated manual development time (configurable hours-per-story-point baseline), estimated manual cost (configurable hourly rate), and computed ROI ratio. ROI data must be surfaced in the dashboard.

REQ-ECON-009

Dashboard with Real-Time Cost Visualization

The Economics Dashboard must update in real time (≤2 second lag from event emission). It must display current spend vs. budget for all active projects, per-agent cost breakdown for the current session, and a live cost rate ($/hour) indicator.

REQ-ECON-010

Historical Cost Trending and Forecasting

The dashboard must display a minimum of 90 days of historical cost data with daily granularity. Forecasting must project end-of-period spend based on current burn rate with configurable forecast horizon (7, 14, 30 days). Forecast confidence intervals must be displayed.

REQ-ECON-011

Integration with Governance Audit Bus for Cost Events

All cost events — budget warnings, threshold breaches, anomaly detections, cap enforcements, and routing decisions — must be published to the central governance audit bus as structured events with standardized schema. Events must be immutable once written.

REQ-ECON-012

Budget Inheritance (Project to Agent)

Project-level budgets must automatically distribute to individual agents based on configurable allocation strategies: equal split, weighted by agent role, or dynamic (agents claim from a shared project pool). Unspent agent allocations must return to the project pool at session end.

REQ-ECON-013

Graceful Degradation on Budget Exhaustion

When an agent's budget is exhausted, the system must: complete the current in-flight LLM call, checkpoint agent state to durable storage, queue remaining tasks with their full context, and emit a resumption event when budget is restored. No work in progress may be silently dropped.

REQ-ECON-014

Alert Channels (Slack / Email / Webhook on Budget Thresholds)

Threshold alerts (warn, throttle, pause, anomaly) must be deliverable via: Slack (configurable channel per project), email (configurable recipient list), and outbound webhook (configurable URL with HMAC signature). Alert delivery must be retried up to 3× on failure with exponential backoff.

Section 05

Prompt to Build It

Copy this prompt into Claude Code to scaffold the Agent Economics & Cost Governance subsystem. It references the architecture and requirements above — use it as-is or adjust cost baseline values for your environment.

You are building the Agent Economics & Cost Governance subsystem for an autonomous AI development factory targeting insurance companies.

The system must implement the following components with full production code — no stubs, no TODOs:

## 1. Cost Metering Engine

Create `src/economics/metering_engine.ts` that:
- Intercepts every LLM API call via a wrapping proxy class `MeteredLLMClient`
- Records: agent_id, session_id, project_id, feature_id, model, input_tokens, output_tokens,
  cache_read_tokens, latency_ms, cost_usd (computed), timestamp (ISO 8601), call_id (UUID v4)
- Publishes a structured `cost_event` to the governance audit bus within 100ms of call completion
- Exposes `getCostLedger(agentId: string): CostLedger` for real-time querying
- Token costs (per 1M): Haiku input $0.80 / output $4.00 / cache $0.08;
  Sonnet input $3.00 / output $15.00 / cache $0.30;
  Opus input $15.00 / output $75.00 / cache $1.50

## 2. Budget Controller

Create `src/economics/budget_controller.ts` that:
- Maintains in-memory budget state for all active agents and projects (backed by Redis for HA)
- Enforces configurable caps: project-level (total), agent-level (per session)
- Implements threshold actions at 60% (log), 80% (downgrade model tier), 90% (throttle dispatch),
  100% (hard pause + checkpoint state)
- Anomaly detection: if agent cost-per-task exceeds 10× its 7-day rolling average, auto-pause
- Exposes `checkBudget(agentId, estimatedCost): BudgetDecision` returning ALLOW | WARN | THROTTLE | BLOCK
- Implements budget inheritance: projects distribute to agents via EQUAL | WEIGHTED | POOL strategies

## 3. Model Router

Create `src/economics/model_router.ts` that:
- Accepts a `TaskDescriptor` (type, estimated_tokens, file_count, tool_call_count, requires_reasoning)
- Returns a `RoutingDecision` (model_tier, model_id, rationale, estimated_cost_usd)
- Routing rules:
  - estimated_tokens < 4K AND single_file → Haiku
  - estimated_tokens 4K–32K AND requires_reasoning=false → Sonnet
  - estimated_tokens > 32K OR requires_reasoning=true OR type in [ARCHITECTURE, SECURITY_REVIEW] → Opus
  - Manual override available with audit log entry
- Logs every routing decision with signals used to the audit bus

## 4. Token Optimizer

Create `src/economics/token_optimizer.ts` that:
- Implements a file content cache keyed by (path, git_sha) — invalidates on commit
- Implements a semantic result cache using cosine similarity (threshold 0.95) with 24h TTL
- Implements context compression: when conversation exceeds 75% of model context window,
  summarize oldest 50% of turns into a compressed block
- Implements batched tool calls: collect tool requests within a 50ms window, dispatch as batch
- Records cache hits as `cost_savings_event` on the audit bus with estimated savings_usd

## 5. Chargeback Engine

Create `src/economics/chargeback_engine.ts` that:
- Aggregates cost ledger data by: organization → department → project → feature → agent → session
- Generates monthly reports as `ChargebackReport` objects with line-item detail
- Exports reports as CSV and JSON
- Calculates ROI: cost_ai_usd, estimated_manual_hours (story_points × hours_per_point baseline),
  estimated_manual_cost_usd (hours × hourly_rate), roi_ratio (manual_cost / ai_cost)
- Baselines configurable: default 4 hours/story-point, $150/hour developer rate

## 6. Economics Dashboard API

Create `src/economics/dashboard_api.ts` that exposes:
- GET /economics/live — real-time cost rate ($/hr), active agents, current spend vs. budget
- GET /economics/projects/:id — project cost breakdown with budget status
- GET /economics/agents/:id/session — current session cost by agent
- GET /economics/trends?days=30 — historical daily costs with 14-day forecast
- GET /economics/chargeback?month=YYYY-MM — chargeback report for period
- WebSocket /economics/stream — real-time cost event stream for dashboard

## 7. Alert Dispatcher

Create `src/economics/alert_dispatcher.ts` that:
- Listens for threshold and anomaly events from Budget Controller
- Dispatches to: Slack (webhook URL per project), email (configurable recipients via nodemailer),
  outbound webhook (HMAC-SHA256 signed payload)
- Retries failed deliveries up to 3× with exponential backoff (1s, 2s, 4s)
- Logs all alert delivery attempts (success/failure) to audit trail

## 8. Integration

Wire all components in `src/economics/index.ts`:
- Export a single `EconomicsLayer` class that initializes all subsystems
- Accept config: `EconomicsConfig` with budget caps, model prices, cache TTLs, alert channels
- All cost events must flow through the central `GovernanceAuditBus` from the governance module
- Integrate with the Orchestration layer: wrap agent dispatch in `EconomicsLayer.authorizeDispatch()`

## Technical requirements:
- TypeScript strict mode, no `any` types
- All monetary values stored as integer cents (not floats) to avoid rounding errors
- Redis for budget state (ioredis client), PostgreSQL for cost ledger (via existing DB layer)
- All public methods must have JSDoc with parameter and return type documentation
- Unit tests in `src/economics/__tests__/` using Vitest, ≥80% coverage
- No hardcoded secrets — all config via environment variables

Section 06

Design Decisions

Key architectural trade-offs made in designing the economics layer, with rationale for each choice.

Real-Time vs. Batched Cost Calculation

Option

Pros

Cons

✓ Real-time (chosen)

Immediate budget enforcement; dashboard accuracy; anomaly detection within one call

Adds ~3–8ms per LLM call for cost computation

Batched (rejected)

Zero per-call overhead; simpler implementation

Budget overruns possible; 1–5 minute lag; useless for real-time enforcement

Insurance companies require real-time budget control. A 5-minute lag on enforcement is unacceptable when an agent can burn $50 in a single runaway session. The 3–8ms overhead is negligible against LLM call latency of 1,000–15,000ms.

Centralized vs. Distributed Metering

Option

Pros

Cons

✓ Centralized (chosen)

Single source of truth; consistent cost model; easier audit; simpler budget aggregation

Single point of failure; potential bottleneck at very high agent counts

Distributed (rejected)

Higher throughput; no single failure point

Cost reconciliation lag; split-brain budget state; complex audit trail reconstruction

The current scale (10s–100s of concurrent agents) does not require distributed metering. Centralized metering through a Redis-backed Budget Controller is operationally simpler and provides the consistency guarantees needed for financial audit trails. Can be sharded horizontally if needed at 1,000+ agents.

Model Routing: Heuristics vs. Learned Routing

Option

Pros

Cons

✓ Heuristics (chosen)

Predictable; auditable; no training data required; zero routing latency overhead

May route sub-optimally for edge cases; requires manual rule updates

ML-based routing (deferred)

Potentially more accurate; adapts over time

Requires training data (months); black-box decisions; regulatory audit concerns for insurance

Insurance regulators expect explainable decision-making at every layer. An ML routing model that cannot articulate why it chose Opus for a task creates compliance risk. Heuristic rules are auditable, versioned in code, and produce deterministic routing logs. ML routing is a future enhancement after sufficient operational data is collected.

Caching Granularity: Token-Level vs. Semantic-Level

Option

Pros

Cons

✓ Both layers (chosen)

Maximum savings: native prompt cache for identical prefixes + semantic cache for equivalent tasks

Two cache systems to maintain; semantic cache adds embedding computation cost

Token-level only

Simpler; no embedding overhead; 100% cache precision

Misses semantically equivalent tasks with different phrasing; lower hit rates

The embedding cost for semantic cache lookup (Haiku embedding, ~$0.00002 per query) is negligible against the potential savings from a full LLM call cache hit ($0.05–$2.00). Both layers together target 50–70% effective cache utilization across a mature project's workload.

Section 07

Integration Points

The Economics layer does not operate in isolation — it connects to four other subsystems, each with a defined interface contract.

🎭

Orchestration Layer

Every agent dispatch call passes through EconomicsLayer.authorizeDispatch() before execution. The orchestrator provides task metadata; the economics layer returns a DispatchAuthorization containing the routed model tier, approved budget, and any active throttle flags.

→ Task metadata (type, estimated tokens) ← Model tier + budget authorization ← Throttle / pause signals

⚖️

Governance Layer

Budget enforcement acts as a financial governance gate — similar to compliance gates but driven by cost thresholds. All cost events, anomaly detections, cap enforcements, and routing overrides are published to the central GovernanceAuditBus as immutable, signed events.

→ Cost events (structured, signed) → Budget threshold breaches → Anomaly flags ← Audit trail confirmation

📊

Dashboard Layer

The Economics Dashboard API feeds real-time cost panels, historical trend charts, and ROI summaries to the operator-facing interface. The WebSocket stream at /economics/stream pushes live cost events for sub-2-second dashboard updates without polling.

→ REST API (6 endpoints) → WebSocket event stream ← Dashboard filter parameters

🧠

Memory System

The Token Optimizer's semantic cache is built on top of the vector memory subsystem. File content cache invalidation is coordinated with the memory layer's git-SHA tracking. Context compression writes summaries back to the memory system for future session continuity without token re-cost.

→ Semantic similarity queries ← Cached embeddings + results → Compressed context summaries ← Git SHA change notifications

Key Interface Contracts

// Core types shared across subsystem boundaries

interface CostEvent {
  event_id: string;        // UUID v4
  event_type: 'llm_call' | 'tool_use' | 'external_api' | 'cache_hit';
  agent_id: string;
  session_id: string;
  project_id: string;
  feature_id: string | null;
  model: string;
  input_tokens: number;
  output_tokens: number;
  cache_read_tokens: number;
  cost_cents: number;         // integer cents, never float
  latency_ms: number;
  timestamp: string;          // ISO 8601
}

interface DispatchAuthorization {
  authorized: boolean;
  model_tier: 'haiku' | 'sonnet' | 'opus';
  model_id: string;
  budget_remaining_cents: number;
  throttle_active: boolean;
  block_reason: string | null;
}

interface BudgetDecision {
  decision: 'ALLOW' | 'WARN' | 'THROTTLE' | 'BLOCK';
  threshold_pct: number;
  spent_cents: number;
  cap_cents: number;
  anomaly: boolean;
}

Cost-Per-Feature Tracking Flow

When a feature ticket (e.g., FEAT-142) is dispatched to the orchestrator, its ID is propagated through the feature_id field on every cost event generated during that feature's work. The Chargeback Engine aggregates all events with the same feature_id to produce the total AI cost for that feature — enabling precise cost-per-story-point metrics across your entire project backlog.

Agent Economics& Cost Governance

Problem Statement

Critical Gap

No Cost Visibility

No Budget Guardrails

No Model Routing

No ROI Justification

Insurance Context

Architecture Overview

Cost Metering Engine

Budget Controller

Model Router

Token Optimizer

Economics Dashboard

Chargeback Engine

Data Flow

Key Components

3.1 — Model Tier Pricing

3.2 — Model Routing Decision Matrix

3.3 — Budget Threshold Actions

3.4 — Caching Strategies & Hit Rate Targets

3.5 — Cost Allocation Taxonomy

Requirements

Prompt to Build It

Design Decisions

Real-Time vs. Batched Cost Calculation

Centralized vs. Distributed Metering

Model Routing: Heuristics vs. Learned Routing

Caching Granularity: Token-Level vs. Semantic-Level

Integration Points

Orchestration Layer

Governance Layer

Dashboard Layer

Memory System

Key Interface Contracts

Cost-Per-Feature Tracking Flow

Agent Economics
& Cost Governance