Predictive Scaling — BulletproofSoftware.ai

1

Problem Statement

Agent Economics (PRD 10) is entirely reactive. It tracks what was spent after the spend occurs, routes models based on the current request in isolation, and has no awareness of historical patterns. The system makes the same cold-start mistakes every Monday morning, every sprint boundary, and every project kickoff.

The consequences are concrete: Opus is invoked for trivial classification tasks because the router lacks pattern memory. Qdrant collections are cold on first access because nobody predicted they would be needed. Budget alerts fire after the overspend occurs rather than before. Concurrency limits are static when workload is dynamic.

No Demand Anticipation

Every session starts cold. The system cannot predict that Monday mornings produce 3x the workload of Friday afternoons, or that sprint boundaries trigger MAJOR-tier workflows.

Wasteful Model Routing

Without historical pattern analysis, the model router treats every request as novel. Trivial tasks receive Opus-level processing. Complex tasks get Haiku-level treatment when budget is low, degrading quality instead of forecasting and reserving capacity.

Cold Cache Penalty

Qdrant collections, prompt caches, and context pre-loads are populated on demand. First access is always slow. Predictable access patterns (daily standup context, sprint planning data) could be pre-warmed but are not.

Reactive Budget Alerts

Budget warnings fire after thresholds are exceeded. A cost forecast with confidence intervals would enable proactive budget management and prevent mid-project cost surprises.

Real-World Scenarios

Scenario: Monday Morning Spike

Developer logs in at 09:00 after a weekend

Without prediction: cold caches, Opus routed for a TRIVIAL task (backlog grooming), context window fills with stale data. Cost: $2.40 for a $0.15 task. With prediction: caches pre-warmed at 06:00, Haiku pre-selected for morning classification tasks, relevant Qdrant collections loaded. Cost: $0.12.

Scenario: Sprint Boundary

End-of-sprint retrospective triggers 8 concurrent agent dispatches

Without prediction: concurrency limit hit, tasks queued, TTR doubles. Budget alert fires at 80% after 6 of 8 tasks complete. With prediction: concurrency limit temporarily raised to 12, budget forecast showed 95% consumption 2 days ago, operator pre-approved the overage.

2

Architecture

The Prediction Engine operates as a sidecar to Agent Economics, consuming the same data streams but producing forward-looking advisories rather than backward-looking reports. All predictions are advisory; no automated system changes occur without operator confirmation or pre-configured policies.

Statistical Forecasting, Not ML

The Prediction Engine uses time-series decomposition (trend + seasonal + residual), exponential smoothing, and simple regression. No neural networks, no training pipelines, no GPU requirements. Runs on the same hardware as the conductor.

Advisory, Not Mandatory

Every prediction produces an advisory with a confidence interval. Model routing suggestions, concurrency limit changes, and budget forecasts are presented to the operator or to Agent Economics as recommendations, never as forced overrides.

Three Output Channels

Predictions flow to three consumers: the Adaptive Model Router (pre-selects optimal model), the Concurrency Advisor (recommends limit adjustments), and the Budget Alert system (provides forward-looking cost projections).

3

Key Components

Workload Predictor

Analyzes three temporal pattern layers to forecast upcoming workload:

Temporal patterns: Hour-of-day, day-of-week, and day-of-month demand curves built from 30+ days of historical data. Detects recurring spikes (Monday mornings, end-of-month, sprint boundaries).
Session trajectory analysis: Once a session begins, predicts remaining workload based on the current trajectory. A MAJOR-tier workflow at phase 2 of 8 predicts 6 more agent dispatches.
Weekly profiles: Stored in Qdrant as vector embeddings, enabling similarity search. "This week looks like sprint-end week #14" enables pattern matching against similar historical periods.

Adaptive Model Router

Extends Agent Economics model routing with predictive pre-selection. Instead of routing each request in isolation, the router considers:

Predicted workload volume: If 20 tasks are forecast in the next hour, reserve Opus capacity for the 3 likely-complex tasks and pre-select Haiku for the 17 likely-trivial ones.
Budget trajectory: If the cost forecast shows 90% budget consumption by Wednesday, begin model downgrade recommendations on Monday.
Historical success rates: For this task type + agent combination, which model historically produces the highest first-pass success rate at the lowest cost?

Cache Pre-Warming Engine

Runs at 06:00 daily (configurable). Analyzes predicted workload and pre-loads:

Qdrant collections likely to be accessed
Prompt cache entries for predicted task types
Context pre-loads for active projects

Cost Forecaster

Produces cost projections with confidence intervals (p25/p50/p75/p95) for the next 24 hours, 7 days, and 30 days. Factors in predicted workload, model routing changes, and historical cost-per-task data.

Concurrency Advisor

Recommends concurrency limit adjustments based on predicted demand. Proposes temporary increases before anticipated spikes and decreases during predicted quiet periods. All changes require operator policy or explicit approval.

Prediction Quality Monitor

Tracks prediction accuracy over time using MAPE (Mean Absolute Percentage Error). Automatically degrades to conservative defaults when prediction quality drops below configured thresholds. Reports accuracy trends to the dashboard.

Prediction Trigger Rules

Trigger	Condition	Action
Session Start	New conductor session detected	Generate session-scope workload forecast and model routing advisory
3-Task Threshold	3 tasks completed in current session	Refine session trajectory prediction with actual data, update model routing
Daily Profile	06:00 local time (configurable)	Generate 24-hour workload forecast, execute cache pre-warming, update budget forecast
Weekly Profile	Sunday 22:00 (configurable)	Generate 7-day workload forecast, identify similar historical weeks, update monthly budget projection
Budget Alert	Cost forecast exceeds 70% of budget at any confidence level	Emit proactive budget warning with forecast details and recommended actions

4

Requirements

ID	Requirement	Priority
REQ-PS-001	Workload Predictor SHALL analyze at least 30 days of historical data to produce temporal demand curves (hour-of-day, day-of-week, day-of-month) with confidence intervals.	MUST
REQ-PS-002	Session trajectory analysis SHALL activate after 3 completed tasks in a session and produce updated workload forecasts for the remaining session within 2 seconds of trigger.	MUST
REQ-PS-003	Adaptive Model Router SHALL produce model pre-selection advisories that consider predicted workload volume, budget trajectory, and historical success rates per task-type/agent/model combination.	MUST
REQ-PS-004	Cache Pre-Warming Engine SHALL execute daily at a configurable time (default 06:00) and pre-load Qdrant collections, prompt cache entries, and context data predicted for the next 12 hours.	MUST
REQ-PS-005	Cost Forecaster SHALL produce cost projections with p25/p50/p75/p95 confidence intervals for 24-hour, 7-day, and 30-day horizons, updated at each prediction trigger event.	MUST
REQ-PS-006	All predictions SHALL be advisory. No automated system changes (model routing overrides, concurrency limit adjustments, budget modifications) SHALL occur without operator policy or explicit approval.	MUST
REQ-PS-007	Concurrency Advisor SHALL recommend limit adjustments (increase before predicted spikes, decrease during predicted quiet periods) with supporting data from the workload forecast.	SHOULD
REQ-PS-008	Weekly profiles SHALL be stored as vector embeddings in Qdrant, enabling similarity search to match current-week patterns against historical weeks for improved forecasting accuracy.	SHOULD
REQ-PS-009	Prediction Quality Monitor SHALL track MAPE (Mean Absolute Percentage Error) for all forecast types and automatically degrade to conservative defaults when MAPE exceeds 40% for 7 consecutive days.	MUST
REQ-PS-010	System SHALL use statistical methods only (time-series decomposition, exponential smoothing, regression). No machine learning models, no training pipelines, no GPU requirements.	MUST
REQ-PS-011	Budget Alert system SHALL emit proactive warnings when cost forecast exceeds 70% of the configured budget at any confidence level (p25/p50/p75/p95), including recommended mitigation actions.	SHOULD
REQ-PS-012	System SHALL expose prediction data to the Memory Dashboard (PRD 6) including forecast visualizations, prediction accuracy trends, and cache warming effectiveness metrics.	COULD

5

Design Decisions

Key architectural and design choices, with rationale for each decision.

Statistical methods, not machine learning. ML models require training data volumes, GPU compute, and maintenance overhead that are disproportionate to the prediction task. Time-series decomposition and exponential smoothing produce adequate forecasts for workload patterns that are fundamentally cyclical (hour/day/week). When prediction quality degrades, the system falls back to conservative defaults rather than producing confidently wrong forecasts.
Advisory, not mandatory. Predictions are probabilistic by nature. Forcing automated actions based on forecasts introduces a new failure mode: the system actively doing the wrong thing based on a bad prediction. Advisory mode preserves operator agency while providing the data needed for informed decisions. Operators can create policies to auto-accept specific advisory types once they trust the predictions.
3-task prediction trigger. Session trajectory analysis requires a minimum sample size before predictions become meaningful. With fewer than 3 completed tasks, the signal-to-noise ratio is too low. The 3-task threshold was chosen empirically: it provides enough data to distinguish TRIVIAL from MAJOR workflow patterns while activating early enough to be useful.
Daily profile updates at 06:00. Cache pre-warming must complete before the typical start of the workday. 06:00 provides a 2-3 hour buffer for pre-warming execution. The time is configurable because team schedules vary. Pre-warming earlier wastes resources on cache entries that expire; pre-warming later risks cold caches at session start.
Confidence intervals on all forecasts. A point estimate ("you will spend $47 tomorrow") is misleading. Confidence intervals ("you will spend between $31 and $68 with 80% probability") communicate the inherent uncertainty in prediction and enable operators to make risk-appropriate decisions. The p25/p50/p75/p95 format is familiar to engineers.
Automatic degradation on poor prediction quality. When MAPE exceeds 40% for 7 consecutive days, the system stops producing predictions and falls back to conservative defaults (no pre-warming, no routing advisories, static concurrency limits). This prevents a poorly-calibrated prediction engine from causing more harm than no predictions at all.

6

Integration Map

Predictive Scaling integrates with 9 systems across the ecosystem, serving as a forward-looking complement to Agent Economics' backward-looking cost analysis.

Integration	Direction	Data Exchanged
Agent Economics (PRD 10)	Reads / Advises	Reads historical cost data, model routing decisions, token consumption. Provides model routing advisories and cost forecasts back to Agent Economics for enhanced routing decisions.
Conductor (PRD 2)	Reads from	Session start events, tier classifications, phase sequences, and agent dispatch records. Primary data source for workload pattern analysis and session trajectory prediction.
Memory System (PRD 4)	Reads / Writes	Reads historical patterns and weekly profiles from Qdrant. Writes prediction metadata and weekly profile embeddings for cross-session continuity.
Memory Dashboard (PRD 6)	Writes to	Forecast visualizations, prediction accuracy trends, cache warming effectiveness metrics, and cost forecast charts with confidence intervals.
Event-Driven (PRD 13)	Subscribes / Publishes	Subscribes to conductor events for real-time session tracking. Publishes prediction events (forecast generated, advisory issued, cache warming completed) for downstream consumers.
Outcome Measurement (PRD 15)	Reads from	Historical success rates, TTR data, and quality scores to improve model routing accuracy and workload estimation.
Context Guard (PRD 3)	Advises	Provides context pre-load recommendations based on predicted task types, enabling Context Guard to proactively manage context window allocation.
n8n Workflows	Triggers	Prediction events can trigger n8n workflows for external notifications (Slack alerts for budget forecasts, email digests for weekly predictions).

PRD 2 Conductor PRD 3 Context Guard PRD 4 Memory System PRD 6 Memory Dashboard PRD 10 Agent Economics PRD 13 Event-Driven PRD 15 Outcome Measurement n8n Workflows

7

Prompt to Build It

Copy and paste this prompt into Claude Code to begin implementing Predictive Scaling. It references the architecture, components, and integration points defined in this PRD.

Build the Predictive Scaling system (PRD 16) for the conductor plugin ecosystem.

## Context
Agent Economics (PRD 10) is reactive — it tracks cost after spend occurs and routes
models without historical pattern awareness. We need a forward-looking prediction
layer that anticipates demand, pre-warms caches, and optimizes model routing using
statistical forecasting.

## Architecture
Prediction Engine (sidecar to Agent Economics) with three sub-components:
- Workload Predictor: temporal patterns (hour/day/week), session trajectory
  analysis (activates after 3 tasks), weekly profiles stored as Qdrant vectors
- Cost Forecaster: p25/p50/p75/p95 confidence intervals for 24h/7d/30d horizons
- Cache Pre-Warming Engine: runs daily at 06:00, pre-loads Qdrant collections,
  prompt caches, and context data based on 12-hour workload forecast

## Output Channels
1. Adaptive Model Router — predictive pre-selection based on forecast + budget
   + historical success rates per task-type/agent/model combination
2. Concurrency Advisor — temporary limit adjustments before predicted spikes
3. Budget Alert system — proactive warnings at 70% forecast threshold

## Key Constraints
- Statistical methods ONLY: time-series decomposition, exponential smoothing,
  regression. No ML, no training pipelines, no GPU requirements
- Advisory ONLY: no automated changes without operator policy or approval
- 3-task minimum before session trajectory prediction activates
- Automatic degradation when MAPE exceeds 40% for 7 consecutive days
- All forecasts include confidence intervals (p25/p50/p75/p95)

## Prediction Triggers
- Session Start: generate session-scope workload forecast
- 3-Task Threshold: refine trajectory with actual session data
- Daily (06:00): 24-hour forecast + cache pre-warming execution
- Weekly (Sunday 22:00): 7-day forecast + similar-week matching
- Budget Alert: proactive warning when forecast exceeds 70% budget

## Integration Points
- Reads: Agent Economics (PRD 10), Conductor (PRD 2), Memory (PRD 4),
  Outcome Measurement (PRD 15)
- Writes: Memory Dashboard (PRD 6), Memory System (PRD 4)
- Subscribes/Publishes: Event-Driven (PRD 13)
- Advises: Context Guard (PRD 3), n8n workflows

## Requirements
12 requirements (REQ-PS-001 through REQ-PS-012). All MUST-priority items
are non-negotiable: temporal demand curves, session trajectory analysis,
model routing advisories, daily cache pre-warming, cost confidence intervals,
advisory-only mode, statistical-only methods, and automatic degradation.