PRD 14 of 19

Process Knowledge Base.

Structured business process knowledge store that provides agents with decision rules, SOPs, domain constraints, and edge-case catalogs at decision points -- bridging the gap between episodic memory and actionable process intelligence.

Domain Process Intelligence
PRD Number 14 of 19
Requirements 14 (REQ-PK-001 through REQ-PK-014)
Dependencies Memory System, Governance, Conductor
Knowledge Types 4 (Rules, Trees, SOPs, Catalogs)

1Problem Statement

The system currently manages two forms of knowledge effectively: episodic memory (stored in Qdrant as vectors -- past conversations, decisions, outcomes) and development workflows (encoded as skills with step-by-step procedures). However, there is a critical gap between these two layers: structured business process knowledge.

Business process knowledge encompasses the decision rules, standard operating procedures, domain constraints, and edge-case catalogs that govern how work should be done in specific contexts. This is the knowledge that a senior employee carries in their head -- the "when you see X, always do Y" rules, the "this process has 7 steps and you must not skip step 4" procedures, and the "watch out for this edge case that breaks the standard approach" warnings.

Without this structured layer, agents make decisions based on general capabilities and episodic recall. This leads to three systemic problems:

The Process Knowledge Base fills this gap by providing a structured, queryable, version-controlled repository of business process knowledge that agents can consult at decision points. It transforms implicit organizational knowledge into explicit, machine-readable rules that produce consistent, constraint-aware, edge-case-informed decisions.

Knowledge Hierarchy

The system's knowledge architecture has three layers. The bottom layer is raw data: files, code, configurations. The middle layer is episodic memory: past interactions, decisions, and their outcomes stored as vectors in Qdrant. The top layer -- currently missing -- is process knowledge: the distilled rules, procedures, and constraints that should govern agent behavior. The Process Knowledge Base implements this top layer, completing the knowledge hierarchy.

2Architecture

The Process Knowledge Base follows a four-stage pipeline: Knowledge Ingestion receives raw knowledge from multiple sources, the Process Knowledge Store persists structured knowledge in Qdrant with a dedicated collection, the Query Interface provides three MCP tools for agent access, and the output reaches Agents at Decision Points where process knowledge informs real-time choices.

Architecture Diagram

Knowledge Types

The system defines four distinct knowledge types, each with a specific structure and query pattern:

Rules
IF-THEN decision rules with conditions, actions, and exceptions. Used for deterministic decisions at known decision points.
Decision Trees
Multi-step branching logic for complex decisions. Each node represents a condition; each branch represents an action path.
SOPs
Ordered step sequences for standard processes. Each step has prerequisites, actions, validation criteria, and rollback procedures.
Edge Case Catalogs
Collections of known exceptions to standard rules. Each entry describes the trigger pattern, why the standard approach fails, and the correct alternative.

Domain Packages

Process knowledge is organized into domain packages -- self-contained collections of rules, trees, SOPs, and catalogs for a specific operational domain:

Insurance
Claim processing rules, regulatory notification timelines, coverage determination trees, jurisdiction-specific constraints, edge cases for multi-policy scenarios.
Security
Vulnerability triage rules, severity-based SLA trees, incident response SOPs, remediation procedures, edge cases for false positives and chained vulnerabilities.
Infrastructure
Change window rules, deployment SOPs, rollback procedures, capacity planning trees, edge cases for cascading failures and split-brain scenarios.
Development
Code review rules, branching strategy SOPs, dependency update trees, release procedures, edge cases for breaking changes and API versioning.
Governance
Approval gate rules, compliance check SOPs, data classification trees, audit procedures, edge cases for cross-jurisdiction conflicts.
Operations
Monitoring alert rules, escalation SOPs, capacity decision trees, maintenance procedures, edge cases for holiday/weekend coverage gaps.

Core Components

Knowledge Ingestion Pipeline

Four ingestion channels: (1) Manual entry via YAML authoring, (2) Document extraction from existing documentation, (3) Trajectory mining from successful memory trajectories, (4) Session extraction from conversation patterns. Each channel produces structured YAML that enters the human verification gate before persistence.

Process Knowledge Store

Dedicated Qdrant collection (process_knowledge) with vectors generated from knowledge content. Metadata includes: knowledge type, domain package, version, provenance, verification status, and usage count. Supports semantic search (find relevant rules for a situation) and exact lookup (retrieve a specific SOP by ID).

Query Interface (MCP Tools)

Three MCP tools provide agent access: process_query (semantic search for relevant knowledge), process_lookup (exact retrieval by ID or path), and process_validate (check a proposed action against applicable rules). Tools return structured responses with the knowledge item, its provenance, and confidence score.

Versioning and Provenance

Every knowledge item carries full provenance: original source (manual, document, trajectory, session), author, creation date, last modification date, version number, and verification status. Version history is maintained for audit and rollback. Provenance is returned with every query result so agents can assess knowledge reliability.

Human Verification Gate

All knowledge entering the store must pass human verification. Manually authored knowledge is verified at creation time. Automatically extracted knowledge (from trajectories, sessions, documents) enters a verification queue. Unverified knowledge is queryable but flagged as "unverified" in results, ensuring agents can distinguish between authoritative and provisional knowledge.

Knowledge Structure (YAML)

All knowledge items are authored and stored as YAML with a standardized structure. Each item includes metadata (ID, domain, type, version, provenance), the knowledge content itself (structured according to type -- rule conditions/actions, tree nodes/branches, SOP steps, catalog entries), applicability conditions (when this knowledge applies), and cross-references (related knowledge items, superseded items). The YAML source of truth enables version control, diffing, code review, and programmatic manipulation.

3Requirements

ID Requirement Priority Acceptance Criteria
REQ-PK-001 Knowledge store must support exactly 4 knowledge types: rules, decision trees, SOPs, and edge case catalogs HIGH Each type has a defined YAML schema; items that do not conform to a type schema are rejected at ingestion
REQ-PK-002 Knowledge must be organized into domain packages with defined boundaries HIGH Six initial domains (insurance, security, infrastructure, development, governance, operations); queries can be scoped to a specific domain
REQ-PK-003 YAML must be the source of truth for all process knowledge HIGH Knowledge is authored in YAML, version controlled, and loaded into Qdrant from YAML; Qdrant is a query index, not the source of truth
REQ-PK-004 Dedicated Qdrant collection for process knowledge, separate from episodic memory HIGH Collection process_knowledge exists independently; queries to process knowledge do not return episodic memories and vice versa
REQ-PK-005 Three MCP tools for agent access: process_query, process_lookup, process_validate HIGH process_query returns semantically similar knowledge; process_lookup returns exact items by ID; process_validate checks actions against rules
REQ-PK-006 Human verification gate required for all knowledge entering the store HIGH Manually authored knowledge verified at creation; extracted knowledge enters verification queue; unverified items flagged in query results
REQ-PK-007 Every knowledge item must carry mandatory provenance metadata HIGH Provenance includes: source type, author, creation date, modification date, version, verification status; provenance returned with every query result
REQ-PK-008 Knowledge ingestion pipeline must support 4 channels: manual, document extraction, trajectory mining, session extraction MEDIUM Manual channel operational at launch; extraction channels added incrementally; all channels produce standardized YAML output
REQ-PK-009 Version history must be maintained for all knowledge items with rollback capability MEDIUM Each modification creates a new version; previous versions retrievable by version number; rollback restores a specific version as current
REQ-PK-010 Knowledge items must support cross-references to related and superseded items MEDIUM Cross-references resolved at query time; superseded items return their replacement; related items returned as suggestions
REQ-PK-011 process_validate tool must check proposed actions against all applicable rules and return violations HIGH Given an action description and domain context, returns: applicable rules, pass/fail per rule, violation details, suggested corrections
REQ-PK-012 Usage tracking must record which knowledge items are queried, by which agents, and for which decisions LOW Usage data feeds into knowledge maintenance: high-usage items prioritized for verification; zero-usage items flagged for review
REQ-PK-013 Knowledge query latency must be under 200ms for semantic search and under 50ms for exact lookup MEDIUM p95 latency measured across 1,000 queries; performance degrades gracefully with collection size up to 10,000 items
REQ-PK-014 Edge case catalogs must link to the original discovery context (trajectory or session) MEDIUM Each edge case entry includes a reference to the source trajectory or session; reference is resolvable through the memory system

4Design Decisions

5Integration Points

System PRD Integration Type Description
Memory System PRD 4 Bidirectional Process Knowledge Base uses the same Qdrant infrastructure as the memory system but with a dedicated collection. Trajectory mining extracts candidate knowledge from successful memory trajectories. Edge case catalogs link back to original discovery contexts stored in episodic memory.
Governance Framework PRD 5 Knowledge Consumer Governance agents query the process knowledge base for compliance rules, approval requirements, and jurisdiction-specific constraints. The process_validate tool is used by governance gates to check proposed actions against domain rules before approval.
Conductor Orchestrator PRD 2 Decision Support The conductor queries process knowledge during tier classification and agent routing to apply domain-specific rules. SOPs inform phase sequencing. Decision trees guide complex routing decisions that depend on project context.
Obsidian Vault External Source Material Existing documentation in the Obsidian vault serves as source material for the document extraction ingestion channel. SOPs, procedures, and domain notes in the vault can be transformed into structured YAML knowledge items.
Markdown-for-Agents PRD 7 Format Alignment Knowledge items follow the markdown-for-agents formatting standards when presented to agents, ensuring consistent rendering and parsing. YAML frontmatter on knowledge items aligns with the agent-readable markdown specification.
Context Guard PRD 3 Budget Coordinator Process knowledge query results consume context budget. Context Guard tracks knowledge retrieval size and ensures that large SOPs or decision trees do not exceed context limits. Knowledge items include a size estimate for budget planning.
Event-Driven Automation PRD 13 Event Consumer Knowledge ingestion events (new item, verification status change, version update) are emitted to the event system. Downstream workflows can react to knowledge changes -- for example, notifying domain experts when new unverified knowledge enters their domain package.

6Prompt to Build It

The following prompt can be used to instruct Claude Code to build the Process Knowledge Base. It encapsulates the full specification from this PRD.

Build a Process Knowledge Base that provides structured business process knowledge to agents at decision points. The system bridges the gap between episodic memory (what happened) and process intelligence (what should happen).

Core components to implement:

1. KNOWLEDGE STRUCTURE — 4 knowledge types with YAML schemas:
   - Rules: IF-THEN decision rules with conditions, actions, exceptions, and applicability scope
   - Decision Trees: Multi-step branching logic with condition nodes, action branches, and terminal states
   - SOPs: Ordered step sequences with prerequisites, actions, validation criteria, and rollback procedures per step
   - Edge Case Catalogs: Collections of known exceptions with trigger patterns, why standard approach fails, and correct alternatives
   Each item includes mandatory metadata: ID, domain, type, version, provenance (source, author, dates, verification status), applicability conditions, and cross-references.

2. DOMAIN PACKAGES — 6 initial domains:
   - Insurance: claim rules, regulatory timelines, coverage trees, jurisdiction constraints, multi-policy edge cases
   - Security: vulnerability triage, severity SLAs, incident response SOPs, false positive edge cases
   - Infrastructure: change windows, deployment SOPs, rollback procedures, cascading failure edge cases
   - Development: code review rules, branching SOPs, dependency update trees, breaking change edge cases
   - Governance: approval gate rules, compliance SOPs, data classification trees, cross-jurisdiction edge cases
   - Operations: monitoring alert rules, escalation SOPs, capacity trees, coverage gap edge cases

3. KNOWLEDGE INGESTION PIPELINE — 4 channels:
   - Manual: Direct YAML authoring by domain experts
   - Document extraction: Parse existing documentation into structured YAML
   - Trajectory mining: Extract candidate rules/SOPs from successful memory trajectories
   - Session extraction: Identify process patterns from conversation history
   All channels produce standardized YAML. ALL ingested knowledge enters a human verification queue. Unverified knowledge is queryable but flagged.

4. QUERY INTERFACE — 3 MCP tools:
   - process_query: Semantic search against the process_knowledge Qdrant collection. Parameters: query text, domain filter (optional), knowledge type filter (optional), limit. Returns: matching items with content, provenance, confidence score.
   - process_lookup: Exact retrieval by ID or path. Parameters: item ID or domain/path. Returns: full knowledge item with all metadata and version history.
   - process_validate: Check a proposed action against applicable rules. Parameters: action description, domain context. Returns: applicable rules, pass/fail per rule, violation details, suggested corrections.

5. VERSIONING AND PROVENANCE:
   - YAML files are the source of truth, version-controlled in git
   - Qdrant collection (process_knowledge) is a query index rebuilt from YAML
   - Each modification creates a new version; previous versions retrievable
   - Provenance is mandatory: source type, author, creation date, modification date, version number, verification status
   - Cross-references: related items, superseded items (superseded items return their replacement)
   - Usage tracking: which items queried, by which agents, for which decisions

Design constraints:
- YAML source of truth — Qdrant is a query index, not the primary store
- Exactly 4 knowledge types — no freeform knowledge
- Dedicated Qdrant collection — separate from episodic memory
- Human verification gate — mandatory for all ingested knowledge
- MCP tools over direct Qdrant access — stable interface, usage logging, result enrichment
- Mandatory provenance — every item, every query result
- Query latency: under 200ms for semantic search, under 50ms for exact lookup
- Edge case catalogs link to original discovery context in episodic memory

File locations:
- Knowledge YAML: ~/.claude/knowledge/{domain}/{type}/*.yaml
- Schemas: ~/.claude/knowledge/schemas/
- Ingestion pipeline: ~/.claude/knowledge/ingestion/
- MCP tools: ~/.claude/knowledge/tools/
- Verification queue: ~/.claude/knowledge/verification/

Integration points:
- Memory system (shared Qdrant infrastructure, trajectory mining, edge case links)
- Governance framework (compliance rules, approval requirements, jurisdiction constraints)
- Conductor orchestrator (domain-specific routing rules, phase sequencing SOPs)
- Obsidian vault (source material for document extraction)
- Context Guard (knowledge retrieval budget coordination)
- Event-driven automation (emit knowledge change events)

7Implementation Notes

Phased Delivery

The Process Knowledge Base should be delivered in three phases:

YAML Schema Design

Each knowledge type has a defined YAML schema that enforces structure while allowing domain-specific flexibility:

Qdrant Collection Configuration

The process_knowledge collection uses the same embedding model as episodic memory for consistency but with different indexing parameters optimized for the knowledge retrieval pattern. Knowledge items are typically shorter and more structured than episodic memories, so a smaller vector size may be appropriate. The collection supports filtered search by domain and knowledge type, enabling scoped queries that reduce result noise.

Verification Queue Workflow

Automatically extracted knowledge enters the verification queue as a YAML file in the verification/ directory with metadata indicating its extraction source and confidence score. A periodic review process (manual or via n8n workflow notification) presents pending items to domain experts for approval, rejection, or modification. Approved items move to the domain package directory and are indexed in Qdrant. Rejected items are archived with the rejection reason. Modified items are updated in-place before approval.

Performance Optimization

Semantic search (process_query) must return results within 200ms. The primary optimization is pre-filtering by domain before vector search, which reduces the search space by approximately 6x (one domain out of six). For the process_validate tool, rule evaluation is structured as a parallel scan of all rules in the applicable domain, with short-circuit evaluation for rules that cannot apply based on the action's context. At the expected collection size (100-500 items per domain), these optimizations maintain sub-200ms latency with margin.

Knowledge Lifecycle

Knowledge items follow a defined lifecycle: Draft (authored but not verified) -> Verified (human-approved, active) -> Superseded (replaced by a newer version, returns redirect) -> Archived (no longer applicable, hidden from default queries). Usage tracking identifies stale knowledge (verified items with zero queries over 90 days), which is flagged for review. The lifecycle ensures the knowledge base remains current and relevant rather than accumulating outdated rules.

Security Considerations

Some knowledge items may contain sensitive information (security remediation procedures, compliance requirements with regulatory references, infrastructure topology details). Domain packages support an access level attribute (public, internal, restricted) that the MCP tools enforce at query time. Restricted knowledge is only returned to agents operating within a conductor workflow that has the appropriate security clearance, as determined by the governance framework.