Context Relevance Pre-Parser

Overview

The Context Relevance Pre-Parser is a deterministic, rule-based service that optimizes context injection by selectively excluding irrelevant context nodes based on the current cognitive mode and conversation signals. This reduces unnecessary I/O, token usage, and latency without sacrificing response quality.

Motivation

Previously, every response generation retrieved and injected ALL context nodes (episodic memory, identity, user traits, facts, gists, focus, tools, skills, etc.) into every prompt — regardless of whether the mode-specific template even referenced them.

Example Waste

An ACKNOWLEDGE for “Hey!” would trigger:

None of which the ACKNOWLEDGE template even uses.

Expected Savings

Mode I/O Skipped Token Savings
ACKNOWLEDGE 5 Redis reads, 1 PG vector search, skill queries ~1500-3000
CLARIFY (warm) 1 PG vector search, skill queries ~500-1500
RESPOND (greeting) 1 PG vector search, focus queries ~800-2000
ACT Identity/trait lookups ~300-800

Pre-parser execution: < 0.5ms (pure dict lookups).

Architecture

Seven-Layer Pipeline

The service applies seven layers in order, each gating context node inclusion:

  1. Template Masks — Static per-mode. Excludes nodes the template doesn’t reference (hard exclusion).
  2. Signal Rules — Conditional. Excludes nodes that the template references but are irrelevant given current signals. Each rule specifies strength: "hard" or "soft".
  3. Urgency Overrides — When classification.urgency == 'high', force-include working_memory, world_state, facts for broader awareness.
  4. Soft Exclusion Recovery — Soft-excluded nodes get re-included if token budget has headroom, in configurable priority order.
  5. Dependency Resolution — If a node is included, its dependencies are auto-included (e.g., including episodic_memory auto-includes gists).
  6. Safety Overrides — Force-includes nodes that must always be present under certain conditions (e.g., identity when returning from silence).
  7. SafeguardMAX_INCLUDED_NODES = 12. Logs warning if exceeded to protect prompt integrity.

Context Nodes

All context nodes supported:

Configuration

File Location

backend/configs/agents/context-relevance.json

Configuration Structure

Template Masks

Static per-mode inclusion decisions. Excludes nodes the template doesn’t even reference.

{
  "template_masks": {
    "RESPOND": {
      "episodic_memory": true,
      "working_memory": true,
      "facts": true,
      ...
    },
    "ACKNOWLEDGE": {
      "episodic_memory": false,
      "working_memory": false,
      ...
    }
  }
}

Signal Rules

Conditional exclusion rules with strength levels:

{
  "signal_rules": {
    "episodic_memory": [
      {
        "when": {
          "context_warmth_gte": 0.5,
          "working_memory_turns_gte": 2
        },
        "strength": "soft"
      },
      {
        "when": {
          "greeting_pattern": true,
          "prompt_token_count_lt": 6
        },
        "strength": "hard"
      }
    ]
  }
}

Predicates:

Strengths:

Dependencies

Dependency graph; if a child is included, parents auto-include:

{
  "dependencies": {
    "episodic_memory": ["gists"],
    "available_tools": ["available_skills"],
    "onboarding_nudge": ["identity_context"],
    "warm_return_hint": ["identity_context"]
  }
}

Urgency Overrides

Force-include critical nodes when urgent:

{
  "urgency_overrides": ["working_memory", "world_state", "facts"]
}

Safety Overrides

Force-include under specific conditions:

{
  "safety_overrides": {
    "identity_context": [
      { "when": { "returning_from_silence": true } },
      { "when": { "context_warmth_lt": 0.3 } }
    ],
    "working_memory": [
      { "when": { "working_memory_turns_gte": 1 } }
    ]
  }
}

Recovery Parameters

{
  "soft_recovery_budget": 1500,
  "soft_recovery_priority": [
    "episodic_memory", "working_memory", "world_state", "facts",
    "active_lists", "focus", "gists"
  ]
}

Usage

In Digest Worker

The service is invoked in digest_worker.py before response generation:

from services.context_relevance_service import ContextRelevanceService

# Compute inclusion map
context_relevance_service = ContextRelevanceService()
inclusion_map = context_relevance_service.compute_inclusion_map(
    mode='RESPOND',                    # cognitive mode
    signals=signals,                   # routing signals
    classification=classification,     # topic classification
    returning_from_silence=returning_from_silence,
    token_budget_remaining=4000        # estimated tokens left
)

# Pass to cortex service
response_data = cortex_service.generate_response(
    system_prompt_template=prompt,
    original_prompt=text,
    classification=classification,
    chat_history=chat_history,
    inclusion_map=inclusion_map,       # ← KEY: gates context retrieval
    ...
)

In Frontal Cortex Service

The generate_response() and _inject_parameters() methods gate context retrieval based on inclusion_map:

def _inject_parameters(self, template, ..., inclusion_map=None):
    _include = lambda node: (inclusion_map or {}).get(node, True)

    # Only submit futures for included nodes
    if _include('gists'):
        futures[executor.submit(...)] = 'gists'
    if _include('episodic_memory'):
        futures[executor.submit(...)] = 'episodes'
    ...

    # Only inject placeholders for included nodes
    result = result.replace('', episodic_context if _include('episodic_memory') else '')
    result = result.replace('', facts_context if _include('facts') else '')
    ...

Backward Compatibility: inclusion_map=None defaults to include everything (current behavior).

Observability

Structured Logging

Every context relevance computation logs a structured entry:

[CONTEXT RELEVANCE] mode=CLARIFY | excluded_hard=[focus, available_skills, available_tools, warm_return_hint] |
excluded_soft=[episodic_memory] | recovered_soft=[] | deps_added=[] |
overrides_applied=[urgency] | total_included=9 | est_tokens=2100

Fields:

Warnings

Testing

Comprehensive unit tests cover:

Run tests:

pytest backend/tests/test_context_relevance_service.py -v

Configuration Tuning

Mode-Specific Optimization

Adjust template masks per mode to match your mode-specific templates:

{
  "template_masks": {
    "CUSTOM_MODE": {
      "episodic_memory": false,
      "facts": true,
      ...
    }
  }
}

Signal-Driven Exclusion

Add new signal rules to exclude context for specific conversation patterns:

{
  "signal_rules": {
    "focus": [
      {
        "when": { "greeting_pattern": true },
        "strength": "soft"
      }
    ]
  }
}

Budget-Aware Recovery

Adjust soft recovery budget based on token model limits:

{
  "soft_recovery_budget": 2000  // Increase headroom for lower-token models
}

Custom Dependencies

Define new dependency relationships:

{
  "dependencies": {
    "new_node": ["existing_node"]
  }
}

Implementation Details

Service Class

Config File

Integration Points

Disabling the Feature

To disable context relevance pre-parsing entirely, set in config:

{
  "enabled": false
}

All context nodes will be included (current behavior). Useful for debugging or when minimal optimization is needed.

Future Enhancements