Context Relevance Pre-Parser

Overview

The Context Relevance Pre-Parser is a deterministic, rule-based service that optimizes context injection by selectively excluding irrelevant context nodes based on the current cognitive mode and conversation signals. This reduces unnecessary I/O, token usage, and latency without sacrificing response quality.

Motivation

Previously, every response generation retrieved and injected ALL context nodes (episodic memory, identity, user traits, facts, gists, focus, tools, skills, etc.) into every prompt — regardless of whether the mode-specific template even referenced them.

Example Waste

An ACKNOWLEDGE for “Hey!” would trigger:

PostgreSQL vector search for episodic memory
Redis reads for facts, gists, working memory
Skill registry queries

None of which the ACKNOWLEDGE template even uses.

Expected Savings

Mode	I/O Skipped	Token Savings
ACKNOWLEDGE	5 Redis reads, 1 PG vector search, skill queries	~1500-3000
CLARIFY (warm)	1 PG vector search, skill queries	~500-1500
RESPOND (greeting)	1 PG vector search, focus queries	~800-2000
ACT	Identity/trait lookups	~300-800

Pre-parser execution: < 0.5ms (pure dict lookups).

Architecture

Seven-Layer Pipeline

The service applies seven layers in order, each gating context node inclusion:

Template Masks — Static per-mode. Excludes nodes the template doesn’t reference (hard exclusion).
Signal Rules — Conditional. Excludes nodes that the template references but are irrelevant given current signals. Each rule specifies strength: "hard" or "soft".
Urgency Overrides — When classification.urgency == 'high', force-include working_memory, world_state, facts for broader awareness.
Soft Exclusion Recovery — Soft-excluded nodes get re-included if token budget has headroom, in configurable priority order.
Dependency Resolution — If a node is included, its dependencies are auto-included (e.g., including episodic_memory auto-includes gists).
Safety Overrides — Force-includes nodes that must always be present under certain conditions (e.g., identity when returning from silence).
Safeguard — MAX_INCLUDED_NODES = 12. Logs warning if exceeded to protect prompt integrity.

Context Nodes

All context nodes supported:

identity_context
onboarding_nudge
user_traits
communication_style
active_lists
client_context
focus
working_memory
facts
gists
episodic_memory
act_history
available_skills
available_tools
world_state
warm_return_hint
identity_modulation

Configuration

File Location

backend/configs/agents/context-relevance.json

Configuration Structure

Template Masks

Static per-mode inclusion decisions. Excludes nodes the template doesn’t even reference.

{
  "template_masks": {
    "RESPOND": {
      "episodic_memory": true,
      "working_memory": true,
      "facts": true,
      ...
    },
    "ACKNOWLEDGE": {
      "episodic_memory": false,
      "working_memory": false,
      ...
    }
  }
}

Signal Rules

Conditional exclusion rules with strength levels:

{
  "signal_rules": {
    "episodic_memory": [
      {
        "when": {
          "context_warmth_gte": 0.5,
          "working_memory_turns_gte": 2
        },
        "strength": "soft"
      },
      {
        "when": {
          "greeting_pattern": true,
          "prompt_token_count_lt": 6
        },
        "strength": "hard"
      }
    ]
  }
}

Predicates:

Exact match: "key": value
Comparisons: "key_gte": threshold, "key_gt", "key_lte", "key_lt", "key_eq"
Special: "returning_from_silence": true/false

Strengths:

"hard" — Never recovered, even with budget
"soft" — Recoverable if token budget has headroom

Dependencies

Dependency graph; if a child is included, parents auto-include:

{
  "dependencies": {
    "episodic_memory": ["gists"],
    "available_tools": ["available_skills"],
    "onboarding_nudge": ["identity_context"],
    "warm_return_hint": ["identity_context"]
  }
}

Urgency Overrides

Force-include critical nodes when urgent:

{
  "urgency_overrides": ["working_memory", "world_state", "facts"]
}

Safety Overrides

Force-include under specific conditions:

{
  "safety_overrides": {
    "identity_context": [
      { "when": { "returning_from_silence": true } },
      { "when": { "context_warmth_lt": 0.3 } }
    ],
    "working_memory": [
      { "when": { "working_memory_turns_gte": 1 } }
    ]
  }
}

Recovery Parameters

soft_recovery_budget (default: 1500) — Token headroom threshold for re-including soft-excluded nodes
soft_recovery_priority (default: listed order) — Priority order for soft recovery

{
  "soft_recovery_budget": 1500,
  "soft_recovery_priority": [
    "episodic_memory", "working_memory", "world_state", "facts",
    "active_lists", "focus", "gists"
  ]
}

Usage

In Digest Worker

The service is invoked in digest_worker.py before response generation:

from services.context_relevance_service import ContextRelevanceService

# Compute inclusion map
context_relevance_service = ContextRelevanceService()
inclusion_map = context_relevance_service.compute_inclusion_map(
    mode='RESPOND',                    # cognitive mode
    signals=signals,                   # routing signals
    classification=classification,     # topic classification
    returning_from_silence=returning_from_silence,
    token_budget_remaining=4000        # estimated tokens left
)

# Pass to cortex service
response_data = cortex_service.generate_response(
    system_prompt_template=prompt,
    original_prompt=text,
    classification=classification,
    chat_history=chat_history,
    inclusion_map=inclusion_map,       # ← KEY: gates context retrieval
    ...
)

In Frontal Cortex Service

The generate_response() and _inject_parameters() methods gate context retrieval based on inclusion_map:

def _inject_parameters(self, template, ..., inclusion_map=None):
    _include = lambda node: (inclusion_map or {}).get(node, True)

    # Only submit futures for included nodes
    if _include('gists'):
        futures[executor.submit(...)] = 'gists'
    if _include('episodic_memory'):
        futures[executor.submit(...)] = 'episodes'
    ...

    # Only inject placeholders for included nodes
    result = result.replace('', episodic_context if _include('episodic_memory') else '')
    result = result.replace('', facts_context if _include('facts') else '')
    ...

Backward Compatibility: inclusion_map=None defaults to include everything (current behavior).

Observability

Structured Logging

Every context relevance computation logs a structured entry:

[CONTEXT RELEVANCE] mode=CLARIFY | excluded_hard=[focus, available_skills, available_tools, warm_return_hint] |
excluded_soft=[episodic_memory] | recovered_soft=[] | deps_added=[] |
overrides_applied=[urgency] | total_included=9 | est_tokens=2100

Fields:

mode — Cognitive mode
excluded_hard — Hard-excluded nodes (never recovered)
excluded_soft — Soft-excluded nodes (recoverable)
recovered_soft — Soft-excluded nodes that were recovered due to budget
deps_added — Dependencies auto-included
overrides_applied — Overrides applied (urgency, safety)
total_included — Total included nodes
est_tokens — Estimated tokens for included nodes

Warnings

MAX_INCLUDED_NODES exceeded: Logs warning if total included nodes > 12
Circular dependencies: Raises ConfigError at config load time
Config load failure: Falls back to “include all” with warning

Testing

Comprehensive unit tests cover:

Template mask correctness per mode
Signal rule triggers (warm clarify excludes episodic, greeting excludes episodic)
Soft vs hard exclusion behavior
Dependency graph resolution with circular detection
Soft recovery priority ordering
Urgency overrides (force-include when urgent)
Safety overrides (returning_from_silence forces identity)
MAX_INCLUDED_NODES safeguard
Edge cases (unknown mode, missing signals, disabled config)

Run tests:

pytest backend/tests/test_context_relevance_service.py -v

Configuration Tuning

Mode-Specific Optimization

Adjust template masks per mode to match your mode-specific templates:

{
  "template_masks": {
    "CUSTOM_MODE": {
      "episodic_memory": false,
      "facts": true,
      ...
    }
  }
}

Signal-Driven Exclusion

Add new signal rules to exclude context for specific conversation patterns:

{
  "signal_rules": {
    "focus": [
      {
        "when": { "greeting_pattern": true },
        "strength": "soft"
      }
    ]
  }
}

Budget-Aware Recovery

Adjust soft recovery budget based on token model limits:

{
  "soft_recovery_budget": 2000  // Increase headroom for lower-token models
}

Custom Dependencies

Define new dependency relationships:

{
  "dependencies": {
    "new_node": ["existing_node"]
  }
}

Implementation Details

Service Class

backend/services/context_relevance_service.py
ContextRelevanceService — Main service class
compute_inclusion_map() — Core method (returns {node: True/False})

Config File

backend/configs/agents/context-relevance.json — Configuration

Integration Points

backend/workers/digest_worker.py — Calls service before generate_for_mode()
backend/services/frontal_cortex_service.py — Uses inclusion_map in _inject_parameters()

Disabling the Feature

To disable context relevance pre-parsing entirely, set in config:

{
  "enabled": false
}

All context nodes will be included (current behavior). Useful for debugging or when minimal optimization is needed.

Future Enhancements

Machine learning-based rules — Learn signal-to-exclusion mappings from interaction data
Per-user config — Different rules per user based on communication patterns
Dynamic token budget — Estimate remaining tokens from prompt + mode
A/B testing framework — Compare responses with/without context relevance pre-parsing