The Context Relevance Pre-Parser is a deterministic, rule-based service that optimizes context injection by selectively excluding irrelevant context nodes based on the current cognitive mode and conversation signals. This reduces unnecessary I/O, token usage, and latency without sacrificing response quality.
Previously, every response generation retrieved and injected ALL context nodes (episodic memory, identity, user traits, facts, gists, focus, tools, skills, etc.) into every prompt — regardless of whether the mode-specific template even referenced them.
An ACKNOWLEDGE for “Hey!” would trigger:
None of which the ACKNOWLEDGE template even uses.
| Mode | I/O Skipped | Token Savings |
|---|---|---|
| ACKNOWLEDGE | 5 Redis reads, 1 PG vector search, skill queries | ~1500-3000 |
| CLARIFY (warm) | 1 PG vector search, skill queries | ~500-1500 |
| RESPOND (greeting) | 1 PG vector search, focus queries | ~800-2000 |
| ACT | Identity/trait lookups | ~300-800 |
Pre-parser execution: < 0.5ms (pure dict lookups).
The service applies seven layers in order, each gating context node inclusion:
strength: "hard" or "soft".classification.urgency == 'high', force-include working_memory, world_state, facts for broader awareness.episodic_memory auto-includes gists).MAX_INCLUDED_NODES = 12. Logs warning if exceeded to protect prompt integrity.All context nodes supported:
identity_contextonboarding_nudgeuser_traitscommunication_styleactive_listsclient_contextfocusworking_memoryfactsgistsepisodic_memoryact_historyavailable_skillsavailable_toolsworld_statewarm_return_hintidentity_modulationbackend/configs/agents/context-relevance.json
Static per-mode inclusion decisions. Excludes nodes the template doesn’t even reference.
{
"template_masks": {
"RESPOND": {
"episodic_memory": true,
"working_memory": true,
"facts": true,
...
},
"ACKNOWLEDGE": {
"episodic_memory": false,
"working_memory": false,
...
}
}
}
Conditional exclusion rules with strength levels:
{
"signal_rules": {
"episodic_memory": [
{
"when": {
"context_warmth_gte": 0.5,
"working_memory_turns_gte": 2
},
"strength": "soft"
},
{
"when": {
"greeting_pattern": true,
"prompt_token_count_lt": 6
},
"strength": "hard"
}
]
}
}
Predicates:
"key": value"key_gte": threshold, "key_gt", "key_lte", "key_lt", "key_eq""returning_from_silence": true/falseStrengths:
"hard" — Never recovered, even with budget"soft" — Recoverable if token budget has headroomDependency graph; if a child is included, parents auto-include:
{
"dependencies": {
"episodic_memory": ["gists"],
"available_tools": ["available_skills"],
"onboarding_nudge": ["identity_context"],
"warm_return_hint": ["identity_context"]
}
}
Force-include critical nodes when urgent:
{
"urgency_overrides": ["working_memory", "world_state", "facts"]
}
Force-include under specific conditions:
{
"safety_overrides": {
"identity_context": [
{ "when": { "returning_from_silence": true } },
{ "when": { "context_warmth_lt": 0.3 } }
],
"working_memory": [
{ "when": { "working_memory_turns_gte": 1 } }
]
}
}
soft_recovery_budget (default: 1500) — Token headroom threshold for re-including soft-excluded nodessoft_recovery_priority (default: listed order) — Priority order for soft recovery{
"soft_recovery_budget": 1500,
"soft_recovery_priority": [
"episodic_memory", "working_memory", "world_state", "facts",
"active_lists", "focus", "gists"
]
}
The service is invoked in digest_worker.py before response generation:
from services.context_relevance_service import ContextRelevanceService
# Compute inclusion map
context_relevance_service = ContextRelevanceService()
inclusion_map = context_relevance_service.compute_inclusion_map(
mode='RESPOND', # cognitive mode
signals=signals, # routing signals
classification=classification, # topic classification
returning_from_silence=returning_from_silence,
token_budget_remaining=4000 # estimated tokens left
)
# Pass to cortex service
response_data = cortex_service.generate_response(
system_prompt_template=prompt,
original_prompt=text,
classification=classification,
chat_history=chat_history,
inclusion_map=inclusion_map, # ← KEY: gates context retrieval
...
)
The generate_response() and _inject_parameters() methods gate context retrieval based on inclusion_map:
def _inject_parameters(self, template, ..., inclusion_map=None):
_include = lambda node: (inclusion_map or {}).get(node, True)
# Only submit futures for included nodes
if _include('gists'):
futures[executor.submit(...)] = 'gists'
if _include('episodic_memory'):
futures[executor.submit(...)] = 'episodes'
...
# Only inject placeholders for included nodes
result = result.replace('', episodic_context if _include('episodic_memory') else '')
result = result.replace('', facts_context if _include('facts') else '')
...
Backward Compatibility: inclusion_map=None defaults to include everything (current behavior).
Every context relevance computation logs a structured entry:
[CONTEXT RELEVANCE] mode=CLARIFY | excluded_hard=[focus, available_skills, available_tools, warm_return_hint] |
excluded_soft=[episodic_memory] | recovered_soft=[] | deps_added=[] |
overrides_applied=[urgency] | total_included=9 | est_tokens=2100
Fields:
mode — Cognitive modeexcluded_hard — Hard-excluded nodes (never recovered)excluded_soft — Soft-excluded nodes (recoverable)recovered_soft — Soft-excluded nodes that were recovered due to budgetdeps_added — Dependencies auto-includedoverrides_applied — Overrides applied (urgency, safety)total_included — Total included nodesest_tokens — Estimated tokens for included nodesConfigError at config load timeComprehensive unit tests cover:
Run tests:
pytest backend/tests/test_context_relevance_service.py -v
Adjust template masks per mode to match your mode-specific templates:
{
"template_masks": {
"CUSTOM_MODE": {
"episodic_memory": false,
"facts": true,
...
}
}
}
Add new signal rules to exclude context for specific conversation patterns:
{
"signal_rules": {
"focus": [
{
"when": { "greeting_pattern": true },
"strength": "soft"
}
]
}
}
Adjust soft recovery budget based on token model limits:
{
"soft_recovery_budget": 2000 // Increase headroom for lower-token models
}
Define new dependency relationships:
{
"dependencies": {
"new_node": ["existing_node"]
}
}
backend/services/context_relevance_service.pyContextRelevanceService — Main service classcompute_inclusion_map() — Core method (returns {node: True/False})backend/configs/agents/context-relevance.json — Configurationbackend/workers/digest_worker.py — Calls service before generate_for_mode()backend/services/frontal_cortex_service.py — Uses inclusion_map in _inject_parameters()To disable context relevance pre-parsing entirely, set in config:
{
"enabled": false
}
All context nodes will be included (current behavior). Useful for debugging or when minimal optimization is needed.