This document is the single authoritative visual map of how a user message travels through Chalie. Every branch, every storage hit, every LLM call, and every background cycle is shown here.
Legend
β‘ DET β Deterministic (no LLM, <10ms)
π§ LLM β LLM inference call
π₯ R β Redis READ
π€ R β Redis WRITE
π₯ DB β PostgreSQL READ
π€ DB β PostgreSQL WRITE
β± ~Xms β Typical latency
ββββββββββββββββββββββββ
β User Message POST β
β /chat (HTTP) β
ββββββββββββ¬ββββββββββββ
β
ββββββββββββΌββββββββββββ
β SSE Channel opened β
β sse:{request_id} β
β π€ R sse_pending β
ββββββββββββ¬ββββββββββββ
β daemon thread
ββββββββββββΌββββββββββββ
β digest_worker() ββββββ background
ββββββββββββ¬ββββββββββββ
β
ββββββββββββΌββββββββββββ
β PHASE A β
β Ingestion & β
β Context Assembly β
β (see Β§2) β
ββββββββββββ¬ββββββββββββ
β
ββββββββββββΌββββββββββββ
β PHASE B β
β Signal Collection β
β & Triage β
β (see Β§3) β
ββββββββββββ¬ββββββββββββ
β
βββββββββββββ΄βββββββββββββββββββββ
β Triage Branch β
β (CognitiveTriageService) β
ββββ¬ββββββββββββββ¬ββββββββββββββββ
β β β
ββββββββββββββββΌβββ βββββββββΌβββββββ βββββΌβββββββββββββββ
β PATH A β β PATH B β β PATH C β
β Social Exit β β ACT β β β RESPOND / β
β CANCEL/IGNORE/ β β Tool Worker β β CLARIFY / β
β ACKNOWLEDGE β β (RQ Queue) β β ACKNOWLEDGE β
ββββββββ¬βββββββββββ ββββββββ¬βββββββββ ββββββββ¬ββββββββββββ
β β β
ββββββββΌβββββββββββ ββββββββΌβββββββββ ββββββββΌββββββββββββ
β Empty response β β Background β β Mode Router β
β + WM append β β execution β β (Deterministic) β
β π€ R π€ DB β β (see Β§5) β β β Generation β
βββββββββββββββββββ βββββββββββββββββ β (see Β§4) β
ββββββββ¬ββββββββββββ
β
ββββββββΌββββββββββββ
β PHASE D β
β Post-Response β
β Commit (see Β§6)β
ββββββββ¬ββββββββββββ
β
ββββββββΌββββββββββββ
β π€ R pub/sub β
β output:{id} β
β SSE β Client β
ββββββββββββββββββββ
BACKGROUND (always running, independent of user messages):
PATH D ββ Persistent Task Worker (30min Β± jitter) (see Β§7)
PATH E ββ Cognitive Drift Engine (300s, idle-only) (see Β§8)
Runs immediately for every message, before any routing decision.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PHASE A: Context Assembly β
β β
β Step 1 IIP Hook (Identity Promotion) β‘ DET <5ms β
β Regex: "call me X", "my name is X", β¦ β
β Match β π€ R π€ DB (trait + identity) β
β No match β continue β
β β β
β Step 2 Working Memory π₯ R β
β key: wm:{thread_id} (list, 4 turns, 24h TTL) β
β βββββββββββββββββββββββββββββββββββββββββββββββββ β
β Step 3 Gists π₯ R β
β key: gist:{topic} (sorted set, 30min TTL) β
β βββββββββββββββββββββββββββββββββββββββββββββββββ β
β Step 4 Facts π₯ R β
β key: fact:{topic}:{key} (24h TTL) β
β βββββββββββββββββββββββββββββββββββββββββββββββββ β
β Step 5 World State π₯ R β
β key: world_state:{topic} β
β βββββββββββββββββββββββββββββββββββββββββββββββββ β
β Step 6 FOK (Feeling-of-Knowing) score π₯ R β
β key: fok:{topic} (float 0.0β5.0) β
β βββββββββββββββββββββββββββββββββββββββββββββββββ β
β Step 7 Context Warmth β‘ DET β
β warmth = (wm_score + gist_score + world_score) / 3 β
β βββββββββββββββββββββββββββββββββββββββββββββββββ β
β Step 8 Memory Confidence β‘ DET β
β conf = 0.4Γfok + 0.4Γwarmth + 0.2Γdensity β
β is_new_topic β conf *= 0.7 β
β βββββββββββββββββββββββββββββββββββββββββββββββββ β
β Step 9 Session / Focus Tracking π₯π€ R β
β topic_streak:{thread_id} (2h TTL) β
β focus:{thread_id} (auto-infer after N exchanges) β
β Silence gap > 2700s β trigger episodic memory β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
This phase produces the routing decision in two separate layers.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LAYER 1: Intent Classification β‘ DET ~5ms β
β β
β IntentClassifierService β
β Input: text, topic, warmth, memory_confidence, wm_turns β
β Output: { intent_type, complexity, confidence } β
β No external calls β pure heuristics β
ββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ
β LAYER 2: Cognitive Triage β
β CognitiveTriageService (4-step pipeline) β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Step 2a Social Filter β‘ DET ~1ms β β
β β β β
β β Pattern β Result (no LLM, returns immediately) β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β Greeting / positive feedback (short) β ACKNOWLEDGE β β
β β Cancel / nevermind β CANCEL β β
β β Self-resolved / topic drop β IGNORE β β
β β Empty input β IGNORE β β
β β β β
β β If matched βββΊ PATH A (Social Exit) β β
β βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ β
β β not matched β
β βββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ β
β β Step 2b Cognitive Triage LLM π§ LLM ~100-300ms β β
β β β β
β β Config: cognitive-triage.json β β
β β Prompt: cognitive-triage.md β β
β β Model: lightweight (qwen3:4b or smaller) β β
β β Timeout: 500ms (falls back to heuristics on timeout) β β
β β β β
β β Context sent to LLM: β β
β β β’ User text β β
β β β’ Previous mode + tools used β β
β β β’ Tool summaries (from profile service) β β
β β β’ Working memory summary (last 2 turns) β β
β β β’ context_warmth, memory_confidence, gist_count β β
β β β β
β β LLM output (JSON): β β
β β branch: respond | clarify | act β β
β β mode: RESPOND|CLARIFY|ACT|ACKNOWLEDGEβ¦ β β
β β tools: ["tool1", β¦] (up to 3) β β
β β skills: ["recall", β¦] β β
β β confidence_internal: 0.0β1.0 β β
β β confidence_tool_need: 0.0β1.0 β β
β β freshness_risk: 0.0β1.0 β β
β βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ β
β β Step 2c Self-Eval Sanity Check β‘ DET ~1ms β β
β β β β
β β β’ Cap tool list at 3 contextual skills β β
β β β’ Validate skill names β β
β β β’ Factual question detected β may force ACT β β
β β β’ URL in message detected β may force ACT β β
β β β’ Can OVERRIDE LLM result if heuristics detect issues β β
β βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ β
β β Step 2d Triage Calibration Log π€ DB ~1ms β β
β β Table: triage_calibration β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββΌββββββββββββββββββββ
β β β
branch=social branch=act branch=respond
β β β
PATH A PATH B PATH C
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ModeRouterService β‘ DET ~5ms β
β β
β Signal inputs (all already in memory from Phase A/B): β
β context_warmth topic_confidence has_question_mark β
β working_memory_turns fok_score interrogative_words β
β gist_count is_new_topic greeting_pattern β
β fact_count world_state_present explicit_feedback β
β intent_type intent_complexity intent_confidence β
β information_density implicit_reference prompt_token_count β
β β
β Scoring formula (per mode): β
β score[mode] = base_score + Ξ£(weight[signal] Γ signal_value) β
β Anti-oscillation: hysteresis dampening from prior mode β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Tie-breaker? π§ LLM ~100ms β β
β β Triggered when: top-2 scores within effective_margin β β
β β Model: qwen3:4b β β
β β Input: mode descriptions + context summary β β
β β Output: JSON β pick mode A or B β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β π€ DB routing_decisions table β
β Fields: mode, scores, tiebreaker_used, margin, signal_snapshot β
ββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββΌββββββββββββββββββββ
β β β
RESPOND CLARIFY ACKNOWLEDGE
β β β
ββββββββββββββββββββββ΄ββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ
β FrontalCortexService π§ LLM ~500msβ2s β
β β
β Prompt = soul.md + identity-core.md + frontal-cortex-{mode}.md β
β β
β Context injected: β
β β’ Working memory (thread_id) β
β β’ Chat history β
β β’ Assembled context (semantic retrieval) β
β β’ Drift gists (if idle thoughts exist) β
β β’ Context relevance inclusion map (computed dynamically) β
β β
β Config files: β
β RESPOND β frontal-cortex-respond.json β
β CLARIFY β frontal-cortex-clarify.json β
β ACKNOWLEDGE β frontal-cortex.json (base) β
β β
β Output: { response: str, confidence: float, mode: str } β
ββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
Phase D (Β§6)
Triggered when triage branch=respond but mode router selects ACT, or directly from triage branch=act via the internal path in route_and_generate.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ActLoopService β
β Config: cumulative_timeout=60s per_action=10s max_iterations=5 β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Iteration N β β
β β β β
β β 1. Generate action plan π§ LLM β β
β β Prompt: frontal-cortex-act.md β β
β β Input: user text + act_history (prior results) β β
β β Output: [{ type, params, β¦ }, β¦] β β
β β β β
β β 2. Termination check β‘ DET β β
β β β’ Fatigue budget exceeded? β β
β β β’ Cumulative timeout reached? β β
β β β’ Max iterations reached? β β
β β β’ No actions in plan? β β
β β β’ Same action repeated 3Γ in a row? β β
β β If any β exit loop β β
β β β β
β β 3. Execute actions β‘/π§ varies β β
β β ActDispatcherService β β
β β Chains outputs: result[N] β input[N+1] β β
β β Action types: β β
β β recall, memorize, introspect, associate β β
β β schedule, list, focus, persistent_task β β
β β (+ external tools via tool_worker RQ) β β
β β β β
β β 4. Accumulate fatigue β‘ DET β β
β β cost *= (1.0 + fatigue_growth_rate Γ iteration) β β
β β fatigue += cost β β
β β β β
β β 5. Log iteration π€ DB β β
β β Table: cortex_iterations β β
β β Fields: iteration_number, actions_executed, β β
β β execution_time_ms, fatigue, mode β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββΊ repeat if can_continue() β
β β
β After loop terminates: β
β 1. Re-route β terminal mode (force previous_mode='ACT') β
β Mode router (deterministic, skip_tiebreaker=True) β
β Typically selects RESPOND β
β 2. Generate terminal response (FrontalCortex) π§ LLM β
β act_history passed as context β
β All-card actions β skip text (mode='IGNORE') β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Triggered when CognitiveTriageService selects branch=act and specific tools are named.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β _handle_act_triage() β‘ DET β
β β
β 1. Create cycle record π€ DB β
β Table: cortex_iterations β
β Type: 'user_input', source: 'user' β
β β
β 2. Enqueue tool work π€ R (RQ) β
β Queue: tool-queue β
β Payload: β
β cycle_id, topic, text, intent β
β context_snapshot: { warmth, tool_hints, exchange_id } β
β β
β 3. Set SSE pending flag π€ R β
β key: sse_pending:{request_id} TTL=600s β
β Tells /chat endpoint: tool_worker will deliver response β
β β
β 4. Return empty response (digest_worker done) β
ββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
SSE endpoint holds open (polling sse_pending)
β
ββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ
β tool_worker (RQ background process) β
β β
β 1. Dequeue from tool-queue π₯ R (RQ) β
β β
β 2. Get relevant tools π₯ DB β
β From triage_selected_tools, or compute via relevance β
β β
β 3. Dispatch each tool β
β ActDispatcherService (generic, no tool-specific branches) β
β Per-tool timeout enforced β
β Result: { status, result, execution_time } β
β β
β 4. Post-action critic verification π§ LLM (optional) β
β CriticService β lightweight LLM β
β Safe actions: silent correction β
β Consequential actions: pause + escalate to user β
β β
β 5. Log results π€ DB β
β β
β 6. Publish response π€ R (pub/sub) β
β key: output:{request_id} β
β Payload: { metadata: { response, mode, cards, β¦ } } β
ββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
SSE receives pub/sub
β streams cards + text to client
Runs after every response is generated (Paths A, B, C).
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PHASE D: Post-Response Commit β
β β
β Step 1 Append to Working Memory π€ R β
β key: wm:{thread_id} (RPUSH) β
β { role: 'assistant', content, timestamp } β
β Max 4 turns maintained β
β β β
β Step 2 Log interaction event π€ DB β
β Table: interaction_log β
β Fields: event_type='system_response', mode, β
β confidence, generation_time β
β β β
β Step 3 Onboarding state π€ DB β
β SparkStateService β increment exchange count β
β Table: spark_state β
β β β
β Step 4 Encode response event π€ R (async) β
β EventBusService β ENCODE_EVENT β
β Triggers downstream memory consolidation: β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β memory-chunker-queue (RQ) β β
β β β memory_chunker_worker: gist generation π§ LLM β β
β β β π€ R gist:{topic} (sorted set) β β
β β β β
β β episodic-memory-queue (RQ) β β
β β β episodic_memory_worker: episode build π§ LLM β β
β β β π€ DB episodes (with pgvector embedding) β β
β β β β
β β semantic_consolidation_queue (RQ) β β
β β β semantic consolidation: concept extract π§ LLM β β
β β β π€ DB concepts, semantic_relationships β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β Step 5 Publish to SSE π€ R (pub/sub) β
β key: output:{request_id} β
β /chat endpoint receives, streams to client β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Operates completely independently of user messages.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β persistent_task_worker (30min Β± 30% jitter) β
β β
β 1. Expire stale tasks π₯π€ DB β
β Table: persistent_tasks β
β created_at > max_age β mark EXPIRED β
β β
β 2. Pick eligible task (FIFO within priority) π₯ DB β
β State machine: PENDING β RUNNING β COMPLETED β
β β
β 3. Load task + progress π₯ DB β
β persistent_tasks.progress (JSONB) β
β Contains: plan DAG, coverage, step statuses β
β β
β 4. Execution branch: β
β ββββββββββββββββββββ βββββββββββββββββββββββββββββββββ β
β β HAS PLAN DAG? ββYesββΊβ Plan-Aware Execution β β
β ββββββββββ¬ββββββββββ β Ready steps = steps where β β
β β No β all depends_on are DONE β β
β βΌ β Execute each ready step β β
β ββββββββββββββββββββ β via bounded ACT loop β β
β β Flat ACT Loop β βββββββββββββββββββββββββββββββββ β
β β Iterate toward β β
β β goal directly β β
β ββββββββββββββββββββ β
β β
β 5. Bounded ACT Loop (both branches): π§ LLM per iter β
β max_iterations=5, cumulative_timeout=30min β
β Same fatigue model as interactive ACT loop β
β β
β 6. Atomic checkpoint π€ DB β
β persistent_tasks.progress (JSONB, atomic UPDATE) β
β Saves: plan, coverage %, step statuses, last results β
β β
β 7. Coverage check β‘ DET β
β 100% complete β mark COMPLETED β
β β
β 8. Adaptive surfacing (optional) β
β After cycle 2, or coverage jumped > 15% β
β β Proactive message to user β
β β π€ R pub/sub proactive channel β
β β
β PLAN DECOMPOSITION (called on task creation): π§ LLM ~300ms β
β PlanDecompositionService β
β Prompt: plan-decomposition.md β
β Output: { steps: [{ id, description, depends_on: [] }] } β
β Validates: Kahn's cycle detection, quality gates (Jaccard <0.7), β
β confidence > 0.5, step word count 4-30 β
β Stores: persistent_tasks.progress.plan (JSONB) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Runs only when all RQ queues are idle. Mimics the brainβs Default Mode Network.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β cognitive_drift_engine (300s cycles, idle-gated) β
β β
β Preconditions: β‘ DET β
β All queues idle? π₯ R (RQ queue lengths = 0) β
β Recent episodes exist? (lookback 168h) π₯ DB β
β Bail if user is in deep focus π₯ R focus:{thread_id} β
β β
β 1. Seed Selection (weighted random) β‘ DET β
β Decaying 0.35 β Recent 0.25 β Salient 0.15 β
β Insight 0.15 β Random 0.10 β
β Source: π₯ DB episodes table (by category) β
β β
β 2. Spreading Activation (depth β€ 2) β‘ DET β
β π₯ DB semantic_concepts, semantic_relationships β
β π₯π€ R cognitive_drift_activations (sorted set) β
β π₯π€ R cognitive_drift_concept_cooldowns (hash) β
β Collect top 5 activated concepts β
β β
β 3. Thought Synthesis π§ LLM ~100ms β
β Prompt: cognitive-drift.md + soul.md β
β Input: activated concepts + soul axioms β
β Output: thought text β
β β
β 4. Store drift gist π€ R β
β key: gist:{topic} (30min TTL) β
β Will surface in frontal cortex context on next user message β
β β
β 5. Action Decision Routing β‘ DET β
β Scores registered actions: β
β β
β ββββββββββββββββ¬βββββββββββ¬βββββββββββββββββββββββββββββββββββ β
β β Action β Priority β What it does β β
β ββββββββββββββββΌβββββββββββΌβββββββββββββββββββββββββββββββββββ€ β
β β COMMUNICATE β 10 β Push thought to user (deferred) β β
β β SUGGEST β 8 β Tool recommendation β β
β β NURTURE β 7 β Engagement nudge β β
β β PLAN β 7 β Propose persistent task π§ LLM β β
β β SEED_THREAD β 6 β Plant new conversation seed β β
β β REFLECT β 5 β Internal memory consolidation β β
β β NOTHING β 0 β Always available fallback β β
β ββββββββββββββββ΄βββββββββββ΄βββββββββββββββββββββββββββββββββββ β
β β
β Winner selected by score (ties broken by priority) β
β PLAN action β calls PlanDecompositionService π§ LLM β
β β stores in persistent_tasks π€ DB β
β β
β 6. Deferred queue π€ R β
β COMMUNICATE β stores thought for quiet-hours delivery β
β Async: flushes when user returns from absence β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Pattern TTL Read Written by
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
wm:{thread_id} 24h A,C D, tool_worker
gist:{topic} 30min A,C Drift, memory_chunker
fact:{topic}:{key} 24h A Frontal cortex
fok:{topic} β A,B FOK update service
world_state:{topic} β A World state service
topic_streak:{thread_id} 2h A Phase A (focus tracking)
focus:{thread_id} variable A,E FocusSessionService
cognitive_drift_activations β E Drift engine
cognitive_drift_concept_cooldowns β E Drift engine
cognitive_drift_state β E Drift engine
sse_pending:{request_id} 600s /chat _handle_act_triage
output:{request_id} short /chat digest_worker, tool_worker
RQ Queues (Redis-backed):
prompt-queue β β consumer.py β digest_worker
tool-queue β B _handle_act_triage
memory-chunker-queue β D Encode event handler
episodic-memory-queue β D memory_chunker_worker
semantic_consolidation_queue β D episodic_memory_worker
Table When Written When Read
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
routing_decisions Phase C (every message) routing_reflection_service
interaction_log Phase D (every message) observability endpoints
cortex_iterations ACT loop, Path B observability endpoints
episodes memory_chunker (async) frontal_cortex, drift engine
concepts semantic_consolidation (async) drift engine, context assembly
semantic_relationships semantic_consolidation drift engine
user_traits IIP hook, triage calibration identity service
triage_calibration Phase B Step 2d triage_calibration_service
persistent_tasks Path D (task worker) persistent_task_worker
topics Phase A (new topic) topic_classifier
threads session management session_service
chat_history Phase D frontal_cortex
spark_state Phase D onboarding service
place_fingerprints ambient inference place_learning_service
curiosity_threads drift (SEED_THREAD action) curiosity_pursuit_service
Every LLM call in the system, with typical latency and model used.
Service Model Prompt Latency Triggered by
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
TopicClassifierService lightweight topic-classifier.md ~100ms Every message
CognitiveTriageService lightweight cognitive-triage.md ~100-300ms Every message
ModeRouterService (tiebreaker) qwen3:4b mode-tiebreaker.md ~100ms Close scores only
FrontalCortex (RESPOND) primary model soul + respond.md ~500ms-2s Path C
FrontalCortex (CLARIFY) primary model soul + clarify.md ~500ms-2s Path C
FrontalCortex (ACKNOWLEDGE) primary model soul + acknowledge.md ~500ms-2s Path C
FrontalCortex (ACT plan) primary model frontal-cortex-act.md ~500ms-2s Path C ACT loop
FrontalCortex (terminal) primary model mode-specific ~500ms-2s After ACT loop
CriticService lightweight critic.md ~200ms Path B (optional)
CognitiveDrift (thought) lightweight cognitive-drift.md ~100ms Path E
PlanDecompositionService lightweight plan-decomposition.md ~300ms On task creation
memory_chunker_worker lightweight memory-chunker.md ~100ms Phase D async
episodic_memory_worker lightweight episodic-memory.md ~200ms Phase D async
semantic_consolidation lightweight semantic-extract.md ~200ms Phase D async
RoutingReflectionService strong model routing-reflection.md ~1-2s Idle-time only
Deterministic paths (zero LLM):
Path P50 Latency Bottleneck
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
A β Social Exit ~400ms Topic classifier LLM
B β ACT + Tools 5s β 30s+ Tool execution (external)
C β RESPOND 1s β 3s Frontal cortex (primary LLM)
C β CLARIFY 1s β 2s Frontal cortex (primary LLM)
C β ACT Loop 2s β 30s N Γ frontal-cortex-act LLMs
D β Task Worker 30min cycle Background, no user wait
E β Drift 300s cycle Background, no user wait
Component latency breakdown (Path C RESPOND, typical):
Context assembly <10ms ββ Redis reads (all cached)
Intent classify ~5ms ββ Deterministic
Triage LLM ~200ms ββ qwen3:4b
Social filter ~1ms ββ Regex
Mode router ~5ms ββ Math, no LLM
Frontal cortex LLM ~800ms ββ Primary model (varies by provider)
Working memory write <5ms ββ Redis RPUSH
DB event log ~10ms ββ PostgreSQL async-ish
SSE publish ~1ms ββ Redis pub/sub
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Total (typical) ~1.1s
| Principle | Where it shows up in the flow |
|---|---|
| Attention is sacred | Social filter exits in <1ms β never wastes LLM for greetings; ACT fatigue model prevents runaway tool chains |
| Judgment over activity | Two-layer routing: fast social filter first, then LLM triage only if needed; mode router is deterministic not generative |
| Tool agnosticism | ActDispatcherService routes all tools generically β no tool names anywhere in the Phase B/C infrastructure |
| Continuity over transactions | Working memory, gists, episodes, concepts all feed every response; drift gists surface even on next message |
| Single authority | RoutingStabilityRegulator is the only process that mutates router weights (24h cycle); no tug-of-war possible |
Last updated: 2026-02-27. See docs/INDEX.md for the full documentation map.