MESSAGE FLOW
Message Flow β Complete Routing Reference
This document is the single authoritative visual map of how a user message travels through Chalie. Every branch, every storage hit, every LLM call, and every background cycle is shown here.
Legend
β‘ DET β Deterministic (no LLM, <10ms)
π§ LLM β LLM inference call
π₯ M β MemoryStore READ
π€ M β MemoryStore WRITE
π₯ DB β SQLite READ
π€ DB β SQLite WRITE
β± ~Xms β Typical latency
1. Master Overview β All Possible Paths
ββββββββββββββββββββββββ
β User Message via β
β /ws (WebSocket) β
ββββββββββββ¬ββββββββββββ
β daemon thread
ββββββββββββΌββββββββββββ
β digest_worker() ββββββ background
ββββββββββββ¬ββββββββββββ
β
ββββββββββββΌββββββββββββ
β PHASE A β
β Ingestion & β
β Context Assembly β
β (see Β§2) β
ββββββββββββ¬ββββββββββββ
β
ββββββββββββΌββββββββββββ
β PHASE B β
β Signal Collection β
β & Unified Path β
β (see Β§3) β
ββββββββββββ¬ββββββββββββ
β
βββββββββββββ΄βββββββββββββββ
β Unified Generation β
β (unified_generate) β
ββββ¬ββββββββββββββββββββββββ
β
ββββββββββββββββΌβββββββββββββββββββββββββββββββ
β Single LLM call β LLM decides: β
β β’ Respond directly (Format B) β
β β’ Invoke skills/tools first (Format A) β
β β’ CANCEL / empty β fast exit β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββ
β
ββββββββΌββββββββββββ
β PHASE D β
β Post-Response β
β Commit (see Β§5)β
ββββββββ¬ββββββββββββ
β
ββββββββΌββββββββββββ
β π€ M pub/sub β
β output:{id} β
β WS β Client β
ββββββββββββββββββββ
BACKGROUND (always running, independent of user messages):
PATH D ββ Persistent Task Worker (30min Β± jitter) (see Β§5)
PATH E ββ Reasoning Loop (600s, idle-only) (see Β§7)
2. Phase A β Ingestion & Context Assembly
Runs immediately for every message, before any routing decision.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PHASE A: Context Assembly β
β β
β Step 1 IIP Hook (Identity Promotion) β‘ DET <5ms β
β Regex: "call me X", "my name is X", β¦ β
β Match β π€ M π€ DB (trait + identity) β
β No match β continue β
β β β
β Step 2 Working Memory (transcript + compaction) π₯ DB β
β topic_compactions + topic_transcript (budget-aware) β
β βββββββββββββββββββββββββββββββββββββββββββββββββ β
β Step 3 Gists π₯ M β
β key: gist:{topic} (sorted set, 30min TTL) β
β βββββββββββββββββββββββββββββββββββββββββββββββββ β
β Step 4 Facts π₯ M β
β key: fact:{topic}:{key} (24h TTL) β
β βββββββββββββββββββββββββββββββββββββββββββββββββ β
β Step 5 World State π₯ M β
β key: world_state:{topic} β
β βββββββββββββββββββββββββββββββββββββββββββββββββ β
β Step 6 FOK (Feeling-of-Knowing) score π₯ M β
β key: fok:{topic} (float 0.0β5.0) β
β βββββββββββββββββββββββββββββββββββββββββββββββββ β
β Step 7 Context Warmth β‘ DET β
β warmth = (wm_score + world_score) / 2 β
β βββββββββββββββββββββββββββββββββββββββββββββββββ β
β Step 8 Memory Confidence β‘ DET β
β conf = 0.4Γfok + 0.4Γwarmth + 0.2Γdensity β
β is_new_topic β conf *= 0.7 β
β βββββββββββββββββββββββββββββββββββββββββββββββββ β
β Step 9 Session / Focus Tracking π₯π€ M β
β topic_streak:{thread_id} (2h TTL) β
β focus:{thread_id} (auto-infer after N exchanges) β
β Silence gap > 2700s β trigger episodic memory β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3. Phase B β Signal Collection & Unified Generation
User messages go through a single unified LLM call. No mode gate, no UNIFIED/ACT routing split.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LAYER 1: NLP Signal Collection β‘ DET <1ms β
β β
β compute_nlp_signals() β
β Input: text β
β Output: { has_question_mark, interrogative_words, greeting_pattern, β
β explicit_feedback, information_density, implicit_reference}β
β No external calls β pure regex/heuristics β
ββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ
β LAYER 2: Unified Generation π§ LLM β
β unified_generate() β
β β
β Single LLM call with discoverable skills/tools. β
β The LLM decides whether to: β
β β’ Respond directly (Format B β conversational response) β
β β’ Invoke skills/tools first (Format A β action + synthesis) β
β β
β Empty input and CANCEL patterns handled inline (fast exit). β
β Context relevance pre-parser selects which context nodes to inject.β
β β
β Config: frontal-cortex-unified.json β
β Prompt: frontal-cortex-unified.md β
ββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
Phase D (Β§6)
4. Mode Router β Non-User Flows (Drift, Proactive, Fallback)
4a. Mode Router (Deterministic)
Used only for non-user flows (cognitive drift, proactive notifications, fallback). User messages bypass this entirely via unified_generate.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ModeRouterService β‘ DET ~5ms β
β β
β Signal inputs (all already in memory from Phase A/B): β
β context_warmth topic_confidence has_question_mark β
β working_memory_turns fok_score interrogative_words β
β gist_count is_new_topic greeting_pattern β
β fact_count world_state_present explicit_feedback β
β intent_type intent_complexity intent_confidence β
β information_density implicit_reference prompt_token_count β
β β
β Scoring formula (per mode): β
β score[mode] = base_score + Ξ£(weight[signal] Γ signal_value) β
β Anti-oscillation: hysteresis dampening from prior mode β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Tie-breaker? β‘ ONNX ~5ms β β
β β Triggered when: top-2 scores within effective_margin β β
β β Model: mode-tiebreaker (ONNX classifier) β β
β β Output: pick mode A or B β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
UNIFIED
β
ββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ
β FrontalCortexService π§ LLM ~500msβ2s β
β β
β Prompt = soul.md + identity-core.md + frontal-cortex-{mode}.md β
β β
β Context injected: β
β β’ Working memory (thread_id) β
β β’ Chat history β
β β’ Assembled context (semantic retrieval) β
β β’ Drift gists (if idle thoughts exist) β
β β’ Context relevance inclusion map (computed dynamically) β
β β
β Config files: β
β UNIFIED β frontal-cortex-unified.json β
β β
β Output: { response: str, confidence: float, mode: str } β
ββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
Phase D (Β§6)
4b. ACT Mode β The Action Loop
Used by background workers (tool_worker, persistent_task_worker) and when the mode router (non-user flows) selects ACT.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ACTOrchestrator β
β Config: cumulative_timeout=60s per_action=10s max_iterations=30 β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Iteration N β β
β β β β
β β 1. Generate action plan π§ LLM β β
β β Prompt: frontal-cortex-act.md β β
β β Input: user text + act_history (prior results) β β
β β Output: [{ type, params, β¦ }, β¦] β β
β β β β
β β 2. Termination check β‘ DET β β
β β β’ Cumulative timeout reached? β β
β β β’ Max iterations reached? β β
β β β’ No actions in plan? β β
β β β’ Semantic repetition detected? (embedding-based) β β
β β β’ Same action type repeated 3Γ in a row? β β
β β If any β exit loop β β
β β β β
β β 3. Execute actions β‘/π§ varies β β
β β ActDispatcherService β β
β β Chains outputs: result[N] β input[N+1] β β
β β Action types: β β
β β recall, memorize, associate, find_tools β β
β β (cognitive primitives, always available) β β
β β schedule, list, focus, persistent_task, etc. β β
β β (all innate skills available directly) β β
β β (+ external tools via tool_worker thread) β β
β β β β
β β 4. Log iteration π€ DB β β
β β Table: cortex_iterations β β
β β Fields: iteration_number, actions_executed, β β
β β execution_time_ms, fatigue, mode β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββΊ repeat if can_continue() β
β β
β After loop terminates: β
β 1. Re-route β terminal mode (force previous_mode='ACT') β
β Mode router (deterministic, skip_tiebreaker=True) β
β Typically selects UNIFIED β
β 2. Generate terminal response (FrontalCortex) π§ LLM β
β act_history passed as context β
β All-card actions β skip text (mode='IGNORE') β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
5. Phase D β Post-Response Commit
Runs after every response is generated (Paths A, B, C).
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PHASE D: Post-Response Commit β
β β
β Step 1 Append to transcript + compaction check π€ DB β
β topic_transcript (append assistant turn) β
β Fires compaction if context > 85% of budget β
β β β
β Step 2 Log interaction event π€ DB β
β Table: interaction_log β
β Fields: event_type='system_response', mode, β
β confidence, generation_time β
β β β
β Step 3 Encode response event π€ M (async) β
β EventBusService β ENCODE_EVENT β
β Triggers downstream memory consolidation: β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β episodic-memory-queue (PromptQueue) β β
β β β episodic_memory_worker: episode build π§ LLM β β
β β β π€ DB episodes (with sqlite-vec embedding) β β
β β β β
β β semantic_consolidation_queue (PromptQueue) β β
β β β semantic consolidation: concept extract π§ LLM β β
β β β π€ DB concepts, semantic_relationships β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β Step 5 Publish to WebSocket π€ M (pub/sub) β
β key: output:{request_id} β
β /chat endpoint receives, streams to client β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
6. Path D β Persistent Task Worker (Background, 30min Cycle)
Operates completely independently of user messages.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β persistent_task_worker (30min Β± 30% jitter) β
β β
β 1. Expire stale tasks π₯π€ DB β
β Table: persistent_tasks β
β created_at > max_age β mark EXPIRED β
β β
β 2. Pick eligible task (FIFO within priority) π₯ DB β
β State machine: PENDING β RUNNING β COMPLETED β
β β
β 3. Load task + progress π₯ DB β
β persistent_tasks.progress (JSON as TEXT) β
β Contains: plan DAG, coverage, step statuses β
β β
β 4. Execution branch: β
β ββββββββββββββββββββ βββββββββββββββββββββββββββββββββ β
β β HAS PLAN DAG? ββYesββΊβ Plan-Aware Execution β β
β ββββββββββ¬ββββββββββ β Ready steps = steps where β β
β β No β all depends_on are DONE β β
β βΌ β Execute each ready step β β
β ββββββββββββββββββββ β via bounded ACT loop β β
β β Flat ACT Loop β βββββββββββββββββββββββββββββββββ β
β β Iterate toward β β
β β goal directly β β
β ββββββββββββββββββββ β
β β
β 5. Bounded ACT Loop (both branches): π§ LLM per iter β
β max_iterations=5, cumulative_timeout=30min β
β Same fatigue model as interactive ACT loop β
β β
β 6. Atomic checkpoint π€ DB β
β persistent_tasks.progress (JSON as TEXT, atomic UPDATE) β
β Saves: plan, coverage %, step statuses, last results β
β β
β 7. Coverage check β‘ DET β
β 100% complete β mark COMPLETED β
β β
β 8. Adaptive surfacing (optional) β
β After cycle 2, or coverage jumped > 15% β
β β Proactive message to user β
β β π€ M pub/sub proactive channel β
β β
β PLAN DECOMPOSITION (called on task creation): π§ LLM ~300ms β
β PlanDecompositionService β
β Prompt: plan-decomposition.md β
β Output: { steps: [{ id, description, depends_on: [] }] } β
β Validates: Kahn's cycle detection, quality gates (Jaccard <0.7), β
β confidence > 0.5, step word count 4-30 β
β Stores: persistent_tasks.progress.plan (JSON as TEXT) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
7. Path E β Reasoning Loop (Background, 600s Idle-Only)
Runs only when all PromptQueues are idle. Signal-driven continuous reasoning.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β reasoning_loop_service (600s idle timeout, signal-driven) β
β β
β Preconditions: β‘ DET β
β All queues idle? π₯ M (queue lengths = 0) β
β Recent episodes exist? (lookback 168h) π₯ DB β
β Bail if user is in deep focus π₯ M focus:{thread_id} β
β β
β 1. Seed Selection (weighted random) β‘ DET β
β Salient 0.60 β Insight 0.40 β
β Source: π₯ DB episodes table (by category) β
β β
β 2. Spreading Activation (depth β€ 2) β‘ DET β
β π₯ DB semantic_concepts, semantic_relationships β
β π₯π€ M cognitive_drift_activations (sorted set) β
β π₯π€ M cognitive_drift_concept_cooldowns (hash) β
β Collect top 5 activated concepts β
β β
β 3. Thought Synthesis π§ LLM ~100ms β
β Prompt: cognitive-drift.md + soul.md β
β Input: activated concepts + soul axioms β
β Output: thought text β
β β
β 4. Store drift gist π€ M β
β key: gist:{topic} (30min TTL) β
β Will surface in frontal cortex context on next user message β
β β
β 5. Action Decision Routing β‘ DET β
β Scores registered actions: β
β β
β βββββββββββββββββ¬βββββββββββ¬ββββββββββββββββββββββββββββββββββ β
β β Action β Priority β What it does β β
β βββββββββββββββββΌβββββββββββΌββββββββββββββββββββββββββββββββββ€ β
β β COMMUNICATE β 10 β Push thought to user (deferred)β β
β β PLAN β 7 β Propose persistent task π§ LLM β β
β β SEED_THREAD β 6 β Plant new conversation seed β β
β β REFLECT β 5 β Internal memory consolidation β β
β β RECONCILE β 4 β Contradiction resolution β β
β β AMBIENT_TOOL β 3 β Context-triggered tool use β β
β β NOTHING β 0 β Always available fallback β β
β βββββββββββββββββ΄βββββββββββ΄ββββββββββββββββββββββββββββββββββ β
β β
β Winner selected by score (ties broken by priority) β
β PLAN action β calls PlanDecompositionService π§ LLM β
β β stores in persistent_tasks π€ DB β
β β
β 6. Deferred queue π€ M β
β COMMUNICATE β stores thought for quiet-hours delivery β
β Async: flushes when user returns from absence β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
8. Complete Storage Access Map
MemoryStore Keys Reference
Key Pattern TTL Read Written by
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
fok:{topic} β A,B FOK update service
world_model:items β A WorldStateService
reasoning_loop:activations β E Reasoning loop
reasoning_loop:cooldowns β E Reasoning loop
output:{request_id} short /ws digest_worker
PromptQueues (in-memory, thread-safe):
prompt-queue β β run.py β digest_worker
episodic-memory-queue β D encode event handler
semantic_consolidation_queue β D episodic_memory_worker
SQLite Tables Reference
Table When Written When Read
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
interaction_log Phase D (every message) observability endpoints
cortex_iterations ACT loop, Path B observability endpoints
episodes episodic_memory_worker (async) frontal_cortex, reasoning loop
concepts semantic_consolidation (async) drift engine, context assembly
semantic_relationships semantic_consolidation drift engine
user_traits IIP hook identity service
persistent_tasks Path D (task worker) persistent_task_worker
topics Phase A (new topic) topic_classifier
threads session management session_service
topic_transcript Phase D context_assembly
place_fingerprints ambient inference place_learning_service
9. LLM Call Inventory
Every LLM call in the system, with typical latency and model used.
Service Model Prompt Latency Triggered by
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
TopicClassifierService lightweight topic-classifier.md ~100ms Every message
ModeRouterService (tiebreaker) ONNX mode-tiebreaker model ~5ms Non-user flows only
FrontalCortex (UNIFIED) primary model soul + unified.md ~500ms-2s User path
FrontalCortex (ACT plan) primary model frontal-cortex-act.md ~500ms-2s Path C ACT loop
FrontalCortex (terminal) primary model mode-specific ~500ms-2s After ACT loop
CriticService lightweight critic.md ~200ms Path B (optional)
ReasoningLoop (thought) lightweight cognitive-drift.md ~100ms Path E
PlanDecompositionService lightweight plan-decomposition.md ~300ms On task creation
episodic_memory_worker lightweight episodic-memory.md ~200ms Phase D async
semantic_consolidation lightweight semantic-extract.md ~200ms Phase D async
Deterministic paths (zero LLM):
- IIP hook (regex)
- Intent classifier
- Empty guard / CANCEL detection (inline in unified path)
- Mode router scoring (non-user flows)
- Fatigue budget check in ACT loop
- Termination checks
- Spreading activation in drift engine
- Plan DAG cycle detection (Kahnβs)
- FOK / warmth / memory confidence calculations
10. Latency Profile by Path
Path P50 Latency Bottleneck
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Unified (user) 1s β 3s Unified LLM call (primary model)
Unified + skills 2s β 30s Skill execution (varies)
B β ACT + Tools 5s β 30s+ Tool execution (background workers)
D β Task Worker 30min cycle Background, no user wait
E β Drift 300s cycle Background, no user wait
Component latency breakdown (unified path, typical):
Context assembly <10ms ββ MemoryStore reads (all cached)
Intent classify ~5ms ββ Deterministic
Unified LLM call ~800ms ββ Primary model (varies by provider)
Working memory write <5ms ββ MemoryStore append
DB event log ~10ms ββ SQLite WAL write
WS publish ~1ms ββ MemoryStore pub/sub
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Total (typical) ~0.85s
11. Architectural Principles Visible in the Flow
| Principle | Where it shows up in the flow |
|---|---|
| Attention is sacred | Unified path lets LLM decide when to act vs respond β no wasted routing overhead; ACT fatigue model prevents runaway tool chains |
| Judgment over activity | Single unified LLM call for user messages; mode router handles non-user flows deterministically |
| Tool agnosticism | ActDispatcherService routes all tools generically β no tool names anywhere in the Phase B/C infrastructure |
| Continuity over transactions | Working memory, gists, episodes, concepts all feed every response; drift gists surface even on next message |
| Single authority | Router weight mutation bounded by single regulator (24h cycle, Β±0.02/day max) |
Last updated: 2026-03-21. See docs/INDEX.md for the full documentation map.