Message Flow — Complete Routing Reference

This document is the single authoritative visual map of how a user message travels through Chalie. Every branch, every storage hit, every LLM call, and every background cycle is shown here.

Legend

⚡ DET   — Deterministic (no LLM, <10ms)
🧠 LLM   — LLM inference call
📥 M     — MemoryStore READ
📤 M     — MemoryStore WRITE
📥 DB    — SQLite READ
📤 DB    — SQLite WRITE
⏱ ~Xms  — Typical latency

1. Master Overview — All Possible Paths

                            ┌──────────────────────┐
                            │  User Message via    │
                            │    /ws  (WebSocket)  │
                            └──────────┬───────────┘
                                       │ daemon thread
                            ┌──────────▼───────────┐
                            │   digest_worker()    │◄──── background
                            └──────────┬───────────┘
                                       │
                            ┌──────────▼───────────┐
                            │   PHASE A            │
                            │   Ingestion &        │
                            │   Context Assembly   │
                            │   (see §2)           │
                            └──────────┬───────────┘
                                       │
                            ┌──────────▼───────────┐
                            │   PHASE B            │
                            │   Signal Collection  │
                            │   & Unified Path     │
                            │   (see §3)           │
                            └──────────┬───────────┘
                                       │
                           ┌───────────┴──────────────┐
                           │    Unified Generation    │
                           │    (unified_generate)    │
                           └──┬───────────────────────┘
                              │
               ┌──────────────▼──────────────────────────────┐
               │  Single LLM call — LLM decides:             │
               │  • Respond directly (Format B)               │
               │  • Invoke skills/tools first (Format A)      │
               │  • CANCEL / empty → fast exit                │
               └──────────────────────┬──────────────────────┘
                                      │
                               ┌──────▼───────────┐
                               │   PHASE D        │
                               │   Post-Response  │
                               │   Commit (see §5)│
                               └──────┬───────────┘
                                      │
                               ┌──────▼───────────┐
                               │  📤 M  pub/sub   │
                               │  output:{id}     │
                               │  WS → Client     │
                               └──────────────────┘

BACKGROUND (always running, independent of user messages):
  PATH D  ──  Persistent Task Worker  (30min ± jitter)   (see §5)
  PATH E  ──  Reasoning Loop          (600s, idle-only)   (see §7)

2. Phase A — Ingestion & Context Assembly

Runs immediately for every message, before any routing decision.

┌─────────────────────────────────────────────────────────────────────┐
│  PHASE A: Context Assembly                                          │
│                                                                     │
│  Step 1  IIP Hook (Identity Promotion)            ⚡ DET  <5ms     │
│          Regex: "call me X", "my name is X", …                     │
│          Match → 📤 M  📤 DB  (trait + identity)                   │
│          No match → continue                                        │
│                           │                                         │
│  Step 2  Working Memory (transcript + compaction)  📥 DB             │
│          topic_compactions + topic_transcript (budget-aware)        │
│          ─────────────────────────────────────────────────          │
│  Step 3  Gists                                    📥 M              │
│          key: gist:{topic}  (sorted set, 30min TTL)                │
│          ─────────────────────────────────────────────────          │
│  Step 4  Facts                                    📥 M              │
│          key: fact:{topic}:{key}  (24h TTL)                        │
│          ─────────────────────────────────────────────────          │
│  Step 5  World State                              📥 M              │
│          key: world_state:{topic}                                   │
│          ─────────────────────────────────────────────────          │
│  Step 6  FOK (Feeling-of-Knowing) score           📥 M              │
│          key: fok:{topic}  (float 0.0–5.0)                         │
│          ─────────────────────────────────────────────────          │
│  Step 7  Context Warmth                           ⚡ DET            │
│          warmth = (wm_score + world_score) / 2                     │
│          ─────────────────────────────────────────────────          │
│  Step 8  Memory Confidence                        ⚡ DET            │
│          conf = 0.4×fok + 0.4×warmth + 0.2×density                │
│          is_new_topic → conf *= 0.7                                 │
│          ─────────────────────────────────────────────────          │
│  Step 9  Session / Focus Tracking                 📥📤 M            │
│          topic_streak:{thread_id}  (2h TTL)                        │
│          focus:{thread_id}  (auto-infer after N exchanges)         │
│          Silence gap > 2700s → trigger episodic memory             │
└─────────────────────────────────────────────────────────────────────┘

3. Phase B — Signal Collection & Unified Generation

User messages go through a single unified LLM call. No mode gate, no UNIFIED/ACT routing split.

┌─────────────────────────────────────────────────────────────────────┐
│  LAYER 1: NLP Signal Collection                   ⚡ DET  <1ms     │
│                                                                     │
│  compute_nlp_signals()                                              │
│  Input:  text                                                       │
│  Output: { has_question_mark, interrogative_words, greeting_pattern, │
│            explicit_feedback, information_density, implicit_reference}│
│  No external calls — pure regex/heuristics                          │
└──────────────────────────────┬──────────────────────────────────────┘
                               │
┌──────────────────────────────▼──────────────────────────────────────┐
│  LAYER 2: Unified Generation                      🧠 LLM           │
│  unified_generate()                                                 │
│                                                                     │
│  Single LLM call with discoverable skills/tools.                   │
│  The LLM decides whether to:                                        │
│    • Respond directly (Format B — conversational response)          │
│    • Invoke skills/tools first (Format A — action + synthesis)      │
│                                                                     │
│  Empty input and CANCEL patterns handled inline (fast exit).        │
│  Context relevance pre-parser selects which context nodes to inject.│
│                                                                     │
│  Config: frontal-cortex-unified.json                                │
│  Prompt: frontal-cortex-unified.md                                  │
└──────────────────────────────┬──────────────────────────────────────┘
                               │
                           Phase D  (§6)

4. Mode Router — Non-User Flows (Drift, Proactive, Fallback)

4a. Mode Router (Deterministic)

Used only for non-user flows (cognitive drift, proactive notifications, fallback). User messages bypass this entirely via unified_generate.

┌─────────────────────────────────────────────────────────────────────┐
│  ModeRouterService                           ⚡ DET  ~5ms           │
│                                                                     │
│  Signal inputs (all already in memory from Phase A/B):             │
│    context_warmth       topic_confidence     has_question_mark     │
│    working_memory_turns fok_score            interrogative_words   │
│    gist_count           is_new_topic         greeting_pattern      │
│    fact_count           world_state_present  explicit_feedback     │
│    intent_type          intent_complexity    intent_confidence     │
│    information_density  implicit_reference   prompt_token_count    │
│                                                                     │
│  Scoring formula (per mode):                                       │
│    score[mode] = base_score + Σ(weight[signal] × signal_value)    │
│    Anti-oscillation: hysteresis dampening from prior mode          │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │  Tie-breaker?                           ⚡ ONNX  ~5ms        │   │
│  │  Triggered when: top-2 scores within effective_margin       │   │
│  │  Model:   mode-tiebreaker (ONNX classifier)                 │   │
│  │  Output:  pick mode A or B                                  │   │
│  └─────────────────────────────────────────────────────────────┘   │
└──────────────────────────────┬──────────────────────────────────────┘
                               │
                            UNIFIED
                               │
┌──────────────────────────────▼──────────────────────────────────────┐
│  FrontalCortexService                        🧠 LLM  ~500ms–2s     │
│                                                                     │
│  Prompt = soul.md + identity-core.md + frontal-cortex-{mode}.md    │
│                                                                     │
│  Context injected:                                                  │
│    • Working memory (thread_id)                                     │
│    • Chat history                                                   │
│    • Assembled context (semantic retrieval)                         │
│    • Drift gists (if idle thoughts exist)                           │
│    • Context relevance inclusion map (computed dynamically)         │
│                                                                     │
│  Config files:                                                      │
│    UNIFIED      → frontal-cortex-unified.json                       │
│                                                                     │
│  Output: { response: str, confidence: float, mode: str }           │
└──────────────────────────────┬──────────────────────────────────────┘
                               │
                           Phase D  (§6)

4b. ACT Mode — The Action Loop

Used by background workers (tool_worker, persistent_task_worker) and when the mode router (non-user flows) selects ACT.

┌─────────────────────────────────────────────────────────────────────┐
│  ACTOrchestrator                                                    │
│  Config: cumulative_timeout=60s  per_action=10s  max_iterations=30 │
│                                                                     │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │  Iteration N                                                 │  │
│  │                                                              │  │
│  │  1. Generate action plan            🧠 LLM                  │  │
│  │     Prompt: frontal-cortex-act.md                           │  │
│  │     Input:  user text + act_history (prior results)         │  │
│  │     Output: [{ type, params, … }, …]                        │  │
│  │                                                              │  │
│  │  2. Termination check               ⚡ DET                  │  │
│  │     • Cumulative timeout reached?                            │  │
│  │     • Max iterations reached?                                │  │
│  │     • No actions in plan?                                    │  │
│  │     • Semantic repetition detected? (embedding-based)        │  │
│  │     • Same action type repeated 3× in a row?                │  │
│  │     If any → exit loop                                       │  │
│  │                                                              │  │
│  │  3. Execute actions                  ⚡/🧠 varies           │  │
│  │     ActDispatcherService                                     │  │
│  │     Chains outputs: result[N] → input[N+1]                  │  │
│  │     Action types:                                            │  │
│  │       recall, memorize, associate, find_tools               │  │
│  │       (cognitive primitives, always available)              │  │
│  │       schedule, list, focus, persistent_task, etc.          │  │
│  │       (all innate skills available directly)                │  │
│  │       (+ external tools via tool_worker thread)             │  │
│  │                                                              │  │
│  │  4. Log iteration                    📤 DB                  │  │
│  │     Table: cortex_iterations                                 │  │
│  │     Fields: iteration_number, actions_executed,             │  │
│  │             execution_time_ms, fatigue, mode                │  │
│  └──────────────────────────────────────────────────────────────┘  │
│           │                                                         │
│           └──► repeat if can_continue()                             │
│                                                                     │
│  After loop terminates:                                             │
│  1. Re-route → terminal mode (force previous_mode='ACT')           │
│     Mode router (deterministic, skip_tiebreaker=True)              │
│     Typically selects UNIFIED                                       │
│  2. Generate terminal response (FrontalCortex)   🧠 LLM           │
│     act_history passed as context                                   │
│     All-card actions → skip text (mode='IGNORE')                   │
└─────────────────────────────────────────────────────────────────────┘

5. Phase D — Post-Response Commit

Runs after every response is generated (Paths A, B, C).

┌─────────────────────────────────────────────────────────────────────┐
│  PHASE D: Post-Response Commit                                      │
│                                                                     │
│  Step 1  Append to transcript + compaction check  📤 DB              │
│          topic_transcript (append assistant turn)                   │
│          Fires compaction if context > 85% of budget               │
│                         │                                           │
│  Step 2  Log interaction event                  📤 DB              │
│          Table: interaction_log                                      │
│          Fields: event_type='system_response', mode,               │
│                  confidence, generation_time                        │
│                         │                                           │
│  Step 3  Encode response event                  📤 M  (async)      │
│          EventBusService → ENCODE_EVENT                             │
│          Triggers downstream memory consolidation:                  │
│                                                                     │
│          ┌──────────────────────────────────────────────────────┐  │
│          │  episodic-memory-queue (PromptQueue)                 │  │
│          │    → episodic_memory_worker: episode build  🧠 LLM  │  │
│          │    → 📤 DB  episodes  (with sqlite-vec embedding)    │  │
│          │                                                      │  │
│          │  semantic_consolidation_queue (PromptQueue)          │  │
│          │    → semantic consolidation: concept extract 🧠 LLM │  │
│          │    → 📤 DB  concepts, semantic_relationships         │  │
│          └──────────────────────────────────────────────────────┘  │
│                         │                                           │
│  Step 5  Publish to WebSocket                   📤 M  (pub/sub)    │
│          key: output:{request_id}                                   │
│          /chat endpoint receives, streams to client                 │
└─────────────────────────────────────────────────────────────────────┘

6. Path D — Persistent Task Worker (Background, 30min Cycle)

Operates completely independently of user messages.

┌─────────────────────────────────────────────────────────────────────┐
│  persistent_task_worker  (30min ± 30% jitter)                      │
│                                                                     │
│  1. Expire stale tasks                          📥📤 DB            │
│     Table: persistent_tasks                                         │
│     created_at > max_age → mark EXPIRED                            │
│                                                                     │
│  2. Pick eligible task (FIFO within priority)   📥 DB              │
│     State machine: PENDING → RUNNING → COMPLETED                    │
│                                                                     │
│  3. Load task + progress                        📥 DB              │
│     persistent_tasks.progress (JSON as TEXT)                               │
│     Contains: plan DAG, coverage, step statuses                    │
│                                                                     │
│  4. Execution branch:                                               │
│     ┌──────────────────┐      ┌───────────────────────────────┐   │
│     │  HAS PLAN DAG?   │─Yes─►│  Plan-Aware Execution         │   │
│     └────────┬─────────┘      │  Ready steps = steps where    │   │
│              │ No             │  all depends_on are DONE       │   │
│              ▼                │  Execute each ready step       │   │
│     ┌──────────────────┐      │  via bounded ACT loop         │   │
│     │  Flat ACT Loop   │      └───────────────────────────────┘   │
│     │  Iterate toward  │                                           │
│     │  goal directly   │                                           │
│     └──────────────────┘                                           │
│                                                                     │
│  5. Bounded ACT Loop (both branches):           🧠 LLM  per iter  │
│     max_iterations=5, cumulative_timeout=30min                     │
│     Same fatigue model as interactive ACT loop                     │
│                                                                     │
│  6. Atomic checkpoint                           📤 DB              │
│     persistent_tasks.progress (JSON as TEXT, atomic UPDATE)        │
│     Saves: plan, coverage %, step statuses, last results           │
│                                                                     │
│  7. Coverage check                              ⚡ DET             │
│     100% complete → mark COMPLETED                                 │
│                                                                     │
│  8. Adaptive surfacing (optional)                                   │
│     After cycle 2, or coverage jumped > 15%                        │
│     → Proactive message to user                                    │
│     → 📤 M  pub/sub proactive channel                              │
│                                                                     │
│  PLAN DECOMPOSITION (called on task creation):  🧠 LLM  ~300ms    │
│  PlanDecompositionService                                           │
│  Prompt: plan-decomposition.md                                      │
│  Output: { steps: [{ id, description, depends_on: [] }] }          │
│  Validates: Kahn's cycle detection, quality gates (Jaccard <0.7),  │
│             confidence > 0.5, step word count 4-30                 │
│  Stores: persistent_tasks.progress.plan (JSON as TEXT)              │
└─────────────────────────────────────────────────────────────────────┘

7. Path E — Reasoning Loop (Background, 600s Idle-Only)

Runs only when all PromptQueues are idle. Signal-driven continuous reasoning.

┌─────────────────────────────────────────────────────────────────────┐
│  reasoning_loop_service  (600s idle timeout, signal-driven)        │
│                                                                     │
│  Preconditions:                               ⚡ DET               │
│    All queues idle?   📥 M  (queue lengths = 0)                    │
│    Recent episodes exist? (lookback 168h)  📥 DB                   │
│    Bail if user is in deep focus           📥 M  focus:{thread_id} │
│                                                                     │
│  1. Seed Selection (weighted random)          ⚡ DET               │
│     Salient  0.60 │ Insight  0.40                                   │
│     Source: 📥 DB  episodes table (by category)                    │
│                                                                     │
│  2. Spreading Activation (depth ≤ 2)          ⚡ DET               │
│     📥 DB  semantic_concepts, semantic_relationships               │
│     📥📤 M  cognitive_drift_activations  (sorted set)              │
│     📥📤 M  cognitive_drift_concept_cooldowns  (hash)              │
│     Collect top 5 activated concepts                               │
│                                                                     │
│  3. Thought Synthesis                         🧠 LLM  ~100ms       │
│     Prompt: cognitive-drift.md + soul.md                           │
│     Input:  activated concepts + soul axioms                       │
│     Output: thought text                                            │
│                                                                     │
│  4. Store drift gist                          📤 M               │
│     key: gist:{topic}  (30min TTL)                                  │
│     Will surface in frontal cortex context on next user message    │
│                                                                     │
│  5. Action Decision Routing                   ⚡ DET               │
│     Scores registered actions:                                      │
│                                                                     │
│     ┌───────────────┬──────────┬─────────────────────────────────┐ │
│     │  Action       │ Priority │  What it does                   │ │
│     ├───────────────┼──────────┼─────────────────────────────────┤ │
│     │  COMMUNICATE  │    10    │  Push thought to user (deferred)│ │
│     │  PLAN         │     7    │  Propose persistent task 🧠 LLM │ │
│     │  SEED_THREAD  │     6    │  Plant new conversation seed    │ │
│     │  REFLECT      │     5    │  Internal memory consolidation  │ │
│     │  RECONCILE    │     4    │  Contradiction resolution       │ │
│     │  AMBIENT_TOOL │     3    │  Context-triggered tool use     │ │
│     │  NOTHING      │     0    │  Always available fallback      │ │
│     └───────────────┴──────────┴─────────────────────────────────┘ │
│                                                                     │
│     Winner selected by score (ties broken by priority)             │
│     PLAN action → calls PlanDecompositionService  🧠 LLM          │
│                → stores in persistent_tasks  📤 DB                 │
│                                                                     │
│  6. Deferred queue                             📤 M               │
│     COMMUNICATE → stores thought for quiet-hours delivery          │
│     Async: flushes when user returns from absence                  │
└─────────────────────────────────────────────────────────────────────┘

8. Complete Storage Access Map

MemoryStore Keys Reference

Key Pattern                        TTL        Read    Written by
─────────────────────────────────────────────────────────────────────
fok:{topic}                        —          A,B     FOK update service
world_model:items                  —          A       WorldStateService
reasoning_loop:activations         —          E       Reasoning loop
reasoning_loop:cooldowns           —          E       Reasoning loop
output:{request_id}                short      /ws     digest_worker

PromptQueues (in-memory, thread-safe):
prompt-queue                       —          —       run.py → digest_worker
episodic-memory-queue              —          D       encode event handler
semantic_consolidation_queue       —          D       episodic_memory_worker

SQLite Tables Reference

Table                      When Written                    When Read
──────────────────────────────────────────────────────────────────────
interaction_log            Phase D (every message)         observability endpoints
cortex_iterations          ACT loop, Path B                observability endpoints
episodes                   episodic_memory_worker (async)  frontal_cortex, reasoning loop
concepts                   semantic_consolidation (async)  drift engine, context assembly
semantic_relationships     semantic_consolidation          drift engine
user_traits                IIP hook                        identity service
persistent_tasks           Path D (task worker)            persistent_task_worker
topics                     Phase A (new topic)             topic_classifier
threads                    session management              session_service
topic_transcript           Phase D                         context_assembly
place_fingerprints         ambient inference               place_learning_service

9. LLM Call Inventory

Every LLM call in the system, with typical latency and model used.

Service                      Model            Prompt                   Latency   Triggered by
────────────────────────────────────────────────────────────────────────────────────────────────
TopicClassifierService       lightweight      topic-classifier.md      ~100ms    Every message
ModeRouterService (tiebreaker) ONNX           mode-tiebreaker model    ~5ms      Non-user flows only
FrontalCortex (UNIFIED)      primary model    soul + unified.md        ~500ms-2s User path
FrontalCortex (ACT plan)     primary model    frontal-cortex-act.md    ~500ms-2s Path C ACT loop
FrontalCortex (terminal)     primary model    mode-specific            ~500ms-2s After ACT loop
CriticService                lightweight      critic.md                ~200ms    Path B (optional)
ReasoningLoop (thought)      lightweight      cognitive-drift.md       ~100ms    Path E
PlanDecompositionService     lightweight      plan-decomposition.md    ~300ms    On task creation
episodic_memory_worker       lightweight      episodic-memory.md       ~200ms    Phase D async
semantic_consolidation       lightweight      semantic-extract.md      ~200ms    Phase D async

Deterministic paths (zero LLM):

IIP hook (regex)
Intent classifier
Empty guard / CANCEL detection (inline in unified path)
Mode router scoring (non-user flows)
Fatigue budget check in ACT loop
Termination checks
Spreading activation in drift engine
Plan DAG cycle detection (Kahn’s)
FOK / warmth / memory confidence calculations

10. Latency Profile by Path

Path              P50 Latency    Bottleneck
────────────────────────────────────────────────────────────
Unified (user)    1s – 3s        Unified LLM call (primary model)
Unified + skills  2s – 30s       Skill execution (varies)
B — ACT + Tools   5s – 30s+      Tool execution (background workers)
D — Task Worker   30min cycle    Background, no user wait
E — Drift         300s cycle     Background, no user wait

Component latency breakdown (unified path, typical):
  Context assembly     <10ms   ── MemoryStore reads (all cached)
  Intent classify      ~5ms    ── Deterministic
  Unified LLM call     ~800ms  ── Primary model (varies by provider)
  Working memory write <5ms    ── MemoryStore append
  DB event log         ~10ms   ── SQLite WAL write
  WS publish           ~1ms    ── MemoryStore pub/sub
  ─────────────────────────────────────────────────────────
  Total (typical)      ~0.85s

11. Architectural Principles Visible in the Flow

Principle	Where it shows up in the flow
Attention is sacred	Unified path lets LLM decide when to act vs respond — no wasted routing overhead; ACT fatigue model prevents runaway tool chains
Judgment over activity	Single unified LLM call for user messages; mode router handles non-user flows deterministically
Tool agnosticism	`ActDispatcherService` routes all tools generically — no tool names anywhere in the Phase B/C infrastructure
Continuity over transactions	Working memory, gists, episodes, concepts all feed every response; drift gists surface even on next message
Single authority	Router weight mutation bounded by single regulator (24h cycle, ±0.02/day max)

Last updated: 2026-03-21. See docs/INDEX.md for the full documentation map.