MESSAGE FLOW

Message Flow β€” Complete Routing Reference

This document is the single authoritative visual map of how a user message travels through Chalie. Every branch, every storage hit, every LLM call, and every background cycle is shown here.

Legend

⚑ DET   β€” Deterministic (no LLM, <10ms)
🧠 LLM   β€” LLM inference call
πŸ“₯ M     β€” MemoryStore READ
πŸ“€ M     β€” MemoryStore WRITE
πŸ“₯ DB    β€” SQLite READ
πŸ“€ DB    β€” SQLite WRITE
⏱ ~Xms  β€” Typical latency

1. Master Overview β€” All Possible Paths

                            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                            β”‚  User Message via    β”‚
                            β”‚    /ws  (WebSocket)  β”‚
                            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚ daemon thread
                            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                            β”‚   digest_worker()    │◄──── background
                            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
                            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                            β”‚   PHASE A            β”‚
                            β”‚   Ingestion &        β”‚
                            β”‚   Context Assembly   β”‚
                            β”‚   (see Β§2)           β”‚
                            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
                            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                            β”‚   PHASE B            β”‚
                            β”‚   Signal Collection  β”‚
                            β”‚   & Unified Path     β”‚
                            β”‚   (see Β§3)           β”‚
                            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚
                           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                           β”‚    Unified Generation    β”‚
                           β”‚    (unified_generate)    β”‚
                           β””β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
               β”‚  Single LLM call β€” LLM decides:             β”‚
               β”‚  β€’ Respond directly (Format B)               β”‚
               β”‚  β€’ Invoke skills/tools first (Format A)      β”‚
               β”‚  β€’ CANCEL / empty β†’ fast exit                β”‚
               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                      β”‚
                               β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                               β”‚   PHASE D        β”‚
                               β”‚   Post-Response  β”‚
                               β”‚   Commit (see Β§5)β”‚
                               β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                      β”‚
                               β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                               β”‚  πŸ“€ M  pub/sub   β”‚
                               β”‚  output:{id}     β”‚
                               β”‚  WS β†’ Client     β”‚
                               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

BACKGROUND (always running, independent of user messages):
  PATH D  ──  Persistent Task Worker  (30min Β± jitter)   (see Β§5)
  PATH E  ──  Reasoning Loop          (600s, idle-only)   (see Β§7)

2. Phase A β€” Ingestion & Context Assembly

Runs immediately for every message, before any routing decision.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  PHASE A: Context Assembly                                          β”‚
β”‚                                                                     β”‚
β”‚  Step 1  IIP Hook (Identity Promotion)            ⚑ DET  <5ms     β”‚
β”‚          Regex: "call me X", "my name is X", …                     β”‚
β”‚          Match β†’ πŸ“€ M  πŸ“€ DB  (trait + identity)                   β”‚
β”‚          No match β†’ continue                                        β”‚
β”‚                           β”‚                                         β”‚
β”‚  Step 2  Working Memory (transcript + compaction)  πŸ“₯ DB             β”‚
β”‚          topic_compactions + topic_transcript (budget-aware)        β”‚
β”‚          ─────────────────────────────────────────────────          β”‚
β”‚  Step 3  Gists                                    πŸ“₯ M              β”‚
β”‚          key: gist:{topic}  (sorted set, 30min TTL)                β”‚
β”‚          ─────────────────────────────────────────────────          β”‚
β”‚  Step 4  Facts                                    πŸ“₯ M              β”‚
β”‚          key: fact:{topic}:{key}  (24h TTL)                        β”‚
β”‚          ─────────────────────────────────────────────────          β”‚
β”‚  Step 5  World State                              πŸ“₯ M              β”‚
β”‚          key: world_state:{topic}                                   β”‚
β”‚          ─────────────────────────────────────────────────          β”‚
β”‚  Step 6  FOK (Feeling-of-Knowing) score           πŸ“₯ M              β”‚
β”‚          key: fok:{topic}  (float 0.0–5.0)                         β”‚
β”‚          ─────────────────────────────────────────────────          β”‚
β”‚  Step 7  Context Warmth                           ⚑ DET            β”‚
β”‚          warmth = (wm_score + world_score) / 2                     β”‚
β”‚          ─────────────────────────────────────────────────          β”‚
β”‚  Step 8  Memory Confidence                        ⚑ DET            β”‚
β”‚          conf = 0.4Γ—fok + 0.4Γ—warmth + 0.2Γ—density                β”‚
β”‚          is_new_topic β†’ conf *= 0.7                                 β”‚
β”‚          ─────────────────────────────────────────────────          β”‚
β”‚  Step 9  Session / Focus Tracking                 πŸ“₯πŸ“€ M            β”‚
β”‚          topic_streak:{thread_id}  (2h TTL)                        β”‚
β”‚          focus:{thread_id}  (auto-infer after N exchanges)         β”‚
β”‚          Silence gap > 2700s β†’ trigger episodic memory             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3. Phase B β€” Signal Collection & Unified Generation

User messages go through a single unified LLM call. No mode gate, no UNIFIED/ACT routing split.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  LAYER 1: NLP Signal Collection                   ⚑ DET  <1ms     β”‚
β”‚                                                                     β”‚
β”‚  compute_nlp_signals()                                              β”‚
β”‚  Input:  text                                                       β”‚
β”‚  Output: { has_question_mark, interrogative_words, greeting_pattern, β”‚
β”‚            explicit_feedback, information_density, implicit_reference}β”‚
β”‚  No external calls β€” pure regex/heuristics                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  LAYER 2: Unified Generation                      🧠 LLM           β”‚
β”‚  unified_generate()                                                 β”‚
β”‚                                                                     β”‚
β”‚  Single LLM call with discoverable skills/tools.                   β”‚
β”‚  The LLM decides whether to:                                        β”‚
β”‚    β€’ Respond directly (Format B β€” conversational response)          β”‚
β”‚    β€’ Invoke skills/tools first (Format A β€” action + synthesis)      β”‚
β”‚                                                                     β”‚
β”‚  Empty input and CANCEL patterns handled inline (fast exit).        β”‚
β”‚  Context relevance pre-parser selects which context nodes to inject.β”‚
β”‚                                                                     β”‚
β”‚  Config: frontal-cortex-unified.json                                β”‚
β”‚  Prompt: frontal-cortex-unified.md                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
                           Phase D  (Β§6)

4. Mode Router β€” Non-User Flows (Drift, Proactive, Fallback)

4a. Mode Router (Deterministic)

Used only for non-user flows (cognitive drift, proactive notifications, fallback). User messages bypass this entirely via unified_generate.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  ModeRouterService                           ⚑ DET  ~5ms           β”‚
β”‚                                                                     β”‚
β”‚  Signal inputs (all already in memory from Phase A/B):             β”‚
β”‚    context_warmth       topic_confidence     has_question_mark     β”‚
β”‚    working_memory_turns fok_score            interrogative_words   β”‚
β”‚    gist_count           is_new_topic         greeting_pattern      β”‚
β”‚    fact_count           world_state_present  explicit_feedback     β”‚
β”‚    intent_type          intent_complexity    intent_confidence     β”‚
β”‚    information_density  implicit_reference   prompt_token_count    β”‚
β”‚                                                                     β”‚
β”‚  Scoring formula (per mode):                                       β”‚
β”‚    score[mode] = base_score + Ξ£(weight[signal] Γ— signal_value)    β”‚
β”‚    Anti-oscillation: hysteresis dampening from prior mode          β”‚
β”‚                                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Tie-breaker?                           ⚑ ONNX  ~5ms        β”‚   β”‚
β”‚  β”‚  Triggered when: top-2 scores within effective_margin       β”‚   β”‚
β”‚  β”‚  Model:   mode-tiebreaker (ONNX classifier)                 β”‚   β”‚
β”‚  β”‚  Output:  pick mode A or B                                  β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
                            UNIFIED
                               β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  FrontalCortexService                        🧠 LLM  ~500ms–2s     β”‚
β”‚                                                                     β”‚
β”‚  Prompt = soul.md + identity-core.md + frontal-cortex-{mode}.md    β”‚
β”‚                                                                     β”‚
β”‚  Context injected:                                                  β”‚
β”‚    β€’ Working memory (thread_id)                                     β”‚
β”‚    β€’ Chat history                                                   β”‚
β”‚    β€’ Assembled context (semantic retrieval)                         β”‚
β”‚    β€’ Drift gists (if idle thoughts exist)                           β”‚
β”‚    β€’ Context relevance inclusion map (computed dynamically)         β”‚
β”‚                                                                     β”‚
β”‚  Config files:                                                      β”‚
β”‚    UNIFIED      β†’ frontal-cortex-unified.json                       β”‚
β”‚                                                                     β”‚
β”‚  Output: { response: str, confidence: float, mode: str }           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
                           Phase D  (Β§6)

4b. ACT Mode β€” The Action Loop

Used by background workers (tool_worker, persistent_task_worker) and when the mode router (non-user flows) selects ACT.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  ACTOrchestrator                                                    β”‚
β”‚  Config: cumulative_timeout=60s  per_action=10s  max_iterations=30 β”‚
β”‚                                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Iteration N                                                 β”‚  β”‚
β”‚  β”‚                                                              β”‚  β”‚
β”‚  β”‚  1. Generate action plan            🧠 LLM                  β”‚  β”‚
β”‚  β”‚     Prompt: frontal-cortex-act.md                           β”‚  β”‚
β”‚  β”‚     Input:  user text + act_history (prior results)         β”‚  β”‚
β”‚  β”‚     Output: [{ type, params, … }, …]                        β”‚  β”‚
β”‚  β”‚                                                              β”‚  β”‚
β”‚  β”‚  2. Termination check               ⚑ DET                  β”‚  β”‚
β”‚  β”‚     β€’ Cumulative timeout reached?                            β”‚  β”‚
β”‚  β”‚     β€’ Max iterations reached?                                β”‚  β”‚
β”‚  β”‚     β€’ No actions in plan?                                    β”‚  β”‚
β”‚  β”‚     β€’ Semantic repetition detected? (embedding-based)        β”‚  β”‚
β”‚  β”‚     β€’ Same action type repeated 3Γ— in a row?                β”‚  β”‚
β”‚  β”‚     If any β†’ exit loop                                       β”‚  β”‚
β”‚  β”‚                                                              β”‚  β”‚
β”‚  β”‚  3. Execute actions                  ⚑/🧠 varies           β”‚  β”‚
β”‚  β”‚     ActDispatcherService                                     β”‚  β”‚
β”‚  β”‚     Chains outputs: result[N] β†’ input[N+1]                  β”‚  β”‚
β”‚  β”‚     Action types:                                            β”‚  β”‚
β”‚  β”‚       recall, memorize, associate, find_tools               β”‚  β”‚
β”‚  β”‚       (cognitive primitives, always available)              β”‚  β”‚
β”‚  β”‚       schedule, list, focus, persistent_task, etc.          β”‚  β”‚
β”‚  β”‚       (all innate skills available directly)                β”‚  β”‚
β”‚  β”‚       (+ external tools via tool_worker thread)             β”‚  β”‚
β”‚  β”‚                                                              β”‚  β”‚
β”‚  β”‚  4. Log iteration                    πŸ“€ DB                  β”‚  β”‚
β”‚  β”‚     Table: cortex_iterations                                 β”‚  β”‚
β”‚  β”‚     Fields: iteration_number, actions_executed,             β”‚  β”‚
β”‚  β”‚             execution_time_ms, fatigue, mode                β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚           β”‚                                                         β”‚
β”‚           └──► repeat if can_continue()                             β”‚
β”‚                                                                     β”‚
β”‚  After loop terminates:                                             β”‚
β”‚  1. Re-route β†’ terminal mode (force previous_mode='ACT')           β”‚
β”‚     Mode router (deterministic, skip_tiebreaker=True)              β”‚
β”‚     Typically selects UNIFIED                                       β”‚
β”‚  2. Generate terminal response (FrontalCortex)   🧠 LLM           β”‚
β”‚     act_history passed as context                                   β”‚
β”‚     All-card actions β†’ skip text (mode='IGNORE')                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

5. Phase D β€” Post-Response Commit

Runs after every response is generated (Paths A, B, C).

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  PHASE D: Post-Response Commit                                      β”‚
β”‚                                                                     β”‚
β”‚  Step 1  Append to transcript + compaction check  πŸ“€ DB              β”‚
β”‚          topic_transcript (append assistant turn)                   β”‚
β”‚          Fires compaction if context > 85% of budget               β”‚
β”‚                         β”‚                                           β”‚
β”‚  Step 2  Log interaction event                  πŸ“€ DB              β”‚
β”‚          Table: interaction_log                                      β”‚
β”‚          Fields: event_type='system_response', mode,               β”‚
β”‚                  confidence, generation_time                        β”‚
β”‚                         β”‚                                           β”‚
β”‚  Step 3  Encode response event                  πŸ“€ M  (async)      β”‚
β”‚          EventBusService β†’ ENCODE_EVENT                             β”‚
β”‚          Triggers downstream memory consolidation:                  β”‚
β”‚                                                                     β”‚
β”‚          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚          β”‚  episodic-memory-queue (PromptQueue)                 β”‚  β”‚
β”‚          β”‚    β†’ episodic_memory_worker: episode build  🧠 LLM  β”‚  β”‚
β”‚          β”‚    β†’ πŸ“€ DB  episodes  (with sqlite-vec embedding)    β”‚  β”‚
β”‚          β”‚                                                      β”‚  β”‚
β”‚          β”‚  semantic_consolidation_queue (PromptQueue)          β”‚  β”‚
β”‚          β”‚    β†’ semantic consolidation: concept extract 🧠 LLM β”‚  β”‚
β”‚          β”‚    β†’ πŸ“€ DB  concepts, semantic_relationships         β”‚  β”‚
β”‚          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                         β”‚                                           β”‚
β”‚  Step 5  Publish to WebSocket                   πŸ“€ M  (pub/sub)    β”‚
β”‚          key: output:{request_id}                                   β”‚
β”‚          /chat endpoint receives, streams to client                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

6. Path D β€” Persistent Task Worker (Background, 30min Cycle)

Operates completely independently of user messages.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  persistent_task_worker  (30min Β± 30% jitter)                      β”‚
β”‚                                                                     β”‚
β”‚  1. Expire stale tasks                          πŸ“₯πŸ“€ DB            β”‚
β”‚     Table: persistent_tasks                                         β”‚
β”‚     created_at > max_age β†’ mark EXPIRED                            β”‚
β”‚                                                                     β”‚
β”‚  2. Pick eligible task (FIFO within priority)   πŸ“₯ DB              β”‚
β”‚     State machine: PENDING β†’ RUNNING β†’ COMPLETED                    β”‚
β”‚                                                                     β”‚
β”‚  3. Load task + progress                        πŸ“₯ DB              β”‚
β”‚     persistent_tasks.progress (JSON as TEXT)                               β”‚
β”‚     Contains: plan DAG, coverage, step statuses                    β”‚
β”‚                                                                     β”‚
β”‚  4. Execution branch:                                               β”‚
β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚     β”‚  HAS PLAN DAG?   │─Yes─►│  Plan-Aware Execution         β”‚   β”‚
β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚  Ready steps = steps where    β”‚   β”‚
β”‚              β”‚ No             β”‚  all depends_on are DONE       β”‚   β”‚
β”‚              β–Ό                β”‚  Execute each ready step       β”‚   β”‚
β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚  via bounded ACT loop         β”‚   β”‚
β”‚     β”‚  Flat ACT Loop   β”‚      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚     β”‚  Iterate toward  β”‚                                           β”‚
β”‚     β”‚  goal directly   β”‚                                           β”‚
β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                           β”‚
β”‚                                                                     β”‚
β”‚  5. Bounded ACT Loop (both branches):           🧠 LLM  per iter  β”‚
β”‚     max_iterations=5, cumulative_timeout=30min                     β”‚
β”‚     Same fatigue model as interactive ACT loop                     β”‚
β”‚                                                                     β”‚
β”‚  6. Atomic checkpoint                           πŸ“€ DB              β”‚
β”‚     persistent_tasks.progress (JSON as TEXT, atomic UPDATE)        β”‚
β”‚     Saves: plan, coverage %, step statuses, last results           β”‚
β”‚                                                                     β”‚
β”‚  7. Coverage check                              ⚑ DET             β”‚
β”‚     100% complete β†’ mark COMPLETED                                 β”‚
β”‚                                                                     β”‚
β”‚  8. Adaptive surfacing (optional)                                   β”‚
β”‚     After cycle 2, or coverage jumped > 15%                        β”‚
β”‚     β†’ Proactive message to user                                    β”‚
β”‚     β†’ πŸ“€ M  pub/sub proactive channel                              β”‚
β”‚                                                                     β”‚
β”‚  PLAN DECOMPOSITION (called on task creation):  🧠 LLM  ~300ms    β”‚
β”‚  PlanDecompositionService                                           β”‚
β”‚  Prompt: plan-decomposition.md                                      β”‚
β”‚  Output: { steps: [{ id, description, depends_on: [] }] }          β”‚
β”‚  Validates: Kahn's cycle detection, quality gates (Jaccard <0.7),  β”‚
β”‚             confidence > 0.5, step word count 4-30                 β”‚
β”‚  Stores: persistent_tasks.progress.plan (JSON as TEXT)              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

7. Path E β€” Reasoning Loop (Background, 600s Idle-Only)

Runs only when all PromptQueues are idle. Signal-driven continuous reasoning.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  reasoning_loop_service  (600s idle timeout, signal-driven)        β”‚
β”‚                                                                     β”‚
β”‚  Preconditions:                               ⚑ DET               β”‚
β”‚    All queues idle?   πŸ“₯ M  (queue lengths = 0)                    β”‚
β”‚    Recent episodes exist? (lookback 168h)  πŸ“₯ DB                   β”‚
β”‚    Bail if user is in deep focus           πŸ“₯ M  focus:{thread_id} β”‚
β”‚                                                                     β”‚
β”‚  1. Seed Selection (weighted random)          ⚑ DET               β”‚
β”‚     Salient  0.60 β”‚ Insight  0.40                                   β”‚
β”‚     Source: πŸ“₯ DB  episodes table (by category)                    β”‚
β”‚                                                                     β”‚
β”‚  2. Spreading Activation (depth ≀ 2)          ⚑ DET               β”‚
β”‚     πŸ“₯ DB  semantic_concepts, semantic_relationships               β”‚
β”‚     πŸ“₯πŸ“€ M  cognitive_drift_activations  (sorted set)              β”‚
β”‚     πŸ“₯πŸ“€ M  cognitive_drift_concept_cooldowns  (hash)              β”‚
β”‚     Collect top 5 activated concepts                               β”‚
β”‚                                                                     β”‚
β”‚  3. Thought Synthesis                         🧠 LLM  ~100ms       β”‚
β”‚     Prompt: cognitive-drift.md + soul.md                           β”‚
β”‚     Input:  activated concepts + soul axioms                       β”‚
β”‚     Output: thought text                                            β”‚
β”‚                                                                     β”‚
β”‚  4. Store drift gist                          πŸ“€ M               β”‚
β”‚     key: gist:{topic}  (30min TTL)                                  β”‚
β”‚     Will surface in frontal cortex context on next user message    β”‚
β”‚                                                                     β”‚
β”‚  5. Action Decision Routing                   ⚑ DET               β”‚
β”‚     Scores registered actions:                                      β”‚
β”‚                                                                     β”‚
β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚     β”‚  Action       β”‚ Priority β”‚  What it does                   β”‚ β”‚
β”‚     β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚
β”‚     β”‚  COMMUNICATE  β”‚    10    β”‚  Push thought to user (deferred)β”‚ β”‚
β”‚     β”‚  PLAN         β”‚     7    β”‚  Propose persistent task 🧠 LLM β”‚ β”‚
β”‚     β”‚  SEED_THREAD  β”‚     6    β”‚  Plant new conversation seed    β”‚ β”‚
β”‚     β”‚  REFLECT      β”‚     5    β”‚  Internal memory consolidation  β”‚ β”‚
β”‚     β”‚  RECONCILE    β”‚     4    β”‚  Contradiction resolution       β”‚ β”‚
β”‚     β”‚  AMBIENT_TOOL β”‚     3    β”‚  Context-triggered tool use     β”‚ β”‚
β”‚     β”‚  NOTHING      β”‚     0    β”‚  Always available fallback      β”‚ β”‚
β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                                     β”‚
β”‚     Winner selected by score (ties broken by priority)             β”‚
β”‚     PLAN action β†’ calls PlanDecompositionService  🧠 LLM          β”‚
β”‚                β†’ stores in persistent_tasks  πŸ“€ DB                 β”‚
β”‚                                                                     β”‚
β”‚  6. Deferred queue                             πŸ“€ M               β”‚
β”‚     COMMUNICATE β†’ stores thought for quiet-hours delivery          β”‚
β”‚     Async: flushes when user returns from absence                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

8. Complete Storage Access Map

MemoryStore Keys Reference

Key Pattern                        TTL        Read    Written by
─────────────────────────────────────────────────────────────────────
fok:{topic}                        β€”          A,B     FOK update service
world_model:items                  β€”          A       WorldStateService
reasoning_loop:activations         β€”          E       Reasoning loop
reasoning_loop:cooldowns           β€”          E       Reasoning loop
output:{request_id}                short      /ws     digest_worker

PromptQueues (in-memory, thread-safe):
prompt-queue                       β€”          β€”       run.py β†’ digest_worker
episodic-memory-queue              β€”          D       encode event handler
semantic_consolidation_queue       β€”          D       episodic_memory_worker

SQLite Tables Reference

Table                      When Written                    When Read
──────────────────────────────────────────────────────────────────────
interaction_log            Phase D (every message)         observability endpoints
cortex_iterations          ACT loop, Path B                observability endpoints
episodes                   episodic_memory_worker (async)  frontal_cortex, reasoning loop
concepts                   semantic_consolidation (async)  drift engine, context assembly
semantic_relationships     semantic_consolidation          drift engine
user_traits                IIP hook                        identity service
persistent_tasks           Path D (task worker)            persistent_task_worker
topics                     Phase A (new topic)             topic_classifier
threads                    session management              session_service
topic_transcript           Phase D                         context_assembly
place_fingerprints         ambient inference               place_learning_service

9. LLM Call Inventory

Every LLM call in the system, with typical latency and model used.

Service                      Model            Prompt                   Latency   Triggered by
────────────────────────────────────────────────────────────────────────────────────────────────
TopicClassifierService       lightweight      topic-classifier.md      ~100ms    Every message
ModeRouterService (tiebreaker) ONNX           mode-tiebreaker model    ~5ms      Non-user flows only
FrontalCortex (UNIFIED)      primary model    soul + unified.md        ~500ms-2s User path
FrontalCortex (ACT plan)     primary model    frontal-cortex-act.md    ~500ms-2s Path C ACT loop
FrontalCortex (terminal)     primary model    mode-specific            ~500ms-2s After ACT loop
CriticService                lightweight      critic.md                ~200ms    Path B (optional)
ReasoningLoop (thought)      lightweight      cognitive-drift.md       ~100ms    Path E
PlanDecompositionService     lightweight      plan-decomposition.md    ~300ms    On task creation
episodic_memory_worker       lightweight      episodic-memory.md       ~200ms    Phase D async
semantic_consolidation       lightweight      semantic-extract.md      ~200ms    Phase D async

Deterministic paths (zero LLM):

  • IIP hook (regex)
  • Intent classifier
  • Empty guard / CANCEL detection (inline in unified path)
  • Mode router scoring (non-user flows)
  • Fatigue budget check in ACT loop
  • Termination checks
  • Spreading activation in drift engine
  • Plan DAG cycle detection (Kahn’s)
  • FOK / warmth / memory confidence calculations

10. Latency Profile by Path

Path              P50 Latency    Bottleneck
────────────────────────────────────────────────────────────
Unified (user)    1s – 3s        Unified LLM call (primary model)
Unified + skills  2s – 30s       Skill execution (varies)
B β€” ACT + Tools   5s – 30s+      Tool execution (background workers)
D β€” Task Worker   30min cycle    Background, no user wait
E β€” Drift         300s cycle     Background, no user wait

Component latency breakdown (unified path, typical):
  Context assembly     <10ms   ── MemoryStore reads (all cached)
  Intent classify      ~5ms    ── Deterministic
  Unified LLM call     ~800ms  ── Primary model (varies by provider)
  Working memory write <5ms    ── MemoryStore append
  DB event log         ~10ms   ── SQLite WAL write
  WS publish           ~1ms    ── MemoryStore pub/sub
  ─────────────────────────────────────────────────────────
  Total (typical)      ~0.85s

11. Architectural Principles Visible in the Flow

Principle Where it shows up in the flow
Attention is sacred Unified path lets LLM decide when to act vs respond β€” no wasted routing overhead; ACT fatigue model prevents runaway tool chains
Judgment over activity Single unified LLM call for user messages; mode router handles non-user flows deterministically
Tool agnosticism ActDispatcherService routes all tools generically β€” no tool names anywhere in the Phase B/C infrastructure
Continuity over transactions Working memory, gists, episodes, concepts all feed every response; drift gists surface even on next message
Single authority Router weight mutation bounded by single regulator (24h cycle, Β±0.02/day max)

Last updated: 2026-03-21. See docs/INDEX.md for the full documentation map.