ARCHITECTURE
System Architecture
Overview
Chalie is a human-in-the-loop cognitive assistant that combines memory consolidation, semantic reasoning, and proactive assistance. The system processes user prompts through a chain of workers and services, enriching conversations with memory chunks and generating episodic memories for future use.
Core Architecture
System Type
- Synthetic cognitive brain using LLMs to replicate human brain functions
- Tech Stack: Python backend, PostgreSQL + pgvector, Redis, Ollama (configurable LLMs), Vanilla JavaScript frontend (Radiant design system)
- Core Pattern: Worker-based architecture with Redis queue, service-oriented design
Communication Pattern
- User sends message → POST to
/chatwith text - Backend processes: Mode router selects mode → mode-specific LLM generates response
- Response delivered: Via SSE stream (status → message → done events)
- Authentication: Session cookie-based authentication (
@require_sessiondecorator)
Code Organization
backend/
├── services/ # Business logic (memory, orchestration, routing, embeddings)
├── workers/ # Async workers (digest, memory chunking, consolidation)
├── listeners/ # Input handlers (direct REST API)
├── api/ # REST API blueprints (conversation, memory, proactive, privacy, system)
├── configs/ # Configuration files (connections.json, agent configs, generated/)
├── migrations/ # Database migrations
├── prompts/ # LLM prompt templates (mode-specific)
├── tools/ # Skill implementations
├── tests/ # Test suite
└── consumer.py # Main supervisor process
Frontend applications located separately:
frontend/
├── interface/ # Main chat UI (HTML/CSS/JS, Radiant design system)
├── brain/ # Admin/cognitive dashboard
└── on-boarding/ # Account setup wizard
IMPORTANT: UI code must exist under /interface/, /brain/, or /on-boarding/ only.
Key Services
Core Services (backend/services/)
Routing & Decision Making
mode_router_service.py— Deterministic mode routing (~5ms) with signal collection + tie-breakerrouting_decision_service.py— Routing decision audit trail (PostgreSQL)routing_stability_regulator_service.py— Single authority for router weight mutation (24h cycle, ±0.02/day max)routing_reflection_service.py— Idle-time peer review of routing decisions via strong LLMcognitive_triage_service.py— LLM-based 4-step triage (social filter → LLM → self-eval → dispatch); routes to RESPOND/ACT/CLARIFY/ACKNOWLEDGE; defers tool selection to ACT loop when tools exist but none namedcognitive_reflex_service.py— Learned fast path via semantic abstraction; heuristic pre-screen (~1ms) + pgvector cluster lookup (~5-20ms) bypasses full pipeline for self-contained queries; rolling-average centroids generalize from observed examples; self-correcting per cluster via user corrections and shadow validation
Response Generation
frontal_cortex_service.py— LLM response generation using mode-specific promptsvoice_mapper_service.py— Translates identity vectors to tone instructions
Memory System
context_assembly_service.py— Unified retrieval from all 5 memory layers with weighted budget allocationepisodic_retrieval_service.py— Hybrid vector + FTS search for episodessemantic_retrieval_service.py— Vector similarity + spreading activation for conceptsuser_trait_service.py— Per-user trait management with category-specific decayepisodic_storage_service.py— PostgreSQL CRUD for episodic memoriessemantic_storage_service.py— PostgreSQL CRUD for semantic conceptsgist_storage_service.py— Redis-backed short-term memory with deduplicationlist_service.py— Deterministic list management (shopping, to-do, chores); perfect recall with full history vialists,list_items,list_eventstablesmoment_service.py— Pinned message bookmarks with LLM-enriched context, pgvector semantic search, and salience boosting; stores user-pinned Chalie responses as permanent, searchable moments viamomentstablemoment_enrichment_service.py— Background worker (5min poll): collects gists from ±4hr interaction window, generates LLM summaries, seals moments after 4hrs; boosts related episode salience on sealmoment_card_service.py— Inline HTML card emission for moment display in the conversation spine
Autonomous Behavior
cognitive_drift_engine.py— Default Mode Network (DMN) for spontaneous thoughts during idle; attention-gated (skips when user in deep focus)autonomous_actions/— Decision routing (priority 10→6): CommunicateAction, SuggestAction (skill-matched proactive suggestions), NurtureAction (gentle phase-appropriate presence), PlanAction (proactive plan proposals from recurring topics, 7-gate eligibility with signal persistence), ReflectAction, SeedThreadActionspark_state_service.py— Tracks relationship phase progression (first_contact → surface → exploratory → connected → graduated)spark_welcome_service.py— First-contact welcome message triggered on first SSE connection; runs once per lifecyclecuriosity_thread_service.py— Self-directed exploration threads (learning and behavioral) seeded from cognitive driftcuriosity_pursuit_service.py— Background worker exploring active threads via ACT loopdecay_engine_service.py— Periodic decay (episodic 0.05/hr, semantic 0.03/hr)
Ambient Awareness
ambient_inference_service.py— Deterministic inference engine (<1ms, zero LLM): place, attention, energy, mobility, tempo, device_context from browser telemetry + behavioral signals; thresholds loaded fromconfigs/agents/ambient-inference.json; emits transition events (place, attention, energy) to event bridge whenemit_events=Trueplace_learning_service.py— Accumulates place fingerprints (geohash ~1km, never raw coords) inplace_fingerprintstable; learned patterns override heuristics after 20+ observationsclient_context_service.py— Rich client context with location history ring buffer (12 entries), place transition detection, session re-entry detection (>30min absence), demographic trait seeding from locale, and circadian hourly interaction counts; emits session_start/session_resume events to event bridgeevent_bridge_service.py— Connects ambient context changes (place, attention, energy, session) to autonomous actions; enforces stabilization windows (90s), per-event cooldowns, confidence gating, aggregation (60s bundle window), and focus gates; config inconfigs/agents/event-bridge.json
ACT Loop & Critic
act_loop_service.py— Iterative action execution with safety limits (60s timeout)act_dispatcher_service.py— Routes actions to skill handlers with timeout enforcement; returns structured results with confidence and contextual notescritic_service.py— Post-action verification: evaluates each action result for correctness via lightweight LLM (reusescognitive-triageagent config); safe actions get silent correction, consequential actions pause; EMA-based confidence calibrationpersistent_task_service.py— Multi-session background task management with state machine (PROPOSED → ACCEPTED → IN_PROGRESS → COMPLETED/PAUSED/CANCELLED/EXPIRED); duplicate detection via Jaccard similarity; rate limiting (3 cycles/hr, 5 active tasks max)plan_decomposition_service.py— LLM-powered goal → step DAG decomposition; validates DAG (Kahn’s cycle detection), step quality (4–30 word descriptions, Jaccard dedup), and cost classification (cheap/expensive); plans stored inpersistent_tasks.progressJSONB; ready-step ordering (shallowest depth, cheapest first)
Tool Integration
tool_registry_service.py— Tool discovery, metadata management, and cron execution viarun_interactive(bidirectional stdin/stdout dialog protocol)tool_container_service.py— Container lifecycle;run()for single-shot,run_interactive()for bidirectional tool↔Chalie dialog (JSON-lines stdout, Chalie responses via stdin)tool_config_service.py— Tool configuration persistence; webhook key generation (HMAC-SHA256 + replay protection via X-Chalie-Signature/X-Chalie-Timestamp)tool_performance_service.py— Performance metrics tracking; correctness-biased ranking (50% success_rate, 15% speed, 15% reliability, 10% cost, 10% preference); post-triage tool reranking; user correction propagation; 30-day preference decaytool_profile_service.py— LLM-generated tool capability profiles withtriage_triggers(short action verbs injected into triage prompt for vocabulary bridging),short_summary,full_profile, andusage_scenarios; Redis-cached triage summaries (5min TTL)- Webhook endpoint (
/api/tools/webhook/<name>) — External tool triggers with HMAC-SHA256 or simple token auth, 30 req/min rate limit, 512KB payload cap
Identity & Learning
identity_service.py— 6-dimensional identity vector system with coherence constraintsidentity_state_service.py— Tracks identity state changes and evolutionuser_trait_service.py— User trait management with category-specific decay
Infrastructure
database_service.py— PostgreSQL connection pool and migrationsredis_client.py— Redis connection handlingconfig_service.py— Environment and JSON file config (precedence: env > .env > json)output_service.py— Output queue management for responsesevent_bus_service.py— Pub/sub event routingcard_renderer_service.py— Card system rendering engine
Topic Classification
topic_classifier_service.py— Embedding-based deterministic topic classification with adaptive boundary detectionadaptive_boundary_detector.py— 3-layer self-calibrating topic boundary detector (NEWMA + Transient Surprise + Leaky Accumulator); persists per-thread state in Redis; degrades gracefully to static threshold when Redis is unavailabletopic_stability_regulator_service.py— 24h adaptive tuning of topic classification and boundary detector parameters
Session & Conversation
thread_conversation_service.py— Redis-backed conversation thread persistencethread_service.py— Manages conversation threads with expirysession_service.py— Tracks user sessions and topic changes
Innate Skills (backend/services/innate_skills/ and backend/skills/)
10 built-in cognitive skills for the ACT loop:
recall_skill.py— Unified retrieval across ALL memory layers (<500ms)memorize_skill.py— Store gists and facts (<50ms)introspect_skill.py— Self-examination (context warmth, FOK signal, stats) (<100ms)associate_skill.py— Spreading activation through semantic graph (<500ms)scheduler_skill.py— Create/list/cancel reminders and scheduled tasks (<100ms)autobiography_skill.py— Retrieve synthesized user narrative with optional section extraction (<500ms)list_skill.py— Deterministic list management: add/remove/check items, view, history (<50ms)focus_skill.py— Focus session management: set, check, clear with distraction detection (<50ms)moment_skill.py— Natural language moment recall (“Do you remember…”) and listing via pgvector searchpersistent_task_skill.py— Multi-session background task management: create (with plan decomposition), pause, resume, cancel, check status, show plan, set priority (<100ms; create ~2-5s with LLM decomposition)
Worker Processes (backend/workers/)
Queue Workers
- Digest Worker — Core pipeline: classify → route → generate response → enqueue memory job
- Memory Chunker Worker — Enriches exchanges with memory chunks via LLM
- Episodic Memory Worker — Builds episodes from sequences of exchanges
- Semantic Consolidation Worker — Extracts concepts + relationships from episodes
Services/Daemons
- REST API Worker — Flask REST API on port 8080
- Cognitive Drift Engine — Generates spontaneous thoughts during worker idle (attention-gated: skips when user in deep focus)
- Ambient Inference Service — Deterministic inference of place, attention, energy, mobility, tempo from browser telemetry (<1ms, zero LLM)
- Place Learning Service — Accumulates place fingerprints in PostgreSQL; learned patterns override heuristics after 20+ observations
- Decay Engine — Periodic memory decay cycle
- Routing Stability Regulator — Single authority for router weight mutation
- Routing Reflection — Idle-time peer review of routing decisions
- Topic Stability Regulator — Adaptive tuning of topic classification parameters
- Experience Assimilation — Tool results → episodic memory (60s poll)
- Thread Expiry Service — Expires stale threads (5min cycle)
- Scheduler Service — Fires due reminders/tasks (60s poll)
- Autobiography Synthesis — Synthesizes user narrative (6h cycle)
- Triage Calibration — Triage correctness scoring (24h cycle); wires user corrections to tool preferences; learns usage scenarios from clarification→tool resolution chains
- Profile Enrichment — Tool profile enrichment (6h cycle, 3 tools/cycle); preference decay; usage-triggered full profile rebuilds (15 successes or reliability < 50%)
- Curiosity Pursuit — Explores curiosity threads via ACT loop (6h cycle)
- Moment Enrichment — Enriches pinned moments with gists + LLM summary, seals after 4hrs (5min poll)
- Persistent Task Worker — Runs eligible multi-session background tasks via bounded ACT loop (30min cycle with ±30% jitter); plan-aware execution follows step DAG when present (up to 3 steps/cycle with per-step fatigue budgets), falls back to flat loop otherwise; adaptive user surfacing at coverage milestones
Data Flow Pipeline
User Input → Response Pipeline
[User Input]
→ [Consumer] → [Prompt Queue] → [Digest Worker]
├─ Classification (embedding-based, adaptive boundary detection)
├─ Context Assembly (retrieve from all 5 memory layers)
├─ Mode Routing (deterministic ~5ms mathematical router)
├─ Mode-Specific LLM Generation
│ └─ If ACT: action loop → re-route → terminal response
└─ Enqueue Memory Chunking Job
→ [Memory Chunker Queue] → [Memory Chunker Worker]
→ [Conversation JSON] (enriched)
→ [Episodic Memory Queue] → [Episodic Memory Worker]
→ PostgreSQL Episodes Table
→ [Semantic Consolidation Queue] → [Semantic Consolidation Worker]
→ PostgreSQL Concepts Table
Background Processes
[Routing Stability Regulator] ← reads routing_decisions (24h cycle)
→ adjusts configs/generated/mode_router_config.json
[Routing Reflection Service] ← reads reflection-queue (idle-time)
→ writes routing_decisions.reflection → feeds pressure to regulator
[Decay Engine] → runs every 1800s (30min)
├─ Episodic decay (salience-weighted)
├─ Semantic decay (strength-weighted)
└─ User trait decay (category-specific)
[Cognitive Drift Engine] → during worker idle
├─ Seed selection (weighted random)
├─ Spreading activation (depth 2, decay 0.7/level)
└─ LLM synthesis → stores as drift gist
Key Architectural Decisions
Deterministic Mode Router
- Decoupled: Mode selection (mathematical, ~5ms) separate from response generation (LLM, ~2-15s)
- Signals: ~17 observable signals from context + NLP (context warmth, question marks, greeting patterns, etc.)
- Scores: Each mode gets weighted composite score; highest wins
- Tie-breaker: Small LLM (qwen3:4b) for ambiguous cases
- Self-leveling: Router naturally shifts toward RESPOND as memory accumulates
Single Authority for Weight Mutation
- Routing Stability Regulator is the only service that modifies router weights
- Other services log “pressure signals” but don’t mutate state
- Updates bounded: max ±0.02/day, 48h cooldown per parameter
- Closed-loop control: Verifies adjustments work before persisting
Mode-Specific Prompts
- Each mode (RESPOND, CLARIFY, ACKNOWLEDGE, ACT) has its own focused prompt template
- Replaces old approach: single combined prompt with mode selection embedded
- Focused scope prevents elaboration and improves consistency
Memory Hierarchy
- Working Memory (Redis, 4 turns, 24h TTL) — Current conversation
- Gists (Redis, 30min TTL) — Compressed exchange summaries
- Facts (Redis, 24h TTL) — Atomic key-value assertions
- Episodes (PostgreSQL + pgvector) — Narrative units with decay
- Concepts (PostgreSQL + pgvector) — Knowledge nodes and relationships
- User Traits (PostgreSQL) — Personal facts with category-specific decay
- Lists (PostgreSQL) — Deterministic ground-truth state (shopping, to-do, chores); perfect recall, no decay, full event history
Each layer optimized for its timescale; all integrated via context assembly. Lists are injected into all prompts as `` for passive awareness; the ACT loop uses the list skill for mutations.
Configuration Precedence
Environment variables > .env file > JSON config files > hardcoded defaults
See docs/02-PROVIDERS-SETUP.md for provider configuration.
Thread-Safe Worker State
WorkerManagermaintains shared dictionary viamultiprocessing.Manager()- Workers use
WorkerBase._update_shared_stateto merge per-worker metrics - Avoids global locks, keeps worker pool lightweight
Adaptive Topic Boundary Detection
- Replaces static 0.65 cosine similarity threshold with a 3-layer self-calibrating detector
- NEWMA (fast/slow EWMA divergence) detects gradual semantic drift
- Transient Surprise (z-score of similarity drop) catches sharp topic shifts
- Leaky Accumulator provides hysteresis — single-message outliers don’t create false topics
- All thresholds derived from running conversation statistics; no manual tuning
- State persisted in Redis (
adaptive_boundary:{thread_id}, 24h TTL); cold-start fallback (0.55 threshold) when Redis unavailable or < 5 messages - Base parameters (
accumulator_boundary_base,accumulator_leak_rate, NEWMA windows) are the slow outer loop controlled by Topic Stability Regulator
Topic Confidence Reinforcement
- Topic confidence updated via bounded reinforcement formula
new = current + (new_confidence - current) * 0.5- Ensures gradual adaptation without oscillation
Error Resilience
- All workers catch JSON decode errors from LLM responses
- Log meaningful messages instead of crashing
- Return status strings for graceful degradation
Safety & Constraints
Hard Boundaries
- Prompt hierarchy immutable (marked as “authoritative and final”)
- Skill registry fixed at startup (no runtime skill registration)
- Data scope parameterized by topic (no cross-topic leakage)
- Speaker confidence gates trait storage (unknown speakers = 0.3 penalty)
Operational Limits
- ACT loop: 60s cumulative timeout, ~7 max iterations; post-action critic verification (0.3 fatigue cost per evaluation)
- Persistent tasks: 5 active max, 3 cycles/hr rate limit, 14-day auto-expiry; plan-decomposed tasks: 3–8 steps per plan, up to 3 steps executed per cycle, 3 ACT iterations per step
- Fatigue budget: 2.5 activation units per 30min
- Per-concept cooldown: 60min (prevents circular rumination)
- Delegation rate: 1 per topic per 30min
Anti-Manipulation
- Identity isolation: 6 vectors with coherence constraints
- No vulnerability simulation: Explicitly forbidden
- Exponential backoff: System retreats on silence (opposite of dependency)
- No flattery optimization: Soul axiom: “Never optimize by misleading”
Configuration Files
Primary Configuration
configs/connections.json— Redis & PostgreSQL endpointsconfigs/agents/*.json— LLM settings (model, temperature, timeout)configs/generated/mode_router_config.json— Learned router weights (generated)
Provider Configuration
- Stored in PostgreSQL
providerstable (not JSON files) - Runtime configurable via REST API (
/api/providers) - Supports: Ollama, Anthropic, OpenAI, Google Gemini
See docs/02-PROVIDERS-SETUP.md for detailed setup instructions.
REST API
Available Blueprints
user_auth— Account creation, login, API key managementconversation— Chat endpoint (SSE streaming), conversation list/retrievalmemory— Memory search, fact managementproactive— Outreach/notifications, upcoming tasksprivacy— Data deletion, exportsystem— Health, version, settings, observability (routing, memory, tools, identity, tasks, autobiography, traits)tools— Tool execution, configurationproviders— LLM provider configurationpush— Push notification subscriptionscheduler— Reminders and scheduled taskslists— List managementstubs— Placeholder endpoints (calendar, notifications, integrations, voice, permissions) returning 501
Observability Endpoints (/system/observability/*)
routing— Mode router decision distribution and recent activitymemory— Memory layer counts and health indicatorstools— Tool performance statsidentity— Identity vector statestasks— Active persistent tasks, curiosity threads, triage calibrationautobiography— Current autobiography narrative with delta (changed/unchanged sections)traits(GET) — User traits grouped by category with confidence scorestraits/<key>(DELETE) — Remove a specific learned trait (user correction)
See API blueprints in backend/api/ for full reference.
Testing Strategy
Test Markers
@pytest.mark.unit— No external dependencies (fast)@pytest.mark.integration— Requires PostgreSQL/Redis (slower)
Test Organization
backend/tests/
├── test_services/ # Service unit tests
├── test_workers/ # Worker integration tests
└── fixtures/ # Shared test fixtures
Run all tests: pytest
Run only unit: pytest -m unit
Run with verbose: pytest -v
Development Workflow
Setup
cd backend
pip install -r requirements.txt
source .venv/bin/activate
cp .env.example .env
Local Development (without Docker)
# Terminal 1: PostgreSQL + Redis
# (ensure postgres + redis running locally)
# Terminal 2: Consumer (all workers)
python consumer.py
# Terminal 3: Test/debug
python -c "from api import create_app; app = create_app(); app.run()"
Docker Development
docker-compose build
docker-compose up -d
docker-compose logs -f backend
Deployment Notes
- No Telemetry: Zero external calls except to configured LLM/voice providers
- Local First: All data stored locally unless external providers configured
- Encryption: API keys and provider credentials encrypted in PostgreSQL
- CORS: Defaults to localhost, restrict before production
- Default Password: PostgreSQL password is
chalie— change in production
Future Roadmap
Completed
- User memory transparency API: Observability endpoints for autobiography, traits, memory, routing, identity, tools, and tasks — with user trait deletion for correction
Planned (Priority 1)
Planned (Priority 2)
- Cross-topic pattern mining: Behavioral prediction, sequence rules
- Active error detection: Pre-delivery validation against known facts
- Negative memory mechanism: Store “X is FALSE” assertions
Planned (Priority 3)
- Formal hypothesis testing: A/B evaluation of alternatives
- Sandboxed computation: Math evaluation and code execution skill
- Memory versioning: Track how beliefs change over time
Glossary
- Mode Router: Deterministic mathematical function selecting engagement mode from observable signals
- Tie-Breaker: Small LLM consulted when top 2 modes are within effective margin
- Routing Signals: Observable features collected from Redis and NLP analysis (~5ms)
- Router Confidence: Normalized gap between top 2 scores — measures routing certainty
- Pressure Signal: Metric logged by monitors, consumed by the single regulator
- Context Warmth: Signal (0.0-1.0) measuring how much context is available for current topic
- Drift Gist: Spontaneous thought stored during idle periods (DMN)
- Episode: Narrative memory unit with intent, context, action, emotion, outcome, salience
- Concept: Knowledge node with strength decay and spreading activation
- Salience: Computed importance metric (0.1-1.0) based on novelty, emotion, commitment