SIGNAL CONTRACT

Signal Contract — Continuous Reasoning Spine

This document defines the contract that governs Chalie’s transition from independent timer-based services to a unified signal-driven architecture. It is the governing spec for all migration work.

Status: Active — governs all service migration decisions.


1. Governing Principles

1.1 Simplicity Over Cleverness

Every service must be as minimal as possible. Complexity compounds across 40+ services — a 10% increase in complexity per service is a 40x increase in system debugging difficulty. When in doubt, do less.

1.2 Graceful Isolation

“Forgetting my name for a split-second doesn’t put me in a vegetative state.”

No service failure may cascade into other services. Every signal consumer must operate under the assumption that any signal source may be dead, delayed, or producing garbage. The system degrades gracefully — individual capabilities may temporarily weaken, but the core reasoning loop never stops.

Concrete rules:

  • Every signal consumption is wrapped in try/except at the boundary
  • Every service has a fail-open default: if it can’t do its job, it returns a neutral result (empty string, no-op, skip), never raises into the caller
  • No service holds locks that other services need
  • No service writes state that another service must read to function (MemoryStore state is advisory, never mandatory)
  • A service being dead means its signals stop arriving — consumers treat “no signal” as “nothing interesting happened”, not as an error

1.3 Independent Testability

Every service must be testable in complete isolation:

  • Unit tests use in-memory MemoryStore and :memory: SQLite — no shared state
  • No test may depend on another service being initialized
  • Every service is covered by at least one chalie-nightly-test blackbox scenario
  • Integration between services is tested by the nightly suite, not by unit tests

1.4 Service Layers (Fault Domains)

Every service belongs to exactly one of three layers. Failures are contained within a layer — they never cascade across layer boundaries.

Layer Analogy What it does If it fails…
Cognitive Brain Reasoning, memory formation, consolidation, decay, planning, reflection …you stop reasoning well, but you still perceive and can still use tools
Embodiment Body/Senses Perception, ambient awareness, place learning, context tracking, voice I/O …you lose awareness of surroundings, but you can still think and act on what you know
Capability Tools/Hands External tools, document processing, scheduling, list management …you lose specific abilities, but you find alternatives or report inability

Cognitive services: DecayEngine, SemanticConsolidation, EpisodicMemoryWorker, MemoryChunker, ReasoningLoopService, ContextAssembly, ModeRouter, PlanDecomposition, CriticService, UncertaintyService, ContradictionClassifier, IdleConsolidation, GrowthPattern, AutobiographySynthesis, GoalInference, SelfModel

Embodiment services: AmbientInference, PlaceLearning, ClientContext, EventBridge, VoiceService, FolderWatcher, TemporalPattern, EpisodicMemoryObserver, ThreadExpiry

Capability services: ToolRegistry, ToolWorker, ToolSubprocess, ToolConfig, ToolProfile, ToolPerformance, ACTLoop, ACTDispatcher, DocumentService, DocumentProcessing, DocumentPurge, SchedulerService, ListService, PersistentTaskWorker, MomentEnrichment, ProfileEnrichment

Cross-layer rules:

  • Cognitive services never import embodiment or capability services at module level (lazy imports only)
  • Embodiment services write to MemoryStore; cognitive services read from MemoryStore. Never direct calls.
  • Capability failures surface as “tool unavailable” — the cognitive layer plans around them, never crashes
  • A full embodiment outage means ambient signals stop arriving. The cognitive layer treats this as “nothing interesting is happening” (idle), not as an error

1.5 Minimal Surface Area

Each service exposes the minimum interface needed:

  • One public method for its primary job (e.g., process(), consolidate(), decay())
  • Signal emission is a side-effect, not the primary interface
  • No service exposes internal state to other services except through MemoryStore advisory keys

2. Signal Envelope

All signals flowing through the spine use this format:

@dataclasses.dataclass
class ReasoningSignal:
    signal_type: str          # What happened (see §3)
    source: str               # Who emitted it (service name)
    concept_id: int | None    # Direct concept reference (fast path)
    concept_name: str | None  # Human-readable label
    topic: str | None         # Domain/topic context
    content: str | None       # Freeform payload (< 200 chars)
    activation_energy: float  # 0.0–1.0, how important/urgent
    timestamp: float          # When emitted (epoch)

2.1 Signal Types (Registered)

Signal Type Meaning Emitter(s) Energy Range
memory_pressure Knowledge is fading or contradicted decay_engine, semantic_consolidation 0.5–0.7
new_knowledge New concept formed from experience semantic_consolidation 0.6
novel_observation Surprising tool output stored as episode experience_assimilation 0.6
ambient_context Environment changed (place, attention, energy) event_bridge From confidence
idle_discovery Nothing happened, engine self-seeds reasoning_loop (internal) 0.4–0.5
episode_created New narrative episode consolidated episodic_memory_worker 0.5
trait_changed User trait created, updated, or corrected knowledge_service 0.3–0.7
task_state_changed Persistent task state transition persistent_task_service 0.5–0.6
schedule_fired Scheduled reminder/task fired scheduler_service 0.5
thread_expired Conversation thread expired thread_expiry_service 0.3
user_message User sent a chat message websocket 1.0
goal_inferred Recurring topic pattern detected as potential goal goal_inference_service 0.6

Note: Signal handlers also update the world model cache in MemoryStore (world_model:items). task_state_changed and schedule_fired trigger incremental cache updates via WorldStateService.notify_task_changed() / notify_schedule_changed(). The cache is fully refreshed from DB during idle periods.

New signal types require:

  1. Addition to this table
  2. A nightly test scenario
  3. Documentation of what the consumer should do with it

2.2 Signal Transport

  • Priority queue: reasoning:priority (user messages — processed first)
  • Background queue: reasoning:signals (all other signal types)
  • Pop: blpop([priority, signals], timeout=idle_timeout) — tries priority first
  • Push: rpush(key, signal.to_json())
  • Max depth: 50 signals (oldest dropped on overflow, background queue only)
  • Debounce: 30s minimum between processed background signals (user messages bypass)
  • Serialization: JSON via dataclasses.asdict()
  • Yield points: Background signal processing checks priority queue before expensive operations (LLM calls); if a user message is waiting, background reasoning aborts and the loop picks up the priority signal

2.3 Emission Rules

  • Emission is always fire-and-forget — the emitter never waits for a response
  • Emission is always wrapped in try/except — a failed emit is logged at DEBUG, never raised
  • Emission uses lazy imports (from services.reasoning_loop_service import emit_reasoning_signal, ReasoningSignal) to avoid import cycles
  • Emitters never instantiate the consumer — they push to the queue and forget

3. Service Lifecycle Contract

3.1 Registration

Every spine-connected service declares in its module docstring:

Emits: signal_type_1, signal_type_2
Consumes: signal_type_3 (via reasoning:signals queue)
Trigger: <timer Ns | signal-driven | request-driven | one-shot>
Fail mode: <fail-open description>

3.2 Health

Every long-running service writes a heartbeat:

store.set(f"health:{service_name}", str(time.time()), ex=ttl)

Where ttl is 2x the expected cycle time. The SelfModelService (30s cycle) reads these heartbeats and includes dead services in its noteworthy[] list. No automated restart — health is observational, not coercive.

3.3 Startup Order

Services start in dependency order (managed by run.py), but no service assumes another service is running. If a dependency isn’t ready:

  • Queue-based: messages accumulate, processed when consumer starts
  • Signal-based: signals accumulate (up to queue cap), processed when consumer starts
  • Direct call: try/except, return neutral default

4. Migration Pattern

4.1 Converting a Timer Service to Signal-Responsive

For a service that currently runs on time.sleep(N):

Before:

def run(self):
    while True:
        time.sleep(self.interval)
        self._do_work()

After (Phase 1 — emit signals, keep timer):

def run(self):
    while True:
        time.sleep(self.interval)
        self._do_work()
        # NEW: emit signal if something interesting happened
        if result.is_interesting:
            emit_reasoning_signal(ReasoningSignal(...))

After (Phase 2 — consume signals, remove timer):

def run_signal_loop(self):
    while True:
        signal = self.store.blpop("service:signals", timeout=self.max_idle)
        if signal:
            self._process_signal(signal)
        else:
            self._idle_maintenance()

Phase 1 is always safe to ship independently. Phase 2 requires the spine to route signals to the service.

4.2 Migration Checklist (Per Service)

  • [ ] Service docstring updated with Emits/Consumes/Trigger/Fail-mode
  • [ ] Signal emission added (Phase 1)
  • [ ] Unit tests pass in isolation
  • [ ] Nightly scenario created/updated
  • [ ] Timer removed, signal consumption added (Phase 2)
  • [ ] Fail-open verified (service killed → system continues)
  • [ ] Documented in this file’s migration tracker (§5)

5. Migration Tracker

Phase 1 Complete (Emits Signals, Keeps Timer)

Service Signals Emitted Timer Nightly Scenario
DecayEngineService memory_pressure 30min 966
SemanticConsolidationService new_knowledge, memory_pressure Queue-driven 967
ExperienceAssimilationService novel_observation 60s poll
EventBridgeService ambient_context Event-driven 968
EpisodicMemoryWorker episode_created Queue-driven 971
KnowledgeService trait_changed Request-driven 972
PersistentTaskService task_state_changed Request/timer 973
SchedulerService schedule_fired 60s timer 974
ThreadExpiryService thread_expired 5min timer 975
EpisodicMemoryWorker goal_emerged Post-episode clustering + LLM

Phase 2 Complete (Signal-Driven, No Timer)

Service Signals Consumed Idle Fallback Nightly Scenario
ReasoningLoopService All signal types 10min → salient/insight 965, 968, 969

Not Yet Started

Service Current Trigger Priority Notes
EpisodicMemoryObserver 60s timer Could react to gist-stored signals
IdleConsolidationService 5min timer Could react to queue-drain signals
GrowthPatternService 30min timer Could react to trait-change signals
AutobiographySynthesis 6h timer Low Long cycle, timer is fine for now
PersistentTaskWorker 30min timer Could react to plan-ready signals
ProfileEnrichmentService 6h timer Low Long cycle, timer is fine
TemporalPatternService 6h timer Low Long cycle, timer is fine
SelfModelService 30s timer Heartbeat aggregator, timer is natural
DocumentPurgeService 6h timer Low Maintenance, timer is fine
MomentEnrichmentService 5min timer Low Polling for status, timer is fine
FolderWatcherService 30s timer Low OS-level polling, timer is natural

6. Anti-Patterns

6.1 Signal Cascades

Bad: Service A emits signal → Service B processes it and emits signal → Service C processes it and emits signal → Service A processes it. Rule: No circular signal paths. If A emits to B, B must never emit back to A through any chain. Draw the signal graph before adding a new emission point.

6.2 Signal as RPC

Bad: Service A emits a signal and waits for a response. Rule: Signals are fire-and-forget. If you need a response, use a direct function call or a dedicated result queue (like bg_llm:result:{job_id}).

6.3 Mandatory Signals

Bad: Service B crashes if it doesn’t receive a signal from Service A within N seconds. Rule: No signal is mandatory. “No signal” means “nothing interesting happened”, never “something is broken”. Timeouts trigger idle/maintenance behavior, not error states.

6.4 Fat Signals

Bad: Signal payload contains the full episode text, embeddings, or large data structures. Rule: Signals carry references (concept_id, topic) and summaries (content < 200 chars). The consumer looks up full data from SQLite/MemoryStore if needed.

6.5 Signal-Driven Configuration

Bad: Using signals to propagate config changes across services. Rule: Config is read from files/DB at service init or on a slow reload cycle. Signals carry cognitive events, not infrastructure state.


7. The Spine (Future)

The current architecture has a single consumer (ReasoningLoopService) reading from a single queue (reasoning:signals). The future spine will:

  1. Route signals to multiple consumers — each service registers interest in specific signal types
  2. Priority scheduling — user-facing signals preempt background maintenance
  3. Backpressure — slow consumers don’t cause queue overflow for fast consumers
  4. Observability — signal flow is logged and queryable for debugging

This is explicitly not built yet. The current single-queue model is sufficient for Phase 1 (emit signals) and the initial Phase 2 conversions. The spine emerges when enough services are signal-driven that routing becomes necessary.

Build the spine when you need it, not before.