Cognitive Reflexes and Structured Planning

Learned Cognitive Reflexes

One of the biggest shifts today was the introduction of the CognitiveReflexService. Inspired by human automaticity, we now build semantic clusters of simple, repetitive queries using rolling-average centroids in pgvector. When a query hits a cluster with enough evidence (at least 5 observations and a high success rate), the system bypasses the full cognitive pipeline in favor of a lightweight LLM call. This reduces response times from ~7500ms down to under 1500ms for routine interactions while maintaining a shadow validation loop to ensure the reflex hasn’t “atrophied” or become incorrect.

Structured Intent via Plan Decomposition

Persistent tasks have evolved from flat ACT loops into structured execution plans. We added a PlanDecompositionService that uses LLM-powered DAG generation to break down complex goals into discrete steps. The system now handles dependency management via Kahn’s algorithm, Jaccard-based deduplication, and quality gates for step descriptions. The persistent task worker has been updated to execute these steps in order, respecting per-step fatigue budgets. On the frontend, the task strip now displays these plans with progress indicators and kind-specific markers.

Routing Optimizations and SSE Robustness

The message routing pipeline underwent a four-phase cleanup. We hoisted the social filter to catch ‘CANCEL’ or ‘IGNORE’ signals before hitting the topic classifier, saving roughly 100ms on trivial inputs. Signal collection is now deduplicated, and we’ve unified the external-tool ACT paths behind a feature flag. For reliability, we implemented SSE job health monitoring; the loop now tracks heartbeats from the tool worker to prevent the interface from hanging indefinitely if a background job stalls or crashes.

Failure Classification and UI Polish

We addressed Issue 004 by distinguishing between internal tool failures (crashes) and external ones (rate limits/timeouts). External failures now receive a significantly attenuated reward penalty (-0.05 vs -0.2), preventing the procedural memory from unfairly devaluing a tool due to upstream instability.

On the infrastructure side, we fixed a widespread psycopg2 bug where connections were being called directly instead of using cursors. We also transitioned the default model to Gemini 2.5 Flash following the retirement of 2.0 Flash for new users. The UI received a safety net polling interval for task completion and new dismiss actions for both reminders and persistent tasks.