Opening the Black Box | Chalie Build Log

Transparency, Control, and Trust

A huge thematic push today focused on making Chalie less of a black box. The goal is to give users total visibility and control over what Chalie knows and does. We shipped a whole suite of features to support this:

Decision Explanations: The introspect skill can now answer “why did you do that?” by surfacing the routing scores and signals that led to an action. The prompt frames the output in plain language (Trigger, Reasoning, Confidence) to avoid exposing raw numerics unless requested.
Conversational Inspection & Correction: Users can now ask “what do you know about me?” and get a nuanced summary from the new user_traits layer in the recall skill. Crucially, they can also correct Chalie’s beliefs in-conversation (“I don’t actually like sushi”), and their word always wins, overwriting inferred traits with high confidence.
Autonomous Activity Feed: To provide insight into background work, a new unified activity feed shows what Chalie did while the user was away—proactive messages, scheduled tasks, etc.—in a simple chronological list.
Privacy & Data Portability: We completed the data privacy feature set. The delete-all function is now exhaustive, wiping data across all ~22 PostgreSQL tables and ~55 Redis key patterns. We also added a full data export endpoint, giving users a complete JSON backup of their data.

Deeper Cognition and Memory

Beyond just surfacing existing state, we also deepened Chalie’s cognitive architecture.

First, we introduced a Temporal Pattern Mining worker. It analyzes interaction history to find statistically significant behavioral patterns, like when a user is most active or discusses certain topics. These are stored as behavioral_pattern traits with generalized labels like “evenings” (instead of “10-11pm”) to be insightful without feeling creepy.

Second, we wired Procedural Memory into the context assembly pipeline. When Chalie is in RESPOND mode, it now gets hints about the reliability of its own skills and tools. This helps it understand which actions have historically yielded good results, using softened labels like “less consistent results so far” for skills with low success rates to avoid discouraging exploration too early.

Finally, the semantic concept graph is now actually wired into the prompt context, which was a long-overdue plumbing fix. Accumulated concepts are now visible to the response generation model for the first time.

We shipped a significant change to how tools present rich information with Deferred Card Rendering. Previously, a tool had to render its output immediately. Now, a tool can flag its output for deferred rendering, which caches the data and surfaces an emit_card offer to the LLM in the ACT loop. The model can then decide whether to show the rich card or just respond with text. This finally lets us get rid of the awkward “I can’t display images” style of response.

I also reframed the autobiography feature into a first-person self-narrative. Instead of a clinical third-person profile of the user, Chalie now synthesizes its memories into a story about its own growth and its relationship with the user.

Hardening and Housekeeping

With so many new features in flight, we spent time on stability. A major commit fixed a race condition in persistent task execution and hardened the API endpoints for the activity feed and belief correction systems. We also significantly improved test coverage for the core ModeRouterService, adding 22 new tests for edge cases in ACT scoring, hysteresis, and IGNORE/ACKNOWLEDGE modes. To wrap it all up, the flurry of new systems work was reflected in updated architecture and worker documentation.

Transparency, Control, and Trust

Deeper Cognition and Memory

System & Interaction Model Refinements

Hardening and Housekeeping