An Economy of Effort | Chalie Build Log

An Economy of Effort & Self-Awareness

The biggest theme today was building an internal “effort economy” to make Chalie more efficient and self-aware. Previously, subsystems would run on their schedule regardless of the context available. Now, many background workers — from the decay engine to curiosity pursuit — first check the richness of Chalie’s memory. If the system is too new or memory is too thin to produce useful results, they skip their cycle, saving energy and preventing vacuous output.

This principle extends to tool selection. We introduced effort-aware pre-filtering, where the system estimates the ‘effort’ of a user’s request and filters the available tools accordingly. Trivial requests won’t see deep, complex tools, and deep requests that are mistakenly routed to simple tools will get a persistent_task injected to ensure they’re handled properly. Effort levels for tools are inferred from their source code and then self-corrected over time using a procedural memory feedback loop.

Underpinning this is the new SelfModelService, a foundational interoception layer. It runs continuously, providing an always-fresh, zero-LLM snapshot of Chalie’s epistemic, operational, and capability state. This self-awareness is injected into the frontal cortex prompts only when there’s a noteworthy issue, giving the core intelligence the context it needs to adapt its behavior without token overhead during normal operation.

We also overhauled the episodic memory consolidation process, replacing a brittle queue-and-poll system with a periodic EpisodicMemoryObserver. This new service scans active conversations and triggers consolidation based on “signal density” — a measure of how enriched a conversation is with facts, gists, and emotional variance. This is far more efficient, eliminating redundant jobs and race conditions, and it enables working memory to adaptively grow and shrink based on conversation intensity rather than sitting at a fixed cap.

Watched Folder Follow-Up

Yesterday’s watched folder feature shipped with a processing bug: the ingestion worker was re-queuing failed files on every cycle, causing an infinite retry loop that processed only 2 of 249 test documents before stalling. Today we replaced the serial, thread-per-job approach with a proper concurrent queue.Queue backed by three worker threads, made embedding generation thread-safe behind a lock, and stopped the system from re-queuing files unless they’ve actually changed on disk.

System-Wide Simplification

We made the architectural decision to be a single-user system and followed through with a deep refactoring to remove user_id from the entire codebase. This was a huge cleanup, touching over 50 files, seven database tables, and dozens of service-layer methods. The result is a much simpler, cleaner data model and API surface, eliminating a whole class of potential bugs and logical complexity.

As part of this, we cleaned up the test suite, removing user_id artifacts and fixing several instances of cross-test state pollution by adding fixtures to isolate database writes. We also hardened the database startup process to always apply the core schema.sql file, ensuring that new tables are created idempotently on existing databases without requiring explicit migration files for every change.

Tooling and Interaction Reliability

We expanded the library of installable tools, adding YouTube, the privacy-focused SearXNG, and a Google Account integration for Gmail and Calendar. The YouTube tool was also promoted to a “trusted” status.

On the user-facing side, we introduced deterministic action buttons in the chat UI. When a skill needs a specific response (like confirming a persistent task), it can now return structured actions that render as buttons. Clicking a button sends a payload that routes directly to the skill handler, completely bypassing the main LLM-based mode router. This solves the classic problem where a user’s response of “yes” could be misinterpreted, making multi-step interactions far more reliable.

Bug Fixes

A batch of five smaller fixes landed today: a JSON leak in the frontal cortex response, incorrect task routing that was sending ACT-loop tasks to the wrong handler, a task-strip bug that was dropping the first character of task output, card suppression that was hiding valid tool results, and a document-card rendering issue. We also guarded against a crash caused by the LLM occasionally returning a non-string where a JSON object was expected — now the system logs and skips gracefully rather than throwing.

Persistent task handling got tighter too: drift-generated tasks are now auto-accepted without prompting the user, duplicate detection is now topic-based rather than text-similarity-based (far fewer false positives), and proposed tasks now have a proper expiry so they don’t accumulate indefinitely.