Hardening the Foundations: Tools, Services, and CI

Hardening the Tool System

A significant portion of today’s work centered on making the tool system more robust, especially during initial setup. On a fresh install, Chalie was starting up and accepting traffic before its default tools were downloaded, leading to a degraded state where it couldn’t act on the world. We’ve fixed this by making the default tool installation process synchronous��Chalie now blocks startup until all essential tools are downloaded and registered.

We also addressed a subtle race condition where the system might try to use a trusted tool before its Python dependencies were fully installed via pip. A new _deps_ready gate in the ToolRegistryService prevents tools from being exposed to the cognitive loop until they are fully ready for invocation.

Observability also got an upgrade. Previously, our tool performance metrics were artificially inflated because failures happening at the registry level (e.g., a tool failing to build or load) weren’t being recorded. Now, _log_outcome() reports to the ToolPerformanceService for all invocation attempts, giving us a much more accurate picture of tool reliability (#150).

Finally, we cleaned up some broken tool repository pointers in embodiment_library.json and created official GitHub releases for several tools that were missing them, streamlining the auto-installation process.

Service Robustness and Core Fixes

We continued the theme of hardening by reviewing several background cognitive services. In the CognitiveDriftEngine, CuriosityPursuitService, and GrowthPatternService, we found several instances of bare except: pass guards. These dangerous “fail-open” patterns would silently swallow database or other service errors, allowing the cognitive processes to run unregulated. We’ve replaced them with proper fail-closed logic: an exception is now logged as a warning, and the cycle is skipped, preventing unpredictable behavior during transient system failures (#135, #133, #127).

A critical bug in memory retrieval was also squashed (#99). The SemanticStorageService was failing to select the embedding vector when retrieving concepts, which caused the hybrid search to skip every single concept during similarity scoring. With this fix, semantic memory search is now functioning correctly.

We also improved API responsiveness by unblocking the scheduler endpoint (#93). The task of embedding a new scheduled item was happening synchronously, which could block the HTTP response for a long time, especially if it triggered a model download. This process now runs in a fire-and-forget background thread.

Expanding ML & Inference Capabilities

Our ONNXInferenceService has become more flexible. It now supports pruned models with a (batch, num_classes) output shape, in addition to the legacy (batch, seq, vocab) format. This work also introduced support for multi-label classification, using sigmoid activation and per-label thresholds, which is being used to power a new skill-selector model. This allows the system to recognize multiple applicable skills or tools from a single input, a key step toward more complex reasoning.

We spent some time improving our own development process. The build log generation script was enhanced to handle multi-day gaps in commits; instead of lumping everything into today’s entry, it now correctly generates a separate file for each day. We also documented our branch strategy (feature → rc → main) and refined the associated CI workflows. The build log is now generated on every push to any branch (for a complete internal record) but is only published to the public website when code is merged to main, striking a balance between comprehensive tracking and a clean public narrative.

Hardening the Tool System

Service Robustness and Core Fixes

Expanding ML & Inference Capabilities

Workflow and CI Refinements