March 9, 2026

De-Regexpressing the Front Door

Ripped out the brittle regex-based social filter in favor of more reliable LLM triage, and introduced an embedding cache to boost performance.

Getting Rid of Brittle Regex Filters

A big chunk of today was spent chasing down bugs in our input classification pipeline, and the theme was clear: our fast, regex-based “social filter” was doing more harm than good. It was designed to quickly handle simple things like greetings and thank-yous, saving a 200ms LLM call. The problem is, it was catastrophically wrong sometimes.

We found two separate cases where it was misclassifying legitimate questions as simple feedback. A query like, “What makes a great sandwich?” was being flagged by a feedback pattern (great) and short-circuited into a simple acknowledgement, completely swallowing the user’s intent before the LLM triage could even see it. There was no fallback.

The fix was simple: just delete the whole pre-filter. The LLM-based cognitive triage is more than capable of handling social niceties, and the small latency cost is a tiny price to pay for not silently ignoring user commands. This is a good reminder of the project’s core principle: intelligence emerges. A dumb, static regex filter is the opposite of that.

Performance & Responsiveness Boosts

On the performance front, we added a significant optimization: an in-memory embedding cache. Many internal systems (reflex lookups, topic classification, context assembly) repeatedly request embeddings for the same text. Now, we cache these embeddings in MemoryStore for an hour, keyed by a hash of the text. This avoids redundant calls to the embedding model, which is especially noticeable on CPU.

We also added an embedding model “warmup” on application boot. The first inference call on a sentence-transformers model always has a few seconds of overhead for PyTorch graph compilation. By running a throwaway encoding at startup, we ensure the first real user request is fast. The /system/readiness endpoint now correctly reports when the embedding model is loaded and warmed up.

To improve the user-perceived responsiveness, I also tightened up the folder watcher’s scan interval from 5 minutes to 60 seconds and added a simple deduplication mechanism to the WebSocket client to prevent duplicate messages from showing up after a reconnect.

Hardening the Document Pipeline

The document ingestion and retrieval system got a round of robustness fixes. We were seeing crashes because metadata could be either a raw JSON string from SQLite or an already-parsed Python dict, depending on the code path. A simple helper now handles both cases gracefully.

I also added timing instrumentation to the document processing service. It now logs the duration for each stage (text extraction, chunking, embedding, LLM summary), which will be invaluable for identifying bottlenecks as we process larger files. Finally, I updated the privacy service to correctly include documents, document_chunks, and watched_folders in its data summary, export, and delete-all operations, which we’d missed when adding the feature.

Minor Polish

  • Contradiction Prep: Added some signals (is_established, created_at) to the contradiction detection pipeline. This is prep work for a future ONNX-based classifier; for now, the LLM path is still active.
  • Quieter Tools: When a tool renders a card and is configured with synthesize=false, we now suppress Chalie’s default text commentary (e.g., [TOOL:list_documents]...). The UI gets the card, and the conversation doesn’t get cluttered with unnecessary confirmation text.