April 22, 2026

SES Daemon & Embedding Optimizations

The system introduced a single FIFO daemon to handle search expansion, replacing per-write doc2query threads and absorbing _schedule_embeddings

The system introduced a single FIFO daemon to handle search expansion, replacing per-write doc2query threads and absorbing _schedule_embeddings. This allows for sequential processing of doc2query variants and embedding generation before persisting results.

Variant retrieval now utilizes a KNN signal against the newly expanded semantic vectors, which are joined back to the source row via relationship tables.

Schema additions include new tables for expanded semantic data and associated virtual tables for vector storage, alongside cascade triggers.

Embedding performance was refined by making the embedding sequence length configurable with a default of 512 and an opt-in maximum of 8192 for long-document profiling.

ONNX runtime execution provider selection is now automatic, prioritizing CoreML on Apple Silicon and CUDA on compatible containers, with graceful CPU fallback on inference failure.

The embedding session unload timeout was reduced from 600s to 120s, and an idle-watchdog thread was added to ensure memory release for ONNX sessions promptly.

The readme documentation received updates, including a rewrite with a Hermes-style differentiator table and modernization of architectural descriptions.

The voice UX was entirely rebuilt, moving from a top-bar orb mode to an inline microphone button and a modal overlay audio player, and the backend now returns a single WAV blob per synthesis request.

Replaced per-write doc2query threads with a single FIFO daemon for search expansion.
KnowledgeService.recall and DataGraphService.recall gain KNN signaling against expanded semantic vectors.
Enabled configurable embedding sequence length (default 512, max 8192) to reduce compute waste on short inputs.
Implemented automatic selection of the best ONNX execution provider (CoreML, CUDA, CPU).
Reduced ONNX session idle timeout from 600s to 120s with an idle-watchdog thread.
Rebuilt voice UX to use an inline mic and a modal audio player, returning a single WAV blob.