May 4, 2026

Voice Stability and Rich Media Overhaul

The TTS pipeline was fundamentally refactored to drop fragile chunked streaming in favor of a single Kokoro call, resulting in one WAV blob

The TTS pipeline was fundamentally refactored to drop fragile chunked streaming in favor of a single Kokoro call, resulting in one WAV blob. This ensures gap-free playback and eliminates the mid-stream restart bug.

Multiple fixes were applied to the TTS playback loop to harden reliability, specifically addressing AudioContext auto-suspension between chunks by resuming before each source start.

To improve user experience, keyboard shortcut handling was fixed in the voice overlay, preventing conflicts when the user types in the chat input.

The rich media pipeline received a major overhaul, removing all arbitrary silent truncation caps that previously silently dropped user-visible data from search and news pipelines.

Image candidate captions were upgraded from unreliable OCR output to deterministic captions derived from URL filenames and og:description.

Image service behaviors were adjusted to handle cases where native thumbnails were absent, allowing rich media to resolve using og:image metadata when necessary.

Search routing was enhanced with a DDG supplement: when the primary routing score is weak (in [0.50, 0.60)), DDG results are appended to broaden coverage.

Subagent results delivery was fixed by ensuring async results are correctly published to the user via OutputService, preventing fire-and-forget messages.

  • Dropped chunked TTS streaming for single WAV blob synthesis.

  • Hardened TTS playback to prevent stalling due to AudioContext suspension.

  • Removed arbitrary silent truncation caps in rich media pipelines.

  • Replaced OCR with descriptive captions derived from URL heuristics.

  • Implemented DDG supplement for weak routing scores in search.

  • Ensured reliable delivery of async subagent results.