March 5, 2026
Hardening the Foundations
Shipped a series of fixes across the stack to improve system reliability, from backend timeouts and WebSocket handling to database schema updates and frontend state management.
Backend Reliability and Correctness
Today’s work involved tracking down and fixing several unrelated but important bugs to make the system more robust. First, we addressed a critical flaw in our timeout logic for background document processing. We were using signal.alarm(), which is a common approach but has the significant limitation of only working in the main thread. This meant our long-running LLM synthesis jobs and the workers themselves weren’t actually timing out as intended. The fix was to replace this with a more standard threading.Thread.join(timeout=...) pattern, which correctly enforces timeouts in any thread.
We also squashed a tricky WebSocket bug. On authentication failure, our code would simply return, but the underlying flask-sock library would still send an HTTP 200 success code over the upgraded TCP connection, causing a confusing “Invalid frame header” error on the client. We now explicitly call ws.close() on auth failure to ensure the connection is terminated cleanly before any erroneous success messages can be sent.
Finally, a small data integrity fix was made to the routing_decision_service. It was returning several JSON-based database fields as raw strings. A new helper now correctly deserializes these fields, ensuring the rest of the application receives structured data as expected.
Simpler Schema Management
We’ve simplified our database management strategy. Instead of checking schema versions and running conditional initializations, we now run the full initialize_schema() function on every application startup. This is safe and idempotent because every CREATE TABLE and CREATE INDEX statement in our schema uses IF NOT EXISTS. This change means we no longer need a formal migration system for adding new tables; they will be created automatically in existing databases when the app is deployed, which is a nice developer experience win.
Frontend Polish
On the client side, we made two changes to improve the user experience. The post-login flow is now simpler and more reliable: instead of attempting to partially re-initialize the application’s state, we now perform a full page reload. This guarantees a clean slate and avoids a whole class of potential state management bugs.
We also added a UX guardrail for document processing. The interface will now wait a maximum of 30 seconds for a document synthesis to complete before displaying its card anyway. This prevents the user from being stuck waiting indefinitely if the backend is slow or unresponsive.