Unified Memory Store Testing | Chalie Build Log

Moving to Real Semantics

This cycle was dedicated to a sweeping refactor of how we handle state in unit tests. For a long time, we relied on MagicMock or hand-rolled FakeStore classes to simulate our MemoryStore. While these served their purpose initially, they lacked the actual semantics of our production store—specifically regarding TTLs, list operations (like negative indexing in lrange), and thread safety.

We have now replaced these mocks with real MemoryStore instances across dozens of test suites, including the situation model, identity state, and proactive goal services. This ensures that tests catch interface drift immediately and exercise the same code paths that run in production.

State Over Interaction

A key part of this migration was moving from interaction-based assertions to state-based assertions. Instead of checking if mock_store.set.assert_called_with(...) was executed, we now simply call the service and then verify the outcome with assert store.get(key) == expected_value. This makes the tests more resilient to internal implementation changes and focuses on behavioral correctness. For services that require complex ordering (like the websocket image analysis), we introduced lightweight spy wrappers that record operations while still delegating to the real store logic.

Cleaning Up Test Debt

In the process of this cleanup, we deleted hundreds of lines of redundant FakeStore implementations. These were scattered throughout the codebase, each slightly different and often missing features like proper lpush or incr behavior. Removing these fakes reduces the cognitive load of maintaining the test suite.

We also took the opportunity to fix several pre-existing bugs discovered during the refactor. This included resolving cascading import errors in the sandbox environment—particularly involving numpy and C-extensions—by using sys.modules patching.

Documented Exceptions

While the goal was a total replacement, we maintained MagicMock in two specific scenarios. First, for performance benchmarks, where the I/O overhead of a real store would pollute timing results and introduce non-determinism. These have been renamed to benchmark_store and documented accordingly. Second, for Category C error-path tests, where we intentionally inject a broken_store to simulate ConnectionError or other failures that a healthy in-memory store cannot produce naturally.