QE Pass: 68 Tests to 99, Five PRs
Before moving on from the blackboard, I wanted to do a proper quality pass rather than declare it done. The previous session shipped 68 tests. That felt thin for production code with concurrent access patterns.
The first thing I did was compare our test suite against treblereel’s original
implementation — honestly, not defensively. The comparison was informative.
We had better unit coverage of the components that didn’t exist in their design:
StageLifecycleEvaluator, PlanItemCompletionHandler, MilestoneAchievementHandler.
They had better integration test scenarios: nested stages activating sequentially,
free-floating and staged workers coexisting, exit conditions triggered by real
worker output. We had none of those end-to-end.
Then a systematic code robustness review surfaced seven issues. Three were thread safety:
PlanItem.status was a plain field, written by one thread and read by another
with no visibility guarantee. A one-word fix: volatile.
Stage.status was the same problem but required a different answer. Two
separate handlers — StageLifecycleEvaluator and PlanItemCompletionHandler
— can race on an ACTIVE stage, one calling terminate() and the other calling
complete(). With volatile alone both can read ACTIVE, both pass the guard,
and the last writer wins arbitrarily. The right answer was AtomicReference<StageStatus>
with CAS lifecycle methods. The user asked directly: volatile or atomic CAS?
Different answer for each. PlanItem has a single writer per lifecycle stage —
volatile is sufficient. Stage has two independent handlers that can race to
write terminal state — CAS is required.
The third thread safety issue was the TOCTOU race in addPlanItem. The
duplicate check and the insert were two separate operations. ConcurrentHashMap.compute()
merges them atomically, eliminating the window.
Four more issues, simpler:
achieveMilestone used computeIfPresent — a no-op if the milestone hadn’t
been pre-tracked. If a MilestoneReachedEvent arrived before trackMilestone
was called, the achievement was silently lost. Changed to put().
CaseEvictionHandler was planned and stubbed but never wired. Plan models were
accumulating in memory for the lifetime of the application. Added a
@ConsumeEvent(CASE_STATUS_CHANGED) handler that evicts on terminal states.
The nested stage functional gap: StageLifecycleEvaluator was evaluating all
pending stages every cycle regardless of parent state. A four-line guard fixed
it — skip a nested stage unless its parent is ACTIVE.
The autocomplete guard for unregistered required items was already correct in the code but undocumented and untested.
Five stacked PRs: thread safety, lifecycle correctness, nested stages, integration regression coverage, edge cases and contract hardening.
One moment stood out in the integration test work. The implementer subagent
for the mixed-workers test (R4) came back with a fix I hadn’t anticipated. The
spec had both bindings trigger on .phase == "start", with workers writing
disjoint keys. The subagent flagged that each worker’s output would keep
phase unchanged, re-firing both bindings indefinitely. It changed to
per-capability trigger keys so each worker self-falsifies its own condition.
The plan had a latent infinite loop in it. Caught before it ever ran.
68 to 99 tests. PRs #91–#95 open. Same merge order as the original three.