The Guard That Did Too Much

The signal gaps in the engine — Qhorus human messages that never reach cases, WorkItem escalations that freeze WAITING cases, M-of-N group outcomes that vanish — all traced to eleven lines in CaseContextChangedEventHandler:

if (!caseInstance.getState().equals(CaseStatus.RUNNING)) {
    return Uni.createFrom().voidItem();
}

The guard does two things. It prevents duplicate dispatches of already-in-flight bindings — necessary in the pure choreography path where there’s no PlanItem tracking. And it blocks all evaluation for non-RUNNING cases. Those are unrelated concerns, and satisfying the second one is wrong. A WAITING case with an active blackboard has PlanItem tracking that already prevents re-dispatch. The guard was carrying load it didn’t need to carry.

The fix is to move the dedup into LoopControl, which already owns dispatch decisions. ChoreographyLoopControl keeps the RUNNING-only restriction — no PlanItem tracking, no dedup. PlanningStrategyLoopControl accepts RUNNING and WAITING both, filtering bindings whose PlanItems are already in flight before dispatching.

The interesting part was what “in flight” means. I expected it to cover RUNNING, DELEGATED, COMPLETED, FAULTED, and CANCELLED — any state beyond PENDING. The TDD cycle caught me: the first test asserting a COMPLETED PlanItem wouldn’t re-dispatch passed before I’d written any production code. That means the test is wrong.

CasePlanModel.addPlanItemIfAbsent() makes the contract clear: COMPLETED items get replaced by a new PENDING one. A binding whose conditions hold again after completion should fire again — iterative cases depend on it. The filter blocks RUNNING and DELEGATED only. There’s a pre-existing timing race this doesn’t address: a second CONTEXT_CHANGED arriving before a handler marks a PlanItem DELEGATED will still find it PENDING and re-dispatch. That’s engine#364 — proper fix requires a DISPATCHING transient state, separate work.

The adapter fixes were simpler. WorkItemLifecycleAdapter was mapping ESCALATED to markFaulted(). ESCALATED returns the WorkItem to PENDING with new candidate groups — the task isn’t done, it’s reassigned. We removed it from the terminal filter. We also added a @ObservesAsync WorkItemGroupLifecycleEvent observer for M-of-N SpawnGroup outcomes: COMPLETED maps to markCompleted(), REJECTED to markFaulted(). The adapter now handles the full lifecycle.

The Qhorus bridge is twenty lines: a CDI @ObservesAsync MessageReceivedEvent bean that calls CaseHubRuntime.signal() for commitment-resolving types — RESPONSE, DONE, DECLINE, FAILURE — on channels named case-{caseId}/{purpose}. The signal writes to context["channelMessage"]; case definitions react via contextChange(".channelMessage"). COMMAND, QUERY, and STATUS are excluded; EVENT has null content by protocol.

Claude caught something during code review: milestone and goal evaluation could now run for SUSPENDED and terminal cases. The old guard blocked them; the new one didn’t address them. We added the RUNNING || WAITING check back at the handler level for milestones and goals. Bindings get their dedup from LoopControl; milestones and goals get a lifecycle check at the handler.

One open thread: ClaudonyReactiveCaseChannelProvider maintains its own CHANNEL_PREFIX = "case-" string, separately from CaseChannel.CASE_CHANNEL_PREFIX. The bridge parses using the engine constant; the provider writes using its own. That’s claudony#139.

After the squash

Interceptors Eat Exceptions