Three layers to one line

I came in expecting to wait the engine round-trip test out. The previous session left a note: CaseEngineRoundTripTest fails because the engine SNAPSHOT has an in-flight PlanExecutionContext constructor change. Engine problem. Sit tight.

I wasn’t convinced. I brought Claude in to look at it fresh — run the build, read the actual error, diagnose from scratch.

That note was wrong.

The error was a ConditionTimeoutException from Awaitility. Lineage was empty after ten seconds. Buried at the bottom of the output, marked Suppressed:, was a NullPointerException: Cannot invoke CaseInstance.getCaseContext() because caseInstance is null inside WorkflowExecutionCompletedHandler. Not a constructor change. An NPE swallowed by the Vert.x event bus.

We traced it back. The test builds the round-trip by looking up the case instance, then publishing WorkflowExecutionCompleted to the engine’s event bus. If the lookup returns null, the event goes out with a null caseInstance. The handler NPEs. The lifecycle event that was supposed to follow — WorkerExecutionCompleted — never fires. ClaudonyLedgerEventCapture never writes anything. Lineage stays empty.

The lookup was caseInstanceRepository.findByUuid(caseId, "default"). The engine SNAPSHOT added tenancyId filtering to the new two-argument signature. We decompiled InMemoryCaseInstanceRepository from the local Maven JAR to read the actual bytecode.

The implementation does tenancyId.equals(instance.tenancyId). When the engine stores a case with tenancyId=null — which it does when the test profile has no tenancy configured — "default".equals(null) returns false. The method returns Uni.nullItem(). The UUID is in the store. The method returns null anyway, with no log, no warning, nothing.

Three layers of indirection hiding a single mismatched parameter. The previous session anchored on the PlanExecutionContext change because it was visible in the SNAPSHOT notes and looked plausible. A reasonable guess. Wrong.

The fix is one import and two line changes. InMemoryCaseInstanceRepository also implements CrossTenantCaseInstanceRepository, which has findByUuid(UUID) — no tenancy parameter. That’s the right API for a test with no tenancy context. 532 of 532 passing.

Fixing that test raised a question I’d been deferring: if the engine round-trip works now, how close are real-world examples?

Closer than I expected. Two blockers, not ten.

The first is WorkerExecutionManager. The engine schedules workers via Quartz and needs something to close the loop — detect when the Claude process in a tmux session finishes and publish WorkflowExecutionCompleted. The test fires this manually. Production has nothing. The outline is a virtual thread watcher in the provisioner: poll tmux has-session, publish on exit. Not complicated, just not written.

The second is a real CaseDefinition. TestResearcherCase is test scaffolding and doesn’t exist at runtime. One concrete CDI bean following the same pattern is enough to have a workflow you can call startCase() on and watch end to end.

Everything else is working. Channels, SSE delivery, ledger capture, mesh participation, authentication. What’s missing are the two ends: something that knows when a worker finishes, and something that defines what a case is.

The Infrastructure That Found Its Address

Two CDI conflicts, one JAX-RS spec surprise