Sixteen Zombies

The eval harness runs sixteen enrichment calls in parallel. Each one invokes the Claude CLI, and each one has a seven-minute timeout. When all sixteen hit that limit simultaneously — which they do, because they are parallel and the model is slow on eidos profiles — each one triggers ClaudeAsyncClient.close().

That close method schedules cleanup on Reactor’s bounded-elastic thread pool. Inside the cleanup, StreamingTransport.close() sends SIGTERM and then waits five seconds for the process to exit before sending SIGKILL. So when sixteen calls all time out together, sixteen bounded-elastic threads are each blocked for up to five seconds. The seventeenth call — the ProximityJudge — starts a subprocess into a system where nothing can be scheduled. It hangs immediately.

I’d observed this twice before without understanding it. The workaround was Ollama: no subprocess accumulation, reliable sequential calls. But the fix was somewhere in casehub-platform, and I wanted to find it properly.

The obvious approach was reflection. The ClaudeAsyncClient SDK interface doesn’t expose destroyForcibly(), but the underlying StreamingTransport has a Process field. We could reach in and call destroyForcibly() directly, bypassing the five-second wait. The user said no — think from first principles.

And once I thought about it properly, the reflection approach isn’t just ugly: it’s solving the wrong problem. The real issue is that buildEventStream() is using the SDK’s session mode — a bidirectional subprocess designed for multi-turn conversations — to handle single-turn calls. Session mode keeps a subprocess alive, expects to send multiple prompts to it, and closes gracefully when done. A single-turn invoke() call has none of those requirements. It sends one prompt, reads the response, and is done. The subprocess should exit naturally.

One-shot mode changes everything. claude -p "prompt" --output-format stream-json starts, runs the prompt, writes streaming JSON to stdout, and exits. No explicit close needed. If cancelled, you call process.destroyForcibly() on a Process reference you already own. No SDK in the way, no five-second grace period, no reflection.

The architectural split turned out to be clean: invoke() becomes one-shot subprocess (new ClaudeOneShotProcess), openSession() stays on the SDK session mode (ClaudeAgentSession). The session path is still the right tool for multi-turn conversations. It’s only invoke() that was in the wrong mode.

Getting the spec right took six iterations of review. That number sounds like bureaucracy but each round caught something real: the MCP config temp file leaking on cancellation (close() isn’t called on that path, so destroyForcibly() has to delete it); ControlMessageParser.parse() returning null rather than throwing for unrecognised message types (the switch needs a null guard before any dispatch); Process.waitFor() throwing InterruptedException rather than IOException (different catch block needed, with interrupt status restored); the BufferedReader objects surviving until GC after a forcible kill (add closeQuietly to destroyForcibly()). Each of these came from actually reading the SDK source, not from guessing.

The implementation ended up straightforward. ClaudeOneShotProcess constructs the subprocess eagerly — the process is live before any Mutiny subscription exists. On cancellation, destroyForcibly() kills it immediately and closes the Java stream objects; the blocked readLine() on the worker thread gets an IOException and the emitter terminates. On natural completion, stdout closes via EOF, the read loop exits, and the process has already exited. The whole five-second timeout situation just disappears.

One detail worth noting: --verbose is required for streaming to actually stream. Without it, --output-format stream-json emits only the final ResultMessage — you get batch mode dressed up as streaming. StreamingTransport.buildStreamingCommand() adds it unconditionally but with no comment explaining why. We found this by removing it and observing the result. It’s in the garden now.

The deeper observation is about SDK encapsulation. ClaudeAsyncClient was doing exactly what it was designed to do: managing a persistent session with graceful lifecycle. The bug wasn’t in the SDK; it was in how we were using it. A single-turn call is not a session. Treating it as one forced us to rely on the session’s teardown semantics, which were correct for sessions and wrong for single calls. The fix wasn’t to patch the teardown — it was to choose the right abstraction.

Runtime Improvements: Tree Navigation, URL Encoding, and Lazy Pages

The Bug That Documented Itself Wrong