Notes | CaseHub - Engine
Where Does a Timeout Belong?
OutcomePolicy declared onExpired from the start. The YAML schema parsed it, the mapper wired it, the default constructor set it to REROUTE. But nothing in the engine ever used it. A worker that tim...
The Bug That Documented Itself Wrong
The failure cascade landed in the previous session — WorkerOutcome sealed type, OutcomePolicy, structured _outcomes state, agent exclusion. Four follow-up issues remained open. All four turned out ...
The Type Dispatch That Didn’t Belong
The Type Dispatch That Didn’t Belong
The Wrong Hypothesis and the JSON Document That Ate Every Binding
The Wrong Hypothesis and the JSON Document That Ate Every Binding
The bridge and the try/catch that lied
The foundation constraint is simple: casehub-qhorus and casehub-work are peer Foundation modules. Neither can depend on the other. But consumer apps often need both — a message arrives on a qhorus ...
The Batch That Paid for Itself
When the issue backlog accumulates fifteen XS/S items, the temptation is to knock them out one by one across separate branches. I tried the opposite — one branch, all fifteen, sequential execution.
The Registry That Ate the Scheduler's Reputation
Every symptom pointed at Quartz. Twelve tests timing out — cases start, workers never fire, ConditionTimeoutException after ten seconds of nothing. The contaminating class was CaseFaultedStateTest,...
The Qualifier Nobody Inherits
Four issues. Two were already done — one we’d fixed in the previous session and never closed the ticket, one adapted to a SNAPSHOT change without noticing the issue still existed on the board. Clos...
The Database That Wasn’t There Yesterday
The Database That Wasn’t There Yesterday
The factory that wasn't
CaseDefinitionYamlMapper had five call sites that all said the same wrong thing:
The Map Before You Dig
Four small issues in a single batch branch. Three were exactly what they looked like. One had a finding inside.
Breaking static routing in humanTask bindings
Until now a humanTask binding in a case definition had fixed routing. You’d write:
The gate that doesn't exist in the code until it fires
The idea behind ActionRiskClassifier is simple: a worker is about to do something consequential — file a SAR, freeze an account, push to production — and before the engine commits that output to th...
The Mixed-Pool Gap
The initial spec for the bootstrap guard had a gap I missed.
Wiring workflow steps to the engine
Engine#206 has been on the list for a while: when a Worker runs a Serverless Workflow, the steps inside it have no way to dispatch other casehub workers. The workflow executes in quarkus-flow’s exe...
Four issues, two architectural surprises
The tenancy enforcement batch took longer than the individual issue sizes suggested. Four issues on one branch: a migration (#411), a CDI qualifier pattern (#405), DB-level Row Level Security (#406...
What four CI failures found in the multi-tenancy test infrastructure
The PR had seven commits and clean local tests. CI disagreed — four times.
Fixes, a mystery, and three missing migrations
The S/XS backlog from the multi-tenancy session had nine open issues. I went through them all in a single branch today — eight commits, one PR. Most were straightforward. Two weren’t.
Tenancy Threading Gets Explicit
My first instinct for tenancy in the engine repositories was elegant: inject
CurrentPrincipal, read tenancyId() in each query, filter silently. Callers stay
clean. Protocol PP-20260520-e6a5f0 says ...
Hydration and Recovery: Teaching the Engine to Survive a Restart
When a Quarkus app restarts, every in-memory registry starts empty. For BlackboardRegistry, that meant any WorkItemLifecycleEvent arriving before the next CONTEXT_CHANGED — a human completing a del...
Six handlers and a miss
The batch we cleared this session was five issues — all small, all overdue,
all the kind of bugs that pass unit tests and quietly break in production.
Worker carries a definition, not an outcome
Worker carries a definition, not an outcome
Unblocking AML
The batch that closed today was supposed to be an afternoon.
Routing the Uncertain
Two routing issues had been sitting on the backlog since the AgentRoutingStrategy
SPI shipped. One was about what to do when all candidates are borderline on trust —
the four-phase model says escal...
The Guard That Did Too Much
The signal gaps in the engine — Qhorus human messages that never reach cases,
WorkItem escalations that freeze WAITING cases, M-of-N group outcomes that vanish
— all traced to eleven lines in CaseC...
Scope and the Silent Guard
Two small fixes that close a larger gap — scope propagation for SLA preference routing and a silent when-field bug in contextChange bindings.
Waiting Is Not Running
The previous branch left the @QuarkusTest suites in casehub-blackboard
blocked — RoutingCursorStore unsatisfied dependency. The fix was
straightforward: add quarkus.arc.exclude-types=io.casehub.wor...
Clearing the Interim Address
When we moved JQEvaluator into casehub-engine-common as part of the JQ
consolidation a few days ago, we knew it was temporary. The comment in CLAUDE.md
said so explicitly: follow-on platform extrac...
The Wrapper That Earns Its Keep
Before touching any code I stopped and asked whether the casehub Agent class should exist at all. LangChain4j has its own agent model — UntypedAgent, AgenticScope, supervisor patterns. Was the case...
The Hand-Rolled Parser That Shouldn't Exist
The bug was simple: { humanApproval: { status: .decision } } in an outputMapping was producing a String literal in the case context instead of a nested Map. The fix should have been five lines. It ...
The List That Emptied Itself
The “What’s Next” table in my handover had nine items. I went to start the first one — engine#300, add the deadline field to the COMMAND message — and found the issue closed. The code was already t...
P1D Was Never Invalid
The issue description was confident: “Duration.parse() throws DateTimeParseException for any non-PT-prefixed format (e.g. P1D).” We were adding validation to CaseDefinitionYamlMapper.convertHumanTa...
The Pin Was Two Lines. The CDI Was Not.
work-adapter had a json-schema-validator pin sitting in the wrong place — it belonged in the root pom.xml alongside the other BOM overrides, not buried in the module that happened to need it first....
The Deadline Gets Through
Claudony is about to start tracking Commitments — Qhorus obligation records that bound how long an agent has to act on a COMMAND. To set an expiry, it needs the case budget deadline. The problem: c...
Giving the YAML a Human Concept
The casehub-engine-work-adapter has been implemented for weeks. HumanTaskScheduleHandler creates WorkItems. WorkItemLifecycleAdapter listens for completion and signals the engine to resume. The cod...
Testing the Handler, Not the Bus
Most of the day was cleanup — small things that had accumulated since the atomicity work. A two-arg arity mismatch when casehub-work extended SelectionContext and WorkItemCreateRequest without upda...
Atomic by Design
The bug in HumanTaskScheduleHandler was easy to describe: if WorkItem creation fails after item.markRunning(), the PlanItem ends up stuck RUNNING with no WorkItem to complete it. The engine won’t r...
Honouring the Contract
HumanTaskScheduleHandler has had a stub in template mode for a while. The comment was honest: // Template mode not yet implemented — leave PlanItem PENDING so binding stays eligible. What was less ...
SWF doesn't have a human
The previous entry covered the HumanTask binding design — the sealed
BindingTarget, the outbound HumanTaskScheduleHandler, the outputMapping
round-trip. Claude had built the implementation in a sep...
Config over CDI
The previous entry ended with a loose end: JsonPatchContextDiffStrategy still
using @Alternative while the no-op next door had just moved to @DefaultBean.
Before touching it I wanted to understand ...
Nine defaults, one wrong package
Nine SPI no-op defaults annotated with @DefaultBean to eliminate CDI ambiguity errors in consumer repos — plus a ninth bean found lurking in an import list, and a package that doesn't exist in the ...
BindingTarget and the sealed dispatch
Adding HumanTaskTarget as a sealed BindingTarget permit — replacing nullable fields with a type-safe sealed interface so every dispatch path is exhaustive and no new binding type can be introduced ...
Sub-case coordination — the race condition sequential tests miss
Implementing M-of-N sub-case coordination for multi-site clinical orchestration — the design question of when to use quarkus-flow vs a native counter, and the race condition that sequential tests c...
The half that was missing
The outbound path from engine binding to WorkItem creation was missing — tracing why WorkerProvisioner couldn't work, and designing HumanTaskScheduleHandler as the correct event-driven hook.
Optional Belongs on a Short Leash
When Optional belongs in an API — tracing Brian Goetz's original intent, identifying where casehub-engine's MapCaseFile.get() fell on the wrong side, and codifying the result as a platform rule.
Going Live — and the Two-Backup Mystery
Pushing the reconstructed engine history live surfaces a silent rebase failure and a detective story about two backup branches, a stale fork, and feature code that never made it to casehubio.
Cleaning House Before the Merge
Applying the squash policy to two live PRs collapses docs follow-on commits and surfaces a hitchhiker commit that snuck in under a different author's PR.
What Kind of Message Is This?
Adding MessageType to CaseChannelProvider.postToChannel forces a dependency graph decision: the engine takes a narrow compile dependency on casehub-qhorus-api rather than inventing a parallel vocab...
Workers can finally talk back
The CaseChannelProvider.postToChannel path existed but was never connected — WorkerContext.channels was always empty, and the buildContext() result was silently discarded before workers were schedu...
The Workaround That Wasn't
We submitted a PR to fix a null output bug in sdk-java. The maintainer closed it: it wasn't a bug.
Migration Gaps Closed
Idempotency window, DLQ replay, and SubCaseBinding close the last migration gaps, but a transitive quarkus-ledger dependency bundles JPA entities into every module that touches engine and breaks fo...
Worker registration as a speech act
WorkerRegistry becomes the single source of truth for all three worker entry paths — static, provisioned, self-registering — and Java 21 sealed classes provide the execution fork without an explici...
Connecting quarkus-work to the blackboard
CaseLifecycleEvent gets its missing worker execution call sites, and the casehub-work-adapter uses CDI choreography to route quarkus-work terminal states into PlanItem transitions via the callerRef...
Wiring the SPIs and adding lifecycle control
The Worker Provisioner SPIs get their call sites wired in, and resume adds a single CONTEXT_CHANGED publish so bindings re-evaluate immediately when a suspended case returns to RUNNING.
WorkBroker: wiring the shared SPI into casehub-engine
Wiring WorkBroker into casehub-engine reveals two undocumented quarkus-work-core behaviours — WorkerCandidate.of() silently drops all selections, and WorkItemStatus.EXPIRED.isTerminal() returns false.
casehub-resilience: conflict resolution, a timeout enforcer, and a Vert.x surprise
Building ConflictResolver and CaseTimeoutEnforcer surfaces a Vert.x event bus gotcha: publishing from a non-event-loop thread requires explicit context acquisition, or messages silently disappear.
Ecosystem Mapping, PRs F-J, ADR-0002
Gap closure via PRs F through J expands into a fifteen-issue ecosystem map, and the binding-gating decision — presence of stage.addBinding() as the opt-in — is validated as the only bounded and fai...
Closing Every Gap: Parity, Kogito, and ADR-0002
Four genuine gaps against the prior implementation — including strict PlanItem lifecycle and a CDI pre-registration SPI — are closed via PRs F-G and documented in ADR-0002.
QE Pass: 68 Tests to 99, Five PRs
A systematic comparison against the original implementation surfaces thread-safety gaps in PlanItem and Stage, missing end-to-end scenarios, and a null pointer hidden by the String constructor over...
Blackboard: Research, Analysis, and Implementation
Academic research into LLM-based blackboard architectures shapes one critical design decision — changing LoopControl.select() to return Uni — before the blackboard module ships its first implementa...
Cutting the JPA Wire
PR3 strips JPA from the engine module entirely — three domain objects become plain Java, twelve handlers route through three SPI interfaces, and no framework annotation remains in casehub-core.
The Dedup Wasn't Broken. The Test Was.
PR 2 shipped, Flyway is gone, and two CI failures that turned out to be different problems than they appeared.
Persistence PR 1: Container Chaos and a Quarkus Context Trick
PR 1 of the persistence decoupling plan exposes a Podman misconfiguration and a Quarkus Vert.x context requirement that blocks standard JUnit methods from calling reactive Panache.
Phase 2: Resilience, Diff Provenance, and a Persistence Rethink
Three new modules designed and shipped — resilience, EventLog enrichment, and a persistence decoupling spec — plus a conversation that changed the ORM approach entirely.
Phase 2: Standards, a Hidden Bug, and casehub-blackboard
Naming decisions resolved, CaseStatus aligned with CNCF standards, a silent bug found by the tests we wrote, and casehub-blackboard went from brainstorm to 390 tests in one session.
Phase 1: Into casehub-engine
The merge direction reversed before a single line of code was written. casehub-engine becomes the home; Phase 1 lays the extension points incrementally, one PR at a time.
Two CaseHubs, One Design
Discovering a parallel casehub-engine implementation and charting a 9-phase plan to unify both systems into one coherent design.
Session 3: Getting the Architecture Right
Collapsing the lineage graph, redesigning the persistence layer, and researching goal models across BDI, GOAP, CMMN, and HTN.
The Architecture Behind CaseHub: Blackboard Meets CMMN
Two patterns from very different traditions — Blackboard Architecture and CMMN — and why they belong together for agentic AI.
Wanted a Sketch, Got a Framework
One session, 73 files, 14,003 lines of code — what started as a request for a sketch became a working framework.