Ten Things Beyond Bors
Gastown’s Refinery is production software. v1.0.1, battle-tested, doing exactly what it was built to do. It batches PRs, tests the tip of the batch, bisects on failure, rejects the culprit, merges the rest. The algorithm is Bors-style and it works.
The merge queue spec for devtown (Epic 4, devtown#11) doesn’t replicate Refinery. It takes a structurally different approach — a CasePlanModel where the strategy is binding conditions, not compiled code. The architecture enables ten capabilities that go beyond what batch-then-bisect traditionally provides.
What I wanted to get right
The merge queue is the first CaseHub feature where the Gastown comparison stops being theoretical. Layers 1–6 proved the foundation works: trust routing, human review gates, typed agent messaging, tamper-evident audit. But those are infrastructure — they don’t do the thing Gastown actually does in production. A merge queue does.
I wanted the spec to be precise enough that someone reading it could verify every claim against the engine’s actual schema. That turned out to be harder than expected.
The two-tier split
The key design decision was separating the merge queue into two tiers: an imperative queue service and a reactive CasePlanModel.
Queue management — admission, priority ordering, batch formation — is scheduling. “Select the next N PRs satisfying these constraints, respecting a dependency DAG, weighted by trust scores with starvation prevention.” That’s an imperative algorithm. Shoehorning it into reactive binding conditions produces bindings that are really just procedural code pretending to be declarative.
Batch processing — test the tip, bisect on failure, merge on success, reject the culprit — is reactive. “If test failed and batch > 1, bisect.” That’s exactly what CasePlanModel bindings do well. The strategy IS the binding conditions. Different repos get different strategies by swapping the CasePlanModel, not by redeploying the queue service.
The split maps to the module structure: devtown-merge for the queue service, merge-batch.yaml for the CasePlanModel.
Trust-weighted bisection
This is the headline architectural difference.
Traditional Bors-style merge queues bisect mechanically — split the batch in half by position, test both halves. The split uses no information about which half is more likely to contain the culprit. Every bisection round costs a CI run.
CaseHub has trust scores. Every PR author has a Bayesian Beta score from Layer 6 that reflects their historical review outcomes. A low-trust author’s PR is statistically more likely to break the batch. By sorting PRs by trust before splitting, the low-trust PRs cluster in one sub-batch.
The worked example that convinced me: a batch of 8 where the culprit has trust 0.35 and everyone else is 0.7+. Trust-weighted bisection isolates the suspect half immediately. With two culprits, trust-weighted bisection clusters them in the same sub-batch — mechanical bisection can split them across halves, requiring both sub-trees to fully recurse.
The split itself is pluggable — a BisectionSplitStrategy SPI with three implementations. TrustWeightedSplitStrategy sorts by trust and splits at the midpoint. IsolateOutlierStrategy checks if one PR’s trust is >2σ below the batch mean and isolates it solo. BinarySplitStrategy is the traditional positional split, kept for benchmarking.
The schema alignment problem
The first version of the spec invented fields that don’t exist in the engine’s YAML schema. workerContext — not a real field. `` — Jinja-style templates in a JQ-only schema. M-of-N sub-case group fields (groupId, totalInGroup) that exist in the Java API but aren’t exposed in YAML.
Claude caught these during a schema-alignment review against the actual CaseDefinition.yaml. Every binding was checked against the schema’s unevaluatedProperties: false constraint. The fixes were structural, not cosmetic:
- Replaced
workerContextwith proper capabilityinputSchema/outputSchema— the pattern already established bypr-review.yaml - Separated bisection output keys (
bisectLeft/bisectRight) to avoid LAST_WRITER_WINS collision - Replaced manual retry counters with
outcomePolicyreroute — the engine already tracks this natively - Filed engine#574 to expose the M-of-N sub-case group fields in the YAML schema and fix per-child outputMapping
That last one is interesting. The engine’s runtime fully supports M-of-N grouped sub-cases — SubCaseExecutionHandler.handleGrouped(), SubCaseGroupRepository, SubCaseGroupPolicy all work. But CaseDefinitionYamlMapper.convertSubCase() doesn’t map the group fields. And SubCaseCompletionService.handleGroupedCompletion() only applies outputMapping from the child that triggers the M-of-N threshold — earlier-completing children’s outputs are silently lost. For bisection, where left and right halves write to different keys, the first child’s result disappears. ~35 lines to fix both.
The recursive sub-case problem
Bisection is naturally recursive. A batch case spawns two sub-batch cases (left half, right half). Each sub-batch is the same CasePlanModel — test, bisect further or reject. The recursion terminates when batch size hits 1.
The engine blocks this. SubCaseExecutionHandler has an explicit circular guard: if a case tries to spawn a sub-case with the same namespace/name/version, the binding faults. The guard prevents infinite recursion, but it also prevents legitimate recursive decomposition.
The fix is engine#573 — replace the hard block with a bounded depth limit. maxRecursionDepth: 10 on the SubCase definition handles batches up to 1024 PRs. The actual recursion depth for a batch of 10 is 3-4 levels. The change is ~30 lines and fully backward compatible — maxRecursionDepth: 0 (the default) preserves the current hard block.
Every path must terminate
The exhaustive state transition analysis found 15 distinct paths through the CasePlanModel. The first three spec versions had paths that hung:
- A batch of size 1 that fails its tip test — no goal fires, the sub-case never completes, the parent’s M-of-N group waits forever. Fix:
single-pr-rejectedsuccess goal. - Human approves merge after reroute exhaustion — nothing re-fires because
merge-batchrequires.mergeResult == null. Fix:merge-after-escalationbinding withcontextWritereset, following thesecurity-review-reduced-scopepattern frompr-review.yaml. - CI runners exhaust all reroutes — no binding or goal handles
.tipTest.status == "REROUTES_EXHAUSTED". Fix:tip-test-escalationhumanTask. - HumanTask SLA expires — writes
BLOCKED, notREJECTED. Every failure goal initially checked only forREJECTED. Fix: expand all failure conditions to includeBLOCKED.
Each of these follows a pattern already established in pr-review.yaml. The merge queue CasePlanModel is bigger — 11 bindings, 3 success goals, 3 failure goals — but the patterns are the same.
The ten architectural differences
Each capability reflects a structural difference between the traditional Bors-style approach and CaseHub’s CasePlanModel-based design:
| # | Capability | Traditional (Bors-style) | CaseHub |
|---|---|---|---|
| 1 | Strategy as data | Compiled code | CasePlanModel YAML, per repo at runtime |
| 2 | Trust-weighted batch composition | FIFO batching | Batch size ∝ min trust; low-trust PRs get smaller batches |
| 3 | Trust-weighted bisection | Positional binary split | Split by trust — isolate likely culprits first |
| 4 | Priority lanes with starvation prevention | FIFO ordering | Composite score with time-decay |
| 5 | Dependency-aware ordering | Author-managed | DAG from labels + git base-branch |
| 6 | SLA-bounded queue wait | Not tracked | WorkItem per queued PR, tiered escalation |
| 7 | Adaptive batch sizing | Static configuration | Batch size adapts to recent failure rate |
| 8 | Cryptographic merge audit | Application logs | MergeDecisionLedgerEntry in Merkle chain |
| 9 | Human oversight for high-risk merges | Uniform merge path | ActionRiskClassifier gates by risk level |
| 10 | Recursive auditable bisection | Internal to the algorithm | Every level in EventLog with causal chain |
The foundation for all ten already exists. Trust scores from Layer 6. SLA tracking from Layer 2. Typed agent messaging from Layer 3. Tamper-evident audit from Layer 4. Adaptive case management from Layer 5. The merge queue is the first feature that composes all six layers into a single capability.
Two foundation gates remain before implementation: engine#573 (recursive sub-case depth limit) and engine#574 (M-of-N YAML schema + per-child outputMapping). Both are small, well-scoped changes to the engine — the runtime already supports the semantics, the gaps are in the YAML schema and one output-mapping behavior fix.