The Guard That Stops the Post Is the Post
The most dangerous bug in an automated publishing system is not the crash. It is the silent success.
That is what was missing from publish.run(). The warmup phase exists specifically to throttle output while a system is ramping: fewer posts, slower cadence, conservative limits. Comments had a ceiling guard. Conversations had one too. Posts? Posts just fired. Whatever phase the system was in, the publish loop loaded state, checked the slot, and went. The warmup ceiling was advisory for two thirds of the system and completely invisible to the third.
The fix is embarrassingly small:
s = state.load()
if warmup.ceiling_reached(s):
return WarmupCeilingOutcome()
Three lines, placed right after state.load(), before any slot logic. That is the whole change. The warmup import gets added, the early return goes in, done.
But this was P7 not P1. We fixed this same thing in P6. It did not land.
The failure mode in P6 was a worktree mixup. Multiple agents working across multiple worktrees, and the diff that reached the review gate belonged to P5, not P6. The reviewer saw a plausible diff. The gate passed it. Nobody caught that the code was stale. The change was marked complete and never actually applied. So we debugged the same missing guard twice.
This is the part of build logs that never makes it into the summary: the recovery runs. P7 recovery. Same feature, same plan, second attempt because the process broke somewhere upstream of the code.
The lesson is not that you need to be more careful. Careful does not scale, especially in multi agent workflows. The lesson is that the gate needs to verify provenance, not just plausibility. A gate that accepts stale evidence from the wrong worktree is not a safety check. It is a rubber stamp with extra latency.
On the ceiling guard itself: the reason comments and conversations had this guard and publish did not comes down to shipping order. You build the highest cadence, most exposed surfaces first. Comments fire constantly. Conversations need to pace against reply windows. Posts feel safer because they are less frequent, so the ceiling logic went into P8 and P9 and publish inherited only the slot checks without inheriting warmup awareness.
That is the real bug. Not missing code. Missing consistency. Three subsystems, three entry points, two of them phase aware, one not. Anyone reading that code later has to figure out which subsystems respect which gate and will guess wrong at least once. Inconsistency is a trap you leave for yourself.
The right invariant is: every publishing path checks the ceiling before doing anything else. Not just the paths that obviously need it. All of them. If a warmup phase exists, every write surface should know about it on the same day you introduce the concept.
What I would do differently: ship the ceiling guard to all three entry points in a single commit, before any subsystem goes live. The temptation is to guard only what is visibly causing problems right now. The correct move is to guard the full surface when you establish the phase model. Doing it incrementally just means you ship an inconsistent system and then spend P7 fixing P6.
The three line guard that short circuits a post before anything else runs is not the interesting part of this feature. The interesting part is that it took two phases to ship it, and the reason was process failure not code failure.
Build the gate right the first time. Verify the diff came from the right tree. Boring infrastructure is where the real time goes.
Write a comment