Two layers to keep a content engine in its lane
Two functions went into core this week. topic_guardrails_directive() is a prompt fragment that rides along in every generation call. violates_topic_policy() is a deterministic lexical check in pure Python with no model dependency. They do different jobs and they fail in opposite ways. That is the point.
Here is the problem they solve. The voice profile that drives my content engine is deliberately opinionated. It has to be to produce writing that sounds like a real person. But a profile loaded with opinions will bleed those opinions into posts that have no business touching them. Posts about LLM inference costs were pulling in angles that belonged nowhere near a technical topic. The model was accurate to the profile and wrong for the context.
The prompt layer
topic_guardrails_directive() returns a text block that defines exactly where the engine operates: software engineering, AI tooling, coding agents, shipping products, crypto and web3, markets and economics. Outside those domains the voice is a curious generalist, not an authority. No expert posturing on medicine, law, biology, or anything outside the author’s actual work.
This fragment goes into build_voice_block(), which means every engine, every post, and every comment carries it. Not added per call. Structural.
The directive also carves economics out of a broader category that the profile includes but the engine should not touch. Markets, macro, monetary policy, trade, taxes, regulation: fair game when argued from data and incentives. Everything else in that category: skip. The directive explicitly supersedes the profile when they conflict.
The lexical gate
Prompt injection is probabilistic. The model can drift, or combine two signals from the profile in a way the directive did not predict.
So I added violates_topic_policy() as a pure Python deterministic check with no model dependency.
The reason for no model dependency matters more than it might look. Every other critic in this pipeline shells out to claude p. If the CLI is down, those critics pass through silently, because an infra failure should not block publishing. That logic is correct for critics. A guardrail is different. A guardrail that passes through on infra failure is not a guardrail; it is a suggestion.
Pure Python never has that failure mode. The gate runs or the program crashes. There is no silent pass.
The tradeoff
Lexical gates are blunt. They cannot read intent. A post about monetary policy could hit a pattern that was meant to catch something more obviously off topic. I tuned the pattern list conservatively to keep false positives low, so some edge cases will slip the first version. Later tightens will catch them.
The two layers have complementary failure modes. The prompt can drift or misfire, but it cannot go down. The gate can be too broad, but it is always on. You want both because the ways they fail do not overlap.
What I would do differently
Separate the voice profile from the generation rules before writing the first line of the profile.
The profile carries opinions that are accurate for the author but wrong for most posts. Loading it wholesale and then patching in a directive to suppress the problematic parts is backwards. The cleaner design loads the profile for tone, registers guardrails as a separate constraint layer, and lets them compose at call time instead of having one block override another inside a single concatenated string.
That would have made the guardrails easier to test, easier to update without touching the profile, and easier to audit when something slips through. Instead I am patching them in after the fact.
It works. It is not clean. These two things happen to be compatible, which is not always the case but this time it is.
Write a comment