The first failure I had to eliminate in the blog pipeline was not a bad paragraph. It was a bad evidence set. The system was finding a few nearby chunks, mistaking density for coverage, and then drafting as if that narrow slice represented the whole repository. That produces text that sounds confident right up until you compare it with the code. The fix was to stop treating topic selection like a writing problem and start treating it like a retrieval coverage problem.
That distinction matters. If the upstream evidence is thin, no amount of prompt polish saves the result. The draft will still overfit the first cluster of files that happened to match the query. I wanted the pipeline to be disciplined about breadth before it was creative about prose. So the gate moved earlier: query fan-out, file-path-aware dedupe, breadth validation, pinned excerpts, and only then the writing pass.
The gate lives before the writing step
In my pipeline, the important work happens before generation starts. The dispatcher is where the topic search fans out through multiple lanes: curated highlights, a fixed RAG query pool, and recent commit-derived queries. That is deliberate. A single semantic search tends to collapse into the same dense corners of the codebase, which is exactly where a system becomes persuasive and shallow at the same time.
The dispatcher does not need to know how the post will read yet. Its job is to prove that the candidate topic has enough distinct evidence behind it to deserve a draft. That means the retrieval layer has to do more than collect relevant chunks. It has to show spread. It has to show that the match did not come from one file, one subsystem, or one repetitive cluster of adjacent chunks.
That is why the shared blog utilities matter. The generator path imports checkRagSufficiency and fetchFailureEvidence from supabase/functions/_shared/blog-utils.ts, and that is the right place for this logic: the part of the system that decides whether retrieval is good enough should sit close to the code that evaluates it.
That flow is the real control surface. The generator is downstream of a decision that already happened: is the evidence wide enough to trust?
Three query lanes, three different jobs
The query fan-out is not random, and it is not a single blended prompt pretending to be a strategy. I built it as three separate signals because each one catches a different failure mode.
The curated highlight queries keep the system anchored in the kinds of features and systems I already know are worth revisiting. Those are the posts that usually come from places I have touched repeatedly: workflow orchestration, retrieval, caching, parsing, state management, security boundaries, or data transformation. They help the pipeline remember what is already interesting in the repository family.
The fixed query pool is the broadest lane. It exists to force coverage across architectural themes and implementation patterns rather than letting one topic family dominate. This is the part that looks for the general system shape: event-driven flows, retrieval logic, prompt construction, orchestration, caching, retry paths, auth boundaries, model inference, ETL, and state machines. If I let the selector live only inside the curated highlights, it would become too self-referential. If I let it live only inside the fixed pool, it would become too generic. The combination is what keeps the output grounded and varied.
The recent commit queries add the temporal dimension. They bias the selector toward what actually changed recently instead of letting the system drift into evergreen topics that no longer reflect the repository’s current shape. That matters because the most obvious topic is often the wrong one when recent work has shifted the architecture. A topic can be semantically relevant and still be stale in practice.
The point of splitting those lanes is simple: no single lane is trusted to decide the topic alone. They feed the same retrieval pass, but they do so for different reasons. One lane preserves editorial continuity. One lane broadens architectural search. One lane keeps the system current. When those three are merged, I get a candidate set that is much harder to fool with local similarity alone.
Why I dedupe by repo and file path, not just text similarity
Once retrieval returns a pile of chunks, the next problem is repetition. Similarity search loves repetition. A single file can dominate a result set by surfacing multiple overlapping excerpts, especially when the file is dense or when several queries land in the same section of code. If I let that happen, the draft starts building itself around one artifact instead of one system.
That is why file-path-aware dedupe matters so much. I want repeated hits from the same repo and file path to collapse early. I do not care whether five adjacent chunks all sound relevant if they are all pointing at the same paragraph of the same file. What I care about is whether the sample spans distinct parts of the codebase.
This is also why repo identity belongs in the dedupe key. In a multi-repo setup, two chunks can look similar for completely different reasons. They may both describe retrieval logic, orchestration, or prompt shaping, but they live in different systems and should not be treated as interchangeable evidence. Repo plus file path tells me whether the system is sampling breadth or simply rediscovering the same neighborhood under different search terms.
The practical effect is that the retrieval layer becomes less greedy. It stops rewarding the first obvious cluster with extra representation. It stops counting repeated evidence as coverage. That makes the candidate set smaller, but it also makes it much more trustworthy.
There is a second-order benefit here too: dedupe reduces the risk that a single implementation detail becomes the skeleton of the whole post. Without it, the draft can end up over-explaining one helper, one file, or one branch of logic simply because retrieval happened to hit it several times. Dedupe breaks that bias before the generator ever sees the prompt.
Breadth is not a vibe; it is a threshold
After dedupe, I do not ask whether the chunks feel diverse. I measure whether the sample is wide enough to support a post. That is the entire point of the breadth gate. It exists to reject candidate sets that are semantically plausible but structurally weak.
This is where checkRagSufficiency fits into the pipeline. The name is exactly what the behavior needs to be: a sufficiency check. If the retrieved set cannot prove enough spread across the repository, it should not advance. The system should fail closed, not guess.
The important thing about the breadth threshold is that it changes the meaning of retrieval. Retrieval is no longer a convenience layer that collects whatever looks closest. It becomes a gate that must establish evidence quality before writing begins. That changes the failure mode from "draft written from a narrow slice" to "candidate rejected because the sample is too narrow." I will take the second failure every time.
That rejection path matters in practice. It catches cases where the query pool lands too hard in one subsystem, cases where recent changes dominate the semantic neighborhood, and cases where one file generates too many overlapping hits. A lot of retrieval bugs look like success until the prompt is assembled. The breadth check is the thing that stops those bugs from turning into published text.
I also like that the failure can produce concrete evidence. fetchFailureEvidence belongs in the same shared utility layer because it gives me a way to inspect why a candidate was rejected. That is useful during tuning. If a topic keeps failing breadth, I can see whether the issue is query bias, inadequate file-path diversity, or a retrieval window that is too small for the amount of material I want to cover.
The stage-compose step adds a second guardrail
File-path retrieval and semantic retrieval are not competing systems. File-path retrieval gives me boundary-aware evidence. Semantic retrieval gives me breadth across related concepts. In blog-stage-compose, I merge both and dedupe a second time, because the two strategies can land on the same excerpt from different angles — and the evidence set should not regress back into repetition right before drafting.
Pinned excerpts keep the evidence from drifting
Once a candidate survives the coverage gate, I pin the excerpts that explain why the topic is worth writing about. That is not a cosmetic step. It is a stability step. Without pinned evidence, the drafting stage has too much freedom to wander away from the exact chunks that earned the topic in the first place.
Pinned excerpts act like an anchor for the generation pass. They preserve the evidence trail. They keep the prompt honest about the source material the draft was built from. That matters because the strongest failure mode in a retrieval-driven blog system is not complete hallucination. It is drift: the draft starts from real evidence, then gradually generalizes beyond what was actually retrieved.
Pinning the surviving excerpts makes that harder. It forces the generation stage to stay connected to the specific implementation details that passed the gate. It also makes review easier, because I can inspect exactly which chunks were considered important enough to carry forward.
The combination of pinning and breadth checking is what gives the pipeline its shape. Breadth says, "This sample is wide enough." Pinning says, "These are the exact pieces that justify the topic." Together, they stop the generator from inventing confidence where the retrieval layer did not earn it.
Why I prefer rejection over a weak draft
I am completely comfortable with a pipeline that says no. In fact, I want it to say no when the evidence is bad. A weak candidate should not be rescued by a polished prompt. If the retrieval set is narrow, the honest response is rejection.
That discipline keeps the blog output specific. It keeps the writing tied to actual systems instead of generic patterns. It also saves me from having to edit around a draft that was born from a bad evidence shape. A rejected topic costs less time than a published post that looks right but misses the structure of the thing it claims to describe.
This is especially important in a multi-repo environment. Once you have a handful of systems with overlapping concepts, retrieval can become too eager to collapse them into one theme. A good gate has to resist that collapse. It has to respect repository boundaries, file boundaries, and evidence density boundaries. If those boundaries are not visible in the sample, the draft should not happen yet.
That is the philosophical change I stopped treating as optional. The system does not owe me a draft. It owes me a trustworthy sample. The sample has to prove that the topic is broad enough, current enough, and distinct enough to justify writing.
The actual win is not creativity; it is control
What changed here was not my ability to generate prose. What changed was the quality of the evidence stack that the prose starts from. The dispatcher fans out through curated highlights, a fixed query pool, and recent commit queries. The retrieval layer dedupes by stable keys. The sufficiency gate checks for breadth. Stage-compose merges file-path and semantic evidence and dedupes again. Pinned excerpts keep the final prompt from drifting.
That sequence turns retrieval into a control system instead of a suggestion engine. It enforces a minimum standard before the draft is allowed to exist. That is the kind of discipline a blog pipeline needs if it is going to write about real systems with real precision.
The result is not just fewer bad drafts. It is a pipeline that knows what it does not know early enough to stop itself. When the evidence is wide, the writing stage does the part it is good at. When the evidence is narrow, the system does the part I am more grateful for: it refuses to pretend it knows.
