A retrieval system returned the right embedding, the right similarity score, and a completely wrong answer. The chunk it surfaced shared vocabulary, naming conventions, even architectural patterns with the query — but it came from the wrong repository. Nothing crashed. Nothing timed out. The system just lied with perfect confidence, and I did not catch it for two days.
That is the most dangerous failure mode in vector search: plausible neighbors from the wrong scope. The embedding math was fine. The rows were valid. The search logic was simply assembling too many decisions in too many places, and by the time the request reached PostgreSQL, the database had to guess which parts belonged together.
I fixed the problem by making the RPC the single source of truth. The caller sends the query embedding, the candidate count, and the JSONB filter together. The SQL function receives those same values together. The index is built for the same embedding column the function reads. Once those pieces move as one unit, the path stops drifting.
The bug was scope drift, not similarity math
The first mistake was treating search as a handful of knobs instead of a single request object. If the embedding is built in one place, the metadata filter is built in another, and the candidate depth is decided a layer above that, the call boundary becomes mushy. The function still runs, but nobody can point to one object and say: this is the exact search intent.
That matters most when retrieval is scoped. In this codebase, the filter is not an afterthought. It can describe a repo, a file path, a language, a type, or any combination of metadata fields that belong in the same search slice. If I am looking for TypeScript files in a specific repository, I want that to be expressed as part of the same request that carries the vector. I do not want the application to infer scope from session state, hidden defaults, or a previous call.
The bug showed up as plausible-but-wrong neighbors because vector similarity is happy to rank related text from the wrong place. That is what makes retrieval bugs so slippery. The results do not look random. They look close enough to distract you. A chunk from the wrong repo can still share concepts, terminology, or naming conventions with the query. If the filter is applied too late, the wrong row can look like a good answer until you inspect the metadata closely.
The fix was to make the request shape boring and explicit. The caller decides the search scope. The database enforces that scope. The index serves that same scope. There is no second pass that tries to patch over a weaker request after the fact.
The caller sends one object
The wrapper I use is intentionally small. It does not hide the request shape, and it does not smuggle in extra search behavior. It passes the vector, the count, and the filter straight into search_embeddings.
import { SupabaseClient } from '@supabase/supabase-js';
type SearchFilter = Record<string, unknown>;
interface SearchResult {
id: string;
document_id: string;
chunk_text: string;
similarity: number;
metadata: Record<string, unknown>;
}
export async function runSearch(
client: SupabaseClient,
queryEmbedding: number[],
matchCount = 5,
filter: SearchFilter = {}
): Promise<SearchResult[]> {
const { data, error } = await client.rpc<SearchResult>('search_embeddings', {
query_embedding: queryEmbedding,
match_count: matchCount,
filter,
});
if (error) {
throw error;
}
return data ?? [];
}
I like this shape because it is hard to misunderstand. The only inputs that matter at call time are the embedding, the number of rows to return, and the structured scope filter. If I need to search within one repo, I pass a repo filter. If I need to narrow by language, I add a language key. If I need to restrict by file type or path, I add that too. The caller is not building a query plan; it is declaring intent.
The same function can be called with a narrow filter or an empty one. An empty filter means search the whole corpus. A populated filter means search the subset that matches the metadata predicate. That is a very different outcome, and I want that difference visible right where the request is created.
A typical call site stays just as clear:
const results = await runSearch(supabase, embedding, 8, {
repo: 'the author/portfolio',
language: 'typescript',
});
That is the point. The search boundary should be obvious at a glance. If I am debugging a bad answer later, I should be able to inspect one payload and know exactly what the database was asked to do.
The database function matches the same interface
On the PostgreSQL side, the search_embeddings function accepts the same three inputs the caller sends. The metadata filter stays inside SQL, where it belongs. The rows are filtered first, then ranked by vector distance.
CREATE EXTENSION IF NOT EXISTS vector;
CREATE OR REPLACE FUNCTION search_embeddings(
query_embedding vector,
match_count integer DEFAULT 5,
filter jsonb DEFAULT '{}'::jsonb
)
RETURNS TABLE (
id uuid,
document_id uuid,
chunk_text text,
similarity double precision,
metadata jsonb
)
LANGUAGE sql
STABLE
AS $$
SELECT
e.id,
e.document_id,
e.chunk_text,
1 - (e.embedding <=> query_embedding) AS similarity,
e.metadata
FROM embeddings e
WHERE filter = '{}'::jsonb OR e.metadata @> filter
ORDER BY e.embedding <=> query_embedding
LIMIT match_count;
$$;
CREATE INDEX IF NOT EXISTS idx_embeddings_embedding
ON embeddings
USING hnsw (embedding vector_cosine_ops);
The important line is the metadata predicate: e.metadata @> filter. That is not a clean-up step after ranking. It is part of the search itself. Rows that do not belong to the requested scope never enter the ranked candidate set.
That design matters because the database is the only place that can apply the filter consistently at the same moment it applies similarity. If the application filters after ranking, the query can still surface neighbors from the wrong scope first. If the filter is inside SQL, then ranking only happens across rows that already belong to the same metadata neighborhood as the request.
The similarity field is there for the caller's benefit. Internally, cosine distance still drives the ordering. Externally, I want a score where larger reads as better. Returning both the score and the raw chunk text gives the next stage enough context to render, inspect, or rerank without another round trip.
The STABLE marker also fits the way I use the function. For a fixed snapshot and a fixed input payload, this is a deterministic retrieval step. It is not a side-effect machine. It is a search function.
Why JSONB belongs in the request, not outside it
The filter is JSONB because the scope is structured, not free-form. A metadata filter can say more than one thing at once, and it needs to do that without collapsing into a pile of ad hoc parameters. The same object can describe a repository slice, a file path constraint, a language constraint, or a file-type constraint.
That gives me a single place to express the search boundary. If I need to retrieve only TypeScript chunks from a particular repository, I do not want to assemble a special query for that case. I want to build a JSONB object like { repo: 'the author/portfolio', language: 'typescript' } and pass it straight through. The SQL predicate then enforces exactly that constraint.
This is also the reason I keep the filter visible at the RPC boundary instead of burying it in a helper that silently rewrites inputs. Hidden rewrite logic is how search calls become hard to reason about. The JSONB object is simple enough to inspect, easy to log, and unambiguous in SQL.
There is a second benefit that shows up during debugging. When a query returns too much, I can loosen the filter and see the effect immediately. When it returns too little, I can inspect the metadata keys that are actually present in the table. Because the filter is part of the request, there is no mystery about which layer decided the corpus was too broad or too narrow.
The same applies when I expand the metadata model. If I start attaching more structure to a chunk, such as file type or path segments, I do not need to change the RPC signature. I update the JSONB shape, then let the same search_embeddings function enforce the new predicate. The request boundary stays stable while the metadata vocabulary evolves.
Why the index is part of the same story
I do not think about the index as a separate optimization pass. It is part of the same guarantee that the RPC makes. If the function says it will search the embeddings table with cosine distance, the index should be built for that exact access path.
That is why the HNSW index sits beside the function in my mental model. The function defines the agreement. The index makes that agreement fast enough to use all the time. vector_cosine_ops matches the ranking strategy, so the storage layer is not fighting the retrieval layer.
The nice thing about HNSW here is that it matches the shape of the workload I care about: lots of dense vector searches, with a metadata filter that keeps the working set scoped before ranking. I am not trying to make the index do the job of the filter. I am making both pieces do the job they are good at. The metadata predicate narrows the rows. The vector index ranks the rows that remain.
That separation is what keeps the system predictable. If I ever need to inspect performance, I know where to look. If the wrong neighbors are showing up, I inspect the filter. If the right neighbors are slow, I inspect the index and the shape of the vector column. The responsibilities do not blur together.
What went wrong the first time
The first broken version of this path made the request feel more flexible than it really was. The caller knew one thing, the search function inferred another, and the database had to reconcile them later. That is exactly the sort of accidental complexity that makes retrieval bugs hard to pin down.
The visible symptom was a result set that looked sane at a glance. The hidden problem was that the request did not fully describe the scope of the search. That is why the bug survived long enough to matter. Nothing crashed. Nothing timed out. The system just answered the wrong question with confidence.
Once I stopped spreading that decision across layers, the failure mode disappeared. The request object became the single place where I could answer three questions at once:
- What text or embedding is this search about?
- How deep should the candidate set be?
- Which metadata fields are allowed to participate?
That is a much better debugging surface than a trail of local variables and implicit defaults. If I have to reason about a bad answer, I want to reason about one request object and one SQL function. That is enough.
Why I trust the boundary now
The version I trust is the one where the search request is explicit enough that the database never has to guess. The caller passes the vector, the candidate count, and the JSONB filter in one place. The SQL function applies the filter inside the ranking query. The HNSW index is built on the same embedding column the function reads. Every step agrees on the same shape.
That is the whole reason I keep search scope inside a single Supabase RPC. Not because it is fashionable, and not because it makes the code shorter, but because it keeps the search intent attached to the request that asked for it. The RPC boundary becomes the line where scope is declared and enforced.
Once I made that change, retrieval stopped feeling like a chain of guesses and started feeling like a reliable interface again. That matters in a system where a correct answer is only useful if it comes from the right slice of data. In the next pass, I am going to push that same discipline further into the ingestion side, because retrieval only stays honest when the embeddings and metadata that feed it are just as deliberate.
