Chapter 12

Ask Flow and Retrieval Strategy

How the ask path expands queries, reranks evidence, exposes diagnostics, and enforces request guardrails.

The ask endpoint delegates retrieval and answer generation

app.MapPost("/api/ask", async (AskRequest request, IChatAnswerService chatAnswerService, CancellationToken cancellationToken) =>
{
    var response = await chatAnswerService.AskAsync(request, cancellationToken);
    return Results.Ok(response);
});

Full source: RAG.Api/Program.cs Full source: RAG.Core/Services/ChatAnswerService.cs Full source: RAG.Tests/ChatAnswerServiceTests.cs

How the ask path expands queries, reranks evidence, exposes diagnostics, and enforces request guardrails.

POST /api/ask

It accepts:

{
  "question": "Can you compare Calpurnia and Hermione?",
  "documentIds": null,
  "includeDiagnostics": false
}

RAG.Core/Services/ChatAnswerService.cs handles the workflow:

validate the question;
build multiple retrieval queries;
embed each query;
search Qdrant;
detect comparison-style questions;
extract named subjects;
add matching literary profiles;
add exact-name chunks;
rank and filter candidates;
send selected chunks to the chat provider;
return answer and citations.

The ask path now enforces configured guardrails before retrieval begins: the question must be present, the question length must fit MaxQuestionCharacters, selected document IDs must fit MaxSelectedDocuments, generated retrieval queries are capped by MaxRetrievalQueries, and provider calls run with a linked timeout from ProviderTimeoutSeconds.

Production note: these limits do not replace authentication, authorization, rate limiting, or billing controls. They are local safety rails that keep a tutorial app from accepting unbounded work before it starts embedding queries or calling a model provider.

Broad literary questions get expanded queries:

literary book club profile protagonists major characters themes...
main characters protagonists central people important names...
who are the key people and what roles do they have...

Retrieval Is a Ranking Problem

The ask service now separates three ideas that are often blurred in simple RAG demos: candidate generation, reranking, and context selection. Semantic search creates candidates, IRetrievalReranker assigns final ranks with reasons, and context selection deduplicates and trims the final evidence sent to the chat model.

The default HeuristicRetrievalReranker starts with the vector score, then adds explicit boosts for generated profiles, named-subject query matches, and comparison named-subject matches. It returns RankedChunk records with rank reasons, which later appear in diagnostics.

Comparison questions get extra handling:

terms like similarities, compare, between, both, contrast, and differences activate comparison mode;
capitalized names are treated as named subjects;
unrelated low-rank documents are filtered out;
citations are returned for the full context sent to the LLM.

This design is still generic. It is not hardcoded to Calpurnia, Hermione, Harry Potter, or Eisenhorn. It uses question shape and names to retrieve better evidence.

Retrieval Diagnostics

For local debugging, callers can set includeDiagnostics on POST /api/ask or call the dedicated debug endpoint:

POST /api/ask/debug

Diagnostics include expanded query text, named-subject extraction, comparison-mode detection, raw vector scores, final ranks, rank reasons, selected context, and whether candidates were filtered by comparison threshold, deduplication, or context limits.

This is one of the most important professional additions. When an answer is bad, an engineer can inspect whether retrieval found the wrong chunks, ranking preferred the wrong evidence, the context budget dropped useful material, or the model ignored good evidence.

PreviousQdrant Vector Storage NextPrompting and Citations