Chapter 14

Testing the Pipeline

The tests are intentionally focused rather than exhaustive.

Run the test suite from the repo root

dotnet test RAGPipeline.sln

Full source: RAG.Tests/TextChunkerTests.cs Full source: RAG.Tests/AiProviderRegistrationTests.cs Full source: RAG.Tests/ChatAnswerServiceTests.cs Full source: RAG.Tests/DocumentIngestionServiceTests.cs

The tests are intentionally focused rather than exhaustive.

TextChunkerTests checks chunk sizing and overlap behavior.

AiProviderRegistrationTests verifies provider selection through configuration.

ChatAnswerServiceTests verifies:

retrieved chunks become citations;
broad character questions expand retrieval;
protagonist questions can use document profiles;
comparison questions retrieve evidence for each named subject;
unrelated documents are filtered from comparison context;
citation provenance is returned for generated artifacts;
diagnostics are included when requested;
question length, selected-document, retrieval-query, and timeout guardrails behave correctly.

Golden-Question Evaluation

RAG.Tests/Evaluation/RagEvaluationTests.cs is a small RAG evaluation harness. It uses deterministic fake embeddings, vector search, and chat completion so it can run in ordinary unit tests without Docker, Qdrant, Ollama, Gemini, or network access.

The golden cases check direct factual retrieval, broad literary retrieval, cross-document comparison, selected-document constraints, no-evidence handling, and generated-artifact citation labeling. The assertions focus on selected context and citation types because those are the parts a deterministic test can judge reliably.

public sealed record RagEvaluationCase(
    string Name,
    string Question,
    IReadOnlyList<Guid>? DocumentIds,
    IReadOnlyList<string> ExpectedFileNames,
    IReadOnlyList<string> ExpectedTermsInSelectedContext,
    bool RequiresSourceCitation,
    bool AllowsGeneratedArtifactCitation);

This does not replace human review of answer quality, but it gives the project a regression suite for retrieval behavior. That is a major step beyond manual "ask it a few questions" testing.

Additional Production-Seam Tests

DatabaseIngestionWorkSourceTests, DocumentManagementServiceTests, HeuristicRetrievalRerankerTests, and QdrantVectorStoreTests cover the new seams for polling replacement, delete/reindex state transitions, heuristic reranking, and vector-payload provenance.

Run tests with:

dotnet test RAGPipeline.sln

The project also benefits from manual testing because RAG behavior depends on real documents, model output, and vector search quality.

Recommended manual checks:

Upload a short TXT file.
Upload a PDF book.
Confirm progress moves through ingestion stages.
Ask a direct question about one document.
Ask a broad character question.
Ask a cross-document comparison question.
Confirm citations reference the expected documents.

PreviousPrompting and Citations NextLocal Development Notes