Build answers from evidence, not memory.
A practical walkthrough of a .NET book-club RAG pipeline with Aspire, Qdrant, MinIO, Gemini or Ollama, and citation-backed answers.
Retrieval-augmented generation connects a language model to your own evidence.
Large language models are excellent at language, but they do not automatically know your private documents, your latest operational data, or the exact passages a user needs to trust an answer. RAG adds a retrieval step before generation: store source material, search for relevant chunks, pass those chunks to the model, and return an answer with citations.
That makes RAG necessary whenever the answer should be grounded in a changing or private corpus: policies, tickets, books, customer records, manuals, research notes, or internal knowledge bases. The model writes the response, but the retrieval layer decides what evidence it is allowed to use.
In practice, the hard parts are not only model selection. A professional RAG system must make retrieval inspectable, distinguish primary source evidence from generated support material, evaluate answer behavior against known questions, and put guardrails around cost, latency, and user input.
Series Overview
This guide explains the current sample project as a source-code-backed series. It is written for engineers who already know basic C# and ASP.NET Core, but are still learning how modern RAG systems are assembled and evaluated.
The goal is not to present a perfect production architecture. The goal is to show how the pieces connect, where the boundaries are, and why those boundaries matter when building a document-ingestion and question-answering system in .NET.
Project workflow
The project implements this workflow:
1. Upload PDF/TXT
2. Store original file in object storage
3. Track metadata in SQLite
4. Worker extracts text
5. Generate book-club literary artifacts
6. Chunk source text and artifacts
7. Generate embeddings
8. Store vectors and citation payloads in Qdrant
9. Retrieve relevant chunks for a question
10. Send chunks to an LLM
11. Return answer + citations
At a high level, the system has six responsibilities:
- Orchestration: Aspire starts the API, worker, Qdrant, MinIO, and optionally Ollama.
- User interaction: The API hosts the upload and chat UI.
- Durable state: SQLite tracks document status; MinIO stores originals; Qdrant stores vectors.
- Ingestion: The worker converts files into searchable chunks.
- Answering: The ask service retrieves evidence and asks an LLM to answer from that evidence.
- Evaluation and operations: Tests, diagnostics, provenance, request limits, logging, and delete/reindex controls make the sample inspectable instead of opaque.
The most important design choice is that the API and worker do not know model-specific request formats. They depend on interfaces such as IEmbeddingProvider, IChatCompletionProvider, and IVectorStore. The same idea now applies inside retrieval: token estimation, reranking, ingestion work discovery, document management, diagnostics, and evaluation all have explicit seams so the sample can teach the engineering decisions behind RAG, not just the happy-path flow.
This guide is now maintained as the narrative source of truth for the RAGPipeline learning project. It tracks the current source code directly, including retrieval diagnostics, generated-artifact provenance, citation labeling, request guardrails, evaluation tests, and operational seams.
The engineering habits behind credible RAG systems.
The technical chapters walk through the implementation, but the project is also meant to show how experienced engineers think about RAG: preserve source material, make derived artifacts explicit, inspect retrieval, evaluate behavior, and keep operational limits visible.
- Store original documents separately from vectors.
- Track ingestion as durable state.
- Keep long-running ingestion outside request/response paths.
- Use provider-neutral abstractions for AI services.
- Embed generated metadata, not only raw source text, and preserve provenance for it.
- Tune retrieval based on expected question types.
- Combine vector search with structured retrieval and simple exact-name fallback.
- Make retrieval inspectable with diagnostics and rank reasons.
- Return citations that distinguish source chunks from generated retrieval aids.
- Surface ingestion failures and progress to the UI.
- Use guardrails for question size, selected documents, retrieval expansion, and provider timeouts.
- Test RAG behavior with deterministic golden-question evaluations.
What still needs more rigor before real deployment.
This sample is a teaching project with production-shaped seams. It is useful for learning architecture, retrieval behavior, and evaluation habits, but it is not a secure or scalable deployment template by itself.
- Replace ad hoc SQLite schema updates with migrations.
- Add authentication, authorization, auditing, and retention policy.
- Support cloud object storage directly.
- Add provider implementations for Azure OpenAI, Bedrock, Vertex AI, or OpenAI.
- Add deeper observability around token usage, latency, provider errors, retrieval quality, and evaluation drift.
- Improve PDF extraction quality.
- Replace the database work source with queue infrastructure for multi-worker deployments.
- Add provider-compatible tokenization, optional model-based reranking, and citation faithfulness checks.
Solution Topology
Project layout and why ingestion runs outside the request path.
Aspire as the Local Control Plane
RAG.AppHost/AppHost.cs defines the local environment.
Shared Configuration and Contracts
Configuration and interfaces that keep workflow code provider-neutral.
Metadata with SQLite and EF Core
SQLite metadata for document status, progress, and lifecycle state.
Upload API and UI
The upload endpoint is in RAG.Api/Program.cs:
Object Storage with MinIO
Original files are stored in object storage before indexing. The local implementation is RAG.Core/Services/S3ObjectStorage.cs.
Worker Ingestion Pipeline
RAG.Worker/Worker.cs is a polling background service. Every configured interval, it asks IDocumentIngestionService to process pending documents.
Extracting and Chunking Text
Text extraction lives in RAG.Core/Services/TextExtractor.cs.
Literary Artifacts
Generated literary profiles improve broad book-club retrieval without pretending to be source evidence.
AI Provider Abstractions
The project supports Ollama and Gemini through provider implementations:
Qdrant Vector Storage
RAG.Core/Services/QdrantVectorStore.cs owns Qdrant interaction.
Ask Flow and Retrieval Strategy
Query expansion, reranking, diagnostics, and request guardrails in the ask path.
Prompting and Citations
Provider prompts turn selected evidence into answers with inspectable citations.
Testing the Pipeline
The tests are intentionally focused rather than exhaustive.
Local Development Notes
Commands and local URLs for Gemini, Ollama, Aspire, Qdrant, and MinIO.