Guide Home 1. Solution Topology 2. Aspire as 3. Shared Configuration 4. Metadata with 5. Upload API 6. Object Storage 7. Worker Ingestion 8. Extracting and 9. Literary Artifacts 10. AI Provider 11. Qdrant Vector 12. Ask Flow 13. Prompting and 14. Testing the 15. Local Development All Guides
Introduction to RAG

Build answers from evidence, not memory.

A practical walkthrough of a .NET book-club RAG pipeline with Aspire, Qdrant, MinIO, Gemini or Ollama, and citation-backed answers.

Illustration of a retrieval augmented generation pipeline from documents to citations.
What is RAG?

Retrieval-augmented generation connects a language model to your own evidence.

Large language models are excellent at language, but they do not automatically know your private documents, your latest operational data, or the exact passages a user needs to trust an answer. RAG adds a retrieval step before generation: store source material, search for relevant chunks, pass those chunks to the model, and return an answer with citations.

That makes RAG necessary whenever the answer should be grounded in a changing or private corpus: policies, tickets, books, customer records, manuals, research notes, or internal knowledge bases. The model writes the response, but the retrieval layer decides what evidence it is allowed to use.

In practice, the hard parts are not only model selection. A professional RAG system must make retrieval inspectable, distinguish primary source evidence from generated support material, evaluate answer behavior against known questions, and put guardrails around cost, latency, and user input.

UploadExtractChunkEmbedRetrieveAnswerEvaluate

Series Overview

This guide explains the current sample project as a source-code-backed series. It is written for engineers who already know basic C# and ASP.NET Core, but are still learning how modern RAG systems are assembled and evaluated.

The goal is not to present a perfect production architecture. The goal is to show how the pieces connect, where the boundaries are, and why those boundaries matter when building a document-ingestion and question-answering system in .NET.

Project workflow

The project implements this workflow:

1. Upload PDF/TXT
2. Store original file in object storage
3. Track metadata in SQLite
4. Worker extracts text
5. Generate book-club literary artifacts
6. Chunk source text and artifacts
7. Generate embeddings
8. Store vectors and citation payloads in Qdrant
9. Retrieve relevant chunks for a question
10. Send chunks to an LLM
11. Return answer + citations

At a high level, the system has six responsibilities:

The most important design choice is that the API and worker do not know model-specific request formats. They depend on interfaces such as IEmbeddingProvider, IChatCompletionProvider, and IVectorStore. The same idea now applies inside retrieval: token estimation, reranking, ingestion work discovery, document management, diagnostics, and evaluation all have explicit seams so the sample can teach the engineering decisions behind RAG, not just the happy-path flow.

This guide is now maintained as the narrative source of truth for the RAGPipeline learning project. It tracks the current source code directly, including retrieval diagnostics, generated-artifact provenance, citation labeling, request guardrails, evaluation tests, and operational seams.

What this guide teaches

The engineering habits behind credible RAG systems.

The technical chapters walk through the implementation, but the project is also meant to show how experienced engineers think about RAG: preserve source material, make derived artifacts explicit, inspect retrieval, evaluate behavior, and keep operational limits visible.

  • Store original documents separately from vectors.
  • Track ingestion as durable state.
  • Keep long-running ingestion outside request/response paths.
  • Use provider-neutral abstractions for AI services.
  • Embed generated metadata, not only raw source text, and preserve provenance for it.
  • Tune retrieval based on expected question types.
  • Combine vector search with structured retrieval and simple exact-name fallback.
  • Make retrieval inspectable with diagnostics and rank reasons.
  • Return citations that distinguish source chunks from generated retrieval aids.
  • Surface ingestion failures and progress to the UI.
  • Use guardrails for question size, selected documents, retrieval expansion, and provider timeouts.
  • Test RAG behavior with deterministic golden-question evaluations.
Production hardening

What still needs more rigor before real deployment.

This sample is a teaching project with production-shaped seams. It is useful for learning architecture, retrieval behavior, and evaluation habits, but it is not a secure or scalable deployment template by itself.

  • Replace ad hoc SQLite schema updates with migrations.
  • Add authentication, authorization, auditing, and retention policy.
  • Support cloud object storage directly.
  • Add provider implementations for Azure OpenAI, Bedrock, Vertex AI, or OpenAI.
  • Add deeper observability around token usage, latency, provider errors, retrieval quality, and evaluation drift.
  • Improve PDF extraction quality.
  • Replace the database work source with queue infrastructure for multi-worker deployments.
  • Add provider-compatible tokenization, optional model-based reranking, and citation faithfulness checks.
Decorative chapter image for Solution TopologyChapter 1

Solution Topology

Project layout and why ingestion runs outside the request path.

Decorative chapter image for Aspire as the Local Control PlaneChapter 2

Aspire as the Local Control Plane

RAG.AppHost/AppHost.cs defines the local environment.

Decorative chapter image for Shared Configuration and ContractsChapter 3

Shared Configuration and Contracts

Configuration and interfaces that keep workflow code provider-neutral.

Decorative chapter image for Metadata with SQLite and EF CoreChapter 4

Metadata with SQLite and EF Core

SQLite metadata for document status, progress, and lifecycle state.

Decorative chapter image for Upload API and UIChapter 5

Upload API and UI

The upload endpoint is in RAG.Api/Program.cs:

Decorative chapter image for Object Storage with MinIOChapter 6

Object Storage with MinIO

Original files are stored in object storage before indexing. The local implementation is RAG.Core/Services/S3ObjectStorage.cs.

Decorative chapter image for Worker Ingestion PipelineChapter 7

Worker Ingestion Pipeline

RAG.Worker/Worker.cs is a polling background service. Every configured interval, it asks IDocumentIngestionService to process pending documents.

Decorative chapter image for Extracting and Chunking TextChapter 8

Extracting and Chunking Text

Text extraction lives in RAG.Core/Services/TextExtractor.cs.

Decorative chapter image for Literary ArtifactsChapter 9

Literary Artifacts

Generated literary profiles improve broad book-club retrieval without pretending to be source evidence.

Decorative chapter image for AI Provider AbstractionsChapter 10

AI Provider Abstractions

The project supports Ollama and Gemini through provider implementations:

Decorative chapter image for Qdrant Vector StorageChapter 11

Qdrant Vector Storage

RAG.Core/Services/QdrantVectorStore.cs owns Qdrant interaction.

Decorative chapter image for Ask Flow and Retrieval StrategyChapter 12

Ask Flow and Retrieval Strategy

Query expansion, reranking, diagnostics, and request guardrails in the ask path.

Decorative chapter image for Prompting and CitationsChapter 13

Prompting and Citations

Provider prompts turn selected evidence into answers with inspectable citations.

Decorative chapter image for Testing the PipelineChapter 14

Testing the Pipeline

The tests are intentionally focused rather than exhaustive.

Decorative chapter image for Local Development NotesChapter 15

Local Development Notes

Commands and local URLs for Gemini, Ollama, Aspire, Qdrant, and MinIO.