Chapter 2

Aspire as the Local Control Plane

RAG.AppHost/AppHost.cs defines the local environment.

AppHost wires local dependencies into API and worker

var qdrant = builder.AddContainer("qdrant", "qdrant/qdrant")
    .WithHttpEndpoint(port: 6333, targetPort: 6333, name: "http")
    .WithVolume("rag-qdrant-data", "/qdrant/storage");

var minio = builder.AddContainer("minio", "minio/minio")
    .WithArgs("server", "/data", "--console-address", ":9001")
    .WithHttpEndpoint(port: 9000, targetPort: 9000, name: "api");

Full source: RAG.AppHost/AppHost.cs Full source: RAG.AppHost/appsettings.json

RAG.AppHost/AppHost.cs defines the local environment.

It starts:

Qdrant on ports 6333 and 6334.
MinIO on ports 9000 and 9001.
API on http://127.0.0.1:5080/.
Worker as a background process.
Ollama only when Gemini is not selected.

The AppHost also creates persistent Docker volumes:

rag-qdrant-data
rag-minio-data
rag-ollama-data

Those volumes allow indexed vectors, uploaded files, and downloaded Ollama models to survive container restarts.

Production note: these services are exposed on fixed local ports and MinIO uses sample credentials. That is acceptable for this learning project, which is not intended for production, but a real deployment should use private networking, managed secrets, and locked-down service access.

Aspire injects configuration into the API and worker through environment variables:

.WithEnvironment("Rag__Qdrant__BaseUrl", qdrant.GetEndpoint("http"))
.WithEnvironment("Rag__Storage__ServiceUrl", minio.GetEndpoint("api"))
.WithEnvironment("Rag__DatabasePath", databasePath)

The double underscore syntax maps environment variables into .NET configuration sections. For example, Rag__Qdrant__BaseUrl becomes Rag:Qdrant:BaseUrl.

AI Provider Selection

The AppHost chooses Gemini when GEMINI_API_KEY is present:

GEMINI_API_KEY present -> Gemini
otherwise              -> Ollama

You can override this with:

export RAG_AI_PROVIDER="Gemini"

Option	Pros	Cons
Local LLM with Ollama	Keeps prompts and document content on your machine. Works well for offline experimentation after models are downloaded. Avoids per-request API costs.	Requires local CPU/GPU, memory, and disk resources. Model downloads can be large. Responses are usually slower than hosted APIs on modest hardware.
API-hosted LLM with Gemini	Requires no local model hosting. Usually provides faster responses and stronger model quality. Easier to scale beyond one development machine.	Sends prompts and retrieved document context to an external service. Requires an API key, network access, and provider billing/quotas.

The current Gemini defaults are:

Embedding model: gemini-embedding-2
Chat model: gemini-2.5-pro

The local Ollama defaults are:

Embedding model: nomic-embed-text
Chat model: llama3.2

Note: RAG uses two model types because retrieval and answer generation are different jobs. The embedding model turns document chunks and user questions into numeric vectors, called embeddings, that capture semantic meaning. The vector store uses those embeddings to find chunks related to the question. The chat model then receives the question plus the retrieved chunks and writes the final answer.

PreviousSolution Topology NextShared Configuration and Contracts