Guide Home 1. Solution Topology 2. Aspire as 3. Shared Configuration 4. Metadata with 5. Upload API 6. Object Storage 7. Worker Ingestion 8. Extracting and 9. Literary Artifacts 10. AI Provider 11. Qdrant Vector 12. Ask Flow 13. Prompting and 14. Testing the 15. Local Development All Guides
Guide navigationIndex and chapters
Chapter 2

Aspire as the Local Control Plane

RAG.AppHost/AppHost.cs defines the local environment.

Decorative chapter image for Aspire as the Local Control Plane
AppHost wires local dependencies into API and worker
var qdrant = builder.AddContainer("qdrant", "qdrant/qdrant")
    .WithHttpEndpoint(port: 6333, targetPort: 6333, name: "http")
    .WithVolume("rag-qdrant-data", "/qdrant/storage");

var minio = builder.AddContainer("minio", "minio/minio")
    .WithArgs("server", "/data", "--console-address", ":9001")
    .WithHttpEndpoint(port: 9000, targetPort: 9000, name: "api");

RAG.AppHost/AppHost.cs defines the local environment.

It starts:

The AppHost also creates persistent Docker volumes:

Those volumes allow indexed vectors, uploaded files, and downloaded Ollama models to survive container restarts.

Production note: these services are exposed on fixed local ports and MinIO uses sample credentials. That is acceptable for this learning project, which is not intended for production, but a real deployment should use private networking, managed secrets, and locked-down service access.

Aspire injects configuration into the API and worker through environment variables:

.WithEnvironment("Rag__Qdrant__BaseUrl", qdrant.GetEndpoint("http"))
.WithEnvironment("Rag__Storage__ServiceUrl", minio.GetEndpoint("api"))
.WithEnvironment("Rag__DatabasePath", databasePath)

The double underscore syntax maps environment variables into .NET configuration sections. For example, Rag__Qdrant__BaseUrl becomes Rag:Qdrant:BaseUrl.

AI Provider Selection

The AppHost chooses Gemini when GEMINI_API_KEY is present:

GEMINI_API_KEY present -> Gemini
otherwise              -> Ollama

You can override this with:

export RAG_AI_PROVIDER="Gemini"
OptionProsCons
Local LLM with OllamaKeeps prompts and document content on your machine. Works well for offline experimentation after models are downloaded. Avoids per-request API costs.Requires local CPU/GPU, memory, and disk resources. Model downloads can be large. Responses are usually slower than hosted APIs on modest hardware.
API-hosted LLM with GeminiRequires no local model hosting. Usually provides faster responses and stronger model quality. Easier to scale beyond one development machine.Sends prompts and retrieved document context to an external service. Requires an API key, network access, and provider billing/quotas.

The current Gemini defaults are:

The local Ollama defaults are:

Note: RAG uses two model types because retrieval and answer generation are different jobs. The embedding model turns document chunks and user questions into numeric vectors, called embeddings, that capture semantic meaning. The vector store uses those embeddings to find chunks related to the question. The chat model then receives the question plus the retrieved chunks and writes the final answer.