Aspire as the Local Control Plane
RAG.AppHost/AppHost.cs defines the local environment.
AppHost wires local dependencies into API and worker
var qdrant = builder.AddContainer("qdrant", "qdrant/qdrant")
.WithHttpEndpoint(port: 6333, targetPort: 6333, name: "http")
.WithVolume("rag-qdrant-data", "/qdrant/storage");
var minio = builder.AddContainer("minio", "minio/minio")
.WithArgs("server", "/data", "--console-address", ":9001")
.WithHttpEndpoint(port: 9000, targetPort: 9000, name: "api");RAG.AppHost/AppHost.cs defines the local environment.
It starts:
- Qdrant on ports
6333and6334. - MinIO on ports
9000and9001. - API on
http://127.0.0.1:5080/. - Worker as a background process.
- Ollama only when Gemini is not selected.
The AppHost also creates persistent Docker volumes:
rag-qdrant-datarag-minio-datarag-ollama-data
Those volumes allow indexed vectors, uploaded files, and downloaded Ollama models to survive container restarts.
Production note: these services are exposed on fixed local ports and MinIO uses sample credentials. That is acceptable for this learning project, which is not intended for production, but a real deployment should use private networking, managed secrets, and locked-down service access.
Aspire injects configuration into the API and worker through environment variables:
.WithEnvironment("Rag__Qdrant__BaseUrl", qdrant.GetEndpoint("http"))
.WithEnvironment("Rag__Storage__ServiceUrl", minio.GetEndpoint("api"))
.WithEnvironment("Rag__DatabasePath", databasePath)
The double underscore syntax maps environment variables into .NET configuration sections. For example, Rag__Qdrant__BaseUrl becomes Rag:Qdrant:BaseUrl.
AI Provider Selection
The AppHost chooses Gemini when GEMINI_API_KEY is present:
GEMINI_API_KEY present -> Gemini
otherwise -> Ollama
You can override this with:
export RAG_AI_PROVIDER="Gemini"
| Option | Pros | Cons |
|---|---|---|
| Local LLM with Ollama | Keeps prompts and document content on your machine. Works well for offline experimentation after models are downloaded. Avoids per-request API costs. | Requires local CPU/GPU, memory, and disk resources. Model downloads can be large. Responses are usually slower than hosted APIs on modest hardware. |
| API-hosted LLM with Gemini | Requires no local model hosting. Usually provides faster responses and stronger model quality. Easier to scale beyond one development machine. | Sends prompts and retrieved document context to an external service. Requires an API key, network access, and provider billing/quotas. |
The current Gemini defaults are:
- Embedding model:
gemini-embedding-2 - Chat model:
gemini-2.5-pro
The local Ollama defaults are:
- Embedding model:
nomic-embed-text - Chat model:
llama3.2
Note: RAG uses two model types because retrieval and answer generation are different jobs. The embedding model turns document chunks and user questions into numeric vectors, called embeddings, that capture semantic meaning. The vector store uses those embeddings to find chunks related to the question. The chat model then receives the question plus the retrieved chunks and writes the final answer.