The stack
Everything retrieval needs
Each stage uses an open model — no separate vendors to stitch together.
Embeddings
Vectorize documents and queries with open embedding models.
Reranking
Re-score retrieved candidates with a cross-encoder reranker for higher precision.
Grounded generation
Feed retrieved context into open LLMs for on-topic, source-grounded answers.
Document store on shared storage
Keep your corpus on a shared filesystem mounted right next to the GPUs.
One API key
Embeddings, reranking, and generation behind a single OpenAI-compatible API.
Multi-region
Low-latency retrieval and generation with automatic failover.