Aquiles-RAG is a high-performance Augmented Recovery-Generation (RAG) solution based on Redis, Qdrant or PostgreSQLRAG. It offers a high-level interface using FastAPI REST APIs.
Project description
Aquiles-RAG
Self-hosted RAG infrastructure with MCP Server support
๐ FastAPI โข Redis / Qdrant / PostgreSQL โข Async โข Embedding-agnostic โข MCP Ready
๐ฏ What is Aquiles-RAG?
Aquiles-RAG is a production-ready RAG (Retrieval-Augmented Generation) API server that brings high-performance vector search to your applications. Choose your backend (Redis, Qdrant, or PostgreSQL), connect your embedding model, and start building intelligent search systems in minutes.
Why Aquiles-RAG?
| Challenge | Aquiles-RAG Solution |
|---|---|
| ๐ธ Expensive vector databases | Use Redis, Qdrant, or PostgreSQL you already have |
| ๐ Data leaves your infrastructure | Everything runs on your servers |
| ๐ง Complex RAG setup | Interactive wizard configures everything |
| ๐ Slow integrations | Async clients, batch operations, optimized pipelines |
| ๐ซ Vendor lock-in | Switch backends without changing code |
Key Features
- ๐ Backend Flexibility - Redis HNSW, Qdrant, or PostgreSQL pgvector
- โก High Performance - Async operations, batch processing, optimized search
- ๐ค MCP Server Built-in - Native Model Context Protocol support for AI assistants
- ๐ ๏ธ Interactive Setup - CLI wizard configures your entire stack
- ๐ Sync & Async Clients - Python and TypeScript/JavaScript SDKs included
- ๐ Optional Re-ranking - Improve results with semantic re-scoring
๐ Quick Start
Installation
pip install aquiles-rag
Interactive Setup
Configure your vector database in seconds:
aquiles-rag configs
The wizard guides you through:
- Backend selection (Redis, Qdrant, or PostgreSQL)
- Connection settings (host, port, credentials)
- TLS/gRPC options
- Optional re-ranker configuration
Start Server
aquiles-rag serve --host "0.0.0.0" --port 5500
Your First RAG Query
from aquiles.client import AquilesRAG
client = AquilesRAG(host="http://127.0.0.1:5500", api_key="YOUR_API_KEY")
# Create index
client.create_index("documents", embeddings_dim=768, dtype="FLOAT32")
# Store document with your embedding function
def get_embedding(text):
return your_embedding_model.encode(text)
client.send_rag(
embedding_func=get_embedding,
index="documents",
name_chunk="intro",
raw_text="Your document text here..."
)
# Query
results = client.query("documents", query_embedding, top_k=5)
print(results)
That's it! You now have a working RAG system.
๐จ Supported Backends
| Backend | Features | Best For |
|---|---|---|
| Redis | HNSW indexing, fast in-memory search | Speed-critical applications |
| Qdrant | HTTP/gRPC, collections, filters | Scalable production systems |
| PostgreSQL | pgvector extension, SQL integration | Existing Postgres infrastructure |
All backends support:
- Vector similarity search (cosine, inner product)
- Metadata filtering
- Batch operations
- Optional re-ranking
๐ค MCP Server Integration
Aquiles-RAG includes a built-in Model Context Protocol server for seamless AI assistant integration.
Start MCP Server
aquiles-rag mcp-serve --host "0.0.0.0" --port 5500 --transport "sse"
Example with OpenAI Agent
from agents import Agent, Runner
from agents.mcp import MCPServerSse
# Connect to MCP server
mcp_server = MCPServerSse({
"url": "http://localhost:5500/sse",
"headers": {"X-API-Key": "YOUR_API_KEY"}
})
await mcp_server.connect()
# Create agent with RAG tools
agent = Agent(
name="RAG Assistant",
instructions="You can store and query documents using the vector database.",
mcp_servers=[mcp_server],
model="gpt-4"
)
# Agent now has access to:
# - create_index
# - send_info (store documents)
# - query_rag (semantic search)
# - list_indexes
# - delete_index
result = await Runner.run(agent, "Store this document and find similar content")
MCP Tools Available:
- Index management (create, list, delete)
- Document ingestion with automatic chunking
- Semantic search with configurable parameters
- Metadata filtering
๐ป Client SDKs
Python - Async Client
from aquiles.client import AsyncAquilesRAG
client = AsyncAquilesRAG(host="http://127.0.0.1:5500", api_key="YOUR_API_KEY")
async def main():
# Create index
await client.create_index("docs", embeddings_dim=1536)
# Store documents (parallel chunking)
await client.send_rag(
embedding_func=async_get_embedding,
index="docs",
name_chunk="document_1",
raw_text=long_text,
metadata={
"author": "John Doe",
"source": "documentation"
}
)
# Query
results = await client.query("docs", query_embedding, top_k=5)
print(results)
asyncio.run(main())
TypeScript/JavaScript
npm install @aquiles-ai/aquiles-rag-client
import { AsyncAquilesRAG } from '@aquiles-ai/aquiles-rag-client';
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
async function getEmbedding(text: string): Promise<number[]> {
const resp = await openai.embeddings.create({
model: "text-embedding-3-small",
input: text,
});
return resp.data[0].embedding;
}
const client = new AsyncAquilesRAG({
host: 'http://127.0.0.1:5500',
apiKey: 'your-api-key',
});
// Create index (1536 dimensions for text-embedding-3-small)
await client.createIndex('my_docs', 1536, 'FLOAT32');
// Store document
await client.sendRAG(
getEmbedding,
'my_docs',
'doc_1',
'Your document text...',
{
embeddingModel: 'text-embedding-3-small',
metadata: { author: 'John Doe' }
}
);
// Query
const queryEmb = await getEmbedding('What is this about?');
const results = await client.query('my_docs', queryEmb, { topK: 5 });
console.log(results);
๐ ๏ธ Advanced Features
Optional Re-ranking
Improve search results with semantic re-scoring:
# Enable during setup wizard
aquiles-rag configs
Re-ranking refines results after vector search by scoring (query, document) pairs for better relevance.
Web UI Playground
Access the interactive UI:
http://localhost:5500/ui
Features:
- Test index creation and queries
- Inspect live configurations
- Protected Swagger UI documentation
- Real-time request/response monitoring
๐๏ธ Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Clients โ
โ HTTP/HTTPS โข Python SDK โข TypeScript SDK โข MCP Server โ
โโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FastAPI Server โ
โ โข Request validation โ
โ โข Business logic orchestration โ
โ โข Optional re-ranking โ
โโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Vector Store โ
โ Redis HNSW โข Qdrant Collections โข PostgreSQL pgvector โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Flow:
- Client sends embedding + query parameters
- Server validates and routes to vector store
- Vector store returns top-k candidates
- Optional re-ranker refines results
- Formatted response returned to client
๐ฏ Use Cases
| Who | What |
|---|---|
| ๐ AI Startups | Build RAG features without vendor costs |
| ๐จโ๐ป Developers | Prototype semantic search quickly |
| ๐ข Enterprises | Private, scalable document search |
| ๐ฌ Researchers | Experiment with embeddings and retrieval |
๐ Requirements
- Python 3.9+
- One of: Redis, Qdrant, or PostgreSQL with pgvector
- pip or uv
Quick Redis Setup (Docker):
docker run -d --name redis-stack -p 6379:6379 redis/redis-stack-server:latest
PostgreSQL Note: Aquiles-RAG doesn't run automatic migrations. Create the pgvector extension and required tables manually before use.
๐ ๏ธ Tech Stack
- FastAPI - High-performance async API framework
- Redis / Qdrant / PostgreSQL - Vector storage backends
- NumPy - Efficient array operations
- Pydantic - Request/response validation
- HTTPX - Async HTTP client
- Click - CLI framework
๐ REST API Examples
Create Index
curl -X POST http://localhost:5500/create/index \
-H "X-API-Key: YOUR_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"indexname": "documents",
"embeddings_dim": 768,
"dtype": "FLOAT32"
}'
Insert Document
curl -X POST http://localhost:5500/rag/create \
-H "X-API-Key: YOUR_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"index": "documents",
"name_chunk": "doc1_part1",
"raw_text": "Document content...",
"embeddings": [0.12, 0.34, ...]
}'
Query
curl -X POST http://localhost:5500/rag/query-rag \
-H "X-API-Key: YOUR_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"index": "documents",
"embeddings": [0.78, 0.90, ...],
"top_k": 5,
"cosine_distance_threshold": 0.6
}'
โ ๏ธ Backend Notes
Redis:
- Fast in-memory HNSW indexing
- Full metrics via
/status/ram - Supports HASH storage with COSINE search
Qdrant:
- HTTP or gRPC connections
- Collection-based organization
- Limited metrics compared to Redis
PostgreSQL:
- Requires manual pgvector setup
- No automatic migrations
- SQL-native filtering and joins
- Check Postgres monitoring for metrics
๐ Documentation
๐ค Contributing
We welcome contributions! See the test suite in test/ for examples:
- Client SDK tests
- API endpoint tests
- Deployment validation
๐ License
โญ Star this project โข ๐ Report issues
Built with โค๏ธ for the AI community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aquiles_rag-0.5.5.tar.gz.
File metadata
- Download URL: aquiles_rag-0.5.5.tar.gz
- Upload date:
- Size: 601.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
33be2891f94906b28f31a1efaed90aa5e647ca4dfc1b433f15a4d20ef58e2323
|
|
| MD5 |
4b63ddeae7675a4ec662968c5a6ff286
|
|
| BLAKE2b-256 |
1d9ed124ae1822739e822c9fb3b7950c21a333247171d71b314d2033e4e10ce6
|
File details
Details for the file aquiles_rag-0.5.5-py3-none-any.whl.
File metadata
- Download URL: aquiles_rag-0.5.5-py3-none-any.whl
- Upload date:
- Size: 585.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98d25afcf64f463aa75314ede1dcd8af6eb9da81092ded34c36e03aab14184d2
|
|
| MD5 |
10f41eeea318a269e3ecdff50cd69621
|
|
| BLAKE2b-256 |
c885eb44e60687c7862042c66204bc8a1e4ce29b9f5e0dd31a545ae2af5ff030
|