Aquiles-RAG is a high-performance Augmented Recovery-Generation (RAG) solution based on Redis or Qdrant. It offers a high-level interface using FastAPI REST APIs.
Project description
Aquiles-RAG
High-performance Retrieval-Augmented Generation (RAG) on Redis or Qdrant
🚀 FastAPI • Redis / Qdrant • Async • Embedding-agnostic
📑 Table of Contents
- Features
- Tech Stack
- Requirements
- Installation
- Configuration & Connection Options
- Usage
- Architecture
- License
⭐ Features
- 📈 High Performance: Vector search powered by Redis HNSW or Qdrant.
- 🛠️ Simple API: Endpoints for index creation, insertion, and querying.
- 🔌 Embedding-agnostic: Works with any embedding model (OpenAI, Llama 3, HuggingFace, etc.).
- 💻 Interactive Setup Wizard:
aquiles-rag configswalks you through full configuration for Redis or Qdrant. - ⚡ Sync & Async clients:
AquilesRAG(requests) andAsyncAquilesRAG(httpx) withembedding_modelmetadata support. - 🧩 Extensible: Designed to integrate into ML pipelines, microservices, or serverless deployments.
🛠 Tech Stack
- Python 3.9+
- FastAPI
- Redis or Qdrant as vector store
- NumPy
- Pydantic
- Jinja2
- Click (CLI)
- Requests (sync client)
- HTTPX (async client)
- Platformdirs (config management)
⚙️ Requirements
- Redis (standalone or cluster) — or Qdrant (HTTP / gRPC).
- Python 3.9+
- pip
Optional: run Redis locally with Docker:
docker run -d --name redis-stack -p 6379:6379 redis/redis-stack-server:latest
🚀 Installation
Via PyPI (recommended)
pip install aquiles-rag
From Source (optional)
git clone https://github.com/Aquiles-ai/Aquiles-RAG.git
cd Aquiles-RAG
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# optional development install
pip install -e .
🔧 Configuration & Connection Options
Configuration is persisted at:
~/.local/share/aquiles/aquiles_config.json
Setup Wizard (recommended)
The previous manual per-flag config flow was replaced by an interactive wizard. Run:
aquiles-rag configs
The wizard prompts for everything required for either Redis or Qdrant (host, ports, TLS/gRPC options, API keys, admin user). At the end it writes aquiles_config.json to the standard location.
Manual config (advanced / CI)
If you prefer automation, generate the same JSON schema the wizard writes and place it at ~/.local/share/aquiles/aquiles_config.json before starting the server (or use the deploy pattern described below).
Redis connection modes (examples)
Aquiles-RAG supports multiple Redis modes:
- Local Cluster
RedisCluster(host=host, port=port, decode_responses=True)
- Standalone Local
redis.Redis(host=host, port=port, decode_responses=True)
- Remote with TLS/SSL
redis.Redis(host=host, port=port, username=username or None,
password=password or None, ssl=True, decode_responses=True,
ssl_certfile=ssl_certfile, ssl_keyfile=ssl_keyfile, ssl_ca_certs=ssl_ca_certs)
- Remote without TLS/SSL
redis.Redis(host=host, port=port, username=username or None, password=password or None, decode_responses=True)
📖 Usage
CLI
- Interactive Setup Wizard (recommended):
aquiles-rag configs
- Serve the API:
aquiles-rag serve --host "0.0.0.0" --port 5500
- Deploy with bootstrap script (pattern:
deploy_*.pywithrun()that callsgen_configs_file()):
# Redis example
aquiles-rag deploy --host "0.0.0.0" --port 5500 --workers 4 deploy_redis.py
# Qdrant example
aquiles-rag deploy --host "0.0.0.0" --port 5500 --workers 4 deploy_qdrant.py
The
deploycommand imports the given Python file, executes itsrun()to generate the config (writesaquiles_config.json), then starts the FastAPI server.
REST API — common examples
- Create Index
curl -X POST http://localhost:5500/create/index \
-H "X-API-Key: YOUR_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"indexname": "documents",
"embeddings_dim": 768,
"dtype": "FLOAT32",
"delete_the_index_if_it_exists": false
}'
- Insert Chunk (ingest)
curl -X POST http://localhost:5500/rag/create \
-H "X-API-Key: YOUR_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"index": "documents",
"name_chunk": "doc1_part1",
"dtype": "FLOAT32",
"chunk_size": 1024,
"raw_text": "Text of the chunk...",
"embeddings": [0.12, 0.34, 0.56, ...]
}'
- Query Top-K
curl -X POST http://localhost:5500/rag/query-rag \
-H "X-API-Key: YOUR_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"index": "documents",
"embeddings": [0.78, 0.90, ...],
"dtype": "FLOAT32",
"top_k": 5,
"cosine_distance_threshold": 0.6
}'
Python Client
Sync client
from aquiles.client import AquilesRAG
client = AquilesRAG(host="http://127.0.0.1:5500", api_key="YOUR_API_KEY")
# Create an index (returns server text)
resp_text = client.create_index("documents", embeddings_dim=768, dtype="FLOAT32")
# Insert chunks using your embedding function
def get_embedding(text):
return embedding_model.encode(text)
responses = client.send_rag(
embedding_func=get_embedding,
index="documents",
name_chunk="doc1",
raw_text=full_text,
embedding_model="text-embedding-v1" # optional metadata sent with each chunk
)
# Query the index (returns parsed JSON)
results = client.query("documents", query_embedding, top_k=5)
print(results)
Async client
import asyncio
from aquiles.client import AsyncAquilesRAG
client = AsyncAquilesRAG(host="http://127.0.0.1:5500", api_key="YOUR_API_KEY")
async def main():
await client.create_index("documents_async")
responses = await client.send_rag(
embedding_func=async_embedding_func, # supports sync or async callables
index="documents_async",
name_chunk="doc_async",
raw_text=full_text
)
results = await client.query("documents_async", query_embedding)
print(results)
asyncio.run(main())
Notes
- Both clients accept an optional
embedding_modelparameter forwarded as metadata — helpful when storing/querying embeddings produced by different models. send_ragchunks text usingchunk_text_by_words()(default ~600 words / ≈1024 tokens) and uploads each chunk (concurrently in the async client).
UI Playground
Open the web UI (protected) at:
http://localhost:5500/ui
Use it to:
- Run the Setup Wizard link (if available) or inspect live configs
- Test
/create/index,/rag/create,/rag/query-rag - Access protected Swagger UI & ReDoc after logging in
🏗 Architecture
- Clients (HTTP/HTTPS, Python SDK, or UI Playground) make asynchronous HTTP requests.
- FastAPI Server — orchestration and business logic; validates requests and translates them to vector store operations.
- Vector Store — either Redis (HASH + HNSW/COSINE search) or Qdrant (collections + vector search).
⚠️ Backend differences & notes
- Metrics /
/status/ram: Redis offersINFO memoryandmemory_stats()— for Qdrant the same Redis-specific metrics are not available (the endpoint will return a short message explaining this). - Dtype handling: Server validates
dtypefor Redis (converts embeddings to the requested NumPy dtype). Qdrant accepts float arrays directly —dtypeis informational/compatibility metadata. - gRPC: Qdrant can be used over HTTP or gRPC (
prefer_grpc=truein the config). Ensure your environment allows gRPC outbound/inbound as needed.
🔎 Test Suite
See the test/ directory for automated tests:
- client tests for the Python SDK
- API tests for endpoint behavior
test_deploy.pyfor deployment / bootstrap validation
📄 License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aquiles_rag-0.3.3.tar.gz.
File metadata
- Download URL: aquiles_rag-0.3.3.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e543e8ed2cb9fd07b3fa991570afc893d245ecba514a20d2f1265a1611522da2
|
|
| MD5 |
0f54a4f2feede3635308d7ae6cfb7588
|
|
| BLAKE2b-256 |
be7008cfbeddd0edb0dc3d33e12f88c1fc02a8402539450b02d7d2696167ea40
|
File details
Details for the file aquiles_rag-0.3.3-py3-none-any.whl.
File metadata
- Download URL: aquiles_rag-0.3.3-py3-none-any.whl
- Upload date:
- Size: 1.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c23fe139a3e8f514ce74adf03c4fb1d2504e3126bc4455dc5eb2cec9d0e9b74
|
|
| MD5 |
450ef224aacf63f1056893fdbdc28a33
|
|
| BLAKE2b-256 |
a9acdc601c074c35d6a3919743f20acba1baad7d4107239708d046b35ab0647e
|