Skip to main content

Aquiles-RAG is a high-performance Augmented Recovery-Generation (RAG) solution based on Redis or Qdrant. It offers a high-level interface using FastAPI REST APIs.

Project description

Aquiles-RAG

Aquiles-RAG Logo

High-performance Retrieval-Augmented Generation (RAG) on Redis or Qdrant
🚀 FastAPI • Redis / Qdrant • Async • Embedding-agnostic

📖 Documentation

📑 Table of Contents

  1. Features
  2. Tech Stack
  3. Requirements
  4. Installation
  5. Configuration & Connection Options
  6. Usage
  7. Architecture
  8. License

⭐ Features

  • 📈 High Performance: Vector search powered by Redis HNSW or Qdrant.
  • 🛠️ Simple API: Endpoints for index creation, insertion, and querying.
  • 🔌 Embedding-agnostic: Works with any embedding model (OpenAI, Llama 3, HuggingFace, etc.).
  • 💻 Interactive Setup Wizard: aquiles-rag configs walks you through full configuration for Redis or Qdrant.
  • Sync & Async clients: AquilesRAG (requests) and AsyncAquilesRAG (httpx) with embedding_model metadata support.
  • 🧩 Extensible: Designed to integrate into ML pipelines, microservices, or serverless deployments.

🛠 Tech Stack

⚙️ Requirements

  1. Redis (standalone or cluster) — or Qdrant (HTTP / gRPC).
  2. Python 3.9+
  3. pip

Optional: run Redis locally with Docker:

docker run -d --name redis-stack -p 6379:6379 redis/redis-stack-server:latest

🚀 Installation

Via PyPI (recommended)

pip install aquiles-rag

From Source (optional)

git clone https://github.com/Aquiles-ai/Aquiles-RAG.git
cd Aquiles-RAG

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# optional development install
pip install -e .

🔧 Configuration & Connection Options

Configuration is persisted at:

~/.local/share/aquiles/aquiles_config.json

Setup Wizard (recommended)

The previous manual per-flag config flow was replaced by an interactive wizard. Run:

aquiles-rag configs

The wizard prompts for everything required for either Redis or Qdrant (host, ports, TLS/gRPC options, API keys, admin user). At the end it writes aquiles_config.json to the standard location.

Manual config (advanced / CI)

If you prefer automation, generate the same JSON schema the wizard writes and place it at ~/.local/share/aquiles/aquiles_config.json before starting the server (or use the deploy pattern described below).

Redis connection modes (examples)

Aquiles-RAG supports multiple Redis modes:

  1. Local Cluster
RedisCluster(host=host, port=port, decode_responses=True)
  1. Standalone Local
redis.Redis(host=host, port=port, decode_responses=True)
  1. Remote with TLS/SSL
redis.Redis(host=host, port=port, username=username or None,
            password=password or None, ssl=True, decode_responses=True,
            ssl_certfile=ssl_certfile, ssl_keyfile=ssl_keyfile, ssl_ca_certs=ssl_ca_certs)
  1. Remote without TLS/SSL
redis.Redis(host=host, port=port, username=username or None, password=password or None, decode_responses=True)

📖 Usage

CLI

  • Interactive Setup Wizard (recommended):
aquiles-rag configs
  • Serve the API:
aquiles-rag serve --host "0.0.0.0" --port 5500
  • Deploy with bootstrap script (pattern: deploy_*.py with run() that calls gen_configs_file()):
# Redis example
aquiles-rag deploy --host "0.0.0.0" --port 5500 --workers 2 deploy_redis.py

# Qdrant example
aquiles-rag deploy --host "0.0.0.0" --port 5500 --workers 2 deploy_qdrant.py

The deploy command imports the given Python file, executes its run() to generate the config (writes aquiles_config.json), then starts the FastAPI server.

REST API — common examples

  1. Create Index
curl -X POST http://localhost:5500/create/index \
  -H "X-API-Key: YOUR_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "indexname": "documents",
    "embeddings_dim": 768,
    "dtype": "FLOAT32",
    "delete_the_index_if_it_exists": false
  }'
  1. Insert Chunk (ingest)
curl -X POST http://localhost:5500/rag/create \
  -H "X-API-Key: YOUR_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "index": "documents",
    "name_chunk": "doc1_part1",
    "dtype": "FLOAT32",
    "chunk_size": 1024,
    "raw_text": "Text of the chunk...",
    "embeddings": [0.12, 0.34, 0.56, ...]
  }'
  1. Query Top-K
curl -X POST http://localhost:5500/rag/query-rag \
  -H "X-API-Key: YOUR_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "index": "documents",
    "embeddings": [0.78, 0.90, ...],
    "dtype": "FLOAT32",
    "top_k": 5,
    "cosine_distance_threshold": 0.6
  }'

Python Client

Sync client

from aquiles.client import AquilesRAG

client = AquilesRAG(host="http://127.0.0.1:5500", api_key="YOUR_API_KEY")

# Create an index (returns server text)
resp_text = client.create_index("documents", embeddings_dim=768, dtype="FLOAT32")

# Insert chunks using your embedding function
def get_embedding(text):
    return embedding_model.encode(text)

responses = client.send_rag(
    embedding_func=get_embedding,
    index="documents",
    name_chunk="doc1",
    raw_text=full_text,
    embedding_model="text-embedding-v1"  # optional metadata sent with each chunk
)

# Query the index (returns parsed JSON)
results = client.query("documents", query_embedding, top_k=5)
print(results)

Async client

import asyncio
from aquiles.client import AsyncAquilesRAG

client = AsyncAquilesRAG(host="http://127.0.0.1:5500", api_key="YOUR_API_KEY")

async def main():
    await client.create_index("documents_async")
    responses = await client.send_rag(
        embedding_func=async_embedding_func,   # supports sync or async callables
        index="documents_async",
        name_chunk="doc_async",
        raw_text=full_text
    )
    results = await client.query("documents_async", query_embedding)
    print(results)

asyncio.run(main())

Notes

  • Both clients accept an optional embedding_model parameter forwarded as metadata — helpful when storing/querying embeddings produced by different models.
  • send_rag chunks text using chunk_text_by_words() (default ~600 words / ≈1024 tokens) and uploads each chunk (concurrently in the async client).

UI Playground

Open the web UI (protected) at:

http://localhost:5500/ui

Use it to:

  • Run the Setup Wizard link (if available) or inspect live configs
  • Test /create/index, /rag/create, /rag/query-rag
  • Access protected Swagger UI & ReDoc after logging in

🏗 Architecture

Architecture

  1. Clients (HTTP/HTTPS, Python SDK, or UI Playground) make asynchronous HTTP requests.
  2. FastAPI Server — orchestration and business logic; validates requests and translates them to vector store operations.
  3. Vector Store — either Redis (HASH + HNSW/COSINE search) or Qdrant (collections + vector search).

⚠️ Backend differences & notes

  • Metrics / /status/ram: Redis offers INFO memory and memory_stats() — for Qdrant the same Redis-specific metrics are not available (the endpoint will return a short message explaining this).
  • Dtype handling: Server validates dtype for Redis (converts embeddings to the requested NumPy dtype). Qdrant accepts float arrays directly — dtype is informational/compatibility metadata.
  • gRPC: Qdrant can be used over HTTP or gRPC (prefer_grpc=true in the config). Ensure your environment allows gRPC outbound/inbound as needed.

🔎 Test Suite

See the test/ directory for automated tests:

  • client tests for the Python SDK
  • API tests for endpoint behavior
  • test_deploy.py for deployment / bootstrap validation

📄 License

Apache License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aquiles_rag-0.3.75.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aquiles_rag-0.3.75-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file aquiles_rag-0.3.75.tar.gz.

File metadata

  • Download URL: aquiles_rag-0.3.75.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for aquiles_rag-0.3.75.tar.gz
Algorithm Hash digest
SHA256 2bdd3673ac13f99476d93dcf5cb599726fc9b1e1d03f0a0f2d8a01a17b8fbc30
MD5 4474b78420e56368ac6638e1bd43c1ce
BLAKE2b-256 d7a42034185b6c260349acc07d1dc0d312c76f5c9fffa039d63aebed998dc249

See more details on using hashes here.

File details

Details for the file aquiles_rag-0.3.75-py3-none-any.whl.

File metadata

  • Download URL: aquiles_rag-0.3.75-py3-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for aquiles_rag-0.3.75-py3-none-any.whl
Algorithm Hash digest
SHA256 2029191c687ce64af16d74aae49ff1b06a62ee751dc7a201a36e0086e3c98cc3
MD5 02bbe5c50bfd65bce7e6378ed4e1bbe0
BLAKE2b-256 55d6c1140f1c9d3ca2e5fec83c02588f39ccd0f11ef608feb886006b4f31e98d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page