Skip to main content

Aquiles-RAG is a high-performance Augmented Recovery-Generation (RAG) solution based on Redis, Qdrant or PostgreSQLRAG. It offers a high-level interface using FastAPI REST APIs.

Project description

Aquiles-RAG

Aquiles-RAG

Self-hosted RAG infrastructure with MCP Server support
๐Ÿš€ FastAPI โ€ข Redis / Qdrant / PostgreSQL โ€ข Async โ€ข Embedding-agnostic โ€ข MCP Ready

PyPI Version Documentation PyPI Downloads Ask DeepWiki

๐ŸŽฏ What is Aquiles-RAG?

Aquiles-RAG is a production-ready RAG (Retrieval-Augmented Generation) API server that brings high-performance vector search to your applications. Choose your backend (Redis, Qdrant, or PostgreSQL), connect your embedding model, and start building intelligent search systems in minutes.

Why Aquiles-RAG?

Challenge Aquiles-RAG Solution
๐Ÿ’ธ Expensive vector databases Use Redis, Qdrant, or PostgreSQL you already have
๐Ÿ”’ Data leaves your infrastructure Everything runs on your servers
๐Ÿ”ง Complex RAG setup Interactive wizard configures everything
๐ŸŒ Slow integrations Async clients, batch operations, optimized pipelines
๐Ÿšซ Vendor lock-in Switch backends without changing code

Key Features

  • ๐Ÿ”Œ Backend Flexibility - Redis HNSW, Qdrant, or PostgreSQL pgvector
  • โšก High Performance - Async operations, batch processing, optimized search
  • ๐Ÿค– MCP Server Built-in - Native Model Context Protocol support for AI assistants
  • ๐Ÿ› ๏ธ Interactive Setup - CLI wizard configures your entire stack
  • ๐Ÿ”„ Sync & Async Clients - Python and TypeScript/JavaScript SDKs included
  • ๐Ÿ“Š Optional Re-ranking - Improve results with semantic re-scoring

๐Ÿš€ Quick Start

Installation

pip install aquiles-rag

Interactive Setup

Configure your vector database in seconds:

aquiles-rag configs

The wizard guides you through:

  • Backend selection (Redis, Qdrant, or PostgreSQL)
  • Connection settings (host, port, credentials)
  • TLS/gRPC options
  • Optional re-ranker configuration

Start Server

aquiles-rag serve --host "0.0.0.0" --port 5500

Your First RAG Query

from aquiles.client import AquilesRAG

client = AquilesRAG(host="http://127.0.0.1:5500", api_key="YOUR_API_KEY")

# Create index
client.create_index("documents", embeddings_dim=768, dtype="FLOAT32")

# Store document with your embedding function
def get_embedding(text):
    return your_embedding_model.encode(text)

client.send_rag(
    embedding_func=get_embedding,
    index="documents",
    name_chunk="intro",
    raw_text="Your document text here..."
)

# Query
results = client.query("documents", query_embedding, top_k=5)
print(results)

That's it! You now have a working RAG system.

๐ŸŽจ Supported Backends

Backend Features Best For
Redis HNSW indexing, fast in-memory search Speed-critical applications
Qdrant HTTP/gRPC, collections, filters Scalable production systems
PostgreSQL pgvector extension, SQL integration Existing Postgres infrastructure

All backends support:

  • Vector similarity search (cosine, inner product)
  • Metadata filtering
  • Batch operations
  • Optional re-ranking

๐Ÿค– MCP Server Integration

Aquiles-RAG includes a built-in Model Context Protocol server for seamless AI assistant integration.

Start MCP Server

aquiles-rag mcp-serve --host "0.0.0.0" --port 5500 --transport "sse"

Example with OpenAI Agent

from agents import Agent, Runner
from agents.mcp import MCPServerSse

# Connect to MCP server
mcp_server = MCPServerSse({
    "url": "http://localhost:5500/sse",
    "headers": {"X-API-Key": "YOUR_API_KEY"}
})
await mcp_server.connect()

# Create agent with RAG tools
agent = Agent(
    name="RAG Assistant",
    instructions="You can store and query documents using the vector database.",
    mcp_servers=[mcp_server],
    model="gpt-4"
)

# Agent now has access to:
# - create_index
# - send_info (store documents)
# - query_rag (semantic search)
# - list_indexes
# - delete_index

result = await Runner.run(agent, "Store this document and find similar content")

MCP Tools Available:

  • Index management (create, list, delete)
  • Document ingestion with automatic chunking
  • Semantic search with configurable parameters
  • Metadata filtering

๐Ÿ’ป Client SDKs

Python - Async Client

from aquiles.client import AsyncAquilesRAG

client = AsyncAquilesRAG(host="http://127.0.0.1:5500", api_key="YOUR_API_KEY")

async def main():
    # Create index
    await client.create_index("docs", embeddings_dim=1536)
    
    # Store documents (parallel chunking)
    await client.send_rag(
        embedding_func=async_get_embedding,
        index="docs",
        name_chunk="document_1",
        raw_text=long_text,
        metadata={
            "author": "John Doe",
            "source": "documentation"
        }
    )
    
    # Query
    results = await client.query("docs", query_embedding, top_k=5)
    print(results)

asyncio.run(main())

TypeScript/JavaScript

npm install @aquiles-ai/aquiles-rag-client
import { AsyncAquilesRAG } from '@aquiles-ai/aquiles-rag-client';
import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function getEmbedding(text: string): Promise<number[]> {
    const resp = await openai.embeddings.create({
        model: "text-embedding-3-small",
        input: text,
    });
    return resp.data[0].embedding;
}

const client = new AsyncAquilesRAG({
    host: 'http://127.0.0.1:5500',
    apiKey: 'your-api-key',
});

// Create index (1536 dimensions for text-embedding-3-small)
await client.createIndex('my_docs', 1536, 'FLOAT32');

// Store document
await client.sendRAG(
    getEmbedding,
    'my_docs',
    'doc_1',
    'Your document text...',
    {
        embeddingModel: 'text-embedding-3-small',
        metadata: { author: 'John Doe' }
    }
);

// Query
const queryEmb = await getEmbedding('What is this about?');
const results = await client.query('my_docs', queryEmb, { topK: 5 });
console.log(results);

๐Ÿ› ๏ธ Advanced Features

Optional Re-ranking

Improve search results with semantic re-scoring:

# Enable during setup wizard
aquiles-rag configs

Re-ranking refines results after vector search by scoring (query, document) pairs for better relevance.

Web UI Playground

Access the interactive UI:

http://localhost:5500/ui

Features:

  • Test index creation and queries
  • Inspect live configurations
  • Protected Swagger UI documentation
  • Real-time request/response monitoring

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         Clients                              โ”‚
โ”‚  HTTP/HTTPS โ€ข Python SDK โ€ข TypeScript SDK โ€ข MCP Server       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    FastAPI Server                            โ”‚
โ”‚  โ€ข Request validation                                        โ”‚
โ”‚  โ€ข Business logic orchestration                              โ”‚
โ”‚  โ€ข Optional re-ranking                                       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   Vector Store                               โ”‚
โ”‚  Redis HNSW  โ€ข  Qdrant Collections  โ€ข  PostgreSQL pgvector  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Flow:

  1. Client sends embedding + query parameters
  2. Server validates and routes to vector store
  3. Vector store returns top-k candidates
  4. Optional re-ranker refines results
  5. Formatted response returned to client

๐ŸŽฏ Use Cases

Who What
๐Ÿš€ AI Startups Build RAG features without vendor costs
๐Ÿ‘จโ€๐Ÿ’ป Developers Prototype semantic search quickly
๐Ÿข Enterprises Private, scalable document search
๐Ÿ”ฌ Researchers Experiment with embeddings and retrieval

๐Ÿ“‹ Requirements

  • Python 3.9+
  • One of: Redis, Qdrant, or PostgreSQL with pgvector
  • pip or uv

Quick Redis Setup (Docker):

docker run -d --name redis-stack -p 6379:6379 redis/redis-stack-server:latest

PostgreSQL Note: Aquiles-RAG doesn't run automatic migrations. Create the pgvector extension and required tables manually before use.

๐Ÿ› ๏ธ Tech Stack

  • FastAPI - High-performance async API framework
  • Redis / Qdrant / PostgreSQL - Vector storage backends
  • NumPy - Efficient array operations
  • Pydantic - Request/response validation
  • HTTPX - Async HTTP client
  • Click - CLI framework

๐Ÿ“š REST API Examples

Create Index

curl -X POST http://localhost:5500/create/index \
  -H "X-API-Key: YOUR_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "indexname": "documents",
    "embeddings_dim": 768,
    "dtype": "FLOAT32"
  }'

Insert Document

curl -X POST http://localhost:5500/rag/create \
  -H "X-API-Key: YOUR_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "index": "documents",
    "name_chunk": "doc1_part1",
    "raw_text": "Document content...",
    "embeddings": [0.12, 0.34, ...]
  }'

Query

curl -X POST http://localhost:5500/rag/query-rag \
  -H "X-API-Key: YOUR_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "index": "documents",
    "embeddings": [0.78, 0.90, ...],
    "top_k": 5,
    "cosine_distance_threshold": 0.6
  }'

โš ๏ธ Backend Notes

Redis:

  • Fast in-memory HNSW indexing
  • Full metrics via /status/ram
  • Supports HASH storage with COSINE search

Qdrant:

  • HTTP or gRPC connections
  • Collection-based organization
  • Limited metrics compared to Redis

PostgreSQL:

  • Requires manual pgvector setup
  • No automatic migrations
  • SQL-native filtering and joins
  • Check Postgres monitoring for metrics

๐Ÿ“– Documentation

๐Ÿค Contributing

We welcome contributions! See the test suite in test/ for examples:

  • Client SDK tests
  • API endpoint tests
  • Deployment validation

๐Ÿ“„ License

Apache License

โญ Star this project โ€ข ๐Ÿ› Report issues

Built with โค๏ธ for the AI community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aquiles_rag-0.5.5.tar.gz (601.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aquiles_rag-0.5.5-py3-none-any.whl (585.6 kB view details)

Uploaded Python 3

File details

Details for the file aquiles_rag-0.5.5.tar.gz.

File metadata

  • Download URL: aquiles_rag-0.5.5.tar.gz
  • Upload date:
  • Size: 601.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for aquiles_rag-0.5.5.tar.gz
Algorithm Hash digest
SHA256 33be2891f94906b28f31a1efaed90aa5e647ca4dfc1b433f15a4d20ef58e2323
MD5 4b63ddeae7675a4ec662968c5a6ff286
BLAKE2b-256 1d9ed124ae1822739e822c9fb3b7950c21a333247171d71b314d2033e4e10ce6

See more details on using hashes here.

File details

Details for the file aquiles_rag-0.5.5-py3-none-any.whl.

File metadata

  • Download URL: aquiles_rag-0.5.5-py3-none-any.whl
  • Upload date:
  • Size: 585.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for aquiles_rag-0.5.5-py3-none-any.whl
Algorithm Hash digest
SHA256 98d25afcf64f463aa75314ede1dcd8af6eb9da81092ded34c36e03aab14184d2
MD5 10f41eeea318a269e3ecdff50cd69621
BLAKE2b-256 c885eb44e60687c7862042c66204bc8a1e4ce29b9f5e0dd31a545ae2af5ff030

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page