Skip to main content

A production-ready, plug-and-play Python SDK for building intelligent RAG systems

Project description

Lexora Agentic RAG SDK

Production-ready Agentic RAG SDK with minimal configuration

Python 3.8+ License: MIT PyPI version Downloads

📖 Full Documentation • 🚀 Getting Started • 💬 Discussions


🚀 What is Lexora?

Lexora is a production-ready Agentic RAG (Retrieval-Augmented Generation) SDK that makes it easy to build intelligent applications with semantic search and AI-powered reasoning. With just a few lines of code, you can:

  • 📚 Create and manage document corpora
  • 🔍 Perform semantic search across your documents
  • 🤖 Build AI agents that reason over your data
  • 🛠️ Extend functionality with custom tools
  • 🎯 Deploy to production with confidence

✨ Key Features

  • Zero-Config Setup: Get started in minutes with sensible defaults
  • Multiple Vector Databases: Support for FAISS, Pinecone, and Chroma
  • Flexible Embeddings: OpenAI, HuggingFace, Gemini, or custom providers
  • Flexible LLM Integration: Works with any LLM via LiteLLM
  • Built-in RAG Tools: 10+ pre-built tools for document management
  • Custom Tool Support: Easily add your own tools
  • Production-Ready: Comprehensive error handling, logging, and testing
  • Type-Safe: Full type hints and Pydantic validation
  • Cost-Effective: Free embedding options available

📦 Installation

Prerequisites

  • Python 3.8 or higher
  • pip or conda package manager

Install from Source

# Clone the repository
git clone https://github.com/yourusername/lexora.git
cd lexora

# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install in development mode
pip install -e .

Install from PyPI (Coming Soon)

pip install lexora

🎯 Quick Start

Basic Usage

from lexora import RAGAgent

# Initialize the agent with defaults
agent = RAGAgent()

# Create a document corpus
await agent.tool_registry.get_tool("create_corpus").run(
    corpus_name="my_docs",
    description="My document collection"
)

# Add documents
documents = [
    {"content": "Python is a programming language.", "metadata": {"topic": "python"}},
    {"content": "Machine learning is a subset of AI.", "metadata": {"topic": "ml"}}
]

await agent.tool_registry.get_tool("add_data").run(
    corpus_name="my_docs",
    documents=documents
)

# Query your documents
result = await agent.tool_registry.get_tool("rag_query").run(
    corpus_name="my_docs",
    query="What is Python?",
    top_k=5
)

print(result.data["results"])

Using the Agent for Reasoning

# Ask questions and get AI-powered answers
response = await agent.query("Explain machine learning in simple terms")

print(f"Answer: {response.answer}")
print(f"Confidence: {response.confidence}")
print(f"Sources: {len(response.sources)}")

📖 Documentation

Table of Contents

  1. Installation Guide
  2. Configuration
  3. Core Concepts
  4. RAG Tools
  5. Custom Tools
  6. Vector Databases
  7. LLM Integration
  8. Error Handling
  9. Best Practices
  10. API Reference

⚙️ Configuration

Lexora supports multiple configuration methods:

1. Default Configuration (Easiest)

from lexora import RAGAgent

# Uses mock LLM and FAISS vector database
agent = RAGAgent()

2. YAML Configuration

# config.yaml
llm:
  provider: "openai"
  model: "gpt-4"
  api_key: "${OPENAI_API_KEY}"
  temperature: 0.7

vector_db:
  provider: "faiss"
  embedding_model: "text-embedding-ada-002"
  dimension: 1536
  connection_params:
    storage_path: "./vector_storage"

agent:
  max_iterations: 5
  enable_reasoning: true
  log_level: "INFO"
from lexora import RAGAgent

agent = RAGAgent.from_yaml("config.yaml")

3. Environment Variables

# .env file
LEXORA_LLM_PROVIDER=openai
LEXORA_LLM_MODEL=gpt-4
LEXORA_LLM_API_KEY=your-api-key
LEXORA_VECTORDB_PROVIDER=faiss
LEXORA_VECTORDB_EMBEDDING_MODEL=text-embedding-ada-002
from lexora import RAGAgent

agent = RAGAgent.from_env()

4. Programmatic Configuration

from lexora import RAGAgent, LLMConfig, VectorDBConfig, AgentConfig

agent = RAGAgent(
    llm_config=LLMConfig(
        provider="openai",
        model="gpt-4",
        api_key="your-api-key"
    ),
    vector_db_config=VectorDBConfig(
        provider="faiss",
        embedding_model="text-embedding-ada-002",
        dimension=1536
    ),
    agent_config=AgentConfig(
        max_iterations=5,
        enable_reasoning=True
    )
)

🎨 Embedding Options

Important: You are NOT limited to OpenAI embeddings! Lexora supports multiple embedding providers.

Available Options

Provider Cost Quality Privacy Best For
HuggingFace Free High ✅ Local Production (recommended)
OpenAI Paid Highest ❌ Cloud Enterprise
Gemini Free tier High ❌ Cloud Gemini users
Mock Free Low ✅ Local Testing

Quick Examples

1. Free Local Embeddings (Recommended)

# Install sentence-transformers
# pip install sentence-transformers

from lexora import RAGAgent
from lexora.models.config import VectorDBConfig

agent = RAGAgent(
    vector_db_config=VectorDBConfig(
        provider="faiss",
        dimension=384,  # all-MiniLM-L6-v2 dimension
        connection_params={
            "index_type": "Flat",
            "persist_directory": "./vector_db"
        }
    )
)

2. OpenAI Embeddings

from lexora import RAGAgent
from lexora.models.config import VectorDBConfig

agent = RAGAgent(
    vector_db_config=VectorDBConfig(
        provider="faiss",
        dimension=1536,  # OpenAI dimension
        connection_params={
            "embedding_model": "text-embedding-ada-002",
            "openai_api_key": "your-key"
        }
    )
)

3. Custom Embedding Provider

from lexora.utils.embeddings import BaseEmbeddingProvider
from sentence_transformers import SentenceTransformer

class HuggingFaceProvider(BaseEmbeddingProvider):
    def __init__(self, model_name="all-MiniLM-L6-v2"):
        self.model = SentenceTransformer(model_name)
    
    async def generate_embedding(self, text: str):
        return self.model.encode(text).tolist()
    
    def get_dimension(self) -> int:
        return self.model.get_sentence_embedding_dimension()

📚 Full Embedding Guide - Detailed documentation on all embedding options


🧩 Core Concepts

Document Corpus

A corpus is a collection of documents that can be searched semantically.

# Create a corpus
await agent.tool_registry.get_tool("create_corpus").run(
    corpus_name="knowledge_base",
    description="Company knowledge base",
    metadata={"department": "engineering"}
)

Documents

Documents are the basic unit of information in Lexora.

document = {
    "content": "Your document text here",
    "metadata": {
        "source": "documentation",
        "author": "John Doe",
        "date": "2024-01-01"
    }
}

Semantic Search

Search documents by meaning, not just keywords.

results = await agent.tool_registry.get_tool("rag_query").run(
    corpus_name="knowledge_base",
    query="How do I deploy to production?",
    top_k=5,
    min_score=0.7
)

🛠️ RAG Tools

Lexora comes with 10+ built-in tools:

Core Tools

Tool Description
create_corpus Create a new document corpus
add_data Add documents to a corpus
rag_query Search documents semantically
list_corpora List all available corpora
get_corpus_info Get detailed corpus information
delete_corpus Delete a corpus
delete_document Delete a specific document
update_document Update an existing document
bulk_add_data Add large batches of documents
health_check Check system health

Tool Usage Examples

See examples/ directory for detailed examples of each tool.


🔧 Custom Tools

Extend Lexora with your own tools:

from lexora import BaseTool, ToolParameter

class WeatherTool(BaseTool):
    @property
    def name(self) -> str:
        return "get_weather"
    
    @property
    def description(self) -> str:
        return "Get current weather for a location"
    
    @property
    def version(self) -> str:
        return "1.0.0"
    
    def _setup_parameters(self) -> None:
        self._parameters = [
            ToolParameter(
                name="location",
                type="string",
                description="City name",
                required=True
            )
        ]
    
    async def _execute(self, location: str, **kwargs):
        # Your implementation here
        return {"temperature": 72, "condition": "sunny"}

# Register the tool
agent.add_tool(WeatherTool())

💾 Vector Databases

FAISS (Default)

from lexora import RAGAgent, VectorDBConfig

agent = RAGAgent(
    vector_db_config=VectorDBConfig(
        provider="faiss",
        embedding_model="text-embedding-ada-002",
        dimension=1536,
        connection_params={"storage_path": "./faiss_storage"}
    )
)

Pinecone

agent = RAGAgent(
    vector_db_config=VectorDBConfig(
        provider="pinecone",
        embedding_model="text-embedding-ada-002",
        dimension=1536,
        connection_params={
            "api_key": "your-pinecone-key",
            "environment": "us-west1-gcp"
        }
    )
)

Chroma

agent = RAGAgent(
    vector_db_config=VectorDBConfig(
        provider="chroma",
        embedding_model="text-embedding-ada-002",
        dimension=1536,
        connection_params={"persist_directory": "./chroma_storage"}
    )
)

🤖 LLM Integration

Lexora uses LiteLLM for universal LLM support:

OpenAI

from lexora import LLMConfig

llm_config = LLMConfig(
    provider="openai",
    model="gpt-4",
    api_key="your-api-key",
    temperature=0.7
)

Anthropic Claude

llm_config = LLMConfig(
    provider="anthropic",
    model="claude-3-opus-20240229",
    api_key="your-api-key"
)

Azure OpenAI

llm_config = LLMConfig(
    provider="azure",
    model="gpt-4",
    api_key="your-api-key",
    api_base="https://your-resource.openai.azure.com/"
)

🚨 Error Handling

Lexora provides structured error handling:

result = await agent.tool_registry.get_tool("rag_query").run(
    corpus_name="nonexistent",
    query="test"
)

if result.status == "error":
    print(f"Error: {result.error}")
    # Error includes context and suggestions

All errors include:

  • Error code
  • Descriptive message
  • Context information
  • Helpful suggestions

📚 Examples

Check out the examples/ directory for complete examples:

  • 01_quick_start.py - Basic usage
  • 02_custom_configuration.py - Configuration options
  • 03_corpus_management.py - Managing corpora
  • 04_custom_tools.py - Creating custom tools
  • rag_tools_demo.py - All RAG tools
  • rag_agent_with_real_embeddings.py - Production setup

🧪 Testing

Run the test suite:

# Run all tests
python run_tests.py

# Run specific test file
python tests/test_error_handling.py

# Run with pytest
pytest tests/ -v

📊 Performance

  • Query Speed: < 1ms for small corpora
  • Batch Processing: 12,000+ documents/second
  • Concurrent Queries: 10 queries in 5ms
  • Memory Efficient: Handles 200+ documents in batches

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.


📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


🆘 Support


🗺️ Roadmap

  • PyPI package distribution
  • Additional vector database support
  • Streaming responses
  • Multi-modal support (images, audio)
  • Advanced caching strategies
  • Distributed deployment support

🙏 Acknowledgments

Built with:


Made with ❤️ by the Lexora Team

⭐ Star us on GitHub📖 Read the Docs🐦 Follow us on Twitter

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lexora-0.1.1.tar.gz (32.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lexora-0.1.1-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file lexora-0.1.1.tar.gz.

File metadata

  • Download URL: lexora-0.1.1.tar.gz
  • Upload date:
  • Size: 32.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for lexora-0.1.1.tar.gz
Algorithm Hash digest
SHA256 4320c81fbf8fa332e65c3882616fd11b58f196edc5e9441a165592f3f9126a3e
MD5 5b54941c8c74e099fdd2a9e02aaf73df
BLAKE2b-256 6b9a8afc642993cf7f6fe5260f5ddc2f24f6d1f969fcefa892bdc288bd18258d

See more details on using hashes here.

File details

Details for the file lexora-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: lexora-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for lexora-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fbb17771814af698e223659e30d88b2927bdb9ab961427cde8ea580bea23c4ed
MD5 9fbdebf960e19442003563d9b9c29356
BLAKE2b-256 84ebf6c5ed033fdd3d183522a2617b47f2de1eb1aa4de680bbe558ea340faae4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page