Skip to main content

High-performance document memory system with vector search capabilities for Python applications

Project description

llmemory

License: MIT Python PyPI

A high-performance document memory system with vector search capabilities for Python applications.

Overview

llmemory provides intelligent document processing with:

  • Complete Document Management API โ€“ List, retrieve, search, and manage documents without direct database access
  • State-of-the-art retrieval using PostgreSQL with pgvector, hybrid BM25, multi-query expansion, and reranking
  • Multi-language support with automatic language detection and normalization
  • Hierarchical chunking & summaries with document-type specific configurations and optional auto-summaries
  • Production-ready monitoring with Prometheus metrics and searchable diagnostics
  • Reliable async Postgres foundation via pgdbm (connection pooling, migrations, schema support)

What's New

  • ๐Ÿ” Multi-query expansion โ€“ Generate semantic + keyword variants automatically and fuse results with reciprocal rank fusion
  • ๐ŸŽฏ Configurable reranking โ€“ Plug in OpenAI or cross-encoder rerankers (or use built-in heuristics) for higher precision on the final hit list
  • ๐Ÿงญ Query routing โ€“ Automatic answerability detection routes queries to best retrieval strategy
  • ๐ŸŽจ Contextual retrieval โ€“ Anthropic-style chunk enrichment with document context for improved semantic matching
  • โš™๏ธ HNSW presets โ€“ Choose fast, balanced, or accurate profiles to tune pgvector index parameters and query-time ef_search
  • ๐Ÿ“ Chunk summaries โ€“ Capture short, metadata-aware synopses during ingestion and surface them with every search hit
  • ๐Ÿ“ˆ Richer diagnostics โ€“ Search history now records query variants, latency breakdowns, rerank status, and summary usage for easy tuning

Why llmemory?

Building applications with document search capabilities requires solving complex problems:

  • Vector embeddings for semantic understanding
  • Efficient chunking that preserves context
  • Hybrid search combining vectors and full-text
  • Multi-tenant isolation for SaaS applications
  • Performance optimization for large document sets

llmemory provides a production-ready solution for these challenges.

Key Features

  • ๐Ÿš€ Fast Search โ€“ HNSW indexes for sub-100โ€ฏms vector searches, with multi-query expansion and optional cross-encoder reranking for harder queries
  • ๐ŸŒ Multi-language โ€“ Automatic detection and processing for 14+ languages
  • ๐Ÿ“Š Smart Chunking โ€“ Document-type aware chunking with contextual enrichment, optional inline summaries, and hierarchical parent context
  • ๐Ÿ” Hybrid Search โ€“ Combines vector and text search with reciprocal rank fusion, query routing, and rerank scores
  • ๐Ÿ“ˆ Observable โ€“ Built-in Prometheus metrics and detailed search diagnostics
  • ๐Ÿข Multi-tenant โ€“ Owner-based isolation for SaaS applications
  • ๐Ÿ”Œ Flexible Embeddings โ€“ Support for OpenAI and local embedding models

Quick Start

from llmemory import LLMemory, DocumentType, SearchType

# Initialize
memory = LLMemory(
    connection_string="postgresql://localhost/mydb",
    openai_api_key="sk-..."
)
await memory.initialize()

# Add a document - returns detailed results
result = await memory.add_document(
    owner_id="workspace-123",
    id_at_origin="user-456",
    document_name="project-report.pdf",
    document_type=DocumentType.REPORT,
    content="Your document content here..."
)
print(f"Created {result.chunks_created} chunks in {result.processing_time_ms}ms")

# List documents with filtering
docs = await memory.list_documents(
    owner_id="workspace-123",
    document_type=DocumentType.REPORT,
    metadata_filter={"status": "active"}
)

# Search with document metadata
results = await memory.search_with_documents(
    owner_id="workspace-123",
    query_text="project timeline",
    search_type=SearchType.HYBRID
)
for result in results.results:
    print(f"Found in: {result.document_name} - {result.content[:100]}...")

# Get statistics
stats = await memory.get_statistics("workspace-123")
print(f"Total: {stats.document_count} docs, {stats.chunk_count} chunks")

Installation

# Install using uv (recommended)
uv add llmemory

# Or using pip
pip install llmemory

Or with optional dependencies:

# Using uv
uv add "llmemory[monitoring]"  # For Prometheus metrics
uv add "llmemory[cache]"       # For Redis caching
uv add "llmemory[local]"       # For local embeddings
uv add "llmemory[reranker-local]"  # For local cross-encoder reranking
uv add "llmemory[bench]"       # For BEIR benchmarking harness

# Using pip
pip install "llmemory[monitoring]"  # For Prometheus metrics
pip install "llmemory[cache]"       # For Redis caching
pip install "llmemory[local]"       # For local embeddings
pip install "llmemory[reranker-local]"  # For cross-encoder reranking support
pip install "llmemory[bench]"       # For benchmarking harness

Claude Code Skills

llmemory provides expert guidance skills for Claude Code that teach Claude how to work with the library effectively. When you use Claude Code with llmemory, these skills automatically activate to provide:

  • โœ… Production-ready code examples
  • โœ… Best practices and patterns
  • โœ… Common pitfalls to avoid
  • โœ… Architecture guidance
  • โœ… Testing strategies

No manual needed - just ask Claude naturally and the right skills load automatically!

Installation

# In Claude Code terminal, add the marketplace
/plugin marketplace add juanre/ai-tools

# Install all llmemory skills (recommended)
/plugin install llmemory@juanre-ai-tools

# Or install individual skills
/plugin install llmemory-hybrid-search@juanre-ai-tools
/plugin install llmemory-rag@juanre-ai-tools

Available Skills

Skill Description Install
llmemory All llmemory skills (recommended) /plugin install llmemory@juanre-ai-tools
llmemory-basic-usage Getting started and basic operations /plugin install llmemory-basic-usage@juanre-ai-tools
llmemory-hybrid-search Vector + BM25 hybrid search /plugin install llmemory-hybrid-search@juanre-ai-tools
llmemory-multi-query Query expansion for better results /plugin install llmemory-multi-query@juanre-ai-tools
llmemory-multi-tenant Multi-tenant SaaS patterns /plugin install llmemory-multi-tenant@juanre-ai-tools
llmemory-rag Complete RAG systems /plugin install llmemory-rag@juanre-ai-tools

How It Works

Example: Building a RAG system

You ask:

"Build a RAG system with hybrid search for customer support docs"

What happens:

  1. Claude sees "RAG", "hybrid search"
  2. Automatically loads llmemory-hybrid-search and llmemory-rag skills
  3. Guides you through document ingestion, search setup, and retrieval
  4. Shows you how to combine vector + keyword search

Result: Complete RAG implementation with best practices built-in!

Example: Multi-tenant document search

You ask:

"Add document search with owner-based isolation for my SaaS app"

What happens:

  1. Claude sees "SaaS", "owner-based isolation"
  2. Loads llmemory-multi-tenant skill
  3. Provides expert guidance on multi-tenant patterns
  4. Shows you exactly how to implement owner isolation

Result: Production-ready multi-tenant search with proper data isolation!

Documentation

  • ๐Ÿ“– Installation Guide - Detailed setup instructions
  • ๐Ÿš€ Quick Start - Get running in 5 minutes
  • ๐ŸŽฏ Usage Patterns - Standalone vs shared pool patterns
  • ๐Ÿ“š API Reference - Complete API documentation
  • ๐Ÿ”ง Integration Guide - Framework integration patterns
  • ๐Ÿ—„๏ธ Migration Guide - How migrations work in each pattern
  • ๐Ÿ“Š Monitoring Guide - Production monitoring setup
  • ๐Ÿ’ก Examples - Working examples for common use cases
  • ๐Ÿงช bench/beir_runner.py - BEIR benchmarking harness (requires llmemory[bench])

Performance

  • Search latency: < 100ms (p95) with proper indexing
  • Throughput: 1000+ searches/second with caching
  • Document processing: Handles documents up to 1MB efficiently
  • Multi-language: Processes 14+ languages with automatic detection

Requirements

  • PostgreSQL 14+ with pgvector extension
  • Python 3.10+
  • OpenAI API key (or local embedding models)

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmemory-0.5.0.tar.gz (232.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmemory-0.5.0-py3-none-any.whl (98.3 kB view details)

Uploaded Python 3

File details

Details for the file llmemory-0.5.0.tar.gz.

File metadata

  • Download URL: llmemory-0.5.0.tar.gz
  • Upload date:
  • Size: 232.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for llmemory-0.5.0.tar.gz
Algorithm Hash digest
SHA256 c50cae53ba67eaaf5febe7c878adf43b72e4b5525615dfa8e6a17db20be4d6bb
MD5 0399e369abd1ae8851a6605bb30d3ca8
BLAKE2b-256 eaba7a7ab7699d37641523cba250137f5c51cee8d670a310bb090ed43b330071

See more details on using hashes here.

File details

Details for the file llmemory-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: llmemory-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 98.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for llmemory-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a7bec6da36ba6dfb249eaf2ec11552ce297265980d7807c903b2d3e48a48e0f0
MD5 5fb67a06bd28f4d98d62e032a43c4a7d
BLAKE2b-256 025c9689ba1815d33da31f1fed90af57f76ebe9a463e97e1cebd75ce75268e15

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page