Skip to main content

A high-performance static internet index for LLM RAG applications

Project description

llmindex

🔍 Local semantic search for LLM applications

A lightweight Python library for searching a pre-trained FAISS index locally. Returns URLs and text content ready for your LLM's context window.

Installation

pip install -e .

Quick Start

from llmindex import LLMIndex
import json

# Load the pre-trained index
index = LLMIndex(model_dir="./models")

# Search
results_json = index.search("machine learning algorithms", top_k=5)

# Parse results
results = json.loads(results_json)

for item in results:
    print(f"URL: {item['url']}")
    print(f"Content: {item['content']}\n")

Architecture

flowchart LR
    A[User Query] --> B[Encode with SentenceTransformer]
    B --> C[PCA: 384 → 64 dims]
    C --> D[PackBits to 8‑byte binary]
    D --> E[FAISS Binary Index Search]
    E --> F[Get FAISS Indices]
    F --> G[Lookup → (dataset_id, row_id)]
    G --> H[Fetch from HuggingFace Datasets]

    subgraph DataSources
        W[[Wikipedia]]
        X[[FineWeb]]
        W --> H
        X --> H
    end

    H --> I[Return Results + Optional Rerank]
    I --> J[JSON Output]

The diagram now uses proper Mermaid syntax: separate edges for the two data sources and clear final output. It matches the simplified flow you requested.

Detailed Architecture Flow

1. Query Encoding

  • Input: User search query string
  • Component: SentenceTransformer (all-MiniLM-L6-v2)
  • Output: 384-dimensional dense embedding vector
  • Device: Auto-detected (CUDA/GPU or CPU)

2. PCA Compression

  • Input: 384-dim embedding
  • Component: Pre-trained PCA model
  • Process: Project to 64 dimensions
  • Output: 64-dim normalized float vector in [0, 1] range

3. Binary Quantization

  • Input: 64-dim float vector
  • Process: PackBits thresholding (value > 0 → 1, else 0)
  • Output: 8-byte binary vector (64 bits)

4. FAISS Binary Index Search

  • Component: FAISS binary index
  • Process: k-NN search on binary vectors
  • Output: Top-K indices with distances

5. Mapping Lookup

  • Component: Mapping pickle file
  • Format: List of (dataset_id, row_id) tuples
  • Purpose: Links FAISS indices to original dataset rows

6. Context Fetch (Parallel)

  • Component: HuggingFace Datasets Server API
  • Datasets:
    • Wikipedia (wikimedia/wikipedia, config: 20231101.en)
    • FineWeb (HuggingFaceFW/fineweb, config: CC-MAIN-2025-26)
  • Process: Fetch url, text, date, source, dump
  • Implementation: Up to 16 concurrent HTTP requests
  • Note: This step is parallelized and transparent to users

7. Optional Reranking

  • When enabled: Fetch additional candidates, re-encode with full embeddings, compute cosine similarity, return best results
  • Benefit: Improves relevance over pure binary search
  • Cost: Additional encoding time

8. Final Output

  • Returns: JSON array with result objects containing URL, content, source, date, dump

API

LLMIndex(model_dir="./models", device=None)

Initialize the search index.

Parameters:

  • model_dir - Directory containing the pre-trained models
  • device - 'cuda' or 'cpu' (auto-detected if None)

index.search(query, top_k=5) → str

Search and return results as JSON.

Parameters:

  • query - Search query string
  • top_k - Number of results (default: 5)

Returns: JSON string with results

[
  {"url": "https://example.com/page1", "content": "..."},
  {"url": "https://example.com/page2", "content": "..."}
]

Getting the Models

The pre-trained models are required to use this library. You have two options:

Option 1: Train Locally (see train.py)

python train.py --target-docs 100000000 --save-dir ./models/

Option 2: Download from HuggingFace

Coming soon - pre-trained models will be available on HuggingFace Hub.

Use Cases

RAG with LLMs

from llmindex import LLMIndex
from openai import OpenAI
import json

index = LLMIndex()
query = "What is transformers?"

# Get context
results_json = index.search(query, top_k=3)
results = json.loads(results_json)
context = "\n".join([r["content"] for r in results])

# Send to LLM
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{
        "role": "user",
        "content": f"Context:\n{context}\n\nQuestion: {query}"
    }]
)

Local Search Engine

from llmindex import LLMIndex
import json

index = LLMIndex()

while True:
    query = input("Search: ")
    results_json = index.search(query, top_k=10)
    results = json.loads(results_json)
    
    for i, item in enumerate(results, 1):
        print(f"{i}. {item['url']}")

Requirements

  • Python 3.8+
  • PyTorch
  • FAISS
  • Sentence Transformers
  • NumPy, scikit-learn

See requirements.txt for exact versions.

Performance

  • Search latency: ~50ms (GPU) to 200ms (CPU) per query
  • Memory: ~4GB for 100M document index
  • Disk: ~2GB for all model files

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmsearchindex-1.0.0.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmsearchindex-1.0.0-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file llmsearchindex-1.0.0.tar.gz.

File metadata

  • Download URL: llmsearchindex-1.0.0.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for llmsearchindex-1.0.0.tar.gz
Algorithm Hash digest
SHA256 08edc26f23a19e8cfb39619dbcfa34f7920c17b6d3079003c6b44a3c4f4da01d
MD5 49301e23027b771e04d8950cd82ac2e4
BLAKE2b-256 cb922623a01e18d3e0a159489cd03f330f972b769b26dceb8380745bc5ef7c56

See more details on using hashes here.

File details

Details for the file llmsearchindex-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: llmsearchindex-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for llmsearchindex-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ca3f6f02b9a835058993c787a10360dc5031514d51b745e9f3cf86b1856c43a9
MD5 01525c1e246f5f3cbdd8522634122070
BLAKE2b-256 5b862b7d693bc540975114f8eb75d6abefb9b55416912f907f808b41fabea29e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page