A high-performance static internet index for LLM RAG applications

These details have not been verified by PyPI

Project links

Homepage

Project description

llmindex

🔍 Local semantic search for LLM applications

A lightweight Python library for searching a pre-trained FAISS index locally. Returns URLs and text content ready for your LLM's context window.

Installation

pip install -e .

Quick Start

from llmindex import LLMIndex
import json

# Load the pre-trained index
index = LLMIndex(model_dir="./models")

# Search
results_json = index.search("machine learning algorithms", top_k=5)

# Parse results
results = json.loads(results_json)

for item in results:
    print(f"URL: {item['url']}")
    print(f"Content: {item['content']}\n")

Architecture

flowchart LR
    A[User Query] --> B[Encode with SentenceTransformer]
    B --> C[PCA: 384 → 64 dims]
    C --> D[PackBits to 8‑byte binary]
    D --> E[FAISS Binary Index Search]
    E --> F[Get FAISS Indices]
    F --> G[Lookup → (dataset_id, row_id)]
    G --> H[Fetch from HuggingFace Datasets]

    subgraph DataSources
        W[[Wikipedia]]
        X[[FineWeb]]
        W --> H
        X --> H
    end

    H --> I[Return Results + Optional Rerank]
    I --> J[JSON Output]

The diagram now uses proper Mermaid syntax: separate edges for the two data sources and clear final output. It matches the simplified flow you requested.

Detailed Architecture Flow

1. Query Encoding

Input: User search query string
Component: SentenceTransformer (all-MiniLM-L6-v2)
Output: 384-dimensional dense embedding vector
Device: Auto-detected (CUDA/GPU or CPU)

2. PCA Compression

Input: 384-dim embedding
Component: Pre-trained PCA model
Process: Project to 64 dimensions
Output: 64-dim normalized float vector in [0, 1] range

3. Binary Quantization

Input: 64-dim float vector
Process: PackBits thresholding (value > 0 → 1, else 0)
Output: 8-byte binary vector (64 bits)

4. FAISS Binary Index Search

Component: FAISS binary index
Process: k-NN search on binary vectors
Output: Top-K indices with distances

5. Mapping Lookup

Component: Mapping pickle file
Format: List of (dataset_id, row_id) tuples
Purpose: Links FAISS indices to original dataset rows

6. Context Fetch (Parallel)

Component: HuggingFace Datasets Server API
Datasets:
- Wikipedia (wikimedia/wikipedia, config: 20231101.en)
- FineWeb (HuggingFaceFW/fineweb, config: CC-MAIN-2025-26)
Process: Fetch url, text, date, source, dump
Implementation: Up to 16 concurrent HTTP requests
Note: This step is parallelized and transparent to users

7. Optional Reranking

When enabled: Fetch additional candidates, re-encode with full embeddings, compute cosine similarity, return best results
Benefit: Improves relevance over pure binary search
Cost: Additional encoding time

8. Final Output

Returns: JSON array with result objects containing URL, content, source, date, dump

API

`LLMIndex(model_dir="./models", device=None)`

Initialize the search index.

Parameters:

model_dir - Directory containing the pre-trained models
device - 'cuda' or 'cpu' (auto-detected if None)

`index.search(query, top_k=5) → str`

Search and return results as JSON.

Parameters:

query - Search query string
top_k - Number of results (default: 5)

Returns: JSON string with results

[
  {"url": "https://example.com/page1", "content": "..."},
  {"url": "https://example.com/page2", "content": "..."}
]

Getting the Models

The pre-trained models are required to use this library. You have two options:

Option 1: Train Locally (see `train.py`)

python train.py --target-docs 100000000 --save-dir ./models/

Option 2: Download from HuggingFace

Coming soon - pre-trained models will be available on HuggingFace Hub.

Use Cases

RAG with LLMs

from llmindex import LLMIndex
from openai import OpenAI
import json

index = LLMIndex()
query = "What is transformers?"

# Get context
results_json = index.search(query, top_k=3)
results = json.loads(results_json)
context = "\n".join([r["content"] for r in results])

# Send to LLM
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{
        "role": "user",
        "content": f"Context:\n{context}\n\nQuestion: {query}"
    }]
)

Local Search Engine

from llmindex import LLMIndex
import json

index = LLMIndex()

while True:
    query = input("Search: ")
    results_json = index.search(query, top_k=10)
    results = json.loads(results_json)
    
    for i, item in enumerate(results, 1):
        print(f"{i}. {item['url']}")

Requirements

Python 3.8+
PyTorch
FAISS
Sentence Transformers
NumPy, scikit-learn

See requirements.txt for exact versions.

Performance

Search latency: ~50ms (GPU) to 200ms (CPU) per query
Memory: ~4GB for 100M document index
Disk: ~2GB for all model files

License

MIT

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.1.0

May 1, 2026

1.0.1

May 1, 2026

This version

1.0.0

May 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmsearchindex-1.0.0.tar.gz (6.9 kB view details)

Uploaded May 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmsearchindex-1.0.0-py3-none-any.whl (7.5 kB view details)

Uploaded May 1, 2026 Python 3

File details

Details for the file llmsearchindex-1.0.0.tar.gz.

File metadata

Download URL: llmsearchindex-1.0.0.tar.gz
Upload date: May 1, 2026
Size: 6.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for llmsearchindex-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`08edc26f23a19e8cfb39619dbcfa34f7920c17b6d3079003c6b44a3c4f4da01d`
MD5	`49301e23027b771e04d8950cd82ac2e4`
BLAKE2b-256	`cb922623a01e18d3e0a159489cd03f330f972b769b26dceb8380745bc5ef7c56`

See more details on using hashes here.

File details

Details for the file llmsearchindex-1.0.0-py3-none-any.whl.

File metadata

Download URL: llmsearchindex-1.0.0-py3-none-any.whl
Upload date: May 1, 2026
Size: 7.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for llmsearchindex-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ca3f6f02b9a835058993c787a10360dc5031514d51b745e9f3cf86b1856c43a9`
MD5	`01525c1e246f5f3cbdd8522634122070`
BLAKE2b-256	`5b862b7d693bc540975114f8eb75d6abefb9b55416912f907f808b41fabea29e`

See more details on using hashes here.

llmsearchindex 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

llmindex

Installation

Quick Start

Architecture

Detailed Architecture Flow

1. Query Encoding

2. PCA Compression

3. Binary Quantization

4. FAISS Binary Index Search

5. Mapping Lookup

6. Context Fetch (Parallel)

7. Optional Reranking

8. Final Output

API

LLMIndex(model_dir="./models", device=None)

index.search(query, top_k=5) → str

Getting the Models

Option 1: Train Locally (see train.py)

Option 2: Download from HuggingFace

Use Cases

RAG with LLMs

Local Search Engine

Requirements

Performance

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`LLMIndex(model_dir="./models", device=None)`

`index.search(query, top_k=5) → str`

Option 1: Train Locally (see `train.py`)