Skip to main content

A high-performance static internet index for LLM RAG applications

Project description

llmsearchindex

LLMSearchIndex is a Python library for internet-scale retrieval in LLM RAG applications using a fully local search index.

We trained a search index on 203,169,792 web pages sourced from:

This index can be used as external context to significantly improve LLM responses without requiring external API calls at query time.

Installation

pip install llmsearchindex

PyPI: https://pypi.org/project/llmsearchindex/

Example Usage:

from llmsearchindex import LLMIndex

# Initializes and downloads index
index = LLMIndex()

# Standard search (Fastest)
results = index.search("who invented sliced bread", top_k=5)

# High-precision search (Reranked)
results = index.search("who invented sliced bread", top_k=5, rerank=True)

for result in results:
  print(result.get('text'))
  print(result.get('url'))
  print("==="*100)

System requirements

  • ~6 GB RAM
  • ~10 GB disk space
  • CPU inference supported (GPU optional)

Architecture

flowchart LR
    A[User Query] --> B(Embed Sentence Transformers all-MiniLM-L6)
    B --> C(PCA: 384d → 64d)
    C --> D(Binary Quantize)
    X[HuggingFace FineWeb] --> G
    Y[HuggingFace Wikipedia] --> G
    D --> E{FAISS Index}
    E --> G(Fetch Indexed Rows from HuggingFace Server)
    G --> H{Rerank?}
    H -->|Yes| I(Cosine Similarity)
    H -->|No| J[Final Results]
    I --> J
    B -->I

Resources

Embeddings: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 FAISS Vector search: https://github.com/facebookresearch/faiss Wikipedia: https://huggingface.co/datasets/wikimedia/wikipedia FineWeb: https://huggingface.co/datasets/HuggingFaceFW/fineweb

License- MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmsearchindex-1.0.1.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmsearchindex-1.0.1-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file llmsearchindex-1.0.1.tar.gz.

File metadata

  • Download URL: llmsearchindex-1.0.1.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for llmsearchindex-1.0.1.tar.gz
Algorithm Hash digest
SHA256 7bad7ba6e7d3bf6bbe66222a3bfdecc84247b400714fed747d87bcd8afd09f05
MD5 05d8b545653fac77e6427fdf292a237b
BLAKE2b-256 470de41b055e5a9679f9fb15e5eac5b66764930cfe148fdcf08c5c08c1fc72a0

See more details on using hashes here.

File details

Details for the file llmsearchindex-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: llmsearchindex-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 5.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for llmsearchindex-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 35c35a3f3e13db6c11b54dea9b35f55246822c5d55c5df748b26f987ac0e4450
MD5 b98c3e13a9899e364c348f5837a879d4
BLAKE2b-256 125d52f780b970a4b437dd91fb7683467fbe49a3b61b4cbb6199129d6f3656a3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page