Skip to main content

A document search library using embeddings and BM25

Project description

🚀 FindaLedge: Simple Ensemble Search for RAG 🔍

FindaLedge is a Python library for building robust, hybrid search backends for Retrieval-Augmented Generation (RAG) and LLM applications. It unifies vector and keyword search, manages document ingestion, and provides a simple, powerful API.

Build powerful RAG search backends with ease!

Python Version License: MIT PyPI version


🇯🇵 日本語版 README はこちら (Click here for Japanese README)


🤔 Why FindaLedge?

  • Vector search (semantic) and keyword search (BM25) each have strengths and weaknesses.
  • FindaLedge combines both (ensemble search) for best accuracy, with zero setup hassle.
  • Handles all the plumbing: document loading, chunking, embedding, index sync, result fusion (RRF), and more!

✨ Features

Feature Description
🎯 Hybrid Search Combines vector & keyword search (BM25) with RRF fusion
🔌 Flexible Supports Chroma, FAISS, BM25s, OpenAI, Ollama, HuggingFace, etc.
📚 Easy Ingestion Add files, directories, or LangChain Documents instantly
🔄 Auto Indexing Indices are auto-created, updated, and persisted
🧹 Simple API Add, search, remove documents with one-liners
🧩 LangChain Ready Use as a Retriever in LangChain chains
🧪 Full Test Suite 100+ tests, pytest/uv compatible

⚙️ Supported Environment

Item Supported
Python 3.11+ (Windows/Powershell推奨)
OS Windows, macOS, Linux
Vector DB Chroma, FAISS (optional)
Embeddings OpenAI, Ollama, HuggingFace, etc.
Agents SDK OpenAI Agents SDK
Test pytest, pytest-cov, uv

🛠️ Quick Start

1. Install (with uv & venv recommended)

# Create and activate venv
python -m venv .venv
.venv\Scripts\Activate.ps1  # (Windows Powershell)

# Install uv (if not yet)
pip install uv

# Install dependencies
uv pip install -r requirements.txt
# or: uv pip install .

2. Set Environment Variables (optional)

$env:OPENAI_API_KEY="sk-..."  # For OpenAI
$env:FINDALEDGE_EMBEDDING_MODEL_NAME="text-embedding-3-small"
$env:FINDALEDGE_PERSIST_DIR="./my_data"

3. Basic Usage

from findaledge import FindaLedge

ledge = FindaLedge()
ledge.add_document("docs/manual.txt")
results = ledge.search("What is the main topic?")
for r in results:
    print(r.document.page_content, r.score)

4. Run Tests

.venv\Scripts\Activate.ps1
uv pip install -r requirements.txt
pytest

🏗️ Architecture (Layered)

Layer Class Responsibility
Controller FindaLedge Unified API, orchestrates all below
UseCase Finder Hybrid search, RRF fusion
Gateway ChromaDocumentStore, BM25sStore Vector/BM25 storage
Function DocumentLoader, DocumentSplitter, EmbeddingModelFactory Loading, splitting, embedding
Data LangchainDocument, SearchResult Data objects
Utility Tokenizer, config/env Tokenize, config
@startuml
FindaLedge --> Finder
FindaLedge --> DocumentLoader
FindaLedge --> DocumentSplitter
FindaLedge --> EmbeddingModelFactory
FindaLedge --> ChromaDocumentStore
FindaLedge --> BM25sStore
Finder --> SearchResult
@enduml

🧑‍💻 Main API (Class Table)

Class Role Key Methods
FindaLedge Facade/Controller add_document, search, remove_document, get_context
Finder UseCase (Hybrid) search (RRF), find
ChromaDocumentStore Gateway add_documents, as_retriever
BM25sStore Gateway add_documents, as_retriever
EmbeddingModelFactory Factory create_embeddings
DocumentLoader Loader load_file, load_from_directory
DocumentSplitter Splitter split_documents

📖 Documentation

🧪 Testing

  • All tests pass (pytest/uv, Windows Powershell)
  • Run: pytest
  • Coverage: pytest-cov enabled

🤝 Contributing

Contributions welcome! Fork, branch, PR, and let's build better RAG search together 🚀

📜 License

MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

findaledge-0.1.0.tar.gz (40.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

findaledge-0.1.0-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file findaledge-0.1.0.tar.gz.

File metadata

  • Download URL: findaledge-0.1.0.tar.gz
  • Upload date:
  • Size: 40.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for findaledge-0.1.0.tar.gz
Algorithm Hash digest
SHA256 40e7e38b7009dd51e5ce8161bc65f816df7c8f856fcbd6139439a8d564229047
MD5 9466a90864008f6715356df390c46376
BLAKE2b-256 2835816e23e72ddb947330ede89fc1af06a4af8c691eee4013dc930d677a30ac

See more details on using hashes here.

File details

Details for the file findaledge-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: findaledge-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for findaledge-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 41c17b33a9cb41d40afb3a53cb50d71696102800e6c0ca102cd1d4fe0a1c29d9
MD5 1fba7fdabadfd6858e2d10bc90fd3078
BLAKE2b-256 25eef11f4189c09f178010f96858102f156d904253818484430babf5b1971e57

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page