Skip to main content

A document search library using embeddings and BM25

Project description

🚀 FindaLedge: Simple Ensemble Search for RAG 🔍

FindaLedge is a Python library for building robust, hybrid search backends for Retrieval-Augmented Generation (RAG) and LLM applications. It unifies vector and keyword search, manages document ingestion, and provides a simple, powerful API.

Build powerful RAG search backends with ease!

Python Version License: MIT PyPI version


🇯🇵 日本語版 README はこちら (Click here for Japanese README)


🤔 Why FindaLedge?

  • Vector search (semantic) and keyword search (BM25) each have strengths and weaknesses.
  • FindaLedge combines both (ensemble search) for best accuracy, with zero setup hassle.
  • Handles all the plumbing: document loading, chunking, embedding, index sync, result fusion (RRF), and more!

✨ Features

Feature Description
🎯 Hybrid Search Combines vector & keyword search (BM25) with RRF fusion
🔌 Flexible Supports Chroma, FAISS, BM25s, OpenAI, Ollama, HuggingFace, etc.
📚 Easy Ingestion Add files, directories, or LangChain Documents instantly
🔄 Auto Indexing Indices are auto-created, updated, and persisted
🧹 Simple API Add, search, remove documents with one-liners
🧩 LangChain Ready Use as a Retriever in LangChain chains
🧪 Full Test Suite 100+ tests, pytest/uv compatible

⚙️ Supported Environment

Item Supported
Python 3.11+ (Windows/Powershell推奨)
OS Windows, macOS, Linux
Vector DB Chroma, FAISS (optional)
Embeddings OpenAI, Ollama, HuggingFace, etc.
Agents SDK OpenAI Agents SDK
Test pytest, pytest-cov, uv

🛠️ Quick Start

1. Install (with uv & venv recommended)

# Create and activate venv
python -m venv .venv
.venv\Scripts\Activate.ps1  # (Windows Powershell)

# Install uv (if not yet)
pip install uv

# Install dependencies
uv pip install -r requirements.txt
# or: uv pip install .

2. Set Environment Variables (optional)

$env:OPENAI_API_KEY="sk-..."  # For OpenAI
$env:FINDALEDGE_EMBEDDING_MODEL_NAME="text-embedding-3-small"
$env:FINDALEDGE_PERSIST_DIR="./my_data"

3. Basic Usage

from findaledge import FindaLedge

ledge = FindaLedge()
ledge.add_document("docs/manual.txt")
results = ledge.search("What is the main topic?")
for r in results:
    print(r.document.page_content, r.score)

4. Run Tests

.venv\Scripts\Activate.ps1
uv pip install -r requirements.txt
pytest

🏗️ Architecture (Layered)

Layer Class Responsibility
Controller FindaLedge Unified API, orchestrates all below
UseCase Finder Hybrid search, RRF fusion
Gateway ChromaDocumentStore, BM25sStore Vector/BM25 storage
Function DocumentLoader, DocumentSplitter, EmbeddingModelFactory Loading, splitting, embedding
Data LangchainDocument, SearchResult Data objects
Utility Tokenizer, config/env Tokenize, config
@startuml
FindaLedge --> Finder
FindaLedge --> DocumentLoader
FindaLedge --> DocumentSplitter
FindaLedge --> EmbeddingModelFactory
FindaLedge --> ChromaDocumentStore
FindaLedge --> BM25sStore
Finder --> SearchResult
@enduml

🧑‍💻 Main API (Class Table)

Class Role Key Methods
FindaLedge Facade/Controller add_document, search, remove_document, get_context
Finder UseCase (Hybrid) search (RRF), find
ChromaDocumentStore Gateway add_documents, as_retriever
BM25sStore Gateway add_documents, as_retriever
EmbeddingModelFactory Factory create_embeddings
DocumentLoader Loader load_file, load_from_directory
DocumentSplitter Splitter split_documents

📖 Documentation

🧪 Testing

  • All tests pass (pytest/uv, Windows Powershell)
  • Run: pytest
  • Coverage: pytest-cov enabled

🤝 Contributing

Contributions welcome! Fork, branch, PR, and let's build better RAG search together 🚀

📜 License

MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

findaledge-0.1.1.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

findaledge-0.1.1-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file findaledge-0.1.1.tar.gz.

File metadata

  • Download URL: findaledge-0.1.1.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for findaledge-0.1.1.tar.gz
Algorithm Hash digest
SHA256 79ed8101f1450269e983a30481d33b1761e88f590bf4cff442ab89f5e1279ae3
MD5 88fd3f86fb8fd426426bd86e55fb0e3e
BLAKE2b-256 e092d54e8a0fa9856ae0eabaa51a5fccfe60c30f753c0a7df5c4a18a7bd13590

See more details on using hashes here.

File details

Details for the file findaledge-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: findaledge-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 4.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for findaledge-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0e81b25c9fc8fd9009dd6242d7f5fd3088e86d8e4fb77cea4098e0a91e30d947
MD5 e83f4145c016f92c2a6291cd070af89a
BLAKE2b-256 c1f99e0132651f5428aa30f2cd184fee4b9f589319c64f67a5f38d3127873194

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page