A document search library using embeddings and BM25
Project description
🚀 FindaLedge: Simple Ensemble Search for RAG 🔍
FindaLedge is a Python library for building robust, hybrid search backends for Retrieval-Augmented Generation (RAG) and LLM applications. It unifies vector and keyword search, manages document ingestion, and provides a simple, powerful API.
✨ Build powerful RAG search backends with ease! ✨
🇯🇵 日本語版 README はこちら (Click here for Japanese README)
🤔 Why FindaLedge?
- Vector search (semantic) and keyword search (BM25) each have strengths and weaknesses.
- FindaLedge combines both (ensemble search) for best accuracy, with zero setup hassle.
- Handles all the plumbing: document loading, chunking, embedding, index sync, result fusion (RRF), and more!
✨ Features
| Feature | Description |
|---|---|
| 🎯 Hybrid Search | Combines vector & keyword search (BM25) with RRF fusion |
| 🔌 Flexible | Supports Chroma, FAISS, BM25s, OpenAI, Ollama, HuggingFace, etc. |
| 📚 Easy Ingestion | Add files, directories, or LangChain Documents instantly |
| 🔄 Auto Indexing | Indices are auto-created, updated, and persisted |
| 🧹 Simple API | Add, search, remove documents with one-liners |
| 🧩 LangChain Ready | Use as a Retriever in LangChain chains |
| 🧪 Full Test Suite | 100+ tests, pytest/uv compatible |
⚙️ Supported Environment
| Item | Supported |
|---|---|
| Python | 3.11+ (Windows/Powershell推奨) |
| OS | Windows, macOS, Linux |
| Vector DB | Chroma, FAISS (optional) |
| Embeddings | OpenAI, Ollama, HuggingFace, etc. |
| Agents SDK | OpenAI Agents SDK |
| Test | pytest, pytest-cov, uv |
🛠️ Quick Start
1. Install (with uv & venv recommended)
# Create and activate venv
python -m venv .venv
.venv\Scripts\Activate.ps1 # (Windows Powershell)
# Install uv (if not yet)
pip install uv
# Install dependencies
uv pip install -r requirements.txt
# or: uv pip install .
2. Set Environment Variables (optional)
$env:OPENAI_API_KEY="sk-..." # For OpenAI
$env:FINDALEDGE_EMBEDDING_MODEL_NAME="text-embedding-3-small"
$env:FINDALEDGE_PERSIST_DIR="./my_data"
3. Basic Usage
from findaledge import FindaLedge
ledge = FindaLedge()
ledge.add_document("docs/manual.txt")
results = ledge.search("What is the main topic?")
for r in results:
print(r.document.page_content, r.score)
4. Run Tests
.venv\Scripts\Activate.ps1
uv pip install -r requirements.txt
pytest
🏗️ Architecture (Layered)
| Layer | Class | Responsibility |
|---|---|---|
| Controller | FindaLedge | Unified API, orchestrates all below |
| UseCase | Finder | Hybrid search, RRF fusion |
| Gateway | ChromaDocumentStore, BM25sStore | Vector/BM25 storage |
| Function | DocumentLoader, DocumentSplitter, EmbeddingModelFactory | Loading, splitting, embedding |
| Data | LangchainDocument, SearchResult | Data objects |
| Utility | Tokenizer, config/env | Tokenize, config |
@startuml
FindaLedge --> Finder
FindaLedge --> DocumentLoader
FindaLedge --> DocumentSplitter
FindaLedge --> EmbeddingModelFactory
FindaLedge --> ChromaDocumentStore
FindaLedge --> BM25sStore
Finder --> SearchResult
@enduml
🧑💻 Main API (Class Table)
| Class | Role | Key Methods |
|---|---|---|
| FindaLedge | Facade/Controller | add_document, search, remove_document, get_context |
| Finder | UseCase (Hybrid) | search (RRF), find |
| ChromaDocumentStore | Gateway | add_documents, as_retriever |
| BM25sStore | Gateway | add_documents, as_retriever |
| EmbeddingModelFactory | Factory | create_embeddings |
| DocumentLoader | Loader | load_file, load_from_directory |
| DocumentSplitter | Splitter | split_documents |
📖 Documentation
🧪 Testing
- All tests pass (pytest/uv, Windows Powershell)
- Run:
pytest - Coverage: pytest-cov enabled
🤝 Contributing
Contributions welcome! Fork, branch, PR, and let's build better RAG search together 🚀
📜 License
MIT License. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file findaledge-0.1.0.tar.gz.
File metadata
- Download URL: findaledge-0.1.0.tar.gz
- Upload date:
- Size: 40.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40e7e38b7009dd51e5ce8161bc65f816df7c8f856fcbd6139439a8d564229047
|
|
| MD5 |
9466a90864008f6715356df390c46376
|
|
| BLAKE2b-256 |
2835816e23e72ddb947330ede89fc1af06a4af8c691eee4013dc930d677a30ac
|
File details
Details for the file findaledge-0.1.0-py3-none-any.whl.
File metadata
- Download URL: findaledge-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41c17b33a9cb41d40afb3a53cb50d71696102800e6c0ca102cd1d4fe0a1c29d9
|
|
| MD5 |
1fba7fdabadfd6858e2d10bc90fd3078
|
|
| BLAKE2b-256 |
25eef11f4189c09f178010f96858102f156d904253818484430babf5b1971e57
|