SAGE Benchmark - RAG and experimental benchmarks for SAGE framework
Project description
SAGE Benchmark
Comprehensive benchmarking tools and RAG examples for the SAGE framework
๐ Overview
SAGE Benchmark provides a comprehensive suite of benchmarking tools and RAG (Retrieval-Augmented Generation) examples for evaluating SAGE framework performance. This package enables researchers and developers to:
- Benchmark RAG pipelines with multiple retrieval strategies (dense, sparse, hybrid)
- Compare vector databases (Milvus, ChromaDB, FAISS) for RAG applications
- Evaluate multimodal retrieval with text, image, and video data
- Run reproducible experiments with standardized configurations and metrics
This package is designed for both research experiments and production system evaluation.
โจ Key Features
- Multiple RAG Implementations: Dense, sparse, hybrid, and multimodal retrieval
- Vector Database Support: Milvus, ChromaDB, FAISS integration
- Experiment Framework: Automated benchmarking with configurable experiments
- Evaluation Metrics: Comprehensive metrics for RAG performance
- Sample Data: Included test data for quick start
- Extensible Design: Easy to add new benchmarks and retrieval methods
๐ฆ Package Structure
sage-benchmark/
โโโ src/
โ โโโ sage/
โ โโโ benchmark/
โ โโโ __init__.py
โ โโโ benchmark_rag/ # RAG benchmarking
โ โโโ __init__.py
โ โโโ implementations/ # RAG implementations
โ โ โโโ pipelines/ # RAG pipeline scripts
โ โ โ โโโ qa_dense_retrieval_milvus.py
โ โ โ โโโ qa_sparse_retrieval_milvus.py
โ โ โ โโโ qa_multimodal_fusion.py
โ โ โ โโโ ...
โ โ โโโ tools/ # Supporting tools
โ โ โโโ build_chroma_index.py
โ โ โโโ build_milvus_dense_index.py
โ โ โโโ loaders/
โ โโโ evaluation/ # Experiment framework
โ โ โโโ pipeline_experiment.py
โ โ โโโ evaluate_results.py
โ โ โโโ config/
โ โโโ config/ # RAG configurations
โ โโโ data/ # Test data
โ # Future benchmarks:
โ # โโโ benchmark_agent/ # Agent benchmarking
โ # โโโ benchmark_anns/ # ANNS benchmarking
โโโ tests/
โโโ pyproject.toml
โโโ README.md
๐ Installation
Quick Start (Recommended)
Clone the repository with submodules and set up development environment:
# 1. Clone repository
git clone --recurse-submodules https://github.com/intellistream/sage-benchmark.git
cd sage-benchmark
# Or if already cloned, initialize submodules
./quickstart.sh
# 2. Install package with development dependencies
pip install -e ".[dev]"
# 3. Install pre-commit hooks (IMPORTANT for contributors)
pre-commit install
The quickstart.sh script will automatically:
- Initialize all Git submodules (LibAMM, SAGE-DB-Bench, sageData)
- Check environment and dependencies
- Display submodule status
Why install pre-commit? Pre-commit hooks automatically check code quality (formatting, import sorting, linting) before each commit, preventing CI/CD failures.
Manual Installation
If you prefer manual setup:
# Clone repository
git clone https://github.com/intellistream/sage-benchmark.git
cd sage-benchmark
# Initialize submodules (direct level only, not recursive)
git submodule update --init
# Install package
pip install -e .
Or with development dependencies:
pip install -e ".[dev]"
Git Submodules
This repository uses Git submodules for external components:
- benchmark_amm (
src/sage/benchmark/benchmark_amm/) โ LibAMM - benchmark_anns (
src/sage/benchmark/benchmark_anns/) โ SAGE-DB-Bench - sage.data (
src/sage/data/) โ sageData
All submodules track the main-dev branch and must be initialized before use.
๐ RAG Benchmarking
The benchmark_rag module provides comprehensive RAG benchmarking capabilities:
RAG Implementations
Various RAG approaches for performance comparison:
Vector Databases:
- Milvus: Dense, sparse, and hybrid retrieval
- ChromaDB: Local vector database with simple setup
- FAISS: Efficient similarity search
Retrieval Methods:
- Dense retrieval (embeddings-based)
- Sparse retrieval (BM25, sparse vectors)
- Hybrid retrieval (combining dense + sparse)
- Multimodal fusion (text + image + video)
Quick Start
1. Build Vector Index
First, prepare your vector index:
# Build ChromaDB index (simplest)
python -m sage.benchmark.benchmark_rag.implementations.tools.build_chroma_index
# Or build Milvus dense index
python -m sage.benchmark.benchmark_rag.implementations.tools.build_milvus_dense_index
2. Run a RAG Pipeline
Test individual RAG pipelines:
# Dense retrieval with Milvus
python -m sage.benchmark.benchmark_rag.implementations.pipelines.qa_dense_retrieval_milvus
# Sparse retrieval
python -m sage.benchmark.benchmark_rag.implementations.pipelines.qa_sparse_retrieval_milvus
# Hybrid retrieval (dense + sparse)
python -m sage.benchmark.benchmark_rag.implementations.pipelines.qa_hybrid_retrieval_milvus
3. Run Benchmark Experiments
Execute full benchmark suite:
# Run comprehensive benchmark
python -m sage.benchmark.benchmark_rag.evaluation.pipeline_experiment
# Evaluate and generate reports
python -m sage.benchmark.benchmark_rag.evaluation.evaluate_results
4. View Results
Results are saved in benchmark_results/:
experiment_TIMESTAMP/- Individual experiment runsmetrics.json- Performance metricscomparison_report.md- Comparison report
๐ Quick Start
Basic Example
from sage.benchmark.benchmark_rag.implementations.pipelines import (
qa_dense_retrieval_milvus,
)
from sage.benchmark.benchmark_rag.config import load_config
# Load configuration
config = load_config("config_dense_milvus.yaml")
# Run RAG pipeline
results = qa_dense_retrieval_milvus.run_pipeline(query="What is SAGE?", config=config)
# View results
print(f"Retrieved {len(results)} documents")
for doc in results:
print(f"- {doc.content[:100]}...")
Run Custom Benchmark
from sage.benchmark.benchmark_rag.evaluation import PipelineExperiment
# Define experiment configuration
experiment = PipelineExperiment(
name="custom_rag_benchmark",
pipelines=["dense", "sparse", "hybrid"],
queries=["query1.txt", "query2.txt"],
metrics=["precision", "recall", "latency"],
)
# Run experiment
results = experiment.run()
# Generate report
experiment.generate_report(results)
Configuration
Configuration files are located in sage/benchmark/benchmark_rag/config/:
config_dense_milvus.yaml- Dense retrieval configurationconfig_sparse_milvus.yaml- Sparse retrieval configurationconfig_hybrid_milvus.yaml- Hybrid retrieval configurationconfig_qa_chroma.yaml- ChromaDB configuration
Experiment configurations in sage/benchmark/benchmark_rag/evaluation/config/:
experiment_config.yaml- Benchmark experiment settings
๐ Data
Test data is included in the package:
-
Benchmark Data (
benchmark_rag/data/):queries.jsonl- Sample queries for testingqa_knowledge_base.*- Knowledge base in multiple formats (txt, md, pdf, docx)sample/- Additional sample documents for testingsample/- Additional sample documents
-
Benchmark Config (
benchmark_rag/config/):experiment_config.yaml- RAG benchmark configurations
๐ง Development
Running Tests
pytest packages/sage-benchmark/
Code Formatting
# Format code
black packages/sage-benchmark/
# Lint code
ruff check packages/sage-benchmark/
๐ Documentation
For detailed documentation on each component:
- See
src/sage/benchmark/rag/README.mdfor RAG examples - See
src/sage/benchmark/benchmark_rag/README.mdfor benchmark details
๐ฎ Future Components
- benchmark_agent: Agent system performance benchmarking
- benchmark_anns: Approximate Nearest Neighbor Search benchmarking
- benchmark_llm: LLM inference performance benchmarking
๐ค Contributing
This package follows the same contribution guidelines as the main SAGE project. See the main
repository's CONTRIBUTING.md.
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Related Packages
- sage-kernel: Core computation engine for running benchmarks
- sage-libs: RAG components and utilities
- sage-middleware: Vector database services (Milvus, ChromaDB)
- sage-common: Common utilities and data types
๐ฎ Support
- Documentation: https://intellistream.github.io/SAGE-Pub/guides/packages/sage-benchmark/
- Issues: https://github.com/intellistream/SAGE/issues
- Discussions: https://github.com/intellistream/SAGE/discussions
Part of the SAGE Framework | Main Repository
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file isage_benchmark-0.1.1.1.tar.gz.
File metadata
- Download URL: isage_benchmark-0.1.1.1.tar.gz
- Upload date:
- Size: 97.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73dafca32d026d029b6068f2705f0e9168b264e7c2748cc1054754e5f21bdd0c
|
|
| MD5 |
a9c5d977a80212e40e0f83aa066b7ec4
|
|
| BLAKE2b-256 |
6444c906d1e3defc756a75114641ab4dbce9f70c0d14c8fd383d33c76ed40975
|
File details
Details for the file isage_benchmark-0.1.1.1-py2.py3-none-any.whl.
File metadata
- Download URL: isage_benchmark-0.1.1.1-py2.py3-none-any.whl
- Upload date:
- Size: 134.9 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76a5307371750bd082d033983ec468388c9558a2e563dd718994b14bcade2a7f
|
|
| MD5 |
3beb0e7123b38e79b6fc68bf3effd093
|
|
| BLAKE2b-256 |
4e347a64d5ddfd6e446ec5dc9d534936c350fe87c0935a17fba8861a5a6d1faa
|