Skip to main content

SAGE Benchmark - RAG and experimental benchmarks for SAGE framework

Project description

SAGE Benchmark

Comprehensive benchmarking tools and RAG examples for the SAGE framework

Python Version License

๐Ÿ“‹ Overview

SAGE Benchmark provides a comprehensive suite of benchmarking tools and RAG (Retrieval-Augmented Generation) examples for evaluating SAGE framework performance. This package enables researchers and developers to:

  • Benchmark RAG pipelines with multiple retrieval strategies (dense, sparse, hybrid)
  • Compare vector databases (Milvus, ChromaDB, FAISS) for RAG applications
  • Evaluate multimodal retrieval with text, image, and video data
  • Run reproducible experiments with standardized configurations and metrics

This package is designed for both research experiments and production system evaluation.

โœจ Key Features

  • Multiple RAG Implementations: Dense, sparse, hybrid, and multimodal retrieval
  • Vector Database Support: Milvus, ChromaDB, FAISS integration
  • Experiment Framework: Automated benchmarking with configurable experiments
  • Evaluation Metrics: Comprehensive metrics for RAG performance
  • Sample Data: Included test data for quick start
  • Extensible Design: Easy to add new benchmarks and retrieval methods

๐Ÿ“ฆ Package Structure

sage-benchmark/
โ”œโ”€โ”€ src/
โ”‚   โ””โ”€โ”€ sage/
โ”‚       โ””โ”€โ”€ benchmark/
โ”‚           โ”œโ”€โ”€ __init__.py
โ”‚           โ””โ”€โ”€ benchmark_rag/           # RAG benchmarking
โ”‚               โ”œโ”€โ”€ __init__.py
โ”‚               โ”œโ”€โ”€ implementations/     # RAG implementations
โ”‚               โ”‚   โ”œโ”€โ”€ pipelines/      # RAG pipeline scripts
โ”‚               โ”‚   โ”‚   โ”œโ”€โ”€ qa_dense_retrieval_milvus.py
โ”‚               โ”‚   โ”‚   โ”œโ”€โ”€ qa_sparse_retrieval_milvus.py
โ”‚               โ”‚   โ”‚   โ”œโ”€โ”€ qa_multimodal_fusion.py
โ”‚               โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚               โ”‚   โ””โ”€โ”€ tools/          # Supporting tools
โ”‚               โ”‚       โ”œโ”€โ”€ build_chroma_index.py
โ”‚               โ”‚       โ”œโ”€โ”€ build_milvus_dense_index.py
โ”‚               โ”‚       โ””โ”€โ”€ loaders/
โ”‚               โ”œโ”€โ”€ evaluation/          # Experiment framework
โ”‚               โ”‚   โ”œโ”€โ”€ pipeline_experiment.py
โ”‚               โ”‚   โ”œโ”€โ”€ evaluate_results.py
โ”‚               โ”‚   โ””โ”€โ”€ config/
โ”‚               โ”œโ”€โ”€ config/              # RAG configurations
โ”‚               โ””โ”€โ”€ data/                # Test data
โ”‚           # Future benchmarks:
โ”‚           # โ”œโ”€โ”€ benchmark_agent/      # Agent benchmarking
โ”‚           # โ””โ”€โ”€ benchmark_anns/       # ANNS benchmarking
โ”œโ”€โ”€ tests/
โ”œโ”€โ”€ pyproject.toml
โ””โ”€โ”€ README.md

๐Ÿš€ Installation

Quick Start (Recommended)

Clone the repository with submodules and set up development environment:

# 1. Clone repository
git clone --recurse-submodules https://github.com/intellistream/sage-benchmark.git
cd sage-benchmark

# Or if already cloned, initialize submodules
./quickstart.sh

# 2. Install package with development dependencies
pip install -e ".[dev]"

# 3. Install pre-commit hooks (IMPORTANT for contributors)
pre-commit install

The quickstart.sh script will automatically:

  • Initialize all Git submodules (LibAMM, SAGE-DB-Bench, sageData)
  • Check environment and dependencies
  • Display submodule status

Why install pre-commit? Pre-commit hooks automatically check code quality (formatting, import sorting, linting) before each commit, preventing CI/CD failures.

Manual Installation

If you prefer manual setup:

# Clone repository
git clone https://github.com/intellistream/sage-benchmark.git
cd sage-benchmark

# Initialize submodules (direct level only, not recursive)
git submodule update --init

# Install package
pip install -e .

Or with development dependencies:

pip install -e ".[dev]"

Git Submodules

This repository uses Git submodules for external components:

  • benchmark_amm (src/sage/benchmark/benchmark_amm/) โ†’ LibAMM
  • benchmark_anns (src/sage/benchmark/benchmark_anns/) โ†’ SAGE-DB-Bench
  • sage.data (src/sage/data/) โ†’ sageData

All submodules track the main-dev branch and must be initialized before use.

๐Ÿ“Š RAG Benchmarking

The benchmark_rag module provides comprehensive RAG benchmarking capabilities:

RAG Implementations

Various RAG approaches for performance comparison:

Vector Databases:

  • Milvus: Dense, sparse, and hybrid retrieval
  • ChromaDB: Local vector database with simple setup
  • FAISS: Efficient similarity search

Retrieval Methods:

  • Dense retrieval (embeddings-based)
  • Sparse retrieval (BM25, sparse vectors)
  • Hybrid retrieval (combining dense + sparse)
  • Multimodal fusion (text + image + video)

Quick Start

1. Build Vector Index

First, prepare your vector index:

# Build ChromaDB index (simplest)
python -m sage.benchmark.benchmark_rag.implementations.tools.build_chroma_index

# Or build Milvus dense index
python -m sage.benchmark.benchmark_rag.implementations.tools.build_milvus_dense_index

2. Run a RAG Pipeline

Test individual RAG pipelines:

# Dense retrieval with Milvus
python -m sage.benchmark.benchmark_rag.implementations.pipelines.qa_dense_retrieval_milvus

# Sparse retrieval
python -m sage.benchmark.benchmark_rag.implementations.pipelines.qa_sparse_retrieval_milvus

# Hybrid retrieval (dense + sparse)
python -m sage.benchmark.benchmark_rag.implementations.pipelines.qa_hybrid_retrieval_milvus

3. Run Benchmark Experiments

Execute full benchmark suite:

# Run comprehensive benchmark
python -m sage.benchmark.benchmark_rag.evaluation.pipeline_experiment

# Evaluate and generate reports
python -m sage.benchmark.benchmark_rag.evaluation.evaluate_results

4. View Results

Results are saved in benchmark_results/:

  • experiment_TIMESTAMP/ - Individual experiment runs
  • metrics.json - Performance metrics
  • comparison_report.md - Comparison report

๐Ÿ“– Quick Start

Basic Example

from sage.benchmark.benchmark_rag.implementations.pipelines import (
    qa_dense_retrieval_milvus,
)
from sage.benchmark.benchmark_rag.config import load_config

# Load configuration
config = load_config("config_dense_milvus.yaml")

# Run RAG pipeline
results = qa_dense_retrieval_milvus.run_pipeline(query="What is SAGE?", config=config)

# View results
print(f"Retrieved {len(results)} documents")
for doc in results:
    print(f"- {doc.content[:100]}...")

Run Custom Benchmark

from sage.benchmark.benchmark_rag.evaluation import PipelineExperiment

# Define experiment configuration
experiment = PipelineExperiment(
    name="custom_rag_benchmark",
    pipelines=["dense", "sparse", "hybrid"],
    queries=["query1.txt", "query2.txt"],
    metrics=["precision", "recall", "latency"],
)

# Run experiment
results = experiment.run()

# Generate report
experiment.generate_report(results)

Configuration

Configuration files are located in sage/benchmark/benchmark_rag/config/:

  • config_dense_milvus.yaml - Dense retrieval configuration
  • config_sparse_milvus.yaml - Sparse retrieval configuration
  • config_hybrid_milvus.yaml - Hybrid retrieval configuration
  • config_qa_chroma.yaml - ChromaDB configuration

Experiment configurations in sage/benchmark/benchmark_rag/evaluation/config/:

  • experiment_config.yaml - Benchmark experiment settings

๐Ÿ“– Data

Test data is included in the package:

  • Benchmark Data (benchmark_rag/data/):

    • queries.jsonl - Sample queries for testing
    • qa_knowledge_base.* - Knowledge base in multiple formats (txt, md, pdf, docx)
    • sample/ - Additional sample documents for testing
    • sample/ - Additional sample documents
  • Benchmark Config (benchmark_rag/config/):

    • experiment_config.yaml - RAG benchmark configurations

๐Ÿ”ง Development

Running Tests

pytest packages/sage-benchmark/

Code Formatting

# Format code
black packages/sage-benchmark/

# Lint code
ruff check packages/sage-benchmark/

๐Ÿ“š Documentation

For detailed documentation on each component:

  • See src/sage/benchmark/rag/README.md for RAG examples
  • See src/sage/benchmark/benchmark_rag/README.md for benchmark details

๐Ÿ”ฎ Future Components

  • benchmark_agent: Agent system performance benchmarking
  • benchmark_anns: Approximate Nearest Neighbor Search benchmarking
  • benchmark_llm: LLM inference performance benchmarking

๐Ÿค Contributing

This package follows the same contribution guidelines as the main SAGE project. See the main repository's CONTRIBUTING.md.

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ”— Related Packages

  • sage-kernel: Core computation engine for running benchmarks
  • sage-libs: RAG components and utilities
  • sage-middleware: Vector database services (Milvus, ChromaDB)
  • sage-common: Common utilities and data types

๐Ÿ“ฎ Support


Part of the SAGE Framework | Main Repository

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

isage_benchmark-0.1.0.8.tar.gz (97.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

isage_benchmark-0.1.0.8-py2.py3-none-any.whl (134.5 kB view details)

Uploaded Python 2Python 3

File details

Details for the file isage_benchmark-0.1.0.8.tar.gz.

File metadata

  • Download URL: isage_benchmark-0.1.0.8.tar.gz
  • Upload date:
  • Size: 97.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.11

File hashes

Hashes for isage_benchmark-0.1.0.8.tar.gz
Algorithm Hash digest
SHA256 361d7a03e5be02f5f06f63d9fa3210eb9cf0abfeb93e4761117d7c9f6244680f
MD5 bd03c26c17397bf24f5d137a3cdb14ff
BLAKE2b-256 d19fa91c2225ebe03fd75339a6bc6ee36e6ae742223379ec387910c2a48eaf64

See more details on using hashes here.

File details

Details for the file isage_benchmark-0.1.0.8-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for isage_benchmark-0.1.0.8-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 c992978cb154b0bbb39a922f3d02b7ea2a32188044bf557d515862fd380e5d57
MD5 7c3e64b189ff4ef317ec57894f82543b
BLAKE2b-256 bdfb1f6de694b637b545bb4ba8800dd5d79fac14bf629850fbdd88bda59dc425

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page