Skip to main content

iNatInqPerf is a benchmark implemented to evalaute performance vs. cost trade-offs of running NLP based search (like INQUIRE) in a platform with different vectorDBs (like INaturalists).

Project description

Project: License

Package: PyPI - Python Version PyPI - Version PyPI - Downloads

Development: uv CI Code Coverage GitHub commit activity

Contents

Overview

This project provides a modular benchmark pipeline for experimenting with different vector databases (FAISS, Qdrant, …).
It runs end-to-end:

  1. Download → Hugging Face dataset (optionally export images + manifest)
  2. Embed → Generate CLIP embeddings for images
  3. Build → Construct indexes with multiple VectorDBs
  4. Search → Profile queries (latency + Recall@K vs exact baseline)
  5. Update → Test insertions & deletions (index maintenance)

All steps are run with uv as the package manager.

How to use iNatInqPerf

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Setup environment
uv venv .venv && source .venv/bin/activate
uv sync

# Run an end-to-end benchmark (FAISS IVF+PQ vectordb) on the INQUIRE dataset.
uv run python scripts/run_benchmark.py configs/inquire_benchmark.yaml

# Spin up a 3-node Weaviate cluster (shared Docker network + RAFT) and run the benchmark.
uv run python scripts/run_benchmark.py configs/inquire_benchmark_weaviate_cluster.yaml

# Spin up a 3-node Qdrant cluster (HTTP+gRPC+p2p) and run the benchmark.
uv run python scripts/run_benchmark.py configs/inquire_benchmark_qdrant_cluster.yaml

Distributed VectorDB Deployments

  • Benchmark-managed clusters. The configs/inquire_benchmark_weaviate_cluster.yaml and configs/inquire_benchmark_qdrant_cluster.yaml files include the container descriptions that container_context will launch automatically before each run. Make sure no identically named containers are already running, otherwise Docker will raise a name-conflict error.

The benchmarking code will

  1. Download the specified dataset from the HuggingFace website.
  2. Embed the images using a CLIP model.
  3. Build a vector database index.
  4. Perform a search for given queries to obtain query latency, and compute Recall@K vs FAISS Flat baseline..
  5. Update the index.

Dataset Output Structure

data/raw/
  dataset_info.json
  state.json
  data-00000-of-00001.arrow
  images/
    00000000.jpg
    00000001.jpg
    ...
  images/manifest.csv   # [index,filename,label]

Supported Vector Databases

  • faiss.flat (exact)
  • faiss.ivfpq (IVF + OPQ + PQ)

Profiling Outputs

  • Latency statistics (avg, p50, p95)
  • Recall@K vs baseline
  • JSON metrics in .results/

Profiling with py-spy

Use py-spy to record flamegraphs during any step:

bash scripts/pyspy_run.sh search-faiss -- python src/inatinqperf/benchmark/benchmark.py search --vectordb faiss.ivfpq --hf_dir data/emb_hf --topk 10 --queries src/inatinqperf/benchmark/queries.txt

Outputs:

  • .results/search-faiss.svg (flamegraph)
  • .results/search-faiss.speedscope.json

Installation

Installation Method Command
Via uv uv add inatinqperf
Via pip pip install inatinqperf

Development

Please visit Contributing and Development for information on contributing to this project.

Additional Information

Additional information can be found at these locations.

Title Document Description
Code of Conduct CODE_OF_CONDUCT.md Information about the norms, rules, and responsibilities we adhere to when participating in this open source community.
Contributing CONTRIBUTING.md Information about contributing to this project.
Development DEVELOPMENT.md Information about development activities involved in making changes to this project.
Governance GOVERNANCE.md Information about how this project is governed.
Maintainers MAINTAINERS.md Information about individuals who maintain this project.
Security SECURITY.md Information about how to privately report security issues associated with this project.

License

iNatInqPerf is licensed under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inatinqperf-0.1.108.tar.gz (307.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

inatinqperf-0.1.108-py3-none-any.whl (38.4 kB view details)

Uploaded Python 3

File details

Details for the file inatinqperf-0.1.108.tar.gz.

File metadata

  • Download URL: inatinqperf-0.1.108.tar.gz
  • Upload date:
  • Size: 307.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for inatinqperf-0.1.108.tar.gz
Algorithm Hash digest
SHA256 d34f21fe97f500b6b71e6cde2e5c394c9c27162a799f38d06c0ac4bb77ee736f
MD5 444eee913774dafe2b1500b2ede129a5
BLAKE2b-256 f1dc1717770bf2996c67b042f91f060105d4418b4b2272ef5ba6b58b1a628e3b

See more details on using hashes here.

File details

Details for the file inatinqperf-0.1.108-py3-none-any.whl.

File metadata

  • Download URL: inatinqperf-0.1.108-py3-none-any.whl
  • Upload date:
  • Size: 38.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for inatinqperf-0.1.108-py3-none-any.whl
Algorithm Hash digest
SHA256 59fd06b9095249d32ef22c6e93d3c34b5cd00d7a4fa195efe33483549b427f37
MD5 717fe32ed737ac38a38394fc9b138afc
BLAKE2b-256 6fefd0274068cf5c6b74b68628ece20fcae8ae8308d88ba3f0c6f259263e83f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page