Skip to main content

A simple ANN benchmark tools

Project description

ANNB: Approximate Nearest Neighbor Benchmark

PyPI Version

Note: This is a work in progress. The API/CLI is not stable yet.

Installation

pip install annb

# install vector search index/client you may need for benchmark
# e.g install faiss for run faiss index benchmark

Usage

CLI Usage

Run Benchmark

start first benchmark with a randome dataset.

Just run annb-test to start your first benchmark with a random dataset.

annb-test

It will produce a result like this:

❯ annb-test
... some logs ...

BenchmarkResult:
  attributes:
    query_args: [{'nprobe': 1}]
    topk: 10
    jobs: 1
    loop: 5
    step: 10
    name: Test
    dataset: .annb_random_d256_l2_1000.hdf5
    index: Test
    dim: 256
    metric_type: MetricType.L2
    index_args: {'index': 'ivfflat', 'nlist': 128}
    started: 2023-08-14 13:03:40

  durations:
    training: 1 items, 1000 total, 1490.03266ms
    insert: 1 items, 1000 total, 132.439627ms
    query:
      nprobe=1,recall=0.2173 -> 1000 items, 18.615083ms, 53719.878659686874qps, latency=0.18615083ms, p95=0.31939ms, p99=0.41488ms

This is a simple benchmark test with default index(faiss) with random l2 dataset. If you wants to generate more data or with some different specifications for the dataset, you could see below options:

  • --index-dim The dimension of the index, default is 256
  • --index-metric-type Index metric type, l2 or ip, default is l2
  • --topk TOPK topk used for query, default is 10
  • --step STEP the query step, default annb will query 10 items per query, you could set it to 0 for query all items in one query (similar like batch for ann-benchmarks)
  • --batch batch mode, alias --step 0
  • --count COUNT the total number of items in the dataset, default is 1000
run benchmark with a specific dataset

You could also use ann-benchmarks's dataset to run benchmark. download them locally and run benchmark with --dataset option.

annb-test --dataset sift-128-euclidean.hdf5
run benchmark with query args

You mary benchmark with different query args, e.g. different nprobe for faiss ivfflat index. you could try --query-args option.

annb-test --query-args nprobe=10 --query-args nprobe=20

will output below result:

durations:
    training: 1 items, 1000 total, 1548.84968ms
    insert: 1 items, 1000 total, 143.402532ms
    query:
      nprobe=1,recall=0.2173 -> 1000 items, 20.074236ms, 49815.09632545916qps, latency=0.20074235999999998ms, p95=0.332276ms, p99=0.455525ms
      nprobe=10,recall=0.5221 -> 1000 items, 49.141931ms, 20349.2207092961qps, latency=0.49141931ms, p95=0.722628ms, p99=0.818012ms
      nprobe=20,recall=0.6861 -> 1000 items, 69.284072ms, 14433.331805324606qps, latency=0.69284072ms, p95=1.126946ms, p99=1.350359ms
run multiple benchmarks with config file

You may run multiple benchmarks with different index and dataset. you could use --run-file run benchmarks from a config file.

Below is a example config file:

config.yaml

default:
  index_factory: annb.anns.faiss.indexes.index_under_test_factory
  index_factory_args: {}
  index_name: Test
  dataset: gist-960-euclidean.hdf5
  topk: 10
  step: 10
  jobs: 1
  loop: 2
  result: output.pth

runs:
  - name: faiss-gist960-gpu-ivfflat
    index_args:
      gpu: yes
      index: ivfflat
      nlist: 1024
    query_args:
      - nprobe: 1
      - nprobe: 16
      - nprobe: 256
  - name: faiss-gist960-gpu-ivfpq8
    index_args:
      gpu: yes
      index: ivfpq
      nlist: 1024
    query_args:
      - nprobe: 1
      - nprobe: 16
      - nprobe: 256

Explanation for above config file:

  • The default section is the default config for all benchmarks.
  • The config keys are generally same as the options for annb-test command. e.g. index_factory is same as --index-factory.
  • You could define multiple benchmarks in runs section. and each run config will override the default config. In this example, we define use gist-960-euclidean.hdf5 as dataset, so it will use this dataset for all benchmarks. and we use different index and query args for each benchmark. for index_args, we use ivfflat(nlist=1024) and ivfpq(nlist=1024) as two benchmark series. and for query_args, we use nprobe=1,16,256 for each benchmark. That means we will run 6 benchmarks in total, each series will run 3 benchmarks with different nprobe.
  • The result will be saved to output.pth file by default setting. Actually, each benchmark series will save to a separate file. so in this example, we will get two files: output-1.pth and output-2.pth. you could use annb-report to view them.
more options

You could use annb-test --help to see more options.

 annb-test --help

Check Benchmark Results

The annb-report is use to view benchmark results as plain/csv text, or export them to Chart graphic.

annb-report --help
examples for view/export benchmark results

view benchmark results as plain text

annb-report output.pth

view benchmark results as csv text

annb-report output.pth --format csv

export benchmark results to chart graphic(multiple series)

annb-report output.pth --format png --output output.png output-1.pth output-2.pth

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

annb-0.1.19.tar.gz (25.5 kB view hashes)

Uploaded Source

Built Distribution

annb-0.1.19-py3-none-any.whl (28.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page