A simple ANN benchmark tools
Project description
ANNB: Approximate Nearest Neighbor Benchmark
Note: This is a work in progress. The API/CLI is not stable yet.
Installation
pip install annb
# install vector search index/client you may need for benchmark
# e.g install faiss for run faiss index benchmark
Usage
CLI Usage
Run Benchmark
start first benchmark with a randome dataset.
Just run annb-test
to start your first benchmark with a random dataset.
annb-test
It will produce a result like this:
❯ annb-test
... some logs ...
BenchmarkResult:
attributes:
query_args: [{'nprobe': 1}]
topk: 10
jobs: 1
loop: 5
step: 10
name: Test
dataset: .annb_random_d256_l2_1000.hdf5
index: Test
dim: 256
metric_type: MetricType.L2
index_args: {'index': 'ivfflat', 'nlist': 128}
started: 2023-08-14 13:03:40
durations:
training: 1 items, 1000 total, 1490.03266ms
insert: 1 items, 1000 total, 132.439627ms
query:
nprobe=1,recall=0.2173 -> 1000 items, 18.615083ms, 53719.878659686874qps, latency=0.18615083ms, p95=0.31939ms, p99=0.41488ms
This is a simple benchmark test with default index(faiss) with random l2 dataset. If you wants to generate more data or with some different specifications for the dataset, you could see below options:
- --index-dim The dimension of the index, default is 256
- --index-metric-type Index metric type, l2 or ip, default is l2
- --topk TOPK topk used for query, default is 10
- --step STEP the query step, default annb will query 10 items per query, you could set it to 0 for query all items in one query (similar like batch for ann-benchmarks)
- --batch batch mode, alias --step 0
- --count COUNT the total number of items in the dataset, default is 1000
run benchmark with a specific dataset
You could also use ann-benchmarks's dataset to run benchmark. download them locally and run benchmark with --dataset
option.
annb-test --dataset sift-128-euclidean.hdf5
run benchmark with query args
You mary benchmark with different query args, e.g. different nprobe for faiss ivfflat index. you could try --query-args
option.
annb-test --query-args nprobe=10 --query-args nprobe=20
will output below result:
durations:
training: 1 items, 1000 total, 1548.84968ms
insert: 1 items, 1000 total, 143.402532ms
query:
nprobe=1,recall=0.2173 -> 1000 items, 20.074236ms, 49815.09632545916qps, latency=0.20074235999999998ms, p95=0.332276ms, p99=0.455525ms
nprobe=10,recall=0.5221 -> 1000 items, 49.141931ms, 20349.2207092961qps, latency=0.49141931ms, p95=0.722628ms, p99=0.818012ms
nprobe=20,recall=0.6861 -> 1000 items, 69.284072ms, 14433.331805324606qps, latency=0.69284072ms, p95=1.126946ms, p99=1.350359ms
run multiple benchmarks with config file
You may run multiple benchmarks with different index and dataset. you could use --run-file
run benchmarks from a config file.
Below is a example config file:
config.yaml
default:
index_factory: annb.anns.faiss.indexes.index_under_test_factory
index_factory_args: {}
index_name: Test
dataset: gist-960-euclidean.hdf5
topk: 10
step: 10
jobs: 1
loop: 2
result: output.pth
runs:
- name: faiss-gist960-gpu-ivfflat
index_args:
gpu: yes
index: ivfflat
nlist: 1024
query_args:
- nprobe: 1
- nprobe: 16
- nprobe: 256
- name: faiss-gist960-gpu-ivfpq8
index_args:
gpu: yes
index: ivfpq
nlist: 1024
query_args:
- nprobe: 1
- nprobe: 16
- nprobe: 256
Explanation for above config file:
- The default section is the default config for all benchmarks.
- The config keys are generally same as the options for
annb-test
command. e.g.index_factory
is same as--index-factory
. - You could define multiple benchmarks in
runs
section. and each run config will override the default config. In this example, we define use gist-960-euclidean.hdf5 as dataset, so it will use this dataset for all benchmarks. and we use different index and query args for each benchmark. for index_args, we use ivfflat(nlist=1024) and ivfpq(nlist=1024) as two benchmark series. and for query_args, we use nprobe=1,16,256 for each benchmark. That means we will run 6 benchmarks in total, each series will run 3 benchmarks with different nprobe. - The result will be saved to output.pth file by default setting. Actually, each benchmark series will save to a separate file. so in this example, we will get two files:
output-1.pth
andoutput-2.pth
. you could useannb-report
to view them.
more options
You could use annb-test --help
to see more options.
❯ annb-test --help
Check Benchmark Results
The annb-report
is use to view benchmark results as plain/csv text, or export them to Chart graphic.
annb-report --help
examples for view/export benchmark results
view benchmark results as plain text
annb-report output.pth
view benchmark results as csv text
annb-report output.pth --format csv
export benchmark results to chart graphic(multiple series)
annb-report output.pth --format png --output output.png output-1.pth output-2.pth
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file annb-0.1.22.tar.gz
.
File metadata
- Download URL: annb-0.1.22.tar.gz
- Upload date:
- Size: 26.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a097b2ef2167844d5bf8544df0d905fd26e9131ea8728f560ac641ebcc74302a |
|
MD5 | c1eba54559b5c7b53538ee26cd0feddb |
|
BLAKE2b-256 | 21d8238cfeadea55fb9160abe3f57dee7767ebbd4659e7461839883cceed3442 |
File details
Details for the file annb-0.1.22-py3-none-any.whl
.
File metadata
- Download URL: annb-0.1.22-py3-none-any.whl
- Upload date:
- Size: 28.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f0fddae555bb880fb8eff8ec68a6b123cf88ebb79bf8f998a73b0a7cf72b803 |
|
MD5 | 4bfcc4e55745cf29536064a44da1e782 |
|
BLAKE2b-256 | b2876ac2a5fe95b75d50e45feb4f232e1e8ea2274f9520d294dd950d7d69dbe0 |