Skip to main content

Analytical vector database storage estimator

Project description

vector-db-sizer

What it is

vector-db-sizer is an analytical CLI estimator for vector database disk and RAM sizing.

When to use it

Use it for fast pre-implementation sizing work, such as:

  • early architecture decisions;
  • comparing vector dimensions;
  • comparing engines;
  • comparing index types;
  • estimating metadata/payload impact;
  • generating Markdown/CSV/JSON artifacts for architecture discussions.

What it does not do

  • No live database connections.
  • No ingestion or load execution.
  • No latency/recall benchmarking.
  • No pricing calculations.
  • No production guarantee.

Quick start

uv sync
uv run vector-db-sizer validate examples/qdrant_text_hnsw.yaml
uv run vector-db-sizer estimate examples/qdrant_text_hnsw.yaml --format markdown --out report.md
uv run vector-db-sizer estimate examples/multi_scenario.yaml --format csv --out comparison.csv

Input YAML

name: qdrant_text_hnsw

dataset:
  source_type: text
  total_tokens: 50000000
  chunk_tokens: 512
  chunk_overlap: 64

embedding:
  kind: dense
  dimensions: 1536
  dtype: float32

database:
  engine: qdrant
  index_type: hnsw

Single-scenario example

uv run vector-db-sizer estimate examples/qdrant_text_hnsw.yaml --format markdown

Multi-scenario example

uv run vector-db-sizer estimate examples/multi_scenario.yaml --format csv
uv run vector-db-sizer estimate examples/multi_scenario.yaml --format json

Output formats

  • json (machine-readable)
  • markdown (human report)
  • csv (comparison table)

Supported engines

  • generic
  • pgvector
  • qdrant
  • milvus
  • elasticsearch
  • opensearch
  • weaviate
  • pinecone

How to interpret the report

  • Raw vectors: uncompressed/base vector bytes.
  • Quantized vectors: additional quantized representation when modeled.
  • Record payload: IDs + metadata/text/provenance payload bytes.
  • Index disk: index structure bytes on disk.
  • Engine overhead: engine/profile-level overhead approximation.
  • Final disk estimate: replicated storage plus WAL/snapshot/safety factors.
  • Final RAM estimate: vectors + payload + index + overhead RAM approximation.
  • Warnings: profile caveats and scenario assumptions to review.
  • Confidence: per-component confidence levels for planning.

Confidence levels

  • high: formulaic or type-level estimate.
  • medium: useful engineering approximation.
  • low: heuristic and engine-dependent; validate with pilot load.

Production sizing warning

The estimates are analytical and should be calibrated with a representative pilot load before production capacity planning.

Development

uv sync
uv run pytest
uv run ruff check .

Current limitations

  • Engine profiles are approximate.
  • No vendor pricing model.
  • No actual DB measurements from live systems.
  • No latency/recall estimation.
  • No automatic database selection.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vector_db_sizer-0.1.0.tar.gz (39.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vector_db_sizer-0.1.0-py3-none-any.whl (24.3 kB view details)

Uploaded Python 3

File details

Details for the file vector_db_sizer-0.1.0.tar.gz.

File metadata

  • Download URL: vector_db_sizer-0.1.0.tar.gz
  • Upload date:
  • Size: 39.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vector_db_sizer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 71cb8b74976bb42e5111f5d8a0d2d518678223bbdaa147f6673f0555c291f136
MD5 c9c5f64c09c95c2a970bea30738ce131
BLAKE2b-256 faf0bd260fed695623ceebd17b61a9b91526f079eaa0a0606a3e7ead51480e9f

See more details on using hashes here.

File details

Details for the file vector_db_sizer-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: vector_db_sizer-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 24.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vector_db_sizer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8bad0f1c9a65061c8a25a81cdf775aa3c1efcf8c9a09790eba74b9bee9749e46
MD5 5217bd706153ec515f48af45fb476aa9
BLAKE2b-256 50a775bb92e35b69310730f3844eff546cf1f861a1ec28e821604429637010da

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page