Skip to main content

PatANN is a massively parallel, distributed, and scalable in-memory/on-disk vector database library for efficient nearest neighbor search across large-scale datasets by finding vector patterns.

Project description

PatANN - Pattern-Aware Vector Database / ANN

Overview

PatANN is a massively parallel, distributed, and scalable vector database library for efficient nearest neighbor search across large-scale datasets by finding vector patterns.

PatANN leverages pattern probing for searching which is a fundamental shift from conventional vector search methodologies. Pattern probing is a preliminary filtering mechanism that examines vector patterns before applying computationally expensive distance metrics. This two-phase approach significantly reduces the search space by quickly identifying potential matches based on pattern similarity rather than calculating exact distances.

While still in beta, PatANN outperforms existing solutions including HNSW, Google ScaNN, Microsoft DiskANN, and Facebook FAISS by large margin.

How Pattern Probing Works

In it's simplest form, when vectors are indexed in PatANN, the system extracts characteristic patterns from each vector. These patterns represent distinctive features or signatures that can be compared much more efficiently than full vector comparisons. These extracted patterns are then hashed and organized into specialized data structures that allow for rapid lookup and comparison.

During a query, PatANN first examines these pattern hashes to identify a subset of vectors that share similar patterns with the query vector. Only after this preliminary filtering does PatANN apply traditional distance metrics (Euclidean, cosine, etc.) to this smaller candidate set.

However, the actual implementation is more involved. Vectors are encoded at multiple resolutions, capturing both macro and micro patterns within the data. This multi-scale approach ensures that both broad similarities and fine details are considered. The patterns are hashed to maintain locality of reference, minimizing cross-shard communication during searches.

The PatANN dynamically selects which patterns to prioritize based on the distribution characteristics of the vector space, optimizing for the specific dataset. Also, PatANN employs probabilistic matching rather than exact pattern match to get massive speed advantage while maintaining high recall.

Performance Implications

This pattern-first approach results in massive performance advantages.

By filtering candidates based on patterns before computing exact distances, PatANN drastically reduces the number of expensive distance calculations.

For disk-based operations, pattern probing allows PatANN to be more selective about which vectors to load from disk, minimizing I/O operations.

Pattern probing operations can be highly parallelized, taking advantage of modern CPU architectures and distributed computing environments. Also, as dataset size increases, the efficiency gains from pattern probing become more pronounced, making PatANN particularly effective for very large-scale vector databases.

Mathematical Foundation

The pattern probing approach is grounded in information theory and dimensionality reduction techniques. While traditional methods like locality-sensitive hashing (LSH) approximate similarity through random projections, PatANN's pattern probing uses a more structured approach that:

  1. Identifies statistically significant patterns in the vector space
  2. Leverages these patterns to create a hierarchical filtering system
  3. Dynamically adjusts the pattern sensitivity based on the density and distribution of the vector space

This mathematically rigorous foundation ensures that PatANN maintains high recall rates while achieving substantial speedups over conventional ANN implementations. By combining this pattern probing technique with traditional distance metrics in a tiered approach, PatANN achieves both speed and accuracy, representing a significant advancement in vector search technology.

Status

Beta Version: Currently uploaded for benchmarking purposes. Complete documentation and updates are under development. Not for production use yet.

Platforms

  • Beta Version: Restricted to Linux to prevent premature circulation of beta versions
  • Production Releases (late April 2025)*: Will support all platforms that are supported by mesibo

Key Features

  • Faster Index building and Searching
  • Supports both in-memory and on-disk operations
  • Dynamic sharding to load balance across servers
  • Refined search, filtering and pagination
  • Unlimited scalability without pre-specified capacity

Algorithmic Approach

  • Novel pattern-based probing technique for ANN search
  • Preliminary results show phenomenal performance in building index and searching
  • Potential slight variations in lower-end matching
  • Detailed research paper forthcoming

Contributions

We are seeking help to:

  • Run additional datasets. So far, all tested datasets (including self-generated) exhibit patterns that helps algorithm. We have yet to test datasets without clear patterns or with uniform distribution.
  • Validate and improve the algorithm

Contact

For support / questions, please contact: support@mesibo.com

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

patann-0.0.75.tar.gz (818.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

patann-0.0.75-py3-none-any.whl (824.4 kB view details)

Uploaded Python 3

File details

Details for the file patann-0.0.75.tar.gz.

File metadata

  • Download URL: patann-0.0.75.tar.gz
  • Upload date:
  • Size: 818.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for patann-0.0.75.tar.gz
Algorithm Hash digest
SHA256 9d809f7fea2f5cefb34f0c2c7f99c64f84ca0fb3cb317a7db1c421551412a06b
MD5 ae0b43ed9d13627198c020db5c645ccc
BLAKE2b-256 cfb3b3255c58bf52a2c72786847871446f0e3e00bd2e08016b07a4c6551c6cde

See more details on using hashes here.

File details

Details for the file patann-0.0.75-py3-none-any.whl.

File metadata

  • Download URL: patann-0.0.75-py3-none-any.whl
  • Upload date:
  • Size: 824.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for patann-0.0.75-py3-none-any.whl
Algorithm Hash digest
SHA256 f43fd5e31239a64969362ada3054b1853ce434494cb26b385df4bb662f44afac
MD5 2fc2ea880e95a6160f1e12fcd0c708bb
BLAKE2b-256 600eea7d042304f962e27b3581a69a37db81e025747bd5bd4f8a4a5d90163859

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page