PatANN is a massively parallel, distributed, and scalable in-memory/on-disk vector database library for efficient nearest neighbor search across large-scale datasets by finding vector patterns.

These details have not been verified by PyPI

Project links

Project description

PatANN - Pattern-Aware Vector Database / ANN

Overview

PatANN is a pattern-aware, massively parallel, distributed, and scalable vector database algorithm and framework for efficient nearest neighbor search, operating both in-memory and on-disk.

Unlike conventional algorithms, PatANN extracts and hashes patterns from vectors, and use them for initial filtering before performing expensive distance computations. During a search, PatANN first examines these pattern hashes to identify a subset of vectors that share similar patterns with the query vector. Only after this preliminary filtering does PatANN apply traditional distance metrics (Euclidean, cosine, etc.) to this smaller candidate set.

However, the actual implementation is more involved. Vectors are encoded at multiple resolutions, capturing both macro and micro patterns within the data. This multi-scale approach ensures that both broad similarities and fine details are captured. The patterns are hashed to maintain locality of reference, minimizing cross-shard communication during searches. PatANN also uses recursive patterns to mitigate the curse of dimensionality and hubness in high-dimensional data. The system dynamically selects which patterns to prioritize based on the distribution characteristics of the vector space, optimizing for the specific dataset. Additionally, PatANN employs probabilistic matching rather than exact pattern matching to achieve massive speed advantages while maintaining high recall. A detailed research paper is forthcoming.

Performance Implications

While still in beta, this pattern-first, details-later approach results in significant performance advantages. PatANN outperforms conventional ANN libraries including HNSW, Google ScaNN, Microsoft DiskANN, and Facebook FAISS by a substantial margin, with superior recall and speed. Detailed benchmarks conducted using industry-standard ann-benchmarks are available at https://patann.dev.

By filtering candidates based on patterns before computing exact distances, PatANN drastically reduces the number of expensive distance calculations.

For disk-based operations, pattern probing allows PatANN to be more selective about which vectors to load from disk, minimizing I/O operations.

Pattern probing operations are highly parallelized, taking advantage of modern CPU architectures and distributed computing environments. Also, as dataset size increases, the efficiency gains from pattern probing become more pronounced, making PatANN particularly effective for very large-scale vector databases.

Mathematical Foundation

The pattern probing approach is grounded in information theory and dimensionality reduction techniques. While traditional methods like locality-sensitive hashing (LSH) approximate similarity through random projections, PatANN's pattern probing uses a more structured approach that:

Identifies statistically significant patterns in the vector space
Leverages these patterns to create a hierarchical filtering system
Dynamically adjusts the pattern sensitivity based on the density and distribution of the vector space

This mathematically rigorous foundation ensures that PatANN maintains high recall rates while achieving substantial speedups over conventional ANN implementations.

By combining this pattern probing technique with traditional distance metrics in a tiered approach, PatANN achieves both speed and accuracy, representing a significant advancement in vector search technology.

Platforms

Linux
macOS (Apple Silicon)
Windows
Android
iOS

Key Distinguishing Features

Novel pattern-based probing technique for ANN search
In-Memory, On-Disk and Hybrid Index
Refined search, filtering and pagination algorithm
Unlimited scalability without pre-specified capacity
Dynamic sharding to load balance across servers
Cloud (in-progress) and Serverless
SIMD-Accelerated for both x86_64 (SSE*, AVX2, AVX-512), and ARM (NEON, SVE) Platforms
OS-optimized I/O--huge (Linux), large (Windows), and super (macOS)
NUMA-aware architecture

Status

Beta Version: Currently in Beta. Not for production use yet.

Contributions

We are seeking help to:

Run additional datasets. So far, all tested datasets (including self-generated) exhibit patterns that helps algorithm. We have yet to test datasets without clear patterns or with uniform distribution.
Validate and improve the algorithm

Contact

For support / questions, please contact: support@mesibo.com

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.3

Apr 22, 2025

0.1.2

Apr 22, 2025

0.1.1

Apr 22, 2025

0.1.0

Apr 22, 2025

This version

0.0.86

Apr 18, 2025

0.0.85

Apr 17, 2025

0.0.84

Apr 17, 2025

0.0.83

Apr 12, 2025

0.0.82

Apr 11, 2025

0.0.81

Mar 28, 2025

0.0.80

Mar 28, 2025

0.0.78

Mar 28, 2025

0.0.77

Mar 27, 2025

0.0.76

Mar 27, 2025

0.0.75

Mar 27, 2025

0.0.74

Mar 21, 2025

0.0.72

Mar 20, 2025

0.0.71

Mar 19, 2025

0.0.70

Mar 19, 2025

0.0.68

Mar 18, 2025

0.0.67

Mar 18, 2025

0.0.66

Mar 18, 2025

0.0.65

Mar 13, 2025

0.0.63

Mar 13, 2025

0.0.62

Mar 12, 2025

0.0.61

Mar 12, 2025

0.0.60

Feb 24, 2025

0.0.59

Feb 24, 2025

0.0.58

Feb 24, 2025

0.0.57

Feb 24, 2025

0.0.56

Feb 22, 2025

0.0.55

Feb 22, 2025

0.0.54

Feb 22, 2025

0.0.53

Feb 22, 2025

0.0.52

Feb 22, 2025

0.0.51

Feb 22, 2025

0.0.50

Feb 22, 2025

0.0.49

Feb 22, 2025

0.0.48

Feb 21, 2025

0.0.47

Feb 21, 2025

0.0.46

Feb 21, 2025

0.0.45

Feb 13, 2025

0.0.44

Feb 13, 2025

0.0.43

Feb 13, 2025

0.0.42

Feb 11, 2025

0.0.41

Feb 11, 2025

0.0.40

Feb 10, 2025

0.0.39

Feb 10, 2025

0.0.38

Feb 5, 2025

0.0.37

Feb 5, 2025

0.0.36

Feb 4, 2025

0.0.35

Feb 4, 2025

0.0.34

Feb 4, 2025

0.0.33

Jan 31, 2025

0.0.32

Jan 31, 2025

0.0.31

Jan 31, 2025

0.0.30

Jan 31, 2025

0.0.29

Jan 31, 2025

0.0.28

Jan 30, 2025

0.0.27

Jan 30, 2025

0.0.26

Jan 30, 2025

0.0.25

Jan 30, 2025

0.0.24

Jan 30, 2025

0.0.23

Jan 29, 2025

0.0.22

Jan 29, 2025

0.0.21

Jan 29, 2025

0.0.20

Jan 29, 2025

0.0.19

Jan 29, 2025

0.0.18

Jan 29, 2025

0.0.17

Jan 29, 2025

0.0.16

Jan 28, 2025

0.0.15

Jan 28, 2025

0.0.14

Jan 28, 2025

0.0.13

Jan 28, 2025

0.0.12

Jan 28, 2025

0.0.10

Jan 28, 2025

0.0.9

Jan 28, 2025

0.0.8

Jan 24, 2025

0.0.7

Jan 24, 2025

0.0.6

Jan 24, 2025

0.0.5

Jan 24, 2025

0.0.4

Jan 24, 2025

0.0.3

Jan 24, 2025

0.0.2

Jan 24, 2025

0.0.1

Jan 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

patann-0.0.86.tar.gz (4.5 MB view details)

Uploaded Apr 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

patann-0.0.86-py3-none-any.whl (4.6 MB view details)

Uploaded Apr 18, 2025 Python 3

File details

Details for the file patann-0.0.86.tar.gz.

File metadata

Download URL: patann-0.0.86.tar.gz
Upload date: Apr 18, 2025
Size: 4.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for patann-0.0.86.tar.gz
Algorithm	Hash digest
SHA256	`e6f927c2d24cb697e2e12f770392c2a931e036809534830fece1b8daff388547`
MD5	`b611d1051fa8125aeb704bdacb7f8709`
BLAKE2b-256	`c32bee7b012998d27ea96b6d74b910bb4f9c5be1d0959a27dde61d63d891ee6b`

See more details on using hashes here.

File details

Details for the file patann-0.0.86-py3-none-any.whl.

File metadata

Download URL: patann-0.0.86-py3-none-any.whl
Upload date: Apr 18, 2025
Size: 4.6 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for patann-0.0.86-py3-none-any.whl
Algorithm	Hash digest
SHA256	`101d4f1102dc80485378bcafd98f1c212ae0fa4a3c0481e897f6920193b23487`
MD5	`518725761b42473e3233f4a9dfe95b0e`
BLAKE2b-256	`64208fb08c0cd61b6d8caddf3c48e0b98a592cc75cb1d7d03ea98148e9c9b74a`

See more details on using hashes here.

patann 0.0.86

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PatANN - Pattern-Aware Vector Database / ANN

Overview

Performance Implications

Mathematical Foundation

Platforms

Key Distinguishing Features

Status

Contributions

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes