Skip to main content

Finds close numerical matches across two arrays.

Project description

Close Numerical Matches

PyPI - Downloads PyPI - Version GitHub Workflow Status (branch) CodeFactor Grade GitHub issues GitHub license PyPI - Python Version PRs welcome

This package finds close numerical matches fast across two 2D arrays of shape (n, d) and (m, d) (if it can be assumed there will be relatively few matches and d is relatively low). Returns the indices of the matches.

Installation

You can install close-numerical-matches from PyPI:

$ pip install close-numerical-matches

The package is supported on Python 3.7 and above and requires Numpy.

How to use

Import find_matches from close_numerical_matches and supply two arrays of shape (n, d) and (m, d) and a given tolerance level. Optionally provide your desired distance metric and a bucket tolerance multiplier. The arguments in more detail:

  • arr0 : np.ndarray First array to find matches against. Should be of size (n, d).
  • arr1 : np.ndarray Second array to find matches against. Should be of size (m, d).
  • dist : {'norm', 'max'} or Callable[[np.ndarray], np.ndarray], default='norm' Distance metric to calculate distance. 'norm' and 'max' are currently supported. If you want some other distance function, you can supply your own function. It should take an (n, d) array as argument and return an (n,) array.
  • tol : float, default=0.1 The tolerance where values are considered the similar enough to count as a match. Should be > 0.
  • bucket_tol_mult : int, default=2 The tolerance multiplier to use for assigning buckets. Can in some instances make algorithm faster to tweak this. Should never be less than 1.

Example

>>> import numpy as np
>>> from close_numerical_matches import find_matches
>>> arr0 = np.array([[25, 24], [50, 50], [25, 26]])
>>> arr1 = np.array([[25, 23], [25, 25], [50.6, 50.6], [60, 60]])
>>> find_matches(arr0, arr1, tol=1.0001)
array([[0, 0], [0, 1], [1, 2], [2, 1]])
>>> find_matches(arr0, arr1, tol=0.9999)
array([[1, 2]])
>>> find_matches(arr0, arr1, tol=0.60001)
array([], dtype=int64)
>>> find_matches(arr0, arr1, tol=0.60001, dist='max')
array([[1, 2]])
>>> manhatten_dist = lambda arr: np.sum(np.abs(arr), axis=1)
>>> matches = find_matches(arr0, arr1, tol=1.0001, dist=manhatten_dist)
>>> matches
array([[0, 0], [0, 1], [2, 1]])
>>> indices0, indices1 = matches.T
>>> arr0[indices0]
array([[25, 24], [25, 24], [25, 26]])

More examples can be found in the test cases.

How fast is it?

Here is an unscientific example:

from timeit import default_timer as timer
import numpy as np
from close_numerical_matches import naive_find_matches, find_matches

arr0 = np.random.rand(320_000, 2)
arr1 = np.random.rand(44_000, 2)

start = timer()
naive_find_matches(arr0, arr1, tol=0.001)
end = timer()
print(end - start)  # 255.335 s

start = timer()
find_matches(arr0, arr1, tol=0.001)
end = timer()
print(end - start)  # 5.821 s

How it works

Instead of comparing every element in the first array against every element in the second array, resulting in an O(nmd) runtime, all elements are at first assigned to buckets so only elements that are relatively close are compared. In the case of relatively few matches and a low dimensionality d, this cuts the runtime down to almost linear O((n + m)d).

In general, the algorithm runtime of the bucket approach is O((n + m)d + Bd³ + ∑_{b ∈ B} n_b m_b) where B is the number of buckets and n_b and m_b are the number of items assigned to bucket b. As can be seen, it scales bad with dimensionality and also does not improve from the naive approach if all elements are assigned to the same bucket. In case the bucket approach is likely to be slower than the naive approach, this library will fall back to the naive approach.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

close_numerical_matches-0.1.4-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

close_numerical_matches-0.1.4-cp311-cp311-win_amd64.whl (51.4 kB view details)

Uploaded CPython 3.11 Windows x86-64

close_numerical_matches-0.1.4-cp311-cp311-win32.whl (45.6 kB view details)

Uploaded CPython 3.11 Windows x86

close_numerical_matches-0.1.4-cp311-cp311-musllinux_1_1_x86_64.whl (135.4 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

close_numerical_matches-0.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (140.7 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

close_numerical_matches-0.1.4-cp311-cp311-macosx_10_9_x86_64.whl (78.0 kB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

close_numerical_matches-0.1.4-cp310-cp310-win_amd64.whl (51.7 kB view details)

Uploaded CPython 3.10 Windows x86-64

close_numerical_matches-0.1.4-cp310-cp310-win32.whl (45.8 kB view details)

Uploaded CPython 3.10 Windows x86

close_numerical_matches-0.1.4-cp310-cp310-musllinux_1_1_x86_64.whl (137.8 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

close_numerical_matches-0.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (142.5 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

close_numerical_matches-0.1.4-cp310-cp310-macosx_10_9_x86_64.whl (79.3 kB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

close_numerical_matches-0.1.4-cp39-cp39-win_amd64.whl (51.7 kB view details)

Uploaded CPython 3.9 Windows x86-64

close_numerical_matches-0.1.4-cp39-cp39-win32.whl (45.8 kB view details)

Uploaded CPython 3.9 Windows x86

close_numerical_matches-0.1.4-cp39-cp39-musllinux_1_1_x86_64.whl (137.7 kB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

close_numerical_matches-0.1.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (142.4 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

close_numerical_matches-0.1.4-cp39-cp39-macosx_10_9_x86_64.whl (79.3 kB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

close_numerical_matches-0.1.4-cp38-cp38-win_amd64.whl (51.6 kB view details)

Uploaded CPython 3.8 Windows x86-64

close_numerical_matches-0.1.4-cp38-cp38-win32.whl (45.8 kB view details)

Uploaded CPython 3.8 Windows x86

close_numerical_matches-0.1.4-cp38-cp38-musllinux_1_1_x86_64.whl (136.4 kB view details)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

close_numerical_matches-0.1.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (140.2 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

close_numerical_matches-0.1.4-cp38-cp38-macosx_10_9_x86_64.whl (78.4 kB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

File details

Details for the file close_numerical_matches-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a5a8c25ed75eba7b94ff8dc15f5123c62f842ed4d747acfc3fa2df1cd0bb4425
MD5 547cbb074addc5611cd6283796731fe1
BLAKE2b-256 e84448837e6645fc8d90c3a5477708932b26d0568a1e728bcd813b8a6a42fe83

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.1.4-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.1.4-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 b2d3cea248c99a03bfd0b8db52b923fa1fb0ff793d6add6d99b67272963d8f43
MD5 1104482ba3b0cc69c6121694f9adc1a4
BLAKE2b-256 eaf46f7137e42003dd9c243805d188af9917a932b6cddcec427b3e9f0c97d93d

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.1.4-cp311-cp311-win32.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.1.4-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 75241d8f3341a8d82af5f3ed1f85016bca7279f218b1577a68b82007b0ce8b3b
MD5 c6285790d7935e98228d07950f7d327b
BLAKE2b-256 40735fed41d78cb4d7f0441684fcd4f6429579966319c65d9810861bbfe90310

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.1.4-cp311-cp311-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.1.4-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 644c0e40375f251ad8ae2e5e7556eaf1088538b0f3c219a248a7ced3ec5453e1
MD5 22280c97a71a8fda60a357ecd4636e9e
BLAKE2b-256 6e62b0462dd888990e4c5ee9c99f3081580527fbd05ee1e02c6bd5e8d9388ada

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 75a65d462ca687aeb2ea0cc9d27a065da984e23c5fb248f2663a51cf58bcdf74
MD5 a98cbffe4beb0c5716096ae93313bff7
BLAKE2b-256 dbc7846df21ae166c4cf03686cfc94a74ce4a4ba7ab9b5a706706389e759dee8

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.1.4-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.1.4-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 03e8eef314f806b79cef606bc3d667fd7fcbece70371c407b1328dee78a70ac2
MD5 af367e4dac7c5b6091d8b525103d1244
BLAKE2b-256 65b11d61823bde77e7bf2216e4aec6419ef72d4ee8e5e1f20b340bef0fa9860d

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.1.4-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.1.4-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 7b3b0a8a33aad471cc1f4d6413a3b43ee9d3cb0a522ac625fc1d244828fb3254
MD5 8199e4045d2d5d18eff17e312068ef97
BLAKE2b-256 1b666185dcd934cda53782f2190972111d43837daf0a3a64bf7ef48a8dc65a72

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.1.4-cp310-cp310-win32.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.1.4-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 8c68c898aa0bfa8becf7178ee0212d6da0db09040b5261424c59df09175b7019
MD5 baa56efda6888c3f187862982d07495e
BLAKE2b-256 610c542e1fe6fdfd38281fb138f8f791c82fec0289b977734e8c877637104cf5

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.1.4-cp310-cp310-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.1.4-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 7807778edb9a7b8118d48f688d999f801562b154eb9001f8b27058475c3f5c52
MD5 3e0c12665ffe9556d91382b0bf395667
BLAKE2b-256 a2f3ca5ad4f066bf2de49d3cd265b292d34f577109af28af300fbb5f432afd78

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d47da209d96e38b6fc997b1e2ad70c93972340d3d3cfaea330cd3d9ebd1930e8
MD5 7afe77bd326e0550d6f5790c89e18ec5
BLAKE2b-256 a9ff10b3aeb7ba2562da4fe23721ee2a85b096d661841b6837545006cc2ac485

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.1.4-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.1.4-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 33109bcba93142f1b747f9753dec86c1dbf69ca6c01112575a8a0cf2e17eeb29
MD5 62ca8bc6ef026da022d6ac0802e9138d
BLAKE2b-256 7ad0e5837bf69dc579ad52da648cfd3a1b419043c3a1de3863bd0adbaec807f8

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.1.4-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.1.4-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 112f3fe024f3fe553bdc38b4bfd459a05bef45586a04b143470096e714c134c1
MD5 f931bfeba12ac236df1fcea2031354d2
BLAKE2b-256 ac31ae237d0331c677e87c39eb17499971842663f37bf71b181ebc959eb3f6a1

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.1.4-cp39-cp39-win32.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.1.4-cp39-cp39-win32.whl
Algorithm Hash digest
SHA256 a24e7e19f34364841cb54e273afd4ad5c69ae7f05d7507dc8d029a9fded10524
MD5 3f43c6b64d291dffa4ae099b24c4c4e8
BLAKE2b-256 30d5ece5d71884463fb2ddc492b2022500b07933df7c9d0199230c8f32e2de44

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.1.4-cp39-cp39-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.1.4-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 882bf18581a31810fcb1410b5b70792a0f3b6eef59fb3e49c104d24e77654162
MD5 066ba74e73a9c9cb4ca05cce947cc293
BLAKE2b-256 4a62ad2b89809817d904e668fe449dfe0ae7c1e655489d4f51a92f7eb7b23c35

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.1.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.1.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b983392925517c75a2646bf8fdd0190e7edef79a54123783147124664807c4fd
MD5 135ee8a5797a35d9125c16b7bc3f5789
BLAKE2b-256 124286920cb03a4a3ffc1909dff3ce210197481ecfd4e42003a0f5e31bd45e13

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.1.4-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.1.4-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 36990b90528cfad1a15040437a64fc5feb27072fa2dd3049ddd58c2c1d01c6f0
MD5 31ced17d81230ece5a0472dc6e79ab73
BLAKE2b-256 a56d6db010c9c560b7c76dcf569ec09ea5a68e9fac412a013173a534d1b5b414

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.1.4-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.1.4-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 55ad9959970569ab906b598d1dea83190e025b6431e6315e34340af2ba82c0d3
MD5 c2edca16a09c3bdb464830321870df72
BLAKE2b-256 24f0c1e60733259c2f045b9a3c46f46d07a64dfa76aa5ec58ef57eb20eb3d44c

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.1.4-cp38-cp38-win32.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.1.4-cp38-cp38-win32.whl
Algorithm Hash digest
SHA256 72cd9feeab4604e37e64ad1fac6fc51f8f5a73d81b584a331397530f867b8e86
MD5 087182a2fe329d05441ea743dad77cdb
BLAKE2b-256 f900fac8b1284311251f7e8fbda9ac0a2ef3aa47f848b11d991a3babeff5859e

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.1.4-cp38-cp38-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.1.4-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 7506d4c42c6aa05b4adbf36420bea755e864706c3fe219eab7c92ddc0e5d3754
MD5 790f7152a57fb504099564fb5a25a743
BLAKE2b-256 d3c30061db33b455303a5de8a0cd276673fe9186c6c73ad665a7fd67c6edb612

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.1.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.1.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5d588345bd5ef0cc89beabb5b819451043510e611d97419c08a5fc021bcfef7f
MD5 f211d2cfb889c8efc24bb28721b5eb10
BLAKE2b-256 946797660828a1df0add9a221ca43ed7b69f7329c192e289922ac0c1622d3946

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.1.4-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.1.4-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 356362a590e57fb2b3b96ef97563b5998b03a2a8f892cd41de225af44a64bec9
MD5 0ff53fb7eddc32a479b50bcd4da9380f
BLAKE2b-256 e74ac548c7b7c6192fe4480b8a45bed4992d246c778c4a42271bd610146af4c4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page