Skip to main content

Finds close numerical matches across two arrays.

Project description

Close Numerical Matches

PyPI - Downloads PyPI - Version GitHub Workflow Status (branch) CodeFactor Grade GitHub issues GitHub license PyPI - Python Version PRs welcome

This package finds close numerical matches fast across two 2D arrays of shape (n, d) and (m, d) (if it can be assumed there will be relatively few matches and d is relatively low). Returns the indices of the matches.

Installation

You can install close-numerical-matches from PyPI:

$ pip install close-numerical-matches

The package is supported on Python 3.8 and above and requires Numpy.

How to use

Import find_matches from close_numerical_matches and supply two arrays of shape (n, d) and (m, d) and a given tolerance level. Optionally provide your desired distance metric and a bucket tolerance multiplier. The arguments in more detail:

  • arr0 : np.ndarray First array to find matches against. Should be of size (n, d).
  • arr1 : np.ndarray Second array to find matches against. Should be of size (m, d).
  • dist : {'norm', 'max', 'cos'} or Callable[[np.ndarray, np.ndarray], np.ndarray], default='norm' Distance metric to calculate distance. 'norm', 'max' and 'cos' are currently supported. If you want some other distance function, you can supply your own function. It should take two arrays: One array of size (n, d) of entries and another array of size (d,) of a particular entry. Should return an (n,) array of distances.
  • tol : float, default=0.1 The tolerance where values are considered the similar enough to count as a match. Should be > 0.
  • bucket_tol_mult : int, default=2 The tolerance multiplier to use for assigning buckets. By setting this lower, there are fewer potential close points that need to be checked (faster runtime), but you risk missing some match pairs. By setting it higher, you need to compare more potential matches, but the risk of missing any close pair is lower. For cosine distance, you might need to set this high. If you supply your own distance function, do take this parameter into consideration. Should never be less than 1.

Example

>>> import numpy as np
>>> from close_numerical_matches import find_matches
>>> arr0 = np.array([[25, 24], [50, 50], [25, 26]])
>>> arr1 = np.array([[25, 23], [25, 25], [50.6, 50.6], [60, 60]])
>>> find_matches(arr0, arr1, tol=1.0001)
array([[0, 0], [0, 1], [1, 2], [2, 1]])
>>> find_matches(arr0, arr1, tol=0.9999)
array([[1, 2]])
>>> find_matches(arr0, arr1, tol=0.60001)
array([], dtype=int64)
>>> find_matches(arr0, arr1, tol=0.60001, dist='max')
array([[1, 2]])
>>> find_matches([[0, 0.05]], [[0, 5], [0, -0.01]], tol=0.1, dist='cos')
array([[0, 0]])
>>> manhatten_dist = lambda arr, row: np.sum(np.abs(arr - row), axis=1)
>>> matches = find_matches(arr0, arr1, tol=1.0001, dist=manhatten_dist)
>>> matches
array([[0, 0], [0, 1], [2, 1]])
>>> indices0, indices1 = matches.T
>>> arr0[indices0]
array([[25, 24], [25, 24], [25, 26]])

More examples can be found in the test cases.

How fast is it?

Here is an unscientific example:

from timeit import default_timer as timer
import numpy as np
from close_numerical_matches import naive_find_matches, find_matches

arr0 = np.random.rand(320_000, 2)
arr1 = np.random.rand(44_000, 2)

start = timer()
naive_find_matches(arr0, arr1, tol=0.001)
end = timer()
print(end - start)  # 255.335 s

start = timer()
find_matches(arr0, arr1, tol=0.001)
end = timer()
print(end - start)  # 5.821 s

How it works

Instead of comparing every element in the first array against every element in the second array, resulting in an O(nmd) runtime, all elements are at first assigned to buckets so only elements that are relatively close are compared. In the case of relatively few matches and a low dimensionality d, this cuts the runtime down to almost linear O((n + m)d).

In general, the algorithm runtime of the bucket approach is O((n + m)d + Bd³ + ∑_{b ∈ B} n_b m_b) where B is the number of buckets and n_b and m_b are the number of items assigned to bucket b. As can be seen, it scales bad with dimensionality and also does not improve from the naive approach if all elements are assigned to the same bucket. In case the bucket approach is likely to be slower than the naive approach, this library will fall back to the naive approach.

When NOT to use this library

  • If you are working with arrays with very high dimensionalities, the algorithm employed here does not scale well. As mentioned above, the naive algorithm will be used in such cases. See if another library exists for your particular problem.
  • If you are expecting that a lot of pairs will match, this is not suitable. This algorithm is targeted for the case where there are extremely many data points and only a fraction of those are expected to match.
  • If you need to use a distance function that does not map well to being assigned to buckets.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

close_numerical_matches-0.2.0-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

close_numerical_matches-0.2.0-cp311-cp311-win_amd64.whl (53.0 kB view details)

Uploaded CPython 3.11 Windows x86-64

close_numerical_matches-0.2.0-cp311-cp311-win32.whl (47.4 kB view details)

Uploaded CPython 3.11 Windows x86

close_numerical_matches-0.2.0-cp311-cp311-musllinux_1_1_x86_64.whl (134.9 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

close_numerical_matches-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (140.1 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

close_numerical_matches-0.2.0-cp311-cp311-macosx_10_9_x86_64.whl (78.5 kB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

close_numerical_matches-0.2.0-cp310-cp310-win_amd64.whl (53.4 kB view details)

Uploaded CPython 3.10 Windows x86-64

close_numerical_matches-0.2.0-cp310-cp310-win32.whl (47.5 kB view details)

Uploaded CPython 3.10 Windows x86

close_numerical_matches-0.2.0-cp310-cp310-musllinux_1_1_x86_64.whl (137.4 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

close_numerical_matches-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (142.1 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

close_numerical_matches-0.2.0-cp310-cp310-macosx_10_9_x86_64.whl (80.0 kB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

close_numerical_matches-0.2.0-cp39-cp39-win_amd64.whl (53.3 kB view details)

Uploaded CPython 3.9 Windows x86-64

close_numerical_matches-0.2.0-cp39-cp39-win32.whl (47.6 kB view details)

Uploaded CPython 3.9 Windows x86

close_numerical_matches-0.2.0-cp39-cp39-musllinux_1_1_x86_64.whl (137.1 kB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

close_numerical_matches-0.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (141.8 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

close_numerical_matches-0.2.0-cp39-cp39-macosx_10_9_x86_64.whl (79.9 kB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

close_numerical_matches-0.2.0-cp38-cp38-win_amd64.whl (53.1 kB view details)

Uploaded CPython 3.8 Windows x86-64

close_numerical_matches-0.2.0-cp38-cp38-win32.whl (47.4 kB view details)

Uploaded CPython 3.8 Windows x86

close_numerical_matches-0.2.0-cp38-cp38-musllinux_1_1_x86_64.whl (136.1 kB view details)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

close_numerical_matches-0.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (139.6 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

close_numerical_matches-0.2.0-cp38-cp38-macosx_10_9_x86_64.whl (79.1 kB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

File details

Details for the file close_numerical_matches-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5709a8ec8d9e3eaea54a0bd7fcfdcef7f98a8c716ea8b7ca80b6506013854979
MD5 b8a257b344a94bdfe7e2160b11c3eafb
BLAKE2b-256 4db9890c0ae4fb28ee25d50b3f172839056d6e1920e728e0ef26460543675065

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 7df5a95d3c4e96c1eda02b69a437d4f653311a1094b5f3cd4796ea11d3f582c7
MD5 29064a5f61a081002a34574cf4e26fc0
BLAKE2b-256 6ae0b5d274e9cb91eb3a6c5f33d87a3baa99dd00a83d0cbae45b90dd19281e99

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.0-cp311-cp311-win32.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.0-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 6033bd5150adfb32e1f3c328bfb7fa6168a0ab03ac47c5b3759f966ddb0d0652
MD5 db8c61cf52c86b45c61758e586b01014
BLAKE2b-256 9940e7a9c72cd2f247ab06ed650637afa79da2d3343604f0824754accf9ce9bd

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.0-cp311-cp311-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.0-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 f3fece8ec841eb8e7d9f80c843ce6d72c555c059c7031c16df4ae4232f15cb0f
MD5 67dd27d9fef1a04182527a54e7657fcf
BLAKE2b-256 5955eb6b5236fe07ad034147439cce56e462b3ef1977870cc7d406610466b829

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5ad1d7784f9ade34db72da41a7dea311fa3401a8076e04588d303e0eaafb6dc2
MD5 93756e88d0992f2320a1d1c055fc9b6f
BLAKE2b-256 147e885b24e2e06c47e56d13c221964af6fe05c5d51591d9d009ef939ac2ae7b

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.0-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 1b4b7beb90d94c2209abba120b3b832ab0f6ae9960681e64424b39dc7e950319
MD5 c9b562a98578e4ca48bdfc6aebec2cc2
BLAKE2b-256 e535112ce6fd8e7c280918b3c06daeae2bf3de7cbc8f974b57b05908831690e3

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 fbd65efa4308c924345262211a7f6cfc1d49c139a2aa27cd1c19d3b279153dcc
MD5 a9216fe326dc487e865397c379b91f31
BLAKE2b-256 25109cd8a8617e6a77d59c971b58f8e2ac4241402020086b65a355d306cf29d6

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.0-cp310-cp310-win32.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.0-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 dd1eb738f176f22cc36a070951582ffe2be94ccfc3a9ad35cb2aeff30c64ad01
MD5 3b44ae84195bef4648efe4ba6f0fd7b9
BLAKE2b-256 437b3919abacda3a48409ef0e12b2f76856da6f9114544c8d76f0e301987b8b3

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.0-cp310-cp310-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 14f6a8ec1b5b84a5d51174dcfa228e786b97673560160fc7259d37cf3d6badab
MD5 405be18e36a43962dc7b005c5917c86e
BLAKE2b-256 afef00f527d38ceda1f6dbd629591df7496f2ff9f641f5330ae6d611831aeb5f

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 94dcca91b2de86d4af27a3f6bebcc6cd56fd030cf9b7ee6efc8a9ac26cc0b259
MD5 4ea4b2f8ac536ed0cb2241a8efb5f382
BLAKE2b-256 341ed229f8cff6945cdeadd5a583d151097270a8c729ef8cc1dae9a7df18b52f

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.0-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 1b7a8c1dac2dfc39c217f7a1c4433c0fc70f5510dbfeeef6737a7f708f80b2f3
MD5 4e03befd926cefa6a0d5aea44f300b94
BLAKE2b-256 ff0ecf8510a2927ecbd88716b34f869c4d8c42de11db77b46e6cdd63ebbd271b

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.0-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 d71c6565495b8781f436fb9993848791a9b4c53e140d4adb405271b64b30b093
MD5 98c77d4b812b3daed2e45e2e676ad977
BLAKE2b-256 9c97d02c3090dbe6a83111d76bb04fb3b17925aff3a59484fa2bf9e9d320cc3a

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.0-cp39-cp39-win32.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.0-cp39-cp39-win32.whl
Algorithm Hash digest
SHA256 d438c9b2cdb2701ff94870ab578d002f631d38a1a13b455d052d1b69aa241b59
MD5 8ac233dbd6bb69809d5912a9c7c3bfb1
BLAKE2b-256 1e79e005f6773db66b20ca69f3b10499cc40b1419f4b3b1796a1c26a53464056

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.0-cp39-cp39-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.0-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 f53104d92748d32c73d4f82434cbf31d77b2dd91d1c38d9e2c236157845d09b9
MD5 8c526c8cd23b8bf2680196a636086a4d
BLAKE2b-256 3617ff0c0b0b0a3855e7463d2b4b895420473409a4e73327ef55ae3716f49715

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 746a6f3ab02ec10a9accfabcec313fe488530a11652b08ad231d62fc4eae9343
MD5 63bfc6c9a73edc33a2eb178651bf205b
BLAKE2b-256 49cced22a233c799a308ba1289cefe7a2ce1f5fcfbaafa115c7168b0b3d28c06

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.0-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 3835f5fe511b10f18c124d67aeb6f36a715125f7f9b64dc338e9595fe8ba47f4
MD5 a48465dbcf1b54e708844871fb2b0fad
BLAKE2b-256 a087a6647c7afc8b7c9f7004b568c6e834d3b69cd1d8a0810c09090d0ddc485c

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.0-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 17816b2d545000f032b356ee31e5abae4a4547402dec6391d53d11b80a570236
MD5 cab5b9e8a77cd7504fa417316e53ab1f
BLAKE2b-256 0ea851ffb0a8d742579011d267c6e4edf3778da726f747bfea4ae29c5e89e75e

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.0-cp38-cp38-win32.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.0-cp38-cp38-win32.whl
Algorithm Hash digest
SHA256 0ac3ea318bb38437b61907faefdd4e737e1da6fbb6a20b5d39e7e4926b09735c
MD5 71eb481f7f937e2dc039d7cefa315c14
BLAKE2b-256 00ae6173c10ee64564b4a5877ed3ea91a0ebe45dbf0966f353faf974f8984cb3

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.0-cp38-cp38-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.0-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 430a1e7a0976fb80de214f92f98d65779469ff4aee3758252fc62d3a6fd9a956
MD5 fae40bc17841b77662030bffca26b15b
BLAKE2b-256 18e4f257726d46de48db42abed6a34a38661304ddcf2488d221d987f41543915

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 64849b5781694567e52302fd21fea6621f27e2721b61b247bf8431ec49557827
MD5 68281ad88dbff346eba4450d36333550
BLAKE2b-256 616673857c462b6376f84a7b7a4bb7ed4f56c03a7513a632996aca29b0358ac6

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.0-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 948f8a6ddf3ac99d14e8d4912c764657a33246810b6d8709864b465a734c9819
MD5 7d8cfde05825c2dc850987505ca17273
BLAKE2b-256 c48b1861b5583a8b3f893b3d4054346d4c757301c2afa4268ab228bf7676a774

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page