Skip to main content

Finds close numerical matches across two arrays.

Project description

Close Numerical Matches

PyPI - Downloads PyPI - Version GitHub Workflow Status (branch) CodeFactor Grade GitHub issues GitHub license PyPI - Python Version PRs welcome

This package finds close numerical matches fast across two 2D arrays of shape (n, d) and (m, d) (if it can be assumed there will be relatively few matches and d is relatively low). Returns the indices of the matches.

Installation

You can install close-numerical-matches from PyPI:

$ pip install close-numerical-matches

The package is supported on Python 3.8 and above and requires Numpy.

How to use

Import find_matches from close_numerical_matches and supply two arrays of shape (n, d) and (m, d) and a given tolerance level. Optionally provide your desired distance metric and a bucket tolerance multiplier. The arguments in more detail:

  • arr0 : np.ndarray First array to find matches against. Should be of size (n, d).
  • arr1 : np.ndarray Second array to find matches against. Should be of size (m, d).
  • dist : {'norm', 'max', 'cos'} or Callable[[np.ndarray, np.ndarray], np.ndarray], default='norm' Distance metric to calculate distance. 'norm', 'max' and 'cos' are currently supported. If you want some other distance function, you can supply your own function. It should take two arrays: One array of size (n, d) of entries and another array of size (d,) of a particular entry. Should return an (n,) array of distances.
  • tol : float, default=0.1 The tolerance where values are considered the similar enough to count as a match. Should be > 0.
  • bucket_tol_mult : int, default=2 The tolerance multiplier to use for assigning buckets. By setting this lower, there are fewer potential close points that need to be checked (faster runtime), but you risk missing some match pairs. By setting it higher, you need to compare more potential matches, but the risk of missing any close pair is lower. For cosine distance, you might need to set this high. If you supply your own distance function, do take this parameter into consideration. Should never be less than 1.

Example

>>> import numpy as np
>>> from close_numerical_matches import find_matches
>>> arr0 = np.array([[25, 24], [50, 50], [25, 26]])
>>> arr1 = np.array([[25, 23], [25, 25], [50.6, 50.6], [60, 60]])
>>> find_matches(arr0, arr1, tol=1.0001)
array([[0, 0], [0, 1], [1, 2], [2, 1]])
>>> find_matches(arr0, arr1, tol=0.9999)
array([[1, 2]])
>>> find_matches(arr0, arr1, tol=0.60001)
array([], dtype=int64)
>>> find_matches(arr0, arr1, tol=0.60001, dist='max')
array([[1, 2]])
>>> find_matches([[0, 0.05]], [[0, 5], [0, -0.01]], tol=0.1, dist='cos')
array([[0, 0]])
>>> manhatten_dist = lambda arr, row: np.sum(np.abs(arr - row), axis=1)
>>> matches = find_matches(arr0, arr1, tol=1.0001, dist=manhatten_dist)
>>> matches
array([[0, 0], [0, 1], [2, 1]])
>>> indices0, indices1 = matches.T
>>> arr0[indices0]
array([[25, 24], [25, 24], [25, 26]])

More examples can be found in the test cases.

How fast is it?

Here is an unscientific example:

from timeit import default_timer as timer
import numpy as np
from close_numerical_matches import naive_find_matches, find_matches

arr0 = np.random.rand(320_000, 2)
arr1 = np.random.rand(44_000, 2)

start = timer()
naive_find_matches(arr0, arr1, tol=0.001)
end = timer()
print(end - start)  # 255.335 s

start = timer()
find_matches(arr0, arr1, tol=0.001)
end = timer()
print(end - start)  # 5.821 s

How it works

Instead of comparing every element in the first array against every element in the second array, resulting in an O(nmd) runtime, all elements are at first assigned to buckets so only elements that are relatively close are compared. In the case of relatively few matches and a low dimensionality d, this cuts the runtime down to almost linear O((n + m)d).

In general, the algorithm runtime of the bucket approach is O((n + m)d + Bd³ + ∑_{b ∈ B} n_b m_b) where B is the number of buckets and n_b and m_b are the number of items assigned to bucket b. As can be seen, it scales bad with dimensionality and also does not improve from the naive approach if all elements are assigned to the same bucket. In case the bucket approach is likely to be slower than the naive approach, this library will fall back to the naive approach.

When NOT to use this library

  • If you are working with arrays with very high dimensionalities, the algorithm employed here does not scale well. As mentioned above, the naive algorithm will be used in such cases. See if another library exists for your particular problem.
  • If you are expecting that a lot of pairs will match, this is not suitable. This algorithm is targeted for the case where there are extremely many data points and only a fraction of those are expected to match.
  • If you need to use a distance function that does not map well to being assigned to buckets.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

close_numerical_matches-0.2.1-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

close_numerical_matches-0.2.1-cp312-cp312-macosx_11_0_arm64.whl (76.3 kB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

close_numerical_matches-0.2.1-cp312-cp312-macosx_10_9_x86_64.whl (79.5 kB view details)

Uploaded CPython 3.12 macOS 10.9+ x86-64

close_numerical_matches-0.2.1-cp312-cp312-macosx_10_9_universal2.whl (146.2 kB view details)

Uploaded CPython 3.12 macOS 10.9+ universal2 (ARM64, x86-64)

close_numerical_matches-0.2.1-cp311-cp311-macosx_11_0_arm64.whl (76.3 kB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

close_numerical_matches-0.2.1-cp311-cp311-macosx_10_9_x86_64.whl (78.7 kB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

close_numerical_matches-0.2.1-cp311-cp311-macosx_10_9_universal2.whl (145.3 kB view details)

Uploaded CPython 3.11 macOS 10.9+ universal2 (ARM64, x86-64)

close_numerical_matches-0.2.1-cp310-cp310-macosx_11_0_arm64.whl (77.7 kB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

close_numerical_matches-0.2.1-cp310-cp310-macosx_10_9_x86_64.whl (80.2 kB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

close_numerical_matches-0.2.1-cp310-cp310-macosx_10_9_universal2.whl (148.2 kB view details)

Uploaded CPython 3.10 macOS 10.9+ universal2 (ARM64, x86-64)

close_numerical_matches-0.2.1-cp39-cp39-macosx_11_0_arm64.whl (77.7 kB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

close_numerical_matches-0.2.1-cp39-cp39-macosx_10_9_x86_64.whl (80.1 kB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

close_numerical_matches-0.2.1-cp39-cp39-macosx_10_9_universal2.whl (148.1 kB view details)

Uploaded CPython 3.9 macOS 10.9+ universal2 (ARM64, x86-64)

close_numerical_matches-0.2.1-cp38-cp38-macosx_11_0_arm64.whl (76.8 kB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

close_numerical_matches-0.2.1-cp38-cp38-macosx_10_9_x86_64.whl (79.3 kB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

close_numerical_matches-0.2.1-cp38-cp38-macosx_10_9_universal2.whl (146.4 kB view details)

Uploaded CPython 3.8 macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file close_numerical_matches-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fcf611762c934e64c1b7458433df9da57ac070adf7401c9e032fd49b23a56509
MD5 724416558c26ea0494be7024ca81c340
BLAKE2b-256 3da556571a4a52f81c8d48e75c56a3a42f54badca7e96f13a6c79770daf223f8

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0cdd609ef6265bf80bfa0deff5e07c252981206347a02b59f8e890c53a4f12ca
MD5 e22c34f2568aa4d251dbceed245f4059
BLAKE2b-256 a7136bca5d3fa0ac17c64db9aa456f8c8ddbd8bd72a137a07209c0f5dd9f6efe

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.1-cp312-cp312-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.1-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 2d7c9878b803cf3c80f2d670c57af1b340e6ce0425f4397783d1183a3465b07d
MD5 9f38d13e75ea640d7ff9e2d4dbf52d8b
BLAKE2b-256 0ca7cb83e5712bf8a070dd83e8e6a811c8a60e0df1abd6b058dd5dd0ffd5a122

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.1-cp312-cp312-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.1-cp312-cp312-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 6f579d1f4e469b25ab76d976e70b27fc40b655270b33dc06995f2708df0e84d3
MD5 d47e5ba59c8fac73b60dcf03130b655c
BLAKE2b-256 c4ac9808526377da663a6917bca4d8309630b37446abe0a5f29e359a202d625c

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fd529429cec0a39ce3a86af7bb65fb0f55814b5028f8d2f79eca3a925e2c6cce
MD5 7a7aac7841b57db18791bebc6f4535b5
BLAKE2b-256 9554ec16ed808a1bb9ecceb65b2957c31bbfd08b0eeb79fbdd41bb274cd50238

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.1-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.1-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 839ff0c8c92a6f9cb341ec7099b5fff5f64463460a725a2c0eba11fd1b9d137f
MD5 3ad0fd7fbd5de0497bcf800171bc461d
BLAKE2b-256 50cd013e0e9a6d4f3ce7e51678422eccf5d8fc1721091beb2cf848be38597343

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.1-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.1-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 83e839436757e834e9a110092183b8855ecdcce08600796e1064ee4c7a55c871
MD5 6ecc85cb9ba93acfb52e69543721bd53
BLAKE2b-256 178f14ebcf9ebdda9c6b174d99a4462bc81bf5986f68d311832643cfca20a6a2

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 da1a0148daab222afc698ef21c1c413109fc1ceb1a4d9f814c3ffb3296a2e37a
MD5 84bdd6a6ff5dc9e7b6bf92123082a418
BLAKE2b-256 d0871501a5080e9c73ed19efcdae8507b1f947834bc777a124bdc70e0b2762e2

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.1-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.1-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 73fa5dee867a55557e5f76aa2c883f76819b34f0212f4def2eda8179b9ad6b36
MD5 2c591bdd1a5bf128d3b484a90e78052b
BLAKE2b-256 d6674120808abf7466ae08633ce79910e8a8149398f177c2e515aecb250e75af

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.1-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.1-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 909e6115b4d13c72de95067e57f7524a0bee3978a33b8f0b7b3122382954cd73
MD5 b913d785cc7ba0c585e88a03833343d6
BLAKE2b-256 f5e07623a1b7072652f1de94aede290149c3b02d2f3728fa7880d8e706246325

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 dad77bb35987abe328475cf91050704a5de97c32247e249841bc90840780d7da
MD5 03139b0fe7525a2f1e8f0d85c81c6bd1
BLAKE2b-256 8677baac6b13ca23581a46f701f560ba4fe7a60365c54c2979813cdb3a2723d2

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.1-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.1-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 f6a6224c795e399245a27eb3700758190b8f60fe4c09dc01093843d3a3c9f780
MD5 672c5bfe93766610fa6e35f572d20ebe
BLAKE2b-256 8a8144b447e7a518ed486fef2ee865557f560495938ca84283699e0cb4f3f77c

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.1-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.1-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 19992ba1d151e1bfabaa426a105c0e9b0304c47550fdc4c206677288b69c573e
MD5 3492f929d49fc5346ec77ba74c49f350
BLAKE2b-256 a252659eae419ad325d1743749c93789d6d89b3b1d92d33bfc2948238261affa

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.1-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.1-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 de8f3c0d5f4fc4abbefb61dcbe0f72dc7eefabfc49640439efb56cdf5b170670
MD5 789e75da6b4aea30af414673a1b30417
BLAKE2b-256 14458bd66125a4269192f1278cccf753fd87fadff105cd52ae22c2890da3bacb

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.1-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.1-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 d911bea358a33c5c1ab4e98ad47c9b4267368da7e2aba221b2e95d8a7b5f3275
MD5 94a2ff0df3e498a6774a19833397508d
BLAKE2b-256 e622f4468f89075a21aa9641e6159b3ea5a3f980b1424388704fa20337a6e0b8

See more details on using hashes here.

File details

Details for the file close_numerical_matches-0.2.1-cp38-cp38-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for close_numerical_matches-0.2.1-cp38-cp38-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 d754a32e7bafcd3f9b6eb2a760fec485155e5e6422b37b7e3e0899590899d6fb
MD5 9d8933f23ee6f15b378727840ab4963e
BLAKE2b-256 bb58bad011b18131b2e9edaa95740f2354336894611ea5e099a87a42f58504ce

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page