closest_pairs finds the closest pairs of points in a dataset
Project description
Closest Pairs
Find the closest pair in a dataset.
Getting Started
pip install closest_pairs
or install from source:
git clone https://github.com/justinshenk/closest-pairs
cd closest_pairs
pip install .
How to use
import closest_pairs
pairs, distances = closest_pairs.solve(X, n=1)
You can specify how many pairs you want to identify with n
.
Example
import closest_pairs
import numpy as np
import matplotlib.pyplot as plt
# Create dataset
X = np.random.random((100,2))
pairs, distance = closest_pairs.solve(X, n=1)
# Plot points
z, y = np.split(X, 2, axis=1)
fig, ax = plt.subplots()
ax.scatter(z, y)
for i, txt in enumerate(X):
if i in pairs:
ax.annotate(i, (z[i], y[i]), color='red')
else:
ax.annotate(i, (z[i], y[i]))
Check pairs:
In [10]: pairs
Out[10]:
array([[[ 7],
[16]],
[[96],
[50]]])
Output:
Caveats
closest_pairs
will reduce the dimensionality with PCA of your data to two-dimensions for faster processing.
It also removes the first point in a pair if n
>1. In rare cases this leads to false negatives if the data is highly overlapping.
Credit
Python code modified from Andriy Lazorenko, packaged by Justin Shenk.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
closest_pairs-0.1.1.tar.gz
(3.9 kB
view hashes)
Built Distribution
Close
Hashes for closest_pairs-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | da18052399215f01ab440d4113ef008e86e96067957f9f61353cfd57ecb317e7 |
|
MD5 | eefe51e9a86d78e01ba5aca6f3bd2c1f |
|
BLAKE2b-256 | cd7faec754b30918549b4b7735ed7bcc33505b3f3f86fe5890c2a43641a5f597 |