Skip to main content

A clean and easy interface for nearest-neighbors lookup

Project description

Simple Neighbors

https://img.shields.io/travis/aparrish/simpleneighbors.svg https://coveralls.io/repos/github/aparrish/simpleneighbors/badge.svg?branch=master https://img.shields.io/pypi/v/simpleneighbors.svg

Simple Neighbors is a clean and easy interface for performing nearest-neighbor lookups on items from a corpus. To install the package:

pip install simpleneighbors[annoy]

Here’s a quick example, showing how to find the names of colors most similar to ‘pink’ in the xkcd colors list:

>>> from simpleneighbors import SimpleNeighbors
>>> import json
>>> color_data = json.load(open('xkcd.json'))['colors']
>>> hex2int = lambda s: [int(s[n:n+2], 16) for n in range(1,7,2)]
>>> colors = [(item['color'], hex2int(item['hex'])) for item in color_data]
>>> sim = SimpleNeighbors(3)
>>> sim.feed(colors)
>>> sim.build()
>>> list(sim.neighbors('pink', 5))
['pink', 'bubblegum pink', 'pale magenta', 'dark mauve', 'light plum']

For a more complete example, refer to my Understanding Word Vectors notebook, which shows how to use Simple Neighbors to perform similarity lookups on word vectors.

Read the complete Simple Neighbors documentation here: https://simpleneighbors.readthedocs.org.

Why Simple Neighbors?

Approximate nearest-neighbor lookups are a quick way to find the items in your data set that are closest (or most similar to) any other item in your data, or an arbitrary point in the space that your data defines. Your data items might be colors in a (R, G, B) space, or sprites in a (X, Y) space, or word vectors in a 300-dimensional space.

You could always perform pairwise distance calculations to find nearest neighbors in your data, but for data of any appreciable size and complexity, this kind of calculation is unbearably slow. Simple Neighbors uses one of a handful of libraries behind the scenes to provide approximate nearest-neighbor lookups, which are ultimately a little less accurate than pairwise calculations but much, much faster.

The library also keeps track of your data, sparing you the extra step of mapping each item in your data to its integer index (at the potential cost of some redundancy in data storage, depending on your application).

I made Simple Neighbors because I use nearest neighbor lookups all the time and found myself writing and rewriting the same bits of wrapper code over and over again. I wanted to hide a little bit of the complexity of using these libraries to make it easier to build small prototypes and teach workshops using nearest-neighbor lookups.

Multiple backend support

Simple Neighbors relies on the approximate nearest neighbor index implementations found in other libraries. By default, Simple Neighbors will choose the best backend based on the packages installed in your environment. (You can also specify which backend to use by hand, or create your own.)

Currently supported backend libraries include:

When you install Simple Neighbors, you can direct pip to install the required packages for a given backend. For example, to install Simple Neighbors with Annoy:

pip install simpleneighbors[annoy]

Annoy is highly recommended! This is the preferred way to use Simple Neighbors.

To install Simple Neighbors alongside scikit-learn to use the Sklearn backend (which makes use of scikit-learn’s NearestNeighbors class):

pip install simpleneighbors[sklearn]

If you can’t install Annoy or scikit-learn on your platform, you can also use a pure Python backend:

pip install simpleneighbors[purepython]

Note that the pure Python version uses a brute force search and is therefore very slow. In general, it’s not suitable for datasets with more than a few thousand items (or more than a handful of dimensions).

See the documentation for the SimpleNeighbors class for more information on specifying backends.

History

0.1.0 (2020-01-12)

  • Support for multiple backends. This was implemented primarily to ease installation for users who can’t install Annoy (because of a lack of binary packaging for their platforms).

0.0.1 (2018-07-13)

  • Initial release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

simpleneighbors-0.1.0-py2.py3-none-any.whl (12.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file simpleneighbors-0.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: simpleneighbors-0.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for simpleneighbors-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 840f7729ad8696830e55173e74acc4af7f89ef6e78c2fe92db818eacc934095c
MD5 6ff9ed15bdbaafbb7f6bcbdfbac65469
BLAKE2b-256 f9109092e15d9aa4a9e5a263416121f124e565766767e7866e11d7074ec50df5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page