Skip to main content

An implementation of KD trees

Project description

KD-tree-implementation

An implementation of kd-search trees with functions to find the nearest neighbor, an operation that would take a long time using linear search on large datasets. That is where kd-search trees come in, since they can exclude a larger part of the dataset at once.
This project was created as a final project for the course CS110/Computation: Solving Problems with algorithms.

Installation guide

Open the Command center and paste the following

pip install Katjas-kd-tree

Requirements

You need to have the numpy, math, and random modules installed. For the example.py code you need the json module in addition. If you are unsure, you can run

pip install numpy
pip install math
pip install random
pip install json

How to run

After installing the package import it by typing

import kd_tree as kd

You are now able to use the following functions

kd.build_tree(dict)
# this will build a kd-tree from a given dictionary of format key:[values]
kd.distance(lsta,lstb)
# returns the distance between two points a and b with coordinates given by lsta and lstb
kd.find_approx_nearest(tree,value)
# returns the approximate nearest neighbor for a given value
kd.find_exact_nearest(tree,value)
# returns the exact nearest element of the tree to the value

Example use case

To find the closest color in a dataset of named colors in the LAB color space. This color space works similar to RBG colors, but is design to let . we cannot use our usual quick-search methods or binary search-trees, since the data has more than 1 dimension and cannot simply be ordered. Therefore, we can create a tree with 3 dimensions, where every new level is split along a new dimension, iterating through all of them as often as needed. This allows us to very quickly get an approximation of the nearest neighbor and with slightly more effort find the exact nearest neighbor quicker than with a linear search.

# importing a dataset of paint colors and their position in the LAB colorspace
with open ("paintcolors.json") as json_file:
    paintcolors=json.load(json_file)
# creating a tree out of the paintcolors
painttree=kd.build_tree(paintcolors)
# finding the approximate and exact nearest color to [0,0,0]
print((kd.distance(kd.find_approx_nearest(painttree,[0,0,0]).value,[0,0,0]),
    kd.find_approx_nearest(painttree,[0,0,0]).name,
    kd.find_approx_nearest(painttree,[0,0,0]).value))
print(kd.find_exact_nearest(painttree,[0,0,0]))

This will return the approximate and exact nearest color to [0,0,0]
(0.23327147726897515, 'UniversalBlack', [0.233007, 0.010686, -0.0030215])
(0.22615200000001437, 'TwilightZone', [0.226152, 5.54817e-08, 5.84874e-08])

If you would like to run this code for yourself, please download the data from https://github.com/katjadellalibera/KD-tree-implementation/blob/master/paintcolors.json and the code from https://github.com/katjadellalibera/KD-tree-implementation/blob/master/example.py

Background

Complexity:
A linear search runs with O(n) complexity, since it has to check every value. find_approx_nearest runs with O(log(n)) complexity on average. The find_exact_nearest function will exclude less of the tree at a time, but still run in O(log(n)), just with a higher constant factor.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Katjas_kd_tree-0.0.2.tar.gz (3.7 kB view hashes)

Uploaded Source

Built Distribution

Katjas_kd_tree-0.0.2-py3-none-any.whl (4.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page