Skip to main content

Implementation of state-of-the-art distance metrics from research papers which can handle mixed-type data and missing values.

Project description

distython

Implementation of state-of-the-art distance metrics from research papers which can handle mixed-type data and missing values. At the moment, HEOM, HVDM and VDM are tested and working. VDM and HVDM has been released recently so please report bugs, if there are any. Please feel free to help and contribute to the project as there is a lack of existing implementations of hetergeneous distance metrics.

Installation

Clone the repository with git clone. Install the necessary packages with pipenv install

Example - HEOM

# Example code of how the HEOM metric can be used together with Scikit-Learn
import numpy as np
from sklearn.neighbors import NearestNeighbors
from sklearn.datasets import load_boston
# Importing a custom metric class
from HEOM import HEOM

# Load the dataset from sklearn
boston = load_boston()
boston_data = boston["data"]
# Categorical variables in the data
categorical_ix = [3, 8]
# The problem here is that NearestNeighbors can't handle np.nan
# So we have to set up the NaN equivalent
nan_eqv = 12345

# Introduce some missingness to the data for the purpose of the example
row_cnt, col_cnt = boston_data.shape
for i in range(row_cnt):
    for j in range(col_cnt):
        rand_val = np.random.randint(20, size=1)
        if rand_val == 10:
            boston_data[i, j] = nan_eqv

# Declare the HEOM with a correct NaN equivalent value
heom_metric = HEOM(boston_data, categorical_ix, nan_equivalents = [nan_eqv])

# Declare NearestNeighbor and link the metric
neighbor = NearestNeighbors(metric = heom_metric.heom)

# Fit the model which uses the custom distance metric 
neighbor.fit(boston_data)

# Return 5-Nearest Neighbors to the 1st instance (row 1)
result = neighbor.kneighbors(boston_data[0].reshape(1, -1), n_neighbors = 5)
print(result)

Research Papers

The code have implemented based on the following literature: HEOM, VDM and HVDM: https://arxiv.org/pdf/cs/9701101.pdf

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

distython-0.0.3.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

distython-0.0.3-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file distython-0.0.3.tar.gz.

File metadata

  • Download URL: distython-0.0.3.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3

File hashes

Hashes for distython-0.0.3.tar.gz
Algorithm Hash digest
SHA256 bc82a51554acd2f19f1bdbc25c5d9d349bbf2d085680a196358311cad6c71659
MD5 0fd87fa8d89321bc028b7b1aab0374de
BLAKE2b-256 262814809dc5a22def53601c344e130d7ac9bccf01b2e078240258fbcb0befa5

See more details on using hashes here.

File details

Details for the file distython-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: distython-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3

File hashes

Hashes for distython-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0f45014d21a821cbf495d4fedb96c4b21c2f82508f91cdd476e789609780500f
MD5 c06ae67838c4be25c7eb3685b4a4eb28
BLAKE2b-256 56fd9f1569d7abdf7ad962884ff97c4215aa0e13f126d25b57b003d1a96faa99

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page