Skip to main content

Distances and divergences between distributions implemented in python.

## Project description

Distances and divergences between dictionaries implemented in python 3.6.

In the complexity notations, n is len(a) and m is len(b).

The samples are dictionaries generated by the test utilities here.

## How do I get it?

Just type into your terminal:

pip install dictances

## Basic example

For each metric, an example is present in the folder examples. Here’s a basic example for those too lazy to click links (like me).

import random
from dictances import cosine, euclidean, canberra
random.seed(42) # for reproducibility

# Simple function to generate the example dictionaries
def generate_example_dict(n=1000):
return {random.randint(0,1000):random.uniform(0,1000) for i in range(n)}

a, b = generate_example_dict(), generate_example_dict()

print(cosine(a,b))
# >>> 0.52336690346601

print(euclidean(a,b))
# >>> 15119.400349404095

print(canberra(a,b))
# >>> 624.9088876554047

## Metrics table

Metric name

Usage example

Average time on sample

Complexity

Euclidean distance

euclidean

53.7 µs ± 981 ns

Squared variation

squared_variation

50.6 µs ± 982 ns

Total variation

total_variation

50.6 µs ± 2.75 µs

Nth variation

nth_variation

50.6 µs ± 1.07 µs

Manhattan distance

manhattan

51 µs ± 1.2 µs

Mean absolute error

mae

51.9 µs ± 2.4 µs

Mean squared error

mse

51.8 µs ± 1.67 µs

Chebyshev distance

chebyshev

49.8 µs ± 628 ns

Minkowski distance

minkowsky

50.4 µs ± 2.94 µs

Canberra distance

canberra

30.9 µs ± 340 ns

Cosine distance

cosine

41.7 µs ± 1.42 µs

Pearson distance

pearson

49.8 µs ± 2.51 µs

Hamming distance

hamming

5.57 µs ± 191 ns

Normalized Total Variation

normal_total_variation

15.2 µs ± 573 ns

Kullback Leibler divergence

kullback_leibler

31.6 µs ± 591 ns

Jensen Shannon divergence

jensen_shannon

21.1 µs ± 442 ns

Bhattacharyya distance

bhattacharyya

13.7 µs ± 71.9 ns

Hellinger distance

hellinger

16.8 µs ± 372 ns

## Test computer specifications

The computer on which the metrics where timed had the following specifications:

Computer specifications

Model Name

MacBook Pro

Processor Name

Intel Core i7

Processor Speed

2.3 GHz

Number of Processors

1

Total Number of Cores

4

L2 Cache (per Core)

256 KB

L3 Cache

6 MB

Memory

16 GB

## Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

### Source Distribution

dictances-1.3.0.tar.gz (8.6 kB view hashes)

Uploaded source