Skip to main content

Distances and divergences between distributions implemented in python.

Project description

Pypi project Pypi total project downloads

Distances and divergences between discrete distributions described as dictionaries implemented in python.

These are meant as fast solutions to compute distances and divergences between discrete distributions, expecially when the two distributions contains a significant amount of events with nill probability which are not described in the dictionaries.

How do I install this package?

As usual, just download it using pip:

pip install dictances

Available metrics

A number of distances and divergences are available:

Distances

Methods

Bhattacharyya distance

bhattacharyya

Bhattacharyya coefficient

bhattacharyya_coefficient

Canberra distance

canberra

Chebyshev distance

chebyshev

Chi Square distance

chi_square

Cosine Distance

cosine

Euclidean distance

euclidean

Hamming distance

hamming

Jensen-Shannon divergence

jensen_shannon

Kullback-Leibler divergence

kullback_leibler

Mean absolute error

mae

Taxicab geometry

manhattan, cityblock, total_variation

Minkowski distance

minkowsky

Mean squared error

mse

Pearson’s distance

pearson

Squared deviations from the mean

squared_variation

Usage example with points

Suppose you have a point described by my_first_dictionary and another one described by my_second_dictionary:

from dictances import cosine

my_first_dictionary = {
    "a": 56,
    "b": 34,
    "c": 89
}

my_second_dictionary = {
    "a": 21,
    "d": 51,
    "e": 74
}

cosine(my_first_dictionary, my_second_dictionary)
#>>> 0.8847005261889619

Usage example with distributions

Suppose you have a point described by my_first_dictionary and another one described by my_second_dictionary:

from dictances import bhattacharyya, bhattacharyya_coefficient

a = {
    "event_1": 0.4,
    "event_2": 0.1,
    "event_3": 0.2,
    "event_4": 0.3,
}
b = {
    "event_1": 0.1,
    "event_2": 0.2,
    "event_5": 0.2,
    "event_9": 0.5,
}

bhattacharyya_coefficient(a, b)
#>>> 0.3414213562373095
bhattacharyya(a, b)
#>>> 1.07463791569453

Handling nested dictionaries

If you need to compute the distance between two nested dictionaries you can use deflate_dict as follows:

from dictances import cosine
from deflate_dict import deflate

my_first_dictionary = {
    "a": 8,
    "b": {
        "c": 3,
        "d": 6
    }
}

my_second_dictionary = {
    "b": {
        "c": 8,
        "d": 1
    },
    "y": 3,

}

cosine(deflate(my_first_dictionary), deflate(my_second_dictionary))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dictances-1.5.6.tar.gz (8.3 kB view details)

Uploaded Source

File details

Details for the file dictances-1.5.6.tar.gz.

File metadata

  • Download URL: dictances-1.5.6.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.7.9

File hashes

Hashes for dictances-1.5.6.tar.gz
Algorithm Hash digest
SHA256 c94cefa990d8301a225e82acd368f58612830b373572a7a677b8fab2dcde20b3
MD5 0694e4abc94a4fa1eca6ae665c96f7be
BLAKE2b-256 aeee79ce1e09ca9cc22f5da12fd7557afb312824b1baa90eb9d27b7fea77a4ae

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page