Similarity Algorithm (Data Mining) implementation in Python
Project description
Similarity Py [![Build Status](https://travis-ci.org/cenkbircanoglu/similarityPy.svg?branch=master)](https://travis-ci.org/cenkbircanoglu/similarityPy) [![Coverage Status](https://coveralls.io/repos/cenkbircanoglu/similarityPy/badge.svg?branch=master)](https://coveralls.io/r/cenkbircanoglu/similarityPy?branch=master)
===================
###Distance Algorithms
#### Numerical Data
##### <em>Norm</em>
**Data**: [{x, y, z}] <br/>
**Formula**: ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/norm.gif)
##### <em>Manhattan Distance</em>
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Formula**: ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/manhattan_distance.gif)
##### <em>Euclidean Distance</em>
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Formula**: ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/euclidean_distance.gif)
##### <em>Squared Euclidean Distance</em>
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Formula**: ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/squared_euclidean_distance.gif)
##### <em>Normalized Squared Euclidean Distance</em>
**Data**: [{a, b}, {x, y}] <br/>
**Formula**: ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/normalized_squared_euclidean_distance.gif)
##### <em>Chessboard Distance</em>
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Formula**: ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/chessboard_distance.gif)
##### <em>Bray Curtis Distance</em>
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Formula**: ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/bray_curtis_distance.gif)
##### <em>Canberra Distance</em>
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Formula**: ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/canberra_distance.gif)
##### <em>Cosine Distance</em>
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Formula**: ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/cosine_distance.gif)
##### <em>Correlation Distance</em>
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Formula**: ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/correlation_distance.gif)
#### Boolean Data
##### <em>Jaccard Dissimilarity</em>
**Data**: [{True,False,True}, {True,True,False}] <br/>
**Explanation**:[u,v] is equivalent to ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/jaccard_dissimilarity.png), where n<sub>ij</sub> is the number of corresponding pairs of elements in u and v respectively equal to i and j.
##### <em>Matching Dissimilarity</em>
**Data**: [{True,False,True}, {True,True,False}] <br/>
**Explanation**:[u,v] is equivalent to (n<sub>10</sub>+n<sub>01</sub>)/Length[u], where n<sub>ij</sub> is the number of corresponding pairs of elements in u and v respectively equal to i and j.
##### <em>Dice Dissimilarity</em>
**Data**: [{True,False,True}, {True,True,False}] <br/>
**Explanation**:[u,v] is equivalent to ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/dice_dissimilarity.png), where n<sub>ij</sub> is the number of corresponding pairs of elements in u and v respectively equal to i and j.
##### <em>Rogers Tanimoto Dissimilarity</em>
**Data**: [{True,False,True}, {True,True,False}] <br/>
**Explanation**:[u,v] is equivalent to ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/rogers_tanimoto_dissimilarity.png), where n<sub>ij</sub> is the number of corresponding pairs of elements in u and v respectively equal to i and j.
##### <em>Russell Rao Dissimilarity</em>
**Data**: [{True,False,True}, {True,True,False}] <br/>
**Explanation**:[u,v] is equivalent to (n<sub>10</sub>+n<sub>01</sub>+n<sub>00</sub>)/Length[u], where n<sub>ij</sub> is the number of corresponding pairs of elements in u and v respectively equal to i and j.
##### <em>Sokal Sneath Dissimilarity</em>
**Data**: [{True,False,True}, {True,True,False}] <br/>
**Explanation**:[u,v] is equivalent to ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/sokal_sneath_dissimilarity.png), where n<sub>ij</sub> is the number of corresponding pairs of elements in and respectively equal to i and j.
##### <em>Yule Dissimilarity</em>
**Data**: [{True,False,True}, {True,True,False}] <br/>
**Explanation**:[u,v] is equivalent to ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/yule_dissimilarity.png), where n<sub>ij</sub> is the number of corresponding pairs of elements in and respectively equal to i and j.
#### String Data
##### <em>Hamming Distance</em>
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Explanation**:[u,v] gives the number of elements whose values disagree in u and v.
##### <em>Edit Distance</em>
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Explanation**:[u,v] gives the number of one-element deletions, insertions, and substitutions required to transform u to v.
##### <em>Damerau Levenshtein Distance</em>
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Explanation**:[u,v] gives the number of one-element deletions, insertions, substitutions, and transpositions required to transform u to v.
##### <em>Needleman Wunsch Similarity</em> (Not Implemented Yet)
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Explanation**:[u,v] finds an optimal global alignment between the elements of u and v, and returns the number of one-element matches.
##### <em>Smith Waterman Similarity</em> (Not Implemented Yet)
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Explanation**:[u,v] finds an optimal local alignment between the elements of u and v, and returns the number of one-element matches.
##Testing
Run all tests:
```bash
$ python -m unittest discover -s tests -p '*_test.py'
```
Start test with nose and code coverage:
```bash
$ nosetests --with-cov --cov-report html --cov apps tests/
```
===================
###Distance Algorithms
#### Numerical Data
##### <em>Norm</em>
**Data**: [{x, y, z}] <br/>
**Formula**: ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/norm.gif)
##### <em>Manhattan Distance</em>
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Formula**: ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/manhattan_distance.gif)
##### <em>Euclidean Distance</em>
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Formula**: ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/euclidean_distance.gif)
##### <em>Squared Euclidean Distance</em>
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Formula**: ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/squared_euclidean_distance.gif)
##### <em>Normalized Squared Euclidean Distance</em>
**Data**: [{a, b}, {x, y}] <br/>
**Formula**: ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/normalized_squared_euclidean_distance.gif)
##### <em>Chessboard Distance</em>
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Formula**: ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/chessboard_distance.gif)
##### <em>Bray Curtis Distance</em>
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Formula**: ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/bray_curtis_distance.gif)
##### <em>Canberra Distance</em>
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Formula**: ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/canberra_distance.gif)
##### <em>Cosine Distance</em>
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Formula**: ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/cosine_distance.gif)
##### <em>Correlation Distance</em>
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Formula**: ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/correlation_distance.gif)
#### Boolean Data
##### <em>Jaccard Dissimilarity</em>
**Data**: [{True,False,True}, {True,True,False}] <br/>
**Explanation**:[u,v] is equivalent to ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/jaccard_dissimilarity.png), where n<sub>ij</sub> is the number of corresponding pairs of elements in u and v respectively equal to i and j.
##### <em>Matching Dissimilarity</em>
**Data**: [{True,False,True}, {True,True,False}] <br/>
**Explanation**:[u,v] is equivalent to (n<sub>10</sub>+n<sub>01</sub>)/Length[u], where n<sub>ij</sub> is the number of corresponding pairs of elements in u and v respectively equal to i and j.
##### <em>Dice Dissimilarity</em>
**Data**: [{True,False,True}, {True,True,False}] <br/>
**Explanation**:[u,v] is equivalent to ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/dice_dissimilarity.png), where n<sub>ij</sub> is the number of corresponding pairs of elements in u and v respectively equal to i and j.
##### <em>Rogers Tanimoto Dissimilarity</em>
**Data**: [{True,False,True}, {True,True,False}] <br/>
**Explanation**:[u,v] is equivalent to ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/rogers_tanimoto_dissimilarity.png), where n<sub>ij</sub> is the number of corresponding pairs of elements in u and v respectively equal to i and j.
##### <em>Russell Rao Dissimilarity</em>
**Data**: [{True,False,True}, {True,True,False}] <br/>
**Explanation**:[u,v] is equivalent to (n<sub>10</sub>+n<sub>01</sub>+n<sub>00</sub>)/Length[u], where n<sub>ij</sub> is the number of corresponding pairs of elements in u and v respectively equal to i and j.
##### <em>Sokal Sneath Dissimilarity</em>
**Data**: [{True,False,True}, {True,True,False}] <br/>
**Explanation**:[u,v] is equivalent to ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/sokal_sneath_dissimilarity.png), where n<sub>ij</sub> is the number of corresponding pairs of elements in and respectively equal to i and j.
##### <em>Yule Dissimilarity</em>
**Data**: [{True,False,True}, {True,True,False}] <br/>
**Explanation**:[u,v] is equivalent to ![alt tag](https://raw.githubusercontent.com/cenkbircanoglu/clustering/master/images/yule_dissimilarity.png), where n<sub>ij</sub> is the number of corresponding pairs of elements in and respectively equal to i and j.
#### String Data
##### <em>Hamming Distance</em>
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Explanation**:[u,v] gives the number of elements whose values disagree in u and v.
##### <em>Edit Distance</em>
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Explanation**:[u,v] gives the number of one-element deletions, insertions, and substitutions required to transform u to v.
##### <em>Damerau Levenshtein Distance</em>
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Explanation**:[u,v] gives the number of one-element deletions, insertions, substitutions, and transpositions required to transform u to v.
##### <em>Needleman Wunsch Similarity</em> (Not Implemented Yet)
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Explanation**:[u,v] finds an optimal global alignment between the elements of u and v, and returns the number of one-element matches.
##### <em>Smith Waterman Similarity</em> (Not Implemented Yet)
**Data**: [{a, b, c}, {x, y, z}] <br/>
**Explanation**:[u,v] finds an optimal local alignment between the elements of u and v, and returns the number of one-element matches.
##Testing
Run all tests:
```bash
$ python -m unittest discover -s tests -p '*_test.py'
```
Start test with nose and code coverage:
```bash
$ nosetests --with-cov --cov-report html --cov apps tests/
```
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
similarityPy-0.1.0.tar.gz
(16.0 kB
view hashes)