Python implementation of Gowers distance, pairwise between records in two data sets
Project description
Introduction
Gower's distance calculation in Python. Gower Distance is a distance measure that can be used to calculate distance between two entity whose attribute has a mixed of categorical and numerical values. Gower (1971) A general coefficient of similarity and some of its properties. Biometrics 27 857–874.
More details and examples can be found on my personal website here:(https://www.thinkdatascience.com/post/2019-12-16-introducing-python-package-gower/)
Core functions are wrote by Marcelo Beckmann.
Examples
Installation
pip install gower
Generate some data
import numpy as np
import pandas as pd
import gower
Xd=pd.DataFrame({'age':[21,21,19, 30,21,21,19,30,None],
'gender':['M','M','N','M','F','F','F','F',None],
'civil_status':['MARRIED','SINGLE','SINGLE','SINGLE','MARRIED','SINGLE','WIDOW','DIVORCED',None],
'salary':[3000.0,1200.0 ,32000.0,1800.0 ,2900.0 ,1100.0 ,10000.0,1500.0,None],
'has_children':[1,0,1,1,1,0,0,1,None],
'available_credit':[2200,100,22000,1100,2000,100,6000,2200,None]})
Yd = Xd.iloc[1:3,:]
X = np.asarray(Xd)
Y = np.asarray(Yd)
Find the distance matrix
gower.gower_matrix(X)
array([[0. , 0.3590238 , 0.6707398 , 0.31787416, 0.16872811,
0.52622986, 0.59697855, 0.47778758, nan],
[0.3590238 , 0. , 0.6964303 , 0.3138769 , 0.523629 ,
0.16720603, 0.45600235, 0.6539635 , nan],
[0.6707398 , 0.6964303 , 0. , 0.6552807 , 0.6728013 ,
0.6969697 , 0.740428 , 0.8151941 , nan],
[0.31787416, 0.3138769 , 0.6552807 , 0. , 0.4824794 ,
0.48108295, 0.74818605, 0.34332284, nan],
[0.16872811, 0.523629 , 0.6728013 , 0.4824794 , 0. ,
0.35750175, 0.43237334, 0.3121036 , nan],
[0.52622986, 0.16720603, 0.6969697 , 0.48108295, 0.35750175,
0. , 0.2898751 , 0.4878362 , nan],
[0.59697855, 0.45600235, 0.740428 , 0.74818605, 0.43237334,
0.2898751 , 0. , 0.57476616, nan],
[0.47778758, 0.6539635 , 0.8151941 , 0.34332284, 0.3121036 ,
0.4878362 , 0.57476616, 0. , nan],
[ nan, nan, nan, nan, nan,
nan, nan, nan, nan]], dtype=float32)
Find Top n results
gower.gower_topn(Xd.iloc[0:2,:], Xd.iloc[:,], n = 5)
{'index': array([4, 3, 1, 7, 5]),
'values': array([0.16872811, 0.31787416, 0.3590238 , 0.47778758, 0.52622986],
dtype=float32)}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gower-0.1.2.tar.gz
.
File metadata
- Download URL: gower-0.1.2.tar.gz
- Upload date:
- Size: 5.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 34ddb5158f0e8bfba093dca06b9f887bda244998d10af2a3ad8c74a6efa1b5f6 |
|
MD5 | 1d33bdd101ad7196dbadad0fc09de08c |
|
BLAKE2b-256 | 7cb8f02ffa72009105e981b21fe957895107d1b3c81dece43167d28d8acfdfb0 |
File details
Details for the file gower-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: gower-0.1.2-py3-none-any.whl
- Upload date:
- Size: 5.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb46e18243e1d88d2fa0a23d20afb71e5469f25db4ee6236db40f897dfea9e6f |
|
MD5 | d7319f211797296951c89c0b4985d67b |
|
BLAKE2b-256 | 992388b526457ea992e0a47147a886db3d749d07347c8d3a303f6076deee7299 |