Skip to main content

Python implementation of Gowers distance, pairwise between records in two data sets

Project description

Build Status PyPI version Downloads

Introduction

Gower's distance calculation in Python. Gower Distance is a distance measure that can be used to calculate distance between two entity whose attribute has a mixed of categorical and numerical values. Gower (1971) A general coefficient of similarity and some of its properties. Biometrics 27 857–874.

More details and examples can be found on my personal website here:(https://www.thinkdatascience.com/post/2019-12-16-introducing-python-package-gower/)

Core functions are wrote by Marcelo Beckmann.

Examples

Installation

pip install gower

Generate some data

import numpy as np
import pandas as pd
import gower

Xd=pd.DataFrame({'age':[21,21,19, 30,21,21,19,30,None],
'gender':['M','M','N','M','F','F','F','F',None],
'civil_status':['MARRIED','SINGLE','SINGLE','SINGLE','MARRIED','SINGLE','WIDOW','DIVORCED',None],
'salary':[3000.0,1200.0 ,32000.0,1800.0 ,2900.0 ,1100.0 ,10000.0,1500.0,None],
'has_children':[1,0,1,1,1,0,0,1,None],
'available_credit':[2200,100,22000,1100,2000,100,6000,2200,None]})
Yd = Xd.iloc[1:3,:]
X = np.asarray(Xd)
Y = np.asarray(Yd)

Find the distance matrix

gower.gower_matrix(X)
array([[0.        , 0.3590238 , 0.6707398 , 0.31787416, 0.16872811,
        0.52622986, 0.59697855, 0.47778758,        nan],
       [0.3590238 , 0.        , 0.6964303 , 0.3138769 , 0.523629  ,
        0.16720603, 0.45600235, 0.6539635 ,        nan],
       [0.6707398 , 0.6964303 , 0.        , 0.6552807 , 0.6728013 ,
        0.6969697 , 0.740428  , 0.8151941 ,        nan],
       [0.31787416, 0.3138769 , 0.6552807 , 0.        , 0.4824794 ,
        0.48108295, 0.74818605, 0.34332284,        nan],
       [0.16872811, 0.523629  , 0.6728013 , 0.4824794 , 0.        ,
        0.35750175, 0.43237334, 0.3121036 ,        nan],
       [0.52622986, 0.16720603, 0.6969697 , 0.48108295, 0.35750175,
        0.        , 0.2898751 , 0.4878362 ,        nan],
       [0.59697855, 0.45600235, 0.740428  , 0.74818605, 0.43237334,
        0.2898751 , 0.        , 0.57476616,        nan],
       [0.47778758, 0.6539635 , 0.8151941 , 0.34332284, 0.3121036 ,
        0.4878362 , 0.57476616, 0.        ,        nan],
       [       nan,        nan,        nan,        nan,        nan,
               nan,        nan,        nan,        nan]], dtype=float32)

Find Top n results

gower.gower_topn(Xd.iloc[0:2,:], Xd.iloc[:,], n = 5)
{'index': array([4, 3, 1, 7, 5]),
 'values': array([0.16872811, 0.31787416, 0.3590238 , 0.47778758, 0.52622986],
       dtype=float32)}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gower-0.1.2.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

gower-0.1.2-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file gower-0.1.2.tar.gz.

File metadata

  • Download URL: gower-0.1.2.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.6

File hashes

Hashes for gower-0.1.2.tar.gz
Algorithm Hash digest
SHA256 34ddb5158f0e8bfba093dca06b9f887bda244998d10af2a3ad8c74a6efa1b5f6
MD5 1d33bdd101ad7196dbadad0fc09de08c
BLAKE2b-256 7cb8f02ffa72009105e981b21fe957895107d1b3c81dece43167d28d8acfdfb0

See more details on using hashes here.

File details

Details for the file gower-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: gower-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.6

File hashes

Hashes for gower-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cb46e18243e1d88d2fa0a23d20afb71e5469f25db4ee6236db40f897dfea9e6f
MD5 d7319f211797296951c89c0b4985d67b
BLAKE2b-256 992388b526457ea992e0a47147a886db3d749d07347c8d3a303f6076deee7299

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page