Skip to main content

Third course reasearch project on developing the way to cluster bank clients from date, time and coordinates in their transaction history

Project description

# Geographical Transctions clustering algorithm

Name of the module stands for geographical transactions clustering. This module is an implementation of the method, developed for the third course project in HSE University. It takes dataframe with clients transactions history of the specified format and returns list of clusters.

For the record, it was intended to be for public usage in this form, as it is a research project seeking to find a way to deal with the described problem

## Installation

Run the following to install:

‘’’python

pip install geot_cluster’’’

## Usage

Before using make sure, that your dataset corresponds with requirements. Csv file must contain the following columns in order to work correctly

  • user_id : string type, example: “423156821”

  • event_dt : string type, example: “20190312”

  • event_time: string type, example: “2019-03-12 06:24:00.279”

  • lattitude : float type, example: 49.862621

  • longtitude: see lattitude

‘’’python

import pandas as pd import numpy as np import markov_clustering as mc import matplotlib.pyplot as plt import math import pytz import folium import os.path import networkx as nx

from haversine import haversine, Unit from collections import Counter from datetime import datetime from timezonefinder import TimezoneFinder from IPython.display import clear_output

import geotrans_cluster

path = [path to file with data] data, names = data_load(path)

%matplotlib notebook base = [path to the folder, where to store libs with information about clients]

archivate = True libs= True graph_f = True cluster_f = True

if(archivate):

archivate_maps(data, names, levels=4)

if(libs):

lib = graph_preparation(data, names, base) prob_lib = znakomstvo_by_lib(lib,data)

lib, prob_lib = load_libs(base = base)

if(graph_f):

graph = graph_forming(lib, prob_lib, treshold=0.9)

if(cluster_f):

result = mc.run_mcl(graph,pruning_threshold=0.7, inflation=2,expansion=2) clusters = mc.get_clusters(result)

clust_0 = clusters_to_ids(lib=lib, prob_lib=prob_lib, clusters = clusters, number = 0) maps = get_cluster_maps(data = data, clust = clust_0) print(“Number of clusters”, len(clusters))

plt.figure(figsize=(10,10)) mc.drawing.draw_graph(result, clusters, edge_color=”red”,node_size=15,width = 1, with_labels=True, font_size = 8)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geot_cluster-0.9.11.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geot_cluster-0.9.11-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file geot_cluster-0.9.11.tar.gz.

File metadata

  • Download URL: geot_cluster-0.9.11.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/44.1.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/2.7.17

File hashes

Hashes for geot_cluster-0.9.11.tar.gz
Algorithm Hash digest
SHA256 da5f68828c6a4ed759f150cfcd377fd95953f54ebd34b4b2821ef871c72df672
MD5 44c246e0ba1a56f65880deec1632bc71
BLAKE2b-256 24c216276547181ac064c0256933fbc2c841ec625de3f90412c314765e12ec8b

See more details on using hashes here.

File details

Details for the file geot_cluster-0.9.11-py3-none-any.whl.

File metadata

  • Download URL: geot_cluster-0.9.11-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/44.1.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/2.7.17

File hashes

Hashes for geot_cluster-0.9.11-py3-none-any.whl
Algorithm Hash digest
SHA256 5c96874b083d960f70b5a89ab755e899dc7957c52e59e6b74da15d48de766362
MD5 da95d1e8c2c183bc02e1d85ab4424c68
BLAKE2b-256 19274a46b8432d3b283d9359ff220b2e8cc8df1c266d5be4706bcfc0597167b6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page