Skip to main content

A Python package for Cleansing Matching

Project description

What is it?

Package for clean text to alfabeth only, clean text to number only, clean name, split name, clean nik, validation format nik and text scoring similarity

Installation

pip install claming

How to use

Import package

from claming import Cleansing, Matching

Define function

clean = Cleansing()
match = Matching()

Alfabeth only

user params case sensitive : upper, lower, capitalize or title, default is capitalize

clean.alfabeth_only('+62 adalah kode negara Indonesia', case_sensitive='capitalize')
# Result
# Adalah kode negara indonesia

Number only

user params output_type : int or str, default is int

clean.number_only("+6281234123412", output_type='int')
# Result
# 6281234123412

Clean name

case sensitive : upper, lower, capitalize or title, default is upper

clean.clean_name(' John D3ve.r  Smith')
# Result
# {'input': ' John D3ve.r  Smith', 'output': 'JOHN DVER SMITH'}

Clean NIK

user params output_type : int or str, default is str

clean.clean_nik(3212300808080003)
# Result
# {'input': '3212300808080003', 'output': '3212300808080003', 'description': 'NIK format is correct'}

Split name

case sensitive : upper, lower, capitalize or title, default is upper
num_split : number of split 2 or 3, default is 3
when num_split is 2 : then the first name will be the first word and the last name will be the second until the last word
when num_split is 3 : then the first name will be the first word, the middle name will be the second word and the last name will be the third until the last word

clean.split_name(' John D3ve.r  Smith', num_split=3)
# Result
# {'original_name': ' John D3ve.r  Smith', 'full_name': 'JOHN DVER SMITH', 'first_name': 'JOHN', 'middle_name': 'DVER', 'last_name': 'SMITH'}

clean.split_name(' John D3ve.r  Smith', num_split=2)
# Result
# {'original_name': ' John D3ve.r  Smith', 'full_name': 'JOHN DVER SMITH', 'first_name': 'JOHN', 'last_name': 'DVER SMITH'}

Text similarity

match.exact_match('JOHN DVER SMITH', 'JOHN DVER SMITH')
# Result
# {'first_text': 'JOHN DVER SMITH', 'second_text': 'JOHN DVER SMITH', 'score': 1, 'max_score': 1}

match.levenshtein_match('JOHN DVER SMITH', 'JOHN DVER SMITH')
# Result
# {'first_text': 'JOHN DVER SMITH', 'second_text': 'JOHN DVER SMITH', 'score': 1.0, 'max_score': 1}

match.part_exact_match('JOHN DVER SMITH', 'JOHN DVER SMITH')
# Result
# {'first_text': 'JOHN DVER SMITH', 'second_text': 'JOHN DVER SMITH', 'score': 1.0, 'max_score': 1}

match.part_levenshtein_match('JOHN DVER SMITH', 'JOHN DVER SMITH')
# Result
# {'first_text': 'JOHN DVER SMITH', 'second_text': 'JOHN DVER SMITH', 'score': 3.0, 'max_score': 3}

match.all_method_match('JOHN DVER SMITH', 'JOHN DVER SMITH')
# Result
# {'first_text': 'JOHN DVER SMITH', 'second_text': 'JOHN DVER SMITH', 'first_text_clean': 'JOHN DVER SMITH', 'second_text_clean': 'JOHN DVER SMITH', 'exact_match': {'score': 1, 'max_score': 1}, 'levenshtein': {'score': 1.0, 'max_score': 1}, 'part_exact_match': {'score': 3, 'max_score': 3}, 'part_levenshtein': {'score': 3.0, 'max_score': 3}}

Project details


Release history Release notifications | RSS feed

This version

1.5

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claming-1.5.tar.gz (3.8 kB view hashes)

Uploaded Source

Built Distribution

claming-1.5-py3-none-any.whl (4.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page