Implementation of the ARCoder Transliterated Arabic Name Matching Encoding Algorithm
Project description
ARCoder
arcoder.py implements an abstract base class called Encoder
along with two concrete
implementations: ARCoder
and Holmes
. Each has an encode
method that takes a
string and encodes it into a series of symbols designed to be used for similarity
measurements.
>>> from arcoder import ARCoder, Holmes
>>> a = ARCoder()
>>> a.encode("Sohaib")
['suhaeb', 'suhib']
>>> h = Holmes()
>>> h.encode("Sohaib")
['sohayb']
The ARCoder algorithm is described more fully in Moore, J., Hamid, S., and Bromberger, S.: "An Evaluation of Transliterated Arabic Name Matching Methods".
The Holmes implementation is derived from Holmes, D., Kashfi, S., Aqeel, S. U.: "Transliterated arabic name search", Communications, Internet, and Information Technology, pp. 267-273. (2004).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.