NYSIIS phonetic encoding algorithm.
Project description
The pynysiis package provides a Python implementation of the New York State Identification and Intelligence System (NYSIIS) phonetic encoding algorithm. NYSIIS encodes names based on pronunciation, which is helpful in name-matching and searching applications.
Requirements
Python 2.7 and later.
Setup
You can install this package by using the pip tool and installing:
$ pip install pynysiis
Or:
$ easy_install pynysiis
Basic Usage
from nysiis import NYSIIS
encoder = NYSIIS()
name = "Watkins"
encoded_name = encoder.encode(name)
print(encoded_name) # Output: WATCAN
Name Comparison
from nysiis import NYSIIS
encoder = NYSIIS()
# Compare similar names
name1 = "John Smith"
name2 = "John Smyth"
encoded_name1 = encoder.encode(name1)
encoded_name2 = encoder.encode(name2)
if encoded_name1 == encoded_name2:
print("Names match phonetically")
else:
print("Names are phonetically different")
# Output: Names match phonetically
Multi-Language Support
The NYSIIS encoder handles names from various languages:
from nysiis import NYSIIS
encoder = NYSIIS()
# Sample names from different languages
names = [
# English names
"Watkins",
"Robert Johnson",
# Yoruba name
"Olanrewaju Akinyele",
# Igbo name
"Obinwanne Obiora",
# Hausa name
"Abdussalamu Abubakar",
# Hindi name
"Virat Kohli",
# Urdu name
"Usman Shah"
]
# Process each name
for name in names:
encoded_name = encoder.encode(name)
print(f"{name:<20} -> {encoded_name}")
# Output:
# Watkins -> WATCAN
# Robert Johnson -> RABART
# Olanrewaju Akinyele -> OLANRA
# Obinwanne Obiora -> OBAWAN
# Abdussalamu Abubakar-> ABDASA
# Virat Kohli -> VARATC
# Usman Shah -> USNANS
Common Use Cases
Database Search Optimisation
def find_similar_names(search_name, database_names):
encoder = NYSIIS()
search_code = encoder.encode(search_name)
matches = [
name for name in database_names
if encoder.encode(name) == search_code
]
return matches
Name Deduplication
def find_duplicates(names):
encoder = NYSIIS()
encoded_names = {}
for name in names:
code = encoder.encode(name)
encoded_names.setdefault(code, []).append(name)
return {
code: names
for code, names in encoded_names.items()
if len(names) > 1
}
Fuzzy Name Matching
def match_names(name1, name2, encoder=None):
if encoder is None:
encoder = NYSIIS()
return encoder.encode(name1) == encoder.encode(name2)
Best Practices
Reuse the Encoder Instance
# Good - create once, use many times
encoder = NYSIIS()
for name in large_name_list:
encoded = encoder.encode(name)
# Less efficient - creating new instance repeatedly
for name in large_name_list:
encoded = NYSIIS().encode(name)
Handle Empty Inputs
def process_name(name):
if not name or not name.strip():
return None
encoder = NYSIIS()
return encoder.encode(name)
Case Sensitivity
# The encoder handles case automatically
encoder = NYSIIS()
print(encoder.encode("smith")) # Same as "SMITH"
print(encoder.encode("SMITH")) # Same result
Reference
@inproceedings{Rajkovic2007,
author = {Petar Rajkovic and Dragan Jankovic},
title = {Adaptation and Application of Daitch-Mokotoff Soundex Algorithm on Serbian Names},
booktitle = {XVII Conference on Applied Mathematics},
editors = {D. Herceg and H. Zarin},
pages = {193--204},
year = {2007},
publisher = {Department of Mathematics and Informatics, Novi Sad},
url = {https://jmp.sh/hukNujCG}
}
Additional References
License
This project is licensed under the MIT License.
Copyright
Copyright © 2024 Finbarrs Oketunji. All Rights Reserved.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pynysiis-1.0.6.tar.gz
.
File metadata
- Download URL: pynysiis-1.0.6.tar.gz
- Upload date:
- Size: 17.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.8.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1ce196a4f1f4a18a30a87a9f99874cccaa5bbb57d7f15b6cf1ce28e03370683d |
|
MD5 | 9e219a0feeb15d95869d1e5987a83ef8 |
|
BLAKE2b-256 | 88634fac4b89e7e2855fbf2827945df251b5d48433b8d30b4937e844850cd2dc |
File details
Details for the file pynysiis-1.0.6-py3-none-any.whl
.
File metadata
- Download URL: pynysiis-1.0.6-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.8.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7d8413f813e78ca8a7bb4f151c0349b73ebc898c89dbdd00f1eca586e7f8aaf7 |
|
MD5 | 0741271fdad9ab5446ef80133fb06c6c |
|
BLAKE2b-256 | 96ed389d1950f2dd7acc15293a5823884dd79bb9c6e8ed3f0550765374dcf0da |