Skip to main content

NYSIIS phonetic encoding algorithm.

Project description

NYSIIS Python Package Version

The pynysiis package provides a Python implementation of the New York State Identification and Intelligence System (NYSIIS) phonetic encoding algorithm. NYSIIS encodes names based on pronunciation, which is helpful in name-matching and searching applications.

Requirements

Python 2.7 and later.

Setup

You can install this package by using the pip tool and installing:

$ pip install pynysiis

Or:

$ easy_install pynysiis

Basic Usage

from nysiis import NYSIIS

encoder = NYSIIS()
name = "Watkins"
encoded_name = encoder.encode(name)
print(encoded_name)  # Output: WATCAN

Name Comparison

from nysiis import NYSIIS

encoder = NYSIIS()

# Compare similar names
name1 = "John Smith"
name2 = "John Smyth"

encoded_name1 = encoder.encode(name1)
encoded_name2 = encoder.encode(name2)

if encoded_name1 == encoded_name2:
    print("Names match phonetically")
else:
    print("Names are phonetically different")

# Output: Names match phonetically

Multi-Language Support

The NYSIIS encoder handles names from various languages:

from nysiis import NYSIIS

encoder = NYSIIS()

# Sample names from different languages
names = [
    # English names
    "Watkins",
    "Robert Johnson",

    # Yoruba name
    "Olanrewaju Akinyele",

    # Igbo name
    "Obinwanne Obiora",

    # Hausa name
    "Abdussalamu Abubakar",

    # Hindi name
    "Virat Kohli",

    # Urdu name
    "Usman Shah"
]

# Process each name
for name in names:
    encoded_name = encoder.encode(name)
    print(f"{name:<20} -> {encoded_name}")

# Output:
# Watkins              -> WATCAN
# Robert Johnson       -> RABART
# Olanrewaju Akinyele -> OLANRA
# Obinwanne Obiora    -> OBAWAN
# Abdussalamu Abubakar-> ABDASA
# Virat Kohli         -> VARATC
# Usman Shah          -> USNANS

Common Use Cases

Database Search Optimisation

def find_similar_names(search_name, database_names):
    encoder = NYSIIS()
    search_code = encoder.encode(search_name)

    matches = [
        name for name in database_names
        if encoder.encode(name) == search_code
    ]
    return matches

Name Deduplication

def find_duplicates(names):
    encoder = NYSIIS()
    encoded_names = {}

    for name in names:
        code = encoder.encode(name)
        encoded_names.setdefault(code, []).append(name)

    return {
        code: names
        for code, names in encoded_names.items()
        if len(names) > 1
    }

Fuzzy Name Matching

def match_names(name1, name2, encoder=None):
    if encoder is None:
        encoder = NYSIIS()

    return encoder.encode(name1) == encoder.encode(name2)

Best Practices

Reuse the Encoder Instance

# Good - create once, use many times
encoder = NYSIIS()
for name in large_name_list:
    encoded = encoder.encode(name)

# Less efficient - creating new instance repeatedly
for name in large_name_list:
    encoded = NYSIIS().encode(name)

Handle Empty Inputs

def process_name(name):
    if not name or not name.strip():
        return None

    encoder = NYSIIS()
    return encoder.encode(name)

Case Sensitivity

# The encoder handles case automatically
encoder = NYSIIS()
print(encoder.encode("smith"))  # Same as "SMITH"
print(encoder.encode("SMITH"))  # Same result

Reference

@inproceedings{Rajkovic2007,
  author    = {Petar Rajkovic and Dragan Jankovic},
  title     = {Adaptation and Application of Daitch-Mokotoff Soundex Algorithm on Serbian Names},
  booktitle = {XVII Conference on Applied Mathematics},
  editors   = {D. Herceg and H. Zarin},
  pages     = {193--204},
  year      = {2007},
  publisher = {Department of Mathematics and Informatics, Novi Sad},
  url       = {https://jmp.sh/hukNujCG}
}

Additional References

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pynysiis-1.0.5.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

pynysiis-1.0.5-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file pynysiis-1.0.5.tar.gz.

File metadata

  • Download URL: pynysiis-1.0.5.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.7

File hashes

Hashes for pynysiis-1.0.5.tar.gz
Algorithm Hash digest
SHA256 de9fa876d8515399cfd06c694a72e156ec701d4f9205b82928c4b7f0943b2000
MD5 3ef07708dd464d9bbe9b5259f60dc904
BLAKE2b-256 7cc3c19c9838a81abae36bd3cf30b25d44e7126c61acb26694920e41552533ae

See more details on using hashes here.

File details

Details for the file pynysiis-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: pynysiis-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.7

File hashes

Hashes for pynysiis-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 c95e1a33d3336393e0df04fd2af01739706c44813d5d3da69ec75091b23c2623
MD5 2a6dce242839e881ba291fb55316c6be
BLAKE2b-256 c441d31b3d823ea095ff0afb6b0deae41d0c6b99c0956166a7b0a0c45adedc49

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page