Skip to main content

NYSIIS phonetic encoding algorithm.

Project description

NYSIIS Python Package Version

The pynysiis package provides a Python implementation of the New York State Identification and Intelligence System (NYSIIS) phonetic encoding algorithm. NYSIIS encodes names based on pronunciation, which is helpful in name-matching and searching applications.

Requirements

Python 3.8 and later.

Setup

You can install this package by using the pip tool and installing:

$ pip install pynysiis

Or:

$ easy_install pynysiis

Basic Usage

from nysiis import NYSIIS

encoder = NYSIIS()
name = "Watkins"
encoded_name = encoder.encode(name)
print(encoded_name)  # Output: WATCAN

Name Comparison

from nysiis import NYSIIS

encoder = NYSIIS()

# Compare similar names
name1 = "John Smith"
name2 = "John Smyth"

encoded_name1 = encoder.encode(name1)
encoded_name2 = encoder.encode(name2)

if encoded_name1 == encoded_name2:
    print("Names match phonetically")
else:
    print("Names are phonetically different")

# Output: Names match phonetically

Multi-Language Support

The NYSIIS encoder handles names from various languages:

from nysiis import NYSIIS

encoder = NYSIIS()

# Sample names from different languages
names = [
    # English names
    "Watkins",
    "Robert Johnson",

    # Yoruba name
    "Olanrewaju Akinyele",

    # Igbo name
    "Obinwanne Obiora",

    # Hausa name
    "Abdussalamu Abubakar",

    # Hindi name
    "Virat Kohli",

    # Urdu name
    "Usman Shah"
]

# Process each name
for name in names:
    encoded_name = encoder.encode(name)
    print(f"{name:<20} -> {encoded_name}")

# Output:
# Watkins              -> WATCAN
# Robert Johnson       -> RABART
# Olanrewaju Akinyele -> OLANRA
# Obinwanne Obiora    -> OBAWAN
# Abdussalamu Abubakar-> ABDASA
# Virat Kohli         -> VARATC
# Usman Shah          -> USNANS

Common Use Cases

Database Search Optimisation

def find_similar_names(search_name, database_names):
    encoder = NYSIIS()
    search_code = encoder.encode(search_name)

    matches = [
        name for name in database_names
        if encoder.encode(name) == search_code
    ]
    return matches

Name Deduplication

def find_duplicates(names):
    encoder = NYSIIS()
    encoded_names = {}

    for name in names:
        code = encoder.encode(name)
        encoded_names.setdefault(code, []).append(name)

    return {
        code: names
        for code, names in encoded_names.items()
        if len(names) > 1
    }

Fuzzy Name Matching

def match_names(name1, name2, encoder=None):
    if encoder is None:
        encoder = NYSIIS()

    return encoder.encode(name1) == encoder.encode(name2)

Best Practices

Reuse the Encoder Instance

# Good - create once, use many times
encoder = NYSIIS()
for name in large_name_list:
    encoded = encoder.encode(name)

# Less efficient - creating new instance repeatedly
for name in large_name_list:
    encoded = NYSIIS().encode(name)

Handle Empty Inputs

def process_name(name):
    if not name or not name.strip():
        return None

    encoder = NYSIIS()
    return encoder.encode(name)

Case Sensitivity

# The encoder handles case automatically
encoder = NYSIIS()
print(encoder.encode("smith"))  # Same as "SMITH"
print(encoder.encode("SMITH"))  # Same result

Reference

@inproceedings{Rajkovic2007,
  author    = {Petar Rajkovic and Dragan Jankovic},
  title     = {Adaptation and Application of Daitch-Mokotoff Soundex Algorithm on Serbian Names},
  booktitle = {XVII Conference on Applied Mathematics},
  editors   = {D. Herceg and H. Zarin},
  pages     = {193--204},
  year      = {2007},
  publisher = {Department of Mathematics and Informatics, Novi Sad},
  url       = {https://jmp.sh/hukNujCG}
}

Additional References

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pynysiis-1.0.7.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

pynysiis-1.0.7-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file pynysiis-1.0.7.tar.gz.

File metadata

  • Download URL: pynysiis-1.0.7.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.7

File hashes

Hashes for pynysiis-1.0.7.tar.gz
Algorithm Hash digest
SHA256 461f4f5e499a7a33298e1afdfc4848880e8fda89603a9ff7e0b467afa90e305d
MD5 328a53ff9fb63c7960b7763f0037ba0d
BLAKE2b-256 8afa804e0d3cb7bbfb23f0b44e874a2a0d5ff681a083c9f0835e32eef09ab3c4

See more details on using hashes here.

File details

Details for the file pynysiis-1.0.7-py3-none-any.whl.

File metadata

  • Download URL: pynysiis-1.0.7-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.7

File hashes

Hashes for pynysiis-1.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 2d624de7a7fb051fc73ce4bb021d645b634bc29445121bb65f3ff1d6e528bffc
MD5 509641b45f6af89d61cd811e10fef18b
BLAKE2b-256 d1cb07985e562b078c7a8ebde8302f383e34fd4353e86570ea994b640bddcf61

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page