Skip to main content

NYSIIS phonetic encoding algorithm.

Project description

NYSIIS Python Package Version

The pynysiis package provides a Python implementation of the New York State Identification and Intelligence System (NYSIIS) phonetic encoding algorithm. NYSIIS encodes names based on pronunciation, which is helpful in name-matching and searching applications.

Requirements

Python 2.7 and later.

Setup

You can install this package by using the pip tool and installing:

$ pip install pynysiis

Or:

$ easy_install pynysiis

Basic Usage

from nysiis import NYSIIS

encoder = NYSIIS()
name = "Watkins"
encoded_name = encoder.encode(name)
print(encoded_name)  # Output: WATCAN

Name Comparison

from nysiis import NYSIIS

encoder = NYSIIS()

# Compare similar names
name1 = "John Smith"
name2 = "John Smyth"

encoded_name1 = encoder.encode(name1)
encoded_name2 = encoder.encode(name2)

if encoded_name1 == encoded_name2:
    print("Names match phonetically")
else:
    print("Names are phonetically different")

# Output: Names match phonetically

Multi-Language Support

The NYSIIS encoder handles names from various languages:

from nysiis import NYSIIS

encoder = NYSIIS()

# Sample names from different languages
names = [
    # English names
    "Watkins",
    "Robert Johnson",

    # Yoruba name
    "Olanrewaju Akinyele",

    # Igbo name
    "Obinwanne Obiora",

    # Hausa name
    "Abdussalamu Abubakar",

    # Hindi name
    "Virat Kohli",

    # Urdu name
    "Usman Shah"
]

# Process each name
for name in names:
    encoded_name = encoder.encode(name)
    print(f"{name:<20} -> {encoded_name}")

# Output:
# Watkins              -> WATCAN
# Robert Johnson       -> RABART
# Olanrewaju Akinyele -> OLANRA
# Obinwanne Obiora    -> OBAWAN
# Abdussalamu Abubakar-> ABDASA
# Virat Kohli         -> VARATC
# Usman Shah          -> USNANS

Common Use Cases

Database Search Optimisation

def find_similar_names(search_name, database_names):
    encoder = NYSIIS()
    search_code = encoder.encode(search_name)

    matches = [
        name for name in database_names
        if encoder.encode(name) == search_code
    ]
    return matches

Name Deduplication

def find_duplicates(names):
    encoder = NYSIIS()
    encoded_names = {}

    for name in names:
        code = encoder.encode(name)
        encoded_names.setdefault(code, []).append(name)

    return {
        code: names
        for code, names in encoded_names.items()
        if len(names) > 1
    }

Fuzzy Name Matching

def match_names(name1, name2, encoder=None):
    if encoder is None:
        encoder = NYSIIS()

    return encoder.encode(name1) == encoder.encode(name2)

Best Practices

Reuse the Encoder Instance

# Good - create once, use many times
encoder = NYSIIS()
for name in large_name_list:
    encoded = encoder.encode(name)

# Less efficient - creating new instance repeatedly
for name in large_name_list:
    encoded = NYSIIS().encode(name)

Handle Empty Inputs

def process_name(name):
    if not name or not name.strip():
        return None

    encoder = NYSIIS()
    return encoder.encode(name)

Case Sensitivity

# The encoder handles case automatically
encoder = NYSIIS()
print(encoder.encode("smith"))  # Same as "SMITH"
print(encoder.encode("SMITH"))  # Same result

Reference

@inproceedings{Rajkovic2007,
  author    = {Petar Rajkovic and Dragan Jankovic},
  title     = {Adaptation and Application of Daitch-Mokotoff Soundex Algorithm on Serbian Names},
  booktitle = {XVII Conference on Applied Mathematics},
  editors   = {D. Herceg and H. Zarin},
  pages     = {193--204},
  year      = {2007},
  publisher = {Department of Mathematics and Informatics, Novi Sad},
  url       = {https://jmp.sh/hukNujCG}
}

Additional References

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pynysiis-1.0.6.tar.gz (17.7 kB view details)

Uploaded Source

Built Distribution

pynysiis-1.0.6-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file pynysiis-1.0.6.tar.gz.

File metadata

  • Download URL: pynysiis-1.0.6.tar.gz
  • Upload date:
  • Size: 17.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.7

File hashes

Hashes for pynysiis-1.0.6.tar.gz
Algorithm Hash digest
SHA256 1ce196a4f1f4a18a30a87a9f99874cccaa5bbb57d7f15b6cf1ce28e03370683d
MD5 9e219a0feeb15d95869d1e5987a83ef8
BLAKE2b-256 88634fac4b89e7e2855fbf2827945df251b5d48433b8d30b4937e844850cd2dc

See more details on using hashes here.

File details

Details for the file pynysiis-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: pynysiis-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.7

File hashes

Hashes for pynysiis-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 7d8413f813e78ca8a7bb4f151c0349b73ebc898c89dbdd00f1eca586e7f8aaf7
MD5 0741271fdad9ab5446ef80133fb06c6c
BLAKE2b-256 96ed389d1950f2dd7acc15293a5823884dd79bb9c6e8ed3f0550765374dcf0da

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page