Skip to main content

Converts English tokens into the equivalent Sinhala representation using IPA (International Phonetic Alphabet)

Project description

SEETM (Sinhala-English Equivalent Token Mapper) allows creating equivalent token maps and replace them with a base token to avoid OOV tokens and generate a single feature for all equivalent tokens in a Sinhala-English code-switching dataset in rasa-based conversational AIs.

Features

  • Allows mapping multiple equivalent tokens into a base token
  • Fully supports rasa 2.8.x projects
  • Provides an easy-to-use CLI
  • Provides an efficient server-based GUI
  • Provides a fully-functional custom whitespace tokenizer
  • Fully-supports Sinhala in the GUI

What's Cooking?

  • Mapping suggestions in the SEETM server GUI
  • Automatically generated mappings

Limitations and Known Issues

  • Should manually add the SEETM tokenizer to the rasa pipeline or else the token maps are not taking any effect
  • IPA-based suggestions could contain slight changes based on th IPA mapping origin. (SEETM uses CMU)

Resources and References

📒 Docs: https://seetm.github.io
📦 PyPi: https://pypi.org/project/seetm/1.0.1/
🪵 Full Changelog: https://github.com/SEETM-NLP/seetm/blob/main/CHANGELOG.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seetm-1.0.1.tar.gz (963.5 kB view details)

Uploaded Source

Built Distribution

seetm-1.0.1-py3-none-any.whl (990.1 kB view details)

Uploaded Python 3

File details

Details for the file seetm-1.0.1.tar.gz.

File metadata

  • Download URL: seetm-1.0.1.tar.gz
  • Upload date:
  • Size: 963.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for seetm-1.0.1.tar.gz
Algorithm Hash digest
SHA256 952988d2492d4d87a88f5cf8f1b0a28aa4b1323f2e94e7e19dbe22d3506f1b69
MD5 378196b38a87d34862da3acbf26f1bcb
BLAKE2b-256 9d43ddb0479fe8c2edad4f4695f8248fad38af71f848ba0717529037add18fba

See more details on using hashes here.

File details

Details for the file seetm-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: seetm-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 990.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for seetm-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d4c67e68bdadd70f8a546e95eebfe3650b4d2bdc1fe42919153d976924b60c61
MD5 70d078b37585228ba953433908359c49
BLAKE2b-256 e96339fe0e49ab7185c54042a92a11225c9ac0a6f8330c3c7eafdb2153d9662f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page