Skip to main content

Converts English tokens into the equivalent Sinhala representation using IPA (International Phonetic Alphabet)

Project description

SEETM (Sinhala-English Equivalent Token Mapper) allows creating equivalent token maps and replace them with a base token to avoid OOV tokens and generate a single feature for all equivalent tokens in a Sinhala-English code-switching dataset in rasa-based conversational AIs.

Features

  • Allows mapping multiple equivalent tokens into a base token
  • Fully supports rasa 2.8.x projects
  • Provides an easy-to-use CLI
  • Provides an efficient server-based GUI
  • Provides a fully-functional custom whitespace tokenizer
  • Fully-supports Sinhala in the GUI

What's Cooking?

  • Mapping suggestions in the SEETM server GUI
  • Automatically generated mappings

Limitations and Known Issues

  • Should manually add the SEETM tokenizer to the rasa pipeline or else the token maps are not taking any effect
  • IPA-based suggestions could contain slight changes based on th IPA mapping origin. (SEETM uses CMU)

Resources and References

📒 Docs: https://seetm.github.io
📦 PyPi: https://pypi.org/project/seetm/1.1.0/
🪵 Full Changelog: https://github.com/SEETM-NLP/seetm/blob/main/CHANGELOG.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seetm-1.1.0.tar.gz (44.1 kB view details)

Uploaded Source

Built Distribution

seetm-1.1.0-py3-none-any.whl (990.9 kB view details)

Uploaded Python 3

File details

Details for the file seetm-1.1.0.tar.gz.

File metadata

  • Download URL: seetm-1.1.0.tar.gz
  • Upload date:
  • Size: 44.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for seetm-1.1.0.tar.gz
Algorithm Hash digest
SHA256 c603bb4d826b83f8a7647e218520a269f017e20e1db135ad33dc71ef56e15a6c
MD5 f28d6fff09e1d29ca2074f924d75519e
BLAKE2b-256 6cef3eee380322772f929f0a098b6a20c40b5518a75b884e1621e0db49ce1c7a

See more details on using hashes here.

File details

Details for the file seetm-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: seetm-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 990.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for seetm-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7bd5d8ff812bfc8023832ee721de5b75905d9cddf85d099d4327b6f6eba1f84c
MD5 a4658baa2845eb853825d03582e4e94d
BLAKE2b-256 3e412e332140da4aabba170b016fe2d69b9c32d341341bf55822cd6fdf2288f4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page