Skip to main content

A Python Library for Tokenizers

Project description

Lexikanon: A HyFI-based library for Tokenizers

pypi-image version-image release-date-image license-image DOI codecov jupyter-book-image

A HyFI-based library for the creation, training, and utilization of tokenizers.

Lexikanon is a high-performance Python library specifically engineered for the creation, training, and utilization of tokenizers, which are fundamental components in both natural language processing (NLP) and artificial intelligence (AI). Drawing its name from the Greek words λέξη (meaning "word") and κάνων (meaning "maker"), Lexikanon encapsulates its primary purpose of enabling users to develop robust tokenizers tailored for different languages and specific tasks. Built on the Hydra Fast Interface (HyFI) framework, Lexikanon stands as a HyFI-based library. This makes it seamlessly pluggable into any HyFI-oriented project, although it can also function as a standalone library.

Citation

@software{lee_2023_8248118,
  author       = {Young Joon Lee},
  title        = {Lexikanon: A HyFI-based library for Tokenizers},
  month        = aug,
  year         = 2023,
  publisher    = {Zenodo},
  version      = {v0.6.2},
  doi          = {10.5281/zenodo.8248117},
  url          = {https://doi.org/10.5281/zenodo.8248117}
}
@software{lee_2023_hyfi,
  author       = {Young Joon Lee},
  title        = {Lexikanon: A HyFI-based library for Tokenizers},
  year         = 2023,
  publisher    = {GitHub},
  url          = {https://github.com/entelecheia/lexikanon}
}

Changelog

See the CHANGELOG for more information.

Contributing

Contributions are welcome! Please see the contributing guidelines for more information.

License

This project is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lexikanon-0.6.5.tar.gz (842.2 kB view hashes)

Uploaded Source

Built Distribution

lexikanon-0.6.5-py3-none-any.whl (853.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page