Skip to main content

A Python Library for Tokenizers

Project description

Lexikanon: A HyFI-based library for Tokenizers

pypi-image version-image release-date-image license-image DOI codecov jupyter-book-image

A HyFI-based library for the creation, training, and utilization of tokenizers.

Lexikanon is a high-performance Python library specifically engineered for the creation, training, and utilization of tokenizers, which are fundamental components in both natural language processing (NLP) and artificial intelligence (AI). Drawing its name from the Greek words λέξη (meaning "word") and κάνων (meaning "maker"), Lexikanon encapsulates its primary purpose of enabling users to develop robust tokenizers tailored for different languages and specific tasks. Built on the Hydra Fast Interface (HyFI) framework, Lexikanon stands as a HyFI-based library. This makes it seamlessly pluggable into any HyFI-oriented project, although it can also function as a standalone library.

Citation

@software{lee_2023_8248118,
  author       = {Young Joon Lee},
  title        = {Lexikanon: A HyFI-based library for Tokenizers},
  month        = aug,
  year         = 2023,
  publisher    = {Zenodo},
  version      = {v0.6.2},
  doi          = {10.5281/zenodo.8248117},
  url          = {https://doi.org/10.5281/zenodo.8248117}
}
@software{lee_2023_hyfi,
  author       = {Young Joon Lee},
  title        = {Lexikanon: A HyFI-based library for Tokenizers},
  year         = 2023,
  publisher    = {GitHub},
  url          = {https://github.com/entelecheia/lexikanon}
}

Changelog

See the CHANGELOG for more information.

Contributing

Contributions are welcome! Please see the contributing guidelines for more information.

License

This project is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lexikanon-0.6.5.tar.gz (842.2 kB view details)

Uploaded Source

Built Distribution

lexikanon-0.6.5-py3-none-any.whl (853.9 kB view details)

Uploaded Python 3

File details

Details for the file lexikanon-0.6.5.tar.gz.

File metadata

  • Download URL: lexikanon-0.6.5.tar.gz
  • Upload date:
  • Size: 842.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.10.0 readme-renderer/43.0 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/2.2.1 tqdm/4.66.2 importlib-metadata/7.1.0 keyring/25.0.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.10.12

File hashes

Hashes for lexikanon-0.6.5.tar.gz
Algorithm Hash digest
SHA256 b1025bc9d6fc81463ef31971636e25364ae451210fcd15043dde7fcebcc22c1f
MD5 4a54c777456c1dbb0c13615d3367a651
BLAKE2b-256 6e11b8d38d9af44fc40a147985670b51edaf216546a65af0792b73f747787f8f

See more details on using hashes here.

File details

Details for the file lexikanon-0.6.5-py3-none-any.whl.

File metadata

  • Download URL: lexikanon-0.6.5-py3-none-any.whl
  • Upload date:
  • Size: 853.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.10.0 readme-renderer/43.0 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/2.2.1 tqdm/4.66.2 importlib-metadata/7.1.0 keyring/25.0.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.10.12

File hashes

Hashes for lexikanon-0.6.5-py3-none-any.whl
Algorithm Hash digest
SHA256 672cbddd2541202f7bfdd811f908037f6888dfba9d49ed28f5dc58e844504791
MD5 b87687db742f8c7da385d40c0b97df69
BLAKE2b-256 f575b70612025dcfd98df0451d9ec1b682bb2264a0a3b29d94ee5219e0759e07

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page