Skip to main content

A Python Library for Tokenizers

Project description

Lexikanon: A Python Library for Tokenizers

pypi-image license-image version-image release-date-image jupyter-book-image

Lexikanon is a robust and efficient Python library designed for creating, training, and deploying tokenizers, an essential component in natural language processing (NLP) and artificial intelligence (AI) applications. The name Lexikanon originates from the Greek words λέξη (word) and κάνων (maker), reflecting the library's purpose in enabling users to build powerful tokenizers for various languages and tasks.

Features

Lexikanon offers an extensive set of features, making it suitable for both newcomers and experienced professionals in the NLP domain:

  • Intuitive API: Lexikanon's easy-to-use API allows users to create, train, and utilize tokenizers with just a few lines of code, ensuring a seamless experience.

  • Wide range of tokenization techniques: The library supports various tokenization methods, including rule-based, statistical, and subword tokenization, catering to diverse requirements and use cases.

  • Multilingual support: Lexikanon is designed with a focus on multilingualism, providing support for a broad range of languages and seamless integration with other language resources and tools.

  • Customizability: Users can build custom tokenizers from the ground up or modify existing ones, offering complete control over tokenization rules, training data, and output formats.

  • Efficient processing: Lexikanon utilizes advanced algorithms and data structures to ensure high-performance tokenization, even on large-scale text corpora.

  • Pre-trained tokenizers: The library includes a collection of pre-trained tokenizers for various languages and domains, enabling users to take advantage of transfer learning and quickly adapt these tokenizers to their specific needs.

Installation

You can install Lexikanon using pip:

pip install lexikanon

Getting Started

To begin working with Lexikanon, visit the official documentation and the GitHub repository for examples, tutorials, and additional information.

Changelog

See the CHANGELOG for more information.

Contributing

Contributions are welcome! Please see the contributing guidelines for more information.

License

This project is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lexikanon-0.1.0.tar.gz (8.9 kB view hashes)

Uploaded Source

Built Distribution

lexikanon-0.1.0-py3-none-any.whl (11.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page