A Python Library for Tokenizers
Project description
Lexikanon: A Python Library for Tokenizers
- Documentation: https://lexikanon.entelecheia.ai
- GitHub: https://github.com/entelecheia/lexikanon
- PyPI: https://pypi.org/project/lexikanon
Lexikanon is a robust and efficient Python library designed for creating, training, and deploying tokenizers, an essential component in natural language processing (NLP) and artificial intelligence (AI) applications. The name Lexikanon originates from the Greek words λέξη (word) and κάνων (maker), reflecting the library's purpose in enabling users to build powerful tokenizers for various languages and tasks.
Features
Lexikanon offers an extensive set of features, making it suitable for both newcomers and experienced professionals in the NLP domain:
-
Intuitive API: Lexikanon's easy-to-use API allows users to create, train, and utilize tokenizers with just a few lines of code, ensuring a seamless experience.
-
Wide range of tokenization techniques: The library supports various tokenization methods, including rule-based, statistical, and subword tokenization, catering to diverse requirements and use cases.
-
Multilingual support: Lexikanon is designed with a focus on multilingualism, providing support for a broad range of languages and seamless integration with other language resources and tools.
-
Customizability: Users can build custom tokenizers from the ground up or modify existing ones, offering complete control over tokenization rules, training data, and output formats.
-
Efficient processing: Lexikanon utilizes advanced algorithms and data structures to ensure high-performance tokenization, even on large-scale text corpora.
-
Pre-trained tokenizers: The library includes a collection of pre-trained tokenizers for various languages and domains, enabling users to take advantage of transfer learning and quickly adapt these tokenizers to their specific needs.
Installation
You can install Lexikanon using pip:
pip install lexikanon
Getting Started
To begin working with Lexikanon, visit the official documentation and the GitHub repository for examples, tutorials, and additional information.
Changelog
See the CHANGELOG for more information.
Contributing
Contributions are welcome! Please see the contributing guidelines for more information.
License
This project is released under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for lexikanon-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0dc2c0711939bc0afc2fd67bf63101f3f6ee6bc3e36c625cd6639e95dbf3c344 |
|
MD5 | 399d3b356f7c09cfbab4e15914fefcb4 |
|
BLAKE2b-256 | 852cdf007839e17a34fd3e156f23a65e8d5da4cb60af9e4667afa4d0d6377077 |