eKorpkit provides a flexible interface for NLP and ML research pipelines such as extraction, transformation, tokenization, training, and visualization.
Project description
ekorpkit 【iːkɔːkɪt】 : eKonomic Research Python Toolkit
eKorpkit provides a flexible interface for NLP and ML research pipelines such as extraction, transformation, tokenization, training, and visualization. Its powerful config composition is backed by Hydra.
Key features
Easy Configuration
- You can compose your configuration dynamically, enabling you to easily get the perfect configuration for each research.
- You can override everything from the command line, which makes experimentation fast, and removes the need to maintain multiple similar configuration files.
- With a help of the eKonf class, it is also easy to compose configurations in a jupyter notebook environment.
No Boilerplate
- eKorpkit lets you focus on the problem at hand instead of spending time on boilerplate code like command line flags, loading configuration files, logging etc.
Workflows
- A workflow is a configurable automated process that will run one or more jobs.
- You can divide your research into several unit jobs (tasks), then combine those jobs into one workflow.
- You can have multiple workflows, each of which can perform a different set of tasks.
Sharable and Reproducible
- With eKorpkit, you can easily share your datasets and models.
- Sharing configs along with datasets and models makes every research reproducible.
- You can share each unit jobs or an entire workflow.
Pluggable Architecture
- eKorpkit has a pluggable architecture, enabling it to combine with your own implementation.
Tutorials
Tutorials for ekorpkit package can be found at https://entelecheia.github.io/ekorpkit-book/
Installation
Install the latest version of ekorpkit:
pip install ekorpkit
To install all extra dependencies,
pip install ekorpkit[all]
The eKorpkit Corpus
The eKorpkit Corpus is a large, diverse, bilingual (ko/en) language modelling dataset.
Citation
@software{lee_2022_6497226,
author = {Young Joon Lee},
title = {eKorpkit: eKonomic Research Python Toolkit},
month = apr,
year = 2022,
publisher = {Zenodo},
doi = {10.5281/zenodo.6497226},
url = {https://doi.org/10.5281/zenodo.6497226}
}
@software{lee_2022_ekorpkit,
author = {Young Joon Lee},
title = {eKorpkit: eKonomic Research Python Toolkit},
month = apr,
year = 2022,
publisher = {GitHub},
url = {https://github.com/entelecheia/ekorpkit}
}
License
- eKorpkit is licensed under the MIT License. This license covers the eKorpkit package and all of its components.
- Each corpus adheres to its own license policy. Please check the license of the corpus before using it!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for ekorpkit-0.1.40.post0.dev14.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8640463cbce38914448304b00caef1dd6c7113d9d5b23e64c11b7640f7d9a44f |
|
MD5 | 70abbb7b543ad31286aa335ffb6d8d70 |
|
BLAKE2b-256 | e8d9680fa6c5cf5e97b44c1a8fab3a7b64eb1d54b123a697313f1f212727b04b |
Close
Hashes for ekorpkit-0.1.40.post0.dev14-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bcb5f541e4c365b02db8adbf3cd7dbbef81d945e17854f56abd6e86e4c6e261e |
|
MD5 | 871f1bea07fe4d9b971611922ccbd079 |
|
BLAKE2b-256 | b887e7d1999a9f42d868f1a9ebfc5906e07d9714c011174d5b7bff3c9eeb7cf8 |