eKorpkit provides a flexible interface for corpus management and analysis pipelines such as extraction, transformation, tokenization, training, and visualization.
Project description
ekorpkit[iːkɔːkɪt]: (e)nglish (K)orean C(orp)us Tool(kit)
eKorpkit provides a flexible interface for corpus management and analysis pipelines such as extraction, transformation, tokenization, training, and visualization.
- Powerful config composition backed by Hydra - Easily swap out corpora, datasets, models, preprocessors, visualizers and many more configurations without touching the code.
Tutorials
Tutorials for ekorpkit package can be found at https://entelecheia.github.io/ekorpkit-config/
Installation
Install the latest version of ekorpit:
pip install ekorpkit
To install all extra dependencies,
pip install ekorpkit[all]
The eKorpkit Corpus
The eKorpkit Corpus is a large, diverse, bilingual (ko/en) language modelling dataset.
Citation
@software{lee_2022_6497226,
author = {Young Joon Lee},
title = {eKorpkit: English Korean Corpus Toolkit},
month = apr,
year = 2022,
publisher = {Zenodo},
doi = {10.5281/zenodo.6497226},
url = {https://doi.org/10.5281/zenodo.6497226}
}
@software{lee_2022_ekorpkit,
author = {Young Joon Lee},
title = {eKorpkit: English Korean Corpus Toolkit},
month = apr,
year = 2022,
publisher = {GitHub},
url = {https://github.com/entelecheia/ekorpkit}
}
License
- eKorpkit is licensed under the Creative Commons License(CCL) 4.0 CC-BY. This license covers the eKorpkit package and all of its components.
- Each corpus adheres to its own license policy. Please check the license of the corpus before using it!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ekorpkit-0.1.30.tar.gz
(6.8 MB
view hashes)
Built Distribution
Close
Hashes for ekorpkit-0.1.30-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49ee3caec23722888a1ba100c36479940f2fc57b08d9e2e6489f665beaea2556 |
|
MD5 | 6b5d51a9ec70a7ac806fffeff8081e40 |
|
BLAKE2b-256 | b32e50664f2e9179e92db51501e7a61889c08156ec16b74f4f3c028e0138e523 |