A library to easily access kocohub datasets
Project description
koco
koco is a library to easily access kocohub datasets.
kocohub contains KOrean COrpus for natural language processing.
Installation
NOTE: The code is tested on
Python 3.6.9
from pypi
pip install koco
from source
git clone https://github.com/inmoonlight/koco
cd koco
pip install .
Usage
Using koco is similar to nlp. The main methods are:
koco.list_datasets(): list all available datasets and their modes inkocohubkoco.load_dataset(dataset_name, mode): load dataset inkocohubwith data-specific mode
example
>>> import koco
>>> koco.list_datasets()
{'korean-hate-speech': ['train_dev', 'unlabeled', 'test'],
'sae4k': ['train_dev', 'test']}
>>> train_dev = koco.load_dataset('korean-hate-speech', mode='train_dev')
>>> type(train_dev)
dict
>>> train_dev.keys()
dict_keys(['train', 'dev'])
>>> train_dev['train'][33]
{'comments': '2,30대 골빈여자들은 이 기사에 다 모이는건가ㅋㅋㅋㅋ 이래서 여자는 투표권 주면 안된다. 엠넷사전투표나 하고 살아야지 계집들은',
'contain_gender_bias': True,
'bias': 'gender',
'hate': 'hate',
'news_title': '"“8년째 연애 중”…‘인생술집’ 블락비 유권♥전선혜, 4살차 연상연하 커플"'}
>>> test = koco.load_dataset('korean-hate-speech', mode='test')
>>> type(test)
list
>>> test[33]
{'comments': '끝낼때도 됐지 요즘같은 분위기엔 성드립 잘못쳤다가 난리. 그동안 잘봤습니다',
'news_title': '[단독] ‘SNL 코리아’ 공식적인 폐지 확정…아름다운 종료'}
Contributing to kocohub / koco
All Korean datasets with their publications or detailed documentations, bug reports, bug fixes, enhancements and ideas are welcome :tada:
Feel free to ask questions via issues. I recommend to use an adequate label!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
koco-0.2.3.tar.gz
(5.5 kB
view details)
File details
Details for the file koco-0.2.3.tar.gz.
File metadata
- Download URL: koco-0.2.3.tar.gz
- Upload date:
- Size: 5.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8232fe033f540368ead1e58642fbc26677005f1f28fd2ca96305139eb85f2c87
|
|
| MD5 |
03516634a3942a51c2dcd301682ceeec
|
|
| BLAKE2b-256 |
5347f948e856528595fa181cd63d7b52d2da64a10025f42ea3809ace96c742e3
|