Skip to main content

A library to easily access kocohub datasets

Project description

koco

koco is a library to easily access kocohub datasets.
kocohub contains KOrean COrpus for natural language processing.

Installation

NOTE: The code is tested on Python 3.6.9

from pypi

pip install koco

from source

git clone https://github.com/inmoonlight/koco
cd koco
pip install .

Usage

Using koco is similar to nlp. The main methods are:

  • koco.list_datasets(): list all available datasets and their modes in kocohub
  • koco.load_dataset(dataset_name, mode): load dataset in kocohub with data-specific mode

example

>>> import koco

>>> koco.list_datasets()
{'korean-hate-speech': ['train_dev', 'unlabeled', 'test'],
 'sae4k': ['train_dev', 'test']}

>>> train_dev = koco.load_dataset('korean-hate-speech', mode='train_dev')
>>> type(train_dev)
dict
>>> train_dev.keys()
dict_keys(['train', 'dev'])
>>> train_dev['train'][33]
{'comments': '2,30대 골빈여자들은 이 기사에 다 모이는건가ㅋㅋㅋㅋ 이래서 여자는 투표권 주면 안된다. 엠넷사전투표나 하고 살아야지 계집들은',
 'contain_gender_bias': True,
 'bias': 'gender',
 'hate': 'hate',
 'news_title': '"“8년째 연애 중”…‘인생술집’ 블락비 유권♥전선혜, 4살차 연상연하 커플"'}
 
 >>> test = koco.load_dataset('korean-hate-speech', mode='test')
 >>> type(test)
 list
 >>> test[33]
 {'comments': '끝낼때도 됐지 요즘같은 분위기엔 성드립 잘못쳤다가 난리. 그동안 잘봤습니다',
 'news_title': '[단독] ‘SNL 코리아’ 공식적인 폐지 확정…아름다운 종료'}

Contributing to kocohub / koco

All Korean datasets with their publications or detailed documentations, bug reports, bug fixes, enhancements and ideas are welcome :tada:
Feel free to ask questions via issues. I recommend to use an adequate label!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

koco-0.2.3.tar.gz (5.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page