Skip to main content

A library to easily access kocohub datasets

Project description

koco

koco is a library to easily access kocohub datasets.
kocohub contains KOrean COrpus for natural language processing.

Installation

NOTE: The code is tested on Python 3.6.9

from pypi

pip install koco

from source

git clone https://github.com/inmoonlight/koco
cd koco
pip install .

Usage

Using koco is similar to nlp. The main methods are:

  • koco.list_datasets(): list all available datasets and their modes in kocohub
  • koco.load_dataset(dataset_name, mode): load dataset in kocohub with data-specific mode

example

>>> import koco

>>> koco.list_datasets()
{'korean-hate-speech': ['train_dev', 'unlabeled', 'test'],
 'sae4k': ['train_dev', 'test']}

>>> train_dev = koco.load_dataset('korean-hate-speech', mode='train_dev')
>>> type(train_dev)
dict
>>> train_dev.keys()
dict_keys(['train', 'dev'])
>>> train_dev['train'][33]
{'comments': '2,30대 골빈여자들은 이 기사에 다 모이는건가ㅋㅋㅋㅋ 이래서 여자는 투표권 주면 안된다. 엠넷사전투표나 하고 살아야지 계집들은',
 'contain_gender_bias': True,
 'bias': 'gender',
 'hate': 'hate',
 'news_title': '"“8년째 연애 중”…‘인생술집’ 블락비 유권♥전선혜, 4살차 연상연하 커플"'}
 
 >>> test = koco.load_dataset('korean-hate-speech', mode='test')
 >>> type(test)
 list
 >>> test[33]
 {'comments': '끝낼때도 됐지 요즘같은 분위기엔 성드립 잘못쳤다가 난리. 그동안 잘봤습니다',
 'news_title': '[단독] ‘SNL 코리아’ 공식적인 폐지 확정…아름다운 종료'}

Contributing to kocohub / koco

All Korean datasets with their publications or detailed documentations, bug reports, bug fixes, enhancements and ideas are welcome :tada:
Feel free to ask questions via issues. I recommend to use an adequate label!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

koco-0.2.3.tar.gz (5.5 kB view details)

Uploaded Source

File details

Details for the file koco-0.2.3.tar.gz.

File metadata

  • Download URL: koco-0.2.3.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.9

File hashes

Hashes for koco-0.2.3.tar.gz
Algorithm Hash digest
SHA256 8232fe033f540368ead1e58642fbc26677005f1f28fd2ca96305139eb85f2c87
MD5 03516634a3942a51c2dcd301682ceeec
BLAKE2b-256 5347f948e856528595fa181cd63d7b52d2da64a10025f42ea3809ace96c742e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page