Korean Dataset Loading Inferface
Project description
KODALI(Korean Dataset Loading Interface)
1. 지원 데이터 셋
개체명 인식
- 모두의 말뭉치(Korean Corpus)
- KLUE-NER(KLUE)
관계 추출
- AI Hub: 한국어 지식기반 관계 데이터
- KLUE-RE
2. How to use
개체명 인식
dataset = Kodali(path={PATH}, task="ner", source={DATASET_SOURCE})
# Source List
# - 모두의 말뭉치 : "korean-corpus"
# - KLUE-NER : "klue"
3. 데이터 포맷
개체명 인식
개체명 인식 데이터 셋은 NerOutputs, NerData, NerEntity 포맷을 지원합니다.
NerEntity는 엔티티 정보를 가지고 있고, NerData는 문장과 문장 내 엔티티들을 포함합니다. NerOutputs는 NerData 목록을 가지고 있으며, 데이터를 입력할 시, 데이터의 크기를 자동으로 계산합니다.
NerOutputs
data: List[NerData]
size: int = 0
NerData:
sentence: str
entities: List[NerEntity]
NerEntity:
word: str
label: str
start_idx: int
end_idx: int
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
kodali-0.2.0.tar.gz
(5.4 kB
view details)
Built Distribution
kodali-0.2.0-py3-none-any.whl
(10.6 kB
view details)
File details
Details for the file kodali-0.2.0.tar.gz
.
File metadata
- Download URL: kodali-0.2.0.tar.gz
- Upload date:
- Size: 5.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.9.14 Linux/6.2.0-26-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66cf7ac774701973108dfbd72b868c0bef8128839d417e8eb25d2e87ef7849d0 |
|
MD5 | a371ea655552b4a1a94b432ed6f10f80 |
|
BLAKE2b-256 | 88f90e072ff523d575da087be34269e221d855bc9c48b5beccbb1c59ee5c851f |
File details
Details for the file kodali-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: kodali-0.2.0-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.9.14 Linux/6.2.0-26-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 100daf1336035ce5ace93edeb8fc7b2bebadcc116b7ac13bcef9c94c9bff3711 |
|
MD5 | 053362ff930dd611cb9f07e5c96f5c3c |
|
BLAKE2b-256 | e228cb5c5e2a987ec111603f74ef8d8e26a5c4c93d2b46dd498145f790668f40 |