Preprocess NIKL(National Institute of Korean Language) Corpus files
Project description
NIKL
국립국어원(National Institute of Korean Language) 언어정보나눔터: 말뭉치 파일 전처리 코드
Installation
- Pypi
pip install nikl
- Source Code
git clone https://github.com/study-artificial-intelligence/nikl.git cd nikl python setup.py install
Requirements
- beautifulsoup4 (
pip install beautifulspul4
로 설치)
Getting Started
- 변환을 원하는 국립국어원 언어정보나눔센터의 말뭉치 사전을
./data
폴더에 넣어주세요. - 아래의 명령어에서 대괄호[, ]로 묶여있는 것들 중 선택해서 사용하세요.
단,--filename
에는 반드시 한 개 이상의 파일명이 들어가야 합니다. - 코드가 정상적으로 작동 시,
./data/
폴더에파일이름_info.txt
,파일이름_content.txt
가 생성됩니다.
python main.py --filename [파일명.txt] [--info] [--content] [--newline]
# ex1) python preprocess.py --filename test.txt --content --newline
# test.txt에서 단락 내용만 개행문자를 포함해서 data/test_content.txt 파일 생성
# ex2) python preprocess.py --filename test2.txt test3.txt --info --content
# text2.txt와 text3.txt에서 파일의 정보와 단락 내용을 각각 저장 후 data/test2_info.txt, test2_content.txt
# data/test3_info.txt, test3_content.txt 파일 생성
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nikl-1.1.0.tar.gz
(5.3 kB
view details)
Built Distribution
nikl-1.1.0-py3-none-any.whl
(8.3 kB
view details)
File details
Details for the file nikl-1.1.0.tar.gz
.
File metadata
- Download URL: nikl-1.1.0.tar.gz
- Upload date:
- Size: 5.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1fe104a51824aedd1d5ac3369ed95516196b129e42a6e3e6cc64365c686926a |
|
MD5 | 2f328487a559dc0c37d9bdb584e1dfe6 |
|
BLAKE2b-256 | 3609b83ce0281bf21a2caa0d0d5e523a7809639a1e3f76324452c04efc48a779 |
File details
Details for the file nikl-1.1.0-py3-none-any.whl
.
File metadata
- Download URL: nikl-1.1.0-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 06b5b4b8671858156c66a7c9acf10eb1dc49be365c7ab1df63a3be92b64f2608 |
|
MD5 | c34de3123f16c6ba081e4eb5061a8be9 |
|
BLAKE2b-256 | 9b7e9a6390764052644533308fb3f75099d51e49127fd0e3307d7f979a5392e3 |