Preprocess NIKL(National Institute of Korean Language) Corpus files

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
License
- OSI Approved :: MIT License
Natural Language
- Korean
Operating System
Programming Language
- Python :: 3

Project description

NIKL

국립국어원(National Institute of Korean Language) 언어정보나눔터: 말뭉치 파일 전처리 코드

Installation

Pypi
```
pip install nikl
```

Source Code

git clone https://github.com/study-artificial-intelligence/nikl.git
cd nikl
python setup.py install

Requirements

beautifulsoup4 (pip install beautifulspul4로 설치)

Getting Started

변환을 원하는 국립국어원 언어정보나눔센터의 말뭉치 사전을 ./data폴더에 넣어주세요.
아래의 명령어에서 대괄호[, ]로 묶여있는 것들 중 선택해서 사용하세요.
단, --filename에는 반드시 한 개 이상의 파일명이 들어가야 합니다.
코드가 정상적으로 작동 시, ./data/폴더에 파일이름_info.txt, 파일이름_content.txt가 생성됩니다.

python main.py --filename [파일명.txt] [--info] [--content] [--newline]

# ex1) python preprocess.py --filename test.txt --content --newline
#      test.txt에서 단락 내용만 개행문자를 포함해서 data/test_content.txt 파일 생성
# ex2) python preprocess.py --filename test2.txt test3.txt --info --content
#      text2.txt와 text3.txt에서 파일의 정보와 단락 내용을 각각 저장 후 data/test2_info.txt, test2_content.txt 
#                                                                    data/test3_info.txt, test3_content.txt 파일 생성

filename: 1개 이상의 파일명.txt 형식으로 입력해주세요. 국립국어원 말뭉치 파일 특성 상 txt 파일만 지원하고 있습니다.
info: 해당 파일의 전반적인 정보를 출력할지에 대한 여부를 나타냅니다. 기본값은 False 입니다.
content: 해당 파일의 내용를 출력할지에 대한 여부를 나타냅니다. 기본값은 False 입니다.
newline: 본문 내용을 전처리 할 때, 개행문자('\n') 삽입 여부를 나타냅니다. 삽입 시 문단 별로 결과물이 출력됩니다. 기본값은 False 입니다.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
License
- OSI Approved :: MIT License
Natural Language
- Korean
Operating System
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

1.1.0

Apr 20, 2020

1.0.0.1

Apr 20, 2020

1.0.0

Apr 17, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nikl-1.1.0.tar.gz (5.3 kB view details)

Uploaded Apr 20, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nikl-1.1.0-py3-none-any.whl (8.3 kB view details)

Uploaded Apr 20, 2020 Python 3

File details

Details for the file nikl-1.1.0.tar.gz.

File metadata

Download URL: nikl-1.1.0.tar.gz
Upload date: Apr 20, 2020
Size: 5.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.10

File hashes

Hashes for nikl-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e1fe104a51824aedd1d5ac3369ed95516196b129e42a6e3e6cc64365c686926a`
MD5	`2f328487a559dc0c37d9bdb584e1dfe6`
BLAKE2b-256	`3609b83ce0281bf21a2caa0d0d5e523a7809639a1e3f76324452c04efc48a779`

See more details on using hashes here.

File details

Details for the file nikl-1.1.0-py3-none-any.whl.

File metadata

Download URL: nikl-1.1.0-py3-none-any.whl
Upload date: Apr 20, 2020
Size: 8.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.10

File hashes

Hashes for nikl-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`06b5b4b8671858156c66a7c9acf10eb1dc49be365c7ab1df63a3be92b64f2608`
MD5	`c34de3123f16c6ba081e4eb5061a8be9`
BLAKE2b-256	`9b7e9a6390764052644533308fb3f75099d51e49127fd0e3307d7f979a5392e3`

See more details on using hashes here.

nikl 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

NIKL

Installation

Requirements

Getting Started

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes