A bunch of python codes to analyze text data in the construction industry. Mainly reconstitute the pre-exist python libraries for Natural Language Processing (NLP)
Project description
connlp
A bunch of python codes to analyze text data in the construction industry.
Mainly reconstitute the pre-exist python libraries for Natural Language Processing (NLP).
Project Information
- Supported by C!LAB (@Seoul Nat'l Univ.)
Contributors
- Seonghyeon Boris Moon (blank54@snu.ac.kr, https://github.com/blank54/)
- Gitaek Lee (lgt0427@snu.ac.kr)
- Taeyeon Chang (jgwoon1838@snu.ac.kr, a.k.a. Kowoon Chang)
- Sehwan Chung (hwani751@snu.ac.kr)
Initialize
Setup
pip install connlp
Test
If the code below runs with no error, connlp is installed successfully.
from connlp.test import hello
hello()
# 'Helloworld'
Preprocess
Preprocessing module supports English and Korean.
NOTE: No plan exist for other languages currently (2021.04.02.).
Normalizer
Normalizer normalizes the input text by eliminating trash characters and remaining numbers, alphabets, and punctuation marks.
from connlp.preprocess import Normalizer
normalizer = Normalizer()
normalizer.normalize(text='I am a boy!')
# 'i am a boy'
EnglishTokenizer
EnglishTokenizer tokenizes the input text in English based on word spacing.
The ngram-based tokenization is in preparation.
from connlp.preprocess import EnglishTokenizer
tokenizer = EnglishTokenizer()
tokenizer.tokenizer(text='I am a boy!')
# ['I', 'am', 'a', 'boy!']
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.