Skip to main content

A bunch of python codes to analyze text data in the construction industry. Mainly reconstitute the pre-exist python libraries for Natural Language Processing (NLP)

Project description

connlp

A bunch of python codes to analyze text data in the construction industry.
Mainly reconstitute the pre-exist python libraries for Natural Language Processing (NLP).

Project Information

  • Supported by C!LAB (@Seoul Nat'l Univ.)

Contributors

Initialize

Setup

pip install connlp

Test

If the code below runs with no error, connlp is installed successfully.

from connlp.test import hello
hello()

# 'Helloworld'

Preprocess

Preprocessing module supports English and Korean.
NOTE: No plan exist for other languages currently (2021.04.02.).

Normalizer

Normalizer normalizes the input text by eliminating trash characters and remaining numbers, alphabets, and punctuation marks.

from connlp.preprocess import Normalizer
normalizer = Normalizer()

normalizer.normalize(text='I am a boy!')

# 'i am a boy'

EnglishTokenizer

EnglishTokenizer tokenizes the input text in English based on word spacing.
The ngram-based tokenization is in preparation.

from connlp.preprocess import EnglishTokenizer
tokenizer = EnglishTokenizer()

tokenizer.tokenizer(text='I am a boy!')

# ['I', 'am', 'a', 'boy!']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

connlp-0.0.7.tar.gz (6.0 kB view hashes)

Uploaded Source

Built Distribution

connlp-0.0.7-py3-none-any.whl (11.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page