Skip to main content

simple nlp pipeline

Project description

SWT-NLP PACKAGE

PACKAGE INSTALLATION

pip install swt-nlp

KEYTERM EXTRACTION

OBJECTIVE

  • extract new keyterm from corpus

DEMO

demo code for keyterm extraction

from swt.nlp.basis import keyterm_extractor
from tests.keyterm_extraction.test_keyterm_extraction_fit_modules import Mockup

# corpus in format list of plain text
small_content = Mockup.small_corpus()
# small_content[:5] = [
# 'อยากกระโดดน้ำที่แม่น้ำโขง',
# 'แม่น้ำที่จังหวัดกาญจนบุรีนี่สุดยอดมาก',
# 'เหล้ามีหลายยี่ห้อ แสงโสม แม่น้ำโขงหรืออะไรก็มีหมดเลย',
# 'ข้อความนี้เกี่ยวกับชิมช็อปใช้',
# 'รัฐบาลผลักดันชิมช็อปใช้มากขึ้น']

# extract new terms
kt = keyterm_extractor()
# - in case of using a custom tokenizer
# - this example is using word_tokenizer of pythainlp with keep_whitespace=False setting
# custom_tokenizer = lambda t: word_tokenize(t, keep_whitespace=False)  # your own callable tokenizer function 
# kt = keyterm_extractor(tokenizer=custom_tokenizer)
kt.fit(small_content)
new_terms = kt.extract()
# new_terms = ['ชิมช็อปใช้', 'แม่น้ำโขง']

HOW TO BUILD A PACKAGE TO PYPI

prerequisite

pip install setuptools wheel tqdm twine

build and upload package

# preparing tar.gz package 
python setup.py sdist
# uploading package to pypi server
python -m twine upload dist/{package.tar.gz}  --verbose

install package

# install latest version
pip install swt-nlp --upgrade
# specific version with no cache
pip install swt-nlp==0.0.11  --no-cache-dir

install package by wheel

# build wheel 
python setup.py bdist_wheel

# install package by wheel 
# use --force-reinstall if needed
pip install dist/{package.whl}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swt-nlp-0.0.57.tar.gz (26.6 kB view details)

Uploaded Source

File details

Details for the file swt-nlp-0.0.57.tar.gz.

File metadata

  • Download URL: swt-nlp-0.0.57.tar.gz
  • Upload date:
  • Size: 26.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.7

File hashes

Hashes for swt-nlp-0.0.57.tar.gz
Algorithm Hash digest
SHA256 9f3c5a7fc0717730ef0f629631605aad1571b7996646a6fb9e0c8302c4b5296f
MD5 126950da74959429bc549a239f67fac7
BLAKE2b-256 debbc3a81fe353c73d7d7239c67f0b88a49adda27738e59afedc878af61ff1bb

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page