Skip to main content

simple nlp pipeline

Project description

SWT-NLP PACKAGE

PACKAGE INSTALLATION

pip install swt-nlp

KEYTERM EXTRACTION

OBJECTIVE

  • extract new keyterm from corpus

DEMO

demo code for keyterm extraction

from swt.nlp.basis import keyterm_extractor
from tests.keyterm_extraction.test_keyterm_extraction_fit_modules import Mockup

# corpus in format list of plain text
small_content = Mockup.small_corpus()
# small_content[:5] = [
# 'อยากกระโดดน้ำที่แม่น้ำโขง',
# 'แม่น้ำที่จังหวัดกาญจนบุรีนี่สุดยอดมาก',
# 'เหล้ามีหลายยี่ห้อ แสงโสม แม่น้ำโขงหรืออะไรก็มีหมดเลย',
# 'ข้อความนี้เกี่ยวกับชิมช็อปใช้',
# 'รัฐบาลผลักดันชิมช็อปใช้มากขึ้น']

# extract new terms
kt = keyterm_extractor()
# - in case of using a custom tokenizer
# - this example is using word_tokenizer of pythainlp with keep_whitespace=False setting
# custom_tokenizer = lambda t: word_tokenize(t, keep_whitespace=False)  # your own callable tokenizer function 
# kt = keyterm_extractor(tokenizer=custom_tokenizer)
kt.fit(small_content)
new_terms = kt.extract()
# new_terms = ['ชิมช็อปใช้', 'แม่น้ำโขง']

HOW TO BUILD A PACKAGE TO PYPI

prerequisite

pip install setuptools wheel tqdm twine

build and upload package

# preparing tar.gz package 
python setup.py sdist
# uploading package to pypi server
python -m twine upload dist/{package.tar.gz}  --verbose

install package

# install latest version
pip install swt-nlp --upgrade
# specific version with no cache
pip install swt-nlp==0.0.11  --no-cache-dir

install package by wheel

# build wheel 
python setup.py bdist_wheel

# install package by wheel 
# use --force-reinstall if needed
pip install dist/{package.whl}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swt-nlp-0.0.55.tar.gz (26.0 kB view details)

Uploaded Source

File details

Details for the file swt-nlp-0.0.55.tar.gz.

File metadata

  • Download URL: swt-nlp-0.0.55.tar.gz
  • Upload date:
  • Size: 26.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.7

File hashes

Hashes for swt-nlp-0.0.55.tar.gz
Algorithm Hash digest
SHA256 c9cc72b69ea3ea7ff0f19975f549418c09e5c943edff224d952acb83e0884c5b
MD5 ccc9b0f76c5deaacc3dd50f4ddc2abd2
BLAKE2b-256 2c1a31a1d6a39994c50eee179c3c5ef0b0df267c1b1ea8e612743ebd568a95cd

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page