simple nlp pipeline
Project description
SWT-NLP PACKAGE
PACKAGE INSTALLATION
pip install swt-nlp
KEYTERM EXTRACTION
OBJECTIVE
- extract new keyterm from corpus
DEMO
demo code for keyterm extraction
from swt.nlp.basis import keyterm_extractor
from tests.keyterm_extraction.test_keyterm_extraction_fit_modules import Mockup
# corpus in format list of plain text
small_content = Mockup.small_corpus()
# small_content[:5] = [
# 'อยากกระโดดน้ำที่แม่น้ำโขง',
# 'แม่น้ำที่จังหวัดกาญจนบุรีนี่สุดยอดมาก',
# 'เหล้ามีหลายยี่ห้อ แสงโสม แม่น้ำโขงหรืออะไรก็มีหมดเลย',
# 'ข้อความนี้เกี่ยวกับชิมช็อปใช้',
# 'รัฐบาลผลักดันชิมช็อปใช้มากขึ้น']
# extract new terms
kt = keyterm_extractor()
# - in case of using a custom tokenizer
# - this example is using word_tokenizer of pythainlp with keep_whitespace=False setting
# custom_tokenizer = lambda t: word_tokenize(t, keep_whitespace=False) # your own callable tokenizer function
# kt = keyterm_extractor(tokenizer=custom_tokenizer)
kt.fit(small_content)
new_terms = kt.extract()
# new_terms = ['ชิมช็อปใช้', 'แม่น้ำโขง']
HOW TO BUILD A PACKAGE TO PYPI
prerequisite
pip install setuptools wheel tqdm twine
build and upload package
# preparing tar.gz package
python setup.py sdist
# uploading package to pypi server
python -m twine upload dist/{package.tar.gz} --verbose
install package
# install latest version
pip install swt-nlp --upgrade
# specific version with no cache
pip install swt-nlp==0.0.11 --no-cache-dir
install package by wheel
# build wheel
python setup.py bdist_wheel
# install package by wheel
# use --force-reinstall if needed
pip install dist/{package.whl}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
swt-nlp-0.0.55.tar.gz
(26.0 kB
view details)
File details
Details for the file swt-nlp-0.0.55.tar.gz
.
File metadata
- Download URL: swt-nlp-0.0.55.tar.gz
- Upload date:
- Size: 26.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c9cc72b69ea3ea7ff0f19975f549418c09e5c943edff224d952acb83e0884c5b |
|
MD5 | ccc9b0f76c5deaacc3dd50f4ddc2abd2 |
|
BLAKE2b-256 | 2c1a31a1d6a39994c50eee179c3c5ef0b0df267c1b1ea8e612743ebd568a95cd |