simple nlp pipeline
Project description
SWT-NLP PACKAGE
KEYTERM EXTRACTION
install package
pip install swt-nlp
DEMO
load demo content
from swt.nlp.basis import keyterm_extractor
from tests.keyterm_extraction.test_keyterm_extraction_fit_modules import Mockup
# corpus in format list of plain text
small_content = Mockup.small_corpus()
# small_content[:5] = [
# 'อยากกระโดดน้ำที่แม่น้ำโขง',
# 'แม่น้ำที่จังหวัดกาญจนบุรีนี่สุดยอดมาก',
# 'เหล้ามีหลายยี่ห้อ แสงโสม แม่น้ำโขงหรืออะไรก็มีหมดเลย',
# 'ข้อความนี้เกี่ยวกับชิมช็อปใช้',
# 'รัฐบาลผลักดันชิมช็อปใช้มากขึ้น']
# extract new terms
kt = keyterm_extractor()
# - in case of using a custom tokenizer
# - this example is using word_tokenizer of pythainlp with keep_whitespace=False setting
# custom_tokenizer = lambda t: word_tokenize(t, keep_whitespace=False) # your own callable tokenizer function
# kt = keyterm_extractor(tokenizer=custom_tokenizer)
kt.fit(small_content)
new_terms = kt.extract()
# new_terms = ['ชิมช็อปใช้', 'แม่น้ำโขง']
HOW TO BUILD A PACKAGE TO PYPI
prerequisite
pip install setuptools wheel tqdm twine
build and upload package
# preparing tar.gz package
python setup.py sdist
# uploading package to pypi server
python -m twine upload dist/{package.tar.gz} --verbose
install package
# install latest version
pip install swt-nlp --upgrade
# specific version with no cache
pip install swt-nlp==0.0.11 --no-cache-dir
install package by wheel
# build wheel
python setup.py bdist_wheel
# install package by wheel
# use --force-reinstall if needed
pip install dist/{package.whl}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
swt-nlp-0.0.53.tar.gz
(25.4 kB
view details)
File details
Details for the file swt-nlp-0.0.53.tar.gz
.
File metadata
- Download URL: swt-nlp-0.0.53.tar.gz
- Upload date:
- Size: 25.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb7e4a0648873e42ac78c9544051571176a89b182f29b7273ced5a347a3b3769 |
|
MD5 | f5deac1683ea6a7fb3b60eba3fe8713a |
|
BLAKE2b-256 | 889caf480458bcfb5fcc5a4d15886aad214d263711b8ae016de0efb4a45f277b |