simple nlp pipeline
Project description
SWT-NLP PACKAGE
PACKAGE INSTALLATION
pip install swt-nlp
KEYTERM EXTRACTION
OBJECTIVE
- extract new keyterm from corpus
DEMO
demo code for keyterm extraction
from swt.nlp.basis import keyterm_extractor
from tests.keyterm_extraction.test_keyterm_extraction_fit_modules import Mockup
# corpus in format list of plain text
small_content = Mockup.small_corpus()
# small_content[:5] = [
# 'อยากกระโดดน้ำที่แม่น้ำโขง',
# 'แม่น้ำที่จังหวัดกาญจนบุรีนี่สุดยอดมาก',
# 'เหล้ามีหลายยี่ห้อ แสงโสม แม่น้ำโขงหรืออะไรก็มีหมดเลย',
# 'ข้อความนี้เกี่ยวกับชิมช็อปใช้',
# 'รัฐบาลผลักดันชิมช็อปใช้มากขึ้น']
# extract new terms
kt = keyterm_extractor()
# - in case of using a custom tokenizer
# - this example is using word_tokenizer of pythainlp with keep_whitespace=False setting
# custom_tokenizer = lambda t: word_tokenize(t, keep_whitespace=False) # your own callable tokenizer function
# kt = keyterm_extractor(tokenizer=custom_tokenizer)
kt.fit(small_content)
new_terms = kt.extract()
# new_terms = ['ชิมช็อปใช้', 'แม่น้ำโขง']
HOW TO BUILD A PACKAGE TO PYPI
prerequisite
pip install setuptools wheel tqdm twine
build and upload package
# preparing tar.gz package
python setup.py sdist
# uploading package to pypi server
python -m twine upload dist/{package.tar.gz} --verbose
install package
# install latest version
pip install swt-nlp --upgrade
# specific version with no cache
pip install swt-nlp==0.0.11 --no-cache-dir
install package by wheel
# build wheel
python setup.py bdist_wheel
# install package by wheel
# use --force-reinstall if needed
pip install dist/{package.whl}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
swt-nlp-0.0.57.tar.gz
(26.6 kB
view details)
File details
Details for the file swt-nlp-0.0.57.tar.gz
.
File metadata
- Download URL: swt-nlp-0.0.57.tar.gz
- Upload date:
- Size: 26.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f3c5a7fc0717730ef0f629631605aad1571b7996646a6fb9e0c8302c4b5296f |
|
MD5 | 126950da74959429bc549a239f67fac7 |
|
BLAKE2b-256 | debbc3a81fe353c73d7d7239c67f0b88a49adda27738e59afedc878af61ff1bb |