CWordTM: Towards a Topic Modeling Toolkit from Low-Code to Pro-Code
Project description
A Topic Modeling Toolkit from Low-code to Pro-code
Installation
$ pip install cwordtm
Usage
cwordtm can be used to perform some NLP pre-processing tasks, text exploration, including Chinese one, text visualization (word cloud), and topic modeling (BERTopic, LDA and NMF) as follows:
from cwordtm import meta, util, ta, tm, viz, pivot, quot
version Submodule
Provides some version information.
import cwordtm
print(cwordtm.__version__)
meta Submodule
Provides extracting source code of cwordtm module and adding timing and code-showing features to all functions of the module.
print(meta.get_module_info())
print(meta.get_submodule_info('viz', detailed=True))
quot Submodule
Provides functions to extract the quotation source Scripture in OT based on the presribed NT Scripture.
cdf = util.load_word('cuv.csv')
crom8 = util.extract2(cdf, 'Rom 8')
quot.show_quot(crom8, lang='chi')
pivot Submodule
Provides a pivot table of the prescribed text.
cdf = util.load_word('cuv.csv')
pivot.stat(cdf, chi=True)
ta Submodule
Provides text analytics functions, including extracting the summarization of the prescribed text.
cdf = util.load_word('cuv.csv')
crom8 = util.extract2(cdf, 'Rom 8')
ta.summary_chi(crom8)
tm Submodule
Provides text modeling functions, including LDA, NMF and BERTopics modeling.
lda = tm.lda_process("web.csv", eval=True, timing=True)
nmf = tm.nmf_process("web.csv", eval=True, code=1)
btm = tm.btm_process("cuv.csv", chi=True, cat='ot', eval=True)
btm = tm.btm_process("cuv.csv", chi=True, cat='nt', eval=True, code=2)
util Submodule
Provides loading text and text preprocessing functions.
df = util.load_word()
cdf = util.load_word('cuv.csv')
df.head()
cdf.head()
rom8 = util.extract2(df, 'Rom 8')
crom8 = util.extract2(cdf, 'Rom 8')
viz Submodule
Wordcloud plotting from the prescribed text.
cdf = util.load_word('cuv.csv')
viz.chi_wordcloud(cdf)
Demo
Usage demo file with output:
On BBC News: CWordTM_BBC.pdf
On Chinese Bible (CUV): CWordTM_CUV.pdf
Paper
For a more detailed overview, you can read the [demo paper](https://link.springer.com/chapter/10.1007/978-3-031-70242-6_4).
Documentation
cwordtm documentation can be reached from: https://cwordtm.readthedocs.io
Contributing
Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
License
cwordtm was created by Dr. Johnny Cheng. It is licensed under the terms of the MIT license.
Credits
cwordtm was created under the guidance of Jehovah, the Almighty God.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cwordtm-0.7.0.tar.gz
.
File metadata
- Download URL: cwordtm-0.7.0.tar.gz
- Upload date:
- Size: 18.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e15573023dd3d43340beaaa870616cc5b47d66d5114b4ceccf5315f4b6a0223 |
|
MD5 | e981ff9fe4a51e19e04e306928ffb830 |
|
BLAKE2b-256 | 14d7e0ff3d68d2322ab9ae11c65cf608605d4e3500bd0ca83253362878a560e9 |
File details
Details for the file cwordtm-0.7.0-py3-none-any.whl
.
File metadata
- Download URL: cwordtm-0.7.0-py3-none-any.whl
- Upload date:
- Size: 18.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 206ae73860ebc85f71e0fcf3988619bbd577ea329830e60f47475aa7308b1644 |
|
MD5 | 83864c4a8bcbc90f8e6643cbbd15935a |
|
BLAKE2b-256 | 9c5c8099a46839183f97057f5ba17a51a1419017d81d8698fa69929de6051289 |