Skip to main content

CWordTM: Towards a Topic Modeling Toolkit from Low-Code to Pro-Code

Project description

A Topic Modeling Toolkit from Low-code to Pro-code

Installation

$ pip install cwordtm

Usage

cwordtm can be used to perform some NLP pre-processing tasks, text exploration, including Chinese one, text visualization (word cloud), and topic modeling (BERTopic, LDA and NMF) as follows:

from cwordtm import meta, util, ta, tm, viz, pivot, quot

version Submodule

Provides some version information.

import cwordtm
print(cwordtm.__version__)

meta Submodule

Provides extracting source code of cwordtm module and adding timing and code-showing features to all functions of the module.

print(meta.get_module_info())

print(meta.get_module_info(detailed=True))

quot Submodule

Provides functions to extract the quotation source Scripture in OT based on the presribed NT Scripture.

cdf = util.load_word('cuv.csv')
crom8 = util.extract2(cdf, 'Rom 8')

quot.show_quot(crom8, lang='chi')

pivot Submodule

Provides a pivot table of the prescribed text.

cdf = util.load_word('cuv.csv')

pivot.stat(cdf, chi=True)

ta Submodule

Provides text analytics functions, including extracting the summarization of the prescribed text.

cdf = util.load_word('cuv.csv')
crom8 = util.extract2(cdf, 'Rom 8')

ta.summary_chi(crom8)

tm Submodule

Provides text modeling functions, including LDA, NMF and BERTopics modeling.

lda = tm.lda_process("web.csv", eval=True, timing=True)

nmf = tm.nmf_process("web.csv", eval=True, code=1)

btm = tm.btm_process("cuv.csv", chi=True, cat='ot', eval=True)

btm = tm.btm_process("cuv.csv", chi=True, cat='nt', eval=True, code=2)

util Submodule

Provides loading text and text preprocessing functions.

df = util.load_word()
cdf = util.load_word('cuv.csv')

df.head()
cdf.head()

rom8 = util.extract2(df, 'Rom 8')
crom8 = util.extract2(cdf, 'Rom 8')

viz Submodule

Wordcloud plotting from the prescribed text.

cdf = util.load_word('cuv.csv')

viz.chi_wordcloud(cdf)

Demo

Usage demo file with output:

  1. On BBC News: CWordTM_BBC.pdf

  2. On Chinese Bible (CUV): CWordTM_CUV.pdf

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

cwordtm was created by Dr. Johnny Cheng. It is licensed under the terms of the MIT license.

Credits

cwordtm was created under the guidance of Jehovah, the Almighty God.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cwordtm-0.6.1.tar.gz (18.2 MB view details)

Uploaded Source

Built Distribution

cwordtm-0.6.1-py3-none-any.whl (18.3 MB view details)

Uploaded Python 3

File details

Details for the file cwordtm-0.6.1.tar.gz.

File metadata

  • Download URL: cwordtm-0.6.1.tar.gz
  • Upload date:
  • Size: 18.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.9

File hashes

Hashes for cwordtm-0.6.1.tar.gz
Algorithm Hash digest
SHA256 aba67e94bf89709881947b1db6f457419faa60d04a296d21b87874c88524c716
MD5 750125f7e48f5b34079da5b84578c327
BLAKE2b-256 934e77c353121c721de007f66d03b0fb1dfd8cdf08da3802a02e2187faeefa41

See more details on using hashes here.

File details

Details for the file cwordtm-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: cwordtm-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 18.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.9

File hashes

Hashes for cwordtm-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f336b3e3c3b6c98656232c2c79e8b0ea1e9ea0d529eeebee82e22470c5e3c2b6
MD5 62828f290bca4d914dd56e5e35991063
BLAKE2b-256 17d307991195a6d5d29df6fd00d07797b189d1ccbca1b121b90f20bd15d31777

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page