Skip to main content

CWordTM: Towards a Topic Modeling Toolkit from Low-Code to Pro-Code

Project description

A Topic Modeling Toolkit from Low-code to Pro-code

Installation

$ pip install cwordtm

Usage

cwordtm can be used to perform some NLP pre-processing tasks, text exploration, including Chinese one, text visualization (word cloud), and topic modeling (BERTopic, LDA and NMF) as follows:

from cwordtm import meta, util, ta, tm, viz, pivot, quot

version Submodule

Provides some version information.

import cwordtm
print(cwordtm.__version__)

meta Submodule

Provides extracting source code of cwordtm module and adding timing and code-showing features to all functions of the module.

print(meta.get_module_info())

print(meta.get_submodule_info('viz', detailed=True))

quot Submodule

Provides functions to extract the quotation source Scripture in OT based on the presribed NT Scripture.

cdf = util.load_word('cuv.csv')
crom8 = util.extract2(cdf, 'Rom 8')

quot.show_quot(crom8, lang='chi')

pivot Submodule

Provides a pivot table of the prescribed text.

cdf = util.load_word('cuv.csv')

pivot.stat(cdf, chi=True)

ta Submodule

Provides text analytics functions, including extracting the summarization of the prescribed text.

cdf = util.load_word('cuv.csv')
crom8 = util.extract2(cdf, 'Rom 8')

ta.summary_chi(crom8)

tm Submodule

Provides text modeling functions, including LDA, NMF and BERTopics modeling.

lda = tm.lda_process("web.csv", eval=True, timing=True)

nmf = tm.nmf_process("web.csv", eval=True, code=1)

btm = tm.btm_process("cuv.csv", chi=True, cat='ot', eval=True)

btm = tm.btm_process("cuv.csv", chi=True, cat='nt', eval=True, code=2)

util Submodule

Provides loading text and text preprocessing functions.

df = util.load_word()
cdf = util.load_word('cuv.csv')

df.head()
cdf.head()

rom8 = util.extract2(df, 'Rom 8')
crom8 = util.extract2(cdf, 'Rom 8')

viz Submodule

Wordcloud plotting from the prescribed text.

cdf = util.load_word('cuv.csv')

viz.chi_wordcloud(cdf)

Demo

Usage demo file with output:

  1. On BBC News: CWordTM_BBC.pdf

  2. On Chinese Bible (CUV): CWordTM_CUV.pdf

Documentation

cwordtm documentation can be reached from: https://cwordtm.readthedocs.io

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

cwordtm was created by Dr. Johnny Cheng. It is licensed under the terms of the MIT license.

Credits

cwordtm was created under the guidance of Jehovah, the Almighty God.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cwordtm-0.6.4.tar.gz (18.2 MB view details)

Uploaded Source

Built Distribution

cwordtm-0.6.4-py3-none-any.whl (18.3 MB view details)

Uploaded Python 3

File details

Details for the file cwordtm-0.6.4.tar.gz.

File metadata

  • Download URL: cwordtm-0.6.4.tar.gz
  • Upload date:
  • Size: 18.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for cwordtm-0.6.4.tar.gz
Algorithm Hash digest
SHA256 67a096ca9369e5bc7e5307136c8728cca26838e6703846e8c0528ec886f3ed6a
MD5 9ef8b17611624a12a56a79d732b4ce52
BLAKE2b-256 30c3c564e8392f0cd7516126a3f439f80dbe65b85c961ae9d0feceeb606c1343

See more details on using hashes here.

File details

Details for the file cwordtm-0.6.4-py3-none-any.whl.

File metadata

  • Download URL: cwordtm-0.6.4-py3-none-any.whl
  • Upload date:
  • Size: 18.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for cwordtm-0.6.4-py3-none-any.whl
Algorithm Hash digest
SHA256 37575c8f4e59cddc7ced19713fe5405a9daa8bc38d3575753ce14f6a5d774374
MD5 a7e5d7193f137c842e5ab0b380327260
BLAKE2b-256 bd850d8469e2d886c6f636d5bfbfac55fb0c3a71eed0f687fb71309f2366bbcf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page