Skip to main content

CWordTM: Towards a Topic Modeling Toolkit from Low-Code to Pro-Code

Project description

A Topic Modeling Toolkit from Low-code to Pro-code

Installation

$ pip install cwordtm

Usage

cwordtm can be used to perform some NLP pre-processing tasks, text exploration, including Chinese one, text visualization (word cloud), and topic modeling (BERTopic, LDA and NMF) as follows:

from cwordtm import meta, util, ta, tm, viz, pivot, quot

version Submodule

Provides some version information.

import cwordtm
print(cwordtm.__version__)

meta Submodule

Provides extracting source code of cwordtm module and adding timing and code-showing features to all functions of the module.

print(meta.get_module_info())

print(meta.get_submodule_info('viz', detailed=True))

quot Submodule

Provides functions to extract the quotation source Scripture in OT based on the presribed NT Scripture.

cdf = util.load_word('cuv.csv')
crom8 = util.extract2(cdf, 'Rom 8')

quot.show_quot(crom8, lang='chi')

pivot Submodule

Provides a pivot table of the prescribed text.

cdf = util.load_word('cuv.csv')

pivot.stat(cdf, chi=True)

ta Submodule

Provides text analytics functions, including extracting the summarization of the prescribed text.

cdf = util.load_word('cuv.csv')
crom8 = util.extract2(cdf, 'Rom 8')

ta.summary_chi(crom8)

tm Submodule

Provides text modeling functions, including LDA, NMF and BERTopics modeling.

lda = tm.lda_process("web.csv", eval=True, timing=True)

nmf = tm.nmf_process("web.csv", eval=True, code=1)

btm = tm.btm_process("cuv.csv", chi=True, cat='ot', eval=True)

btm = tm.btm_process("cuv.csv", chi=True, cat='nt', eval=True, code=2)

util Submodule

Provides loading text and text preprocessing functions.

df = util.load_word()
cdf = util.load_word('cuv.csv')

df.head()
cdf.head()

rom8 = util.extract2(df, 'Rom 8')
crom8 = util.extract2(cdf, 'Rom 8')

viz Submodule

Wordcloud plotting from the prescribed text.

cdf = util.load_word('cuv.csv')

viz.chi_wordcloud(cdf)

Demo

Usage demo file with output:

  1. On BBC News: CWordTM_BBC.pdf

  2. On Chinese Bible (CUV): CWordTM_CUV.pdf

Paper

For a more detailed overview, you can read the demo paper: https://link.springer.com/chapter/10.1007/978-3-031-70242-6_4

Documentation

cwordtm documentation can be reached from: https://cwordtm.readthedocs.io

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

cwordtm was created by Dr. Johnny Cheng. It is licensed under the terms of the MIT license.

Credits

cwordtm was created under the guidance of Jehovah, the Almighty God.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cwordtm-0.7.7.tar.gz (18.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cwordtm-0.7.7-py3-none-any.whl (18.3 MB view details)

Uploaded Python 3

File details

Details for the file cwordtm-0.7.7.tar.gz.

File metadata

  • Download URL: cwordtm-0.7.7.tar.gz
  • Upload date:
  • Size: 18.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.9

File hashes

Hashes for cwordtm-0.7.7.tar.gz
Algorithm Hash digest
SHA256 006008bf23b65933de6a026ccfcae95791b6d0b5638b72c2bac0e08d8eae58cc
MD5 668c05fbc489b2c5b467cd1ebb92a268
BLAKE2b-256 ca5e6b96e0db6a9e05a83cf5daedabfae86a240463666e93c3dedca65b712f94

See more details on using hashes here.

File details

Details for the file cwordtm-0.7.7-py3-none-any.whl.

File metadata

  • Download URL: cwordtm-0.7.7-py3-none-any.whl
  • Upload date:
  • Size: 18.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.9

File hashes

Hashes for cwordtm-0.7.7-py3-none-any.whl
Algorithm Hash digest
SHA256 ab35d4537fe9bb32790dfefe0b3d7a7b2fbb19a0246c7bac539af6b386699f8a
MD5 565a1bd3b4a3d1d2c05e6df99fb7a766
BLAKE2b-256 3cee0785b160b85dbacf05f4c91db4bf0cb35ae632bb7bd113017856c9643737

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page