Skip to main content

Biterm Topic Model

Project description

Biterm Topic Model

CircleCI Downloads PyPI

This package implements Biterm topic model for short texts introduced by Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. It is based on biterm package by @markoarnauto. Unfortunately, biterm package is not maintained anymore.

Bitermplus is a fixed and optimized successor. Pure Python version of BTM class was removed. Class oBTM was strongly optimized using typed memoryviews in Cython and now replaces BTM class.

Requirements

  • Cython
  • NumPy
  • Pandas
  • SciPy
  • Scikit-learn
  • pyLDAvis (optional)

Setup

You can install the package from PyPi:

pip install bitermplus

Or from this repo:

pip install git+https://github.com/maximtrp/bitermplus.git

Example

import bitermplus as btm
import numpy as np
from gzip import open as gzip_open

# Importing and vectorizing text data
with gzip_open('dataset/SearchSnippets.txt.gz', 'rb') as file:
    texts = file.readlines()

# Vectorizing documents, obtaining full vocabulary and biterms
X, vocab = btm.util.get_vectorized_docs(texts)
biterms = btm.util.get_biterms(X)

# Initializing and running model
model = btm.BTM(X, T=8, W=vocab.size, M=20, alpha=50/8, beta=0.01, L=0.5)
model.fit(biterms, iterations=10)
P_zd = model.transform(biterms)

# Calculating metrics
perplexity = btm.metrics.perplexity(model.phi_, P_zd, X, 8)
coherence = btm.metrics.coherence(model.phi_, X, M=20)
# or
perplexity = model.perplexity_
coherence = model.coherence_

Acknowledgement

Markus Tretzmüller @markoarnauto

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bitermplus-0.4.0.tar.gz (591.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page