Biterm Topic Model

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Biterm Topic Model

This is a simple Python implementation of the awesome Biterm Topic Model. This model is accurate in short text classification. It explicitly models the word co-occurrence patterns in the whole corpus to solve the problem of sparse word co-occurrence at document-level.

Simply install by:

pip install biterm

Load some short texts and vectorize them via sklearn.

    from sklearn.feature_extraction.text import CountVectorizer

    texts = open('./data/reuters.titles').read().splitlines()[:50]
    vec = CountVectorizer(stop_words='english')
    X = vec.fit_transform(texts).toarray()

Get the vocabulary and the biterms from the texts.

    from biterm.utility import vec_to_biterms

    vocab = np.array(vec.get_feature_names())
    biterms = vec_to_biterms(X)

Create a BTM and pass the biterms to train it.

    from biterm.cbtm import oBTM

    btm = oBTM(num_topics=20, V=vocab)
    topics = btm.fit_transform(biterms, iterations=100)

Save a topic plot using pyLDAvis and explore the results! (also see simple_btml.py)

    from biterm.btm import oBTM

    btm = oBTM(num_topics=20, V=vocab)
    topics = btm.fit_transform(biterms, iterations=100)

pyLDAvis Visualization

Inference is done with Gibbs Sampling and it's not really fast. The implementation is not meant for production. But if you have to classify a lot of texts you can try using online learning. Use the Cython version to speed up performance a bit.

import numpy as np
import pyLDAvis
from biterm.cbtm import oBTM 
from sklearn.feature_extraction.text import CountVectorizer
from biterm.utility import vec_to_biterms, topic_summuary # helper functions

if __name__ == "__main__":

    texts = open('./data/reuters.titles').read().splitlines()

    # vectorize texts
    vec = CountVectorizer(stop_words='english')
    X = vec.fit_transform(texts).toarray()

    # get vocabulary
    vocab = np.array(vec.get_feature_names())

    # get biterms
    biterms = vec_to_biterms(X)

    # create btm
    btm = oBTM(num_topics=20, V=vocab)

    print("\n\n Train Online BTM ..")
    for i in range(0, len(biterms), 100): # prozess chunk of 200 texts
        biterms_chunk = biterms[i:i + 100]
        btm.fit(biterms_chunk, iterations=50)
    topics = btm.transform(biterms)

    print("\n\n Visualize Topics ..")
    vis = pyLDAvis.prepare(btm.phi_wz.T, topics, np.count_nonzero(X, axis=1), vocab, np.sum(X, axis=0))
    pyLDAvis.save_html(vis, './vis/online_btm.html')

    print("\n\n Topic coherence ..")
    topic_summuary(btm.phi_wz.T, X, vocab, 10)

    print("\n\n Texts & Topics ..")
    for i in range(len(texts)):
        print("{} (topic: {})".format(texts[i], topics[i].argmax()))

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.5

Mar 18, 2019

0.1.3

Feb 16, 2019

0.1.2

Feb 16, 2019

0.1.1

Feb 16, 2019

0.1.0

Feb 16, 2019

0.0.1

Nov 30, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biterm-0.1.5.tar.gz (79.7 kB view details)

Uploaded Mar 18, 2019 Source

Built Distribution

biterm-0.1.5-cp36-cp36m-macosx_10_7_x86_64.whl (62.3 kB view details)

Uploaded Mar 18, 2019 CPython 3.6mmacOS 10.7+ x86-64

File details

Details for the file biterm-0.1.5.tar.gz.

File metadata

Download URL: biterm-0.1.5.tar.gz
Upload date: Mar 18, 2019
Size: 79.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.7.3 requests-toolbelt/0.9.1 tqdm/4.29.1 CPython/3.6.8

File hashes

Hashes for biterm-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`e46ed58b95e39247d0c56f3339e15a4a14545f2c1957d95c565f0ed3e3d20384`
MD5	`f9763474ec9de44636d4c1cc87daf172`
BLAKE2b-256	`36ca5a43511e6ea8ca02cc9e8be1b8898ad79b140c055d4400342dc210ba23bb`

See more details on using hashes here.

File details

Details for the file biterm-0.1.5-cp36-cp36m-macosx_10_7_x86_64.whl.

File metadata

Download URL: biterm-0.1.5-cp36-cp36m-macosx_10_7_x86_64.whl
Upload date: Mar 18, 2019
Size: 62.3 kB
Tags: CPython 3.6m, macOS 10.7+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.7.3 requests-toolbelt/0.9.1 tqdm/4.29.1 CPython/3.6.8

File hashes

Hashes for biterm-0.1.5-cp36-cp36m-macosx_10_7_x86_64.whl
Algorithm	Hash digest
SHA256	`bb3b28fdbb31365ee27209c4bc132d6ae9802c2f44648a977f101014e709fa66`
MD5	`ab819c4d671c710e6d900ae3d0195b33`
BLAKE2b-256	`194b8267896db7dc084d2f077253f67aab1e96bafbf3768017f183ce2d216cc1`

See more details on using hashes here.

biterm 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Biterm Topic Model

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes