Skip to main content

Python library that performs Latent Dirichlet Allocation using Gibbs sampling.

Project description

# topic-modelling-tools Topic Modelling with Latent Dirichlet Allocation using Gibbs sampling. This version of the package uses the GNU Scientific Library for random number generation, providing faster performance than numpy.

by Stephen Hansen, stephen.hansen@economics.ox.ac.uk Associate Professor of Economics, University of Oxford

Python/Cython code for cleaning text and estimating LDA via collapsed Gibbs sampling as in Griffiths and Steyvers (2004).

Tutorial scripts and notebooks making use of this library, along with some example data, can be found in: https://github.com/sekhansen/text-mining-tutorial

## Installation instructions

This version of the package requires the GNU Scientific Library (GSL) to be installed. You can download GSL from ftp://ftp.gnu.org/gnu/gsl/ or for Mac OSX using homebrew, you can do brew install gsl. If you have conda, do conda install gsl.

(For a version that doesn’t require GSL (but is somewhat slower), checkout the “master” branch of this repository, or pip install topic-modelling-tools.)

If you already have GSL, Python and pip installed, pip install topic-modelling-tools_gsl should work. The package depends on some other python libraries such as numpy and nltk but this should be taken care of by pip.

The only other requirement is that a C++ compiler is needed to build the Cython code. For Mac OS X you can download Xcode command-line tools, while for Windows you can download the Visual Studio C++ compiler.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for topic-modelling-tools-fast, version 0.7.dev0
Filename, size File type Python version Upload date Hashes
Filename, size topic_modelling_tools_fast-0.7.dev0-cp36-cp36m-macosx_10_7_x86_64.whl (112.0 kB) File type Wheel Python version cp36 Upload date Hashes View hashes
Filename, size topic-modelling-tools_fast-0.7.dev0.tar.gz (4.1 MB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page