Skip to main content

Topic modeling with latent Dirichlet allocation

Project description

pypi version github actions build status Zenodo citation

NOTE: This package is in maintenance mode. Critical bugs will be fixed. No new features will be added.

lda implements latent Dirichlet allocation (LDA) using collapsed Gibbs sampling. lda is fast and is tested on Linux, OS X, and Windows.

You can read more about lda in the documentation.

Installation

pip install lda

Getting started

lda.LDA implements latent Dirichlet allocation (LDA). The interface follows conventions found in scikit-learn.

The following demonstrates how to inspect a model of a subset of the Reuters news dataset. The input below, X, is a document-term matrix (sparse matrices are accepted).

>>> import numpy as np
>>> import lda
>>> import lda.datasets
>>> X = lda.datasets.load_reuters()
>>> vocab = lda.datasets.load_reuters_vocab()
>>> titles = lda.datasets.load_reuters_titles()
>>> X.shape
(395, 4258)
>>> X.sum()
84010
>>> model = lda.LDA(n_topics=20, n_iter=1500, random_state=1)
>>> model.fit(X)  # model.fit_transform(X) is also available
>>> topic_word = model.topic_word_  # model.components_ also works
>>> n_top_words = 8
>>> for i, topic_dist in enumerate(topic_word):
...     topic_words = np.array(vocab)[np.argsort(topic_dist)][:-(n_top_words+1):-1]
...     print('Topic {}: {}'.format(i, ' '.join(topic_words)))

Topic 0: british churchill sale million major letters west britain
Topic 1: church government political country state people party against
Topic 2: elvis king fans presley life concert young death
Topic 3: yeltsin russian russia president kremlin moscow michael operation
Topic 4: pope vatican paul john surgery hospital pontiff rome
Topic 5: family funeral police miami versace cunanan city service
Topic 6: simpson former years court president wife south church
Topic 7: order mother successor election nuns church nirmala head
Topic 8: charles prince diana royal king queen parker bowles
Topic 9: film french france against bardot paris poster animal
Topic 10: germany german war nazi letter christian book jews
Topic 11: east peace prize award timor quebec belo leader
Topic 12: n't life show told very love television father
Topic 13: years year time last church world people say
Topic 14: mother teresa heart calcutta charity nun hospital missionaries
Topic 15: city salonika capital buddhist cultural vietnam byzantine show
Topic 16: music tour opera singer israel people film israeli
Topic 17: church catholic bernardin cardinal bishop wright death cancer
Topic 18: harriman clinton u.s ambassador paris president churchill france
Topic 19: city museum art exhibition century million churches set

The document-topic distributions are available in model.doc_topic_.

>>> doc_topic = model.doc_topic_
>>> for i in range(10):
...     print("{} (top topic: {})".format(titles[i], doc_topic[i].argmax()))
0 UK: Prince Charles spearheads British royal revolution. LONDON 1996-08-20 (top topic: 8)
1 GERMANY: Historic Dresden church rising from WW2 ashes. DRESDEN, Germany 1996-08-21 (top topic: 13)
2 INDIA: Mother Teresa's condition said still unstable. CALCUTTA 1996-08-23 (top topic: 14)
3 UK: Palace warns British weekly over Charles pictures. LONDON 1996-08-25 (top topic: 8)
4 INDIA: Mother Teresa, slightly stronger, blesses nuns. CALCUTTA 1996-08-25 (top topic: 14)
5 INDIA: Mother Teresa's condition unchanged, thousands pray. CALCUTTA 1996-08-25 (top topic: 14)
6 INDIA: Mother Teresa shows signs of strength, blesses nuns. CALCUTTA 1996-08-26 (top topic: 14)
7 INDIA: Mother Teresa's condition improves, many pray. CALCUTTA, India 1996-08-25 (top topic: 14)
8 INDIA: Mother Teresa improves, nuns pray for "miracle". CALCUTTA 1996-08-26 (top topic: 14)
9 UK: Charles under fire over prospect of Queen Camilla. LONDON 1996-08-26 (top topic: 8)

Requirements

Python ≥3.10 and NumPy.

Caveat

lda aims for simplicity. (It happens to be fast, as essential parts are written in C via Cython.) If you are working with a very large corpus you may wish to use more sophisticated topic models such as those implemented in hca and MALLET. hca is written entirely in C and MALLET is written in Java. Unlike lda, hca can use more than one processor at a time. Both MALLET and hca implement topic models known to be more robust than standard latent Dirichlet allocation.

Notes

Latent Dirichlet allocation is described in Blei et al. (2003) and Pritchard et al. (2000). Inference using collapsed Gibbs sampling is described in Griffiths and Steyvers (2004).

Other implementations

License

lda is licensed under Version 2.0 of the Mozilla Public License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lda-3.0.2.tar.gz (165.7 kB view details)

Uploaded Source

Built Distributions

lda-3.0.2-cp312-cp312-win_amd64.whl (380.2 kB view details)

Uploaded CPython 3.12 Windows x86-64

lda-3.0.2-cp312-cp312-musllinux_1_2_x86_64.whl (375.2 kB view details)

Uploaded CPython 3.12 musllinux: musl 1.2+ x86-64

lda-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (368.1 kB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

lda-3.0.2-cp312-cp312-macosx_14_0_arm64.whl (274.5 kB view details)

Uploaded CPython 3.12 macOS 14.0+ ARM64

lda-3.0.2-cp311-cp311-win_amd64.whl (355.9 kB view details)

Uploaded CPython 3.11 Windows x86-64

lda-3.0.2-cp311-cp311-musllinux_1_2_x86_64.whl (352.6 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.2+ x86-64

lda-3.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (347.7 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

lda-3.0.2-cp311-cp311-macosx_14_0_arm64.whl (267.9 kB view details)

Uploaded CPython 3.11 macOS 14.0+ ARM64

lda-3.0.2-cp310-cp310-win_amd64.whl (356.4 kB view details)

Uploaded CPython 3.10 Windows x86-64

lda-3.0.2-cp310-cp310-musllinux_1_2_x86_64.whl (354.3 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.2+ x86-64

lda-3.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (350.0 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

lda-3.0.2-cp310-cp310-macosx_14_0_arm64.whl (270.0 kB view details)

Uploaded CPython 3.10 macOS 14.0+ ARM64

File details

Details for the file lda-3.0.2.tar.gz.

File metadata

  • Download URL: lda-3.0.2.tar.gz
  • Upload date:
  • Size: 165.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.4 Darwin/23.5.0

File hashes

Hashes for lda-3.0.2.tar.gz
Algorithm Hash digest
SHA256 76fc6fbb066b6d1ec0360a1541c5e1c8b69a728666525e72644c4d5332fc778a
MD5 fdcc089e5d9408e6a255d9663bad5c3a
BLAKE2b-256 dd46ffc9667172d794fd17daaf296c684ec8ed0d2a3fe9d557014407dfed64fe

See more details on using hashes here.

File details

Details for the file lda-3.0.2-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: lda-3.0.2-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 380.2 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.4 Windows/2022Server

File hashes

Hashes for lda-3.0.2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 178b07b5d37505d4d63fe57b1bcd7208254232dd720b5d9ba0054f9f55f55488
MD5 28be8667291aa37ae6b1c6c0dcb0dd2a
BLAKE2b-256 96b45d9c4497549fc2bb1f9bdaeab34703f7f2da135f26e3f27b5571fb29aba1

See more details on using hashes here.

File details

Details for the file lda-3.0.2-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

  • Download URL: lda-3.0.2-cp312-cp312-musllinux_1_2_x86_64.whl
  • Upload date:
  • Size: 375.2 kB
  • Tags: CPython 3.12, musllinux: musl 1.2+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.4 Linux/6.5.0-1024-azure

File hashes

Hashes for lda-3.0.2-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 20028b96fce4b336b390d9a208916e4ab86588724e99a96b9c29b748fa6753b9
MD5 3bccb50b8c96eaad88a998c14ad28107
BLAKE2b-256 90f591db582f2a56eb34dbc49d28c26a8fbcbc244631a05fe7acefd96b8ba6bb

See more details on using hashes here.

File details

Details for the file lda-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lda-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5b2f40fdbf221192c48e1628e0a3f815247e64d27daf8cee5f7b3106d70ce0a7
MD5 ca9e9ce80c69970f67cc84fb6565b233
BLAKE2b-256 c6d0ffabbf59deae9ef776d9843e04ab865399f627ed35fadbde0cc8250f2ac2

See more details on using hashes here.

File details

Details for the file lda-3.0.2-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

  • Download URL: lda-3.0.2-cp312-cp312-macosx_14_0_arm64.whl
  • Upload date:
  • Size: 274.5 kB
  • Tags: CPython 3.12, macOS 14.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.4 Darwin/23.5.0

File hashes

Hashes for lda-3.0.2-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 4012f7e834ecb7e0be8a046b3c42c149c3124f27099ba97b08b0feea4a0ab751
MD5 72270bb12214dd2bce99bdc2d6742729
BLAKE2b-256 88fbf2dc4cdfbbb8ed5ca75c5a45a816292186413aad75705d83c91814f8bec8

See more details on using hashes here.

File details

Details for the file lda-3.0.2-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: lda-3.0.2-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 355.9 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.4 Windows/2022Server

File hashes

Hashes for lda-3.0.2-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 bb41865290b5f1f19b065d58fa4005f08e71947a01dde7f598f975cba3dd2294
MD5 c71770b7944af17298bc63414559c573
BLAKE2b-256 4ba028c2b275303dcb3e69b570a8e2d42270edea45ae434ac36300cf8429f39e

See more details on using hashes here.

File details

Details for the file lda-3.0.2-cp311-cp311-musllinux_1_2_x86_64.whl.

File metadata

  • Download URL: lda-3.0.2-cp311-cp311-musllinux_1_2_x86_64.whl
  • Upload date:
  • Size: 352.6 kB
  • Tags: CPython 3.11, musllinux: musl 1.2+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.4 Linux/6.5.0-1024-azure

File hashes

Hashes for lda-3.0.2-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 68b95a93cc6c5d7c5c5d391a6e33e2dd97d38093380671d831976e2cf762c0a8
MD5 e9c29b8ba31c1c035762384e3d70f5ec
BLAKE2b-256 dc1da1651b3a261af3f58034dc0773846802f1abd242e1d30d6881d04ddbdbfa

See more details on using hashes here.

File details

Details for the file lda-3.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lda-3.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5092577d2213890f0e1df4e6db02cc50955de97ca94bd57b3405dd53e21c1ab3
MD5 b1bc796ce159c1e1ba546be33c698d49
BLAKE2b-256 be0d8d86a51d49d87bfae2edebe660589cee02423df08bb4322edeac5b9fccfd

See more details on using hashes here.

File details

Details for the file lda-3.0.2-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

  • Download URL: lda-3.0.2-cp311-cp311-macosx_14_0_arm64.whl
  • Upload date:
  • Size: 267.9 kB
  • Tags: CPython 3.11, macOS 14.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.4 Darwin/23.5.0

File hashes

Hashes for lda-3.0.2-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 d8b4773bed0b1428230ead6bd7ec223861f4189971a28e2fe7c9501452810f27
MD5 ade95907c01faf934d5d75fdb7ce984f
BLAKE2b-256 c153aa93fcd3d7995e3c613701b29f33251c5680f5c294e698de74fbdb37539a

See more details on using hashes here.

File details

Details for the file lda-3.0.2-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: lda-3.0.2-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 356.4 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.4 Windows/2022Server

File hashes

Hashes for lda-3.0.2-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 61d7b27aa3759c113e8a4deedfdc86a3fcbbed2cae64fe96c68e3eaadcdb78e8
MD5 4e40b7ac97eefd5aaa0c201b4374d7a7
BLAKE2b-256 a3e3b8759d381ede320c949073cba778b4bb37ebb61a148876e6897b8ca7016a

See more details on using hashes here.

File details

Details for the file lda-3.0.2-cp310-cp310-musllinux_1_2_x86_64.whl.

File metadata

  • Download URL: lda-3.0.2-cp310-cp310-musllinux_1_2_x86_64.whl
  • Upload date:
  • Size: 354.3 kB
  • Tags: CPython 3.10, musllinux: musl 1.2+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.4 Linux/6.5.0-1024-azure

File hashes

Hashes for lda-3.0.2-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 a92c6862476a640014fd7f891a579b446c41b4f3af68861a8f8179990ffa2c85
MD5 d984e0afffe09c3a58aec72a58827306
BLAKE2b-256 2a027f6d4d0f9f92b940461a6a023d42dd5674fba78b1439a092232f24afa635

See more details on using hashes here.

File details

Details for the file lda-3.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lda-3.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 69783191736d1b6253edc84d0c2fa2f4189ef04c18680201c0d77866cad38f62
MD5 a15d794045677b21221414beb5e0cc0d
BLAKE2b-256 b4c208572d76335ac6f4b77a67811eceaa79887b874292c6784cfbe6a4f905b7

See more details on using hashes here.

File details

Details for the file lda-3.0.2-cp310-cp310-macosx_14_0_arm64.whl.

File metadata

  • Download URL: lda-3.0.2-cp310-cp310-macosx_14_0_arm64.whl
  • Upload date:
  • Size: 270.0 kB
  • Tags: CPython 3.10, macOS 14.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.4 Darwin/23.5.0

File hashes

Hashes for lda-3.0.2-cp310-cp310-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 3a6dfc8d3aede1f4d8debde89dca0d871066b756bb43bc6b6a8062ca81555ec2
MD5 da3eb3ea40be6ae4c4ba886fd5c4eddd
BLAKE2b-256 3c00fb9f066ae18294f96305169e38018db6b1b70f0c0921284824425d90e53d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page