Skip to main content

Python Framework for Topic Modeling

Project description

Gensim is a Python framework for unsupervised learning from raw, unstructured digital texts.
It provides a framework for learning hidden (*latent*) corpus structure.
Once found, documents can be succinctly expressed in terms of this structure, queried
for topical similarity and so on.

Gensim includes the following features:
* Memory independence -- there is no need for the whole text corpus (or any
intermediate term-document matrices) to reside fully in RAM at any one time.
* Provides implementations for several popular topic inference algorithms,
including Latent Semantic Analysis (LSA, LSI) and Latent Dirichlet Allocation (LDA),
and makes adding new ones simple.
* Contains I/O wrappers and converters around several popular data formats.
* Allows similarity queries across documents in their latent, topical representation.

The principal design objectives behind gensim are:
1. Straightforward interfaces and low API learning curve for developers,
facilitating modifications and rapid prototyping.
2. Memory independence with respect to the size of the input corpus; all intermediate
steps and algorithms operate in a streaming fashion, processing one document
at a time.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gensim-0.3.0.tar.gz (124.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gensim-0.3.0-py2.5.egg (130.8 kB view details)

Uploaded Egg

File details

Details for the file gensim-0.3.0.tar.gz.

File metadata

  • Download URL: gensim-0.3.0.tar.gz
  • Upload date:
  • Size: 124.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for gensim-0.3.0.tar.gz
Algorithm Hash digest
SHA256 9a43e91473b3b7b8471e1ac5d3a2df1615f6d14a343de4bf088504787c24a3b5
MD5 1009141ab11f4b6520a83b4300505cd4
BLAKE2b-256 0eb23c062f4009408209bb899dceb9093d814c76d5fec141136ae5f9075e9e81

See more details on using hashes here.

File details

Details for the file gensim-0.3.0-py2.5.egg.

File metadata

  • Download URL: gensim-0.3.0-py2.5.egg
  • Upload date:
  • Size: 130.8 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for gensim-0.3.0-py2.5.egg
Algorithm Hash digest
SHA256 6baeb63d82020307ef43a777f91730230d1f9304a0dc6b0433771f4026be9ff6
MD5 a2d0ef0fb9b4a6d7224ec102ddfb6670
BLAKE2b-256 526688ec78bc4ea8aa2318ccf44f9fd2349ebfef75e6ab37fdea8b7974b7696e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page