Python Framework for Topic Modeling
Project description
Gensim is a Python framework for unsupervised learning from raw, unstructured digital texts.
It provides a framework for learning hidden (*latent*) corpus structure.
Once found, documents can be succinctly expressed in terms of this structure, queried
for topical similarity and so on.
Gensim includes the following features:
* Memory independence -- there is no need for the whole text corpus (or any
intermediate term-document matrices) to reside fully in RAM at any one time.
* Provides implementations for several popular topic inference algorithms,
including Latent Semantic Analysis (LSA, LSI) and Latent Dirichlet Allocation (LDA),
and makes adding new ones simple.
* Contains I/O wrappers and converters around several popular data formats.
* Allows similarity queries across documents in their latent, topical representation.
The principal design objectives behind gensim are:
1. Straightforward interfaces and low API learning curve for developers,
facilitating modifications and rapid prototyping.
2. Memory independence with respect to the size of the input corpus; all intermediate
steps and algorithms operate in a streaming fashion, processing one document
at a time.
It provides a framework for learning hidden (*latent*) corpus structure.
Once found, documents can be succinctly expressed in terms of this structure, queried
for topical similarity and so on.
Gensim includes the following features:
* Memory independence -- there is no need for the whole text corpus (or any
intermediate term-document matrices) to reside fully in RAM at any one time.
* Provides implementations for several popular topic inference algorithms,
including Latent Semantic Analysis (LSA, LSI) and Latent Dirichlet Allocation (LDA),
and makes adding new ones simple.
* Contains I/O wrappers and converters around several popular data formats.
* Allows similarity queries across documents in their latent, topical representation.
The principal design objectives behind gensim are:
1. Straightforward interfaces and low API learning curve for developers,
facilitating modifications and rapid prototyping.
2. Memory independence with respect to the size of the input corpus; all intermediate
steps and algorithms operate in a streaming fashion, processing one document
at a time.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gensim-0.3.0.tar.gz
(124.4 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
gensim-0.3.0-py2.5.egg
(130.8 kB
view details)
File details
Details for the file gensim-0.3.0.tar.gz.
File metadata
- Download URL: gensim-0.3.0.tar.gz
- Upload date:
- Size: 124.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a43e91473b3b7b8471e1ac5d3a2df1615f6d14a343de4bf088504787c24a3b5
|
|
| MD5 |
1009141ab11f4b6520a83b4300505cd4
|
|
| BLAKE2b-256 |
0eb23c062f4009408209bb899dceb9093d814c76d5fec141136ae5f9075e9e81
|
File details
Details for the file gensim-0.3.0-py2.5.egg.
File metadata
- Download URL: gensim-0.3.0-py2.5.egg
- Upload date:
- Size: 130.8 kB
- Tags: Egg
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6baeb63d82020307ef43a777f91730230d1f9304a0dc6b0433771f4026be9ff6
|
|
| MD5 |
a2d0ef0fb9b4a6d7224ec102ddfb6670
|
|
| BLAKE2b-256 |
526688ec78bc4ea8aa2318ccf44f9fd2349ebfef75e6ab37fdea8b7974b7696e
|