Skip to main content

Python framework for fast Vector Space Modelling

Project description

==============================================
gensim -- Topic Modelling in Python
==============================================

|Travis|_
|Wheel|_

.. |Travis| image:: https://img.shields.io/travis/RaRe-Technologies/gensim/develop.svg
.. |Wheel| image:: https://img.shields.io/pypi/wheel/gensim.svg

.. _Travis: https://travis-ci.org/RaRe-Technologies/gensim
.. _Downloads: https://pypi.python.org/pypi/gensim
.. _License: http://radimrehurek.com/gensim/about.html
.. _Wheel: https://pypi.python.org/pypi/gensim

Gensim is a Python library for *topic modelling*, *document indexing* and *similarity retrieval* with large corpora.
Target audience is the *natural language processing* (NLP) and *information retrieval* (IR) community.

Features
---------

* All algorithms are **memory-independent** w.r.t. the corpus size (can process input larger than RAM, streamed, out-of-core),
* **Intuitive interfaces**

* easy to plug in your own input corpus/datastream (trivial streaming API)
* easy to extend with other Vector Space algorithms (trivial transformation API)

* Efficient multicore implementations of popular algorithms, such as online **Latent Semantic Analysis (LSA/LSI/SVD)**,
**Latent Dirichlet Allocation (LDA)**, **Random Projections (RP)**, **Hierarchical Dirichlet Process (HDP)** or **word2vec deep learning**.
* **Distributed computing**: can run *Latent Semantic Analysis* and *Latent Dirichlet Allocation* on a cluster of computers.
* Extensive `documentation and Jupyter Notebook tutorials <https://github.com/RaRe-Technologies/gensim/#documentation>`_.


If this feature list left you scratching your head, you can first read more about the `Vector
Space Model <http://en.wikipedia.org/wiki/Vector_space_model>`_ and `unsupervised
document analysis <http://en.wikipedia.org/wiki/Latent_semantic_indexing>`_ on Wikipedia.

Installation
------------

This software depends on `NumPy and Scipy <http://www.scipy.org/Download>`_, two Python packages for scientific computing.
You must have them installed prior to installing `gensim`.

It is also recommended you install a fast BLAS library before installing NumPy. This is optional, but using an optimized BLAS such as `ATLAS <http://math-atlas.sourceforge.net/>`_ or `OpenBLAS <http://xianyi.github.io/OpenBLAS/>`_ is known to improve performance by as much as an order of magnitude. On OS X, NumPy picks up the BLAS that comes with it automatically, so you don't need to do anything special.

The simple way to install `gensim` is::

pip install -U gensim

Or, if you have instead downloaded and unzipped the `source tar.gz <http://pypi.python.org/pypi/gensim>`_ package,
you'd run::

python setup.py test
python setup.py install


For alternative modes of installation (without root privileges, development
installation, optional install features), see the `install documentation <http://radimrehurek.com/gensim/install.html>`_.

This version has been tested under Python 2.7, 3.5 and 3.6. Support for Python 2.6, 3.3 and 3.4 was dropped in gensim 1.0.0. Install gensim 0.13.4 if you *must* use Python 2.6, 3.3 or 3.4. Support for Python 2.5 was dropped in gensim 0.10.0; install gensim 0.9.1 if you *must* use Python 2.5). Gensim's github repo is hooked against `Travis CI for automated testing <https://travis-ci.org/RaRe-Technologies/gensim>`_ on every commit push and pull request.

How come gensim is so fast and memory efficient? Isn't it pure Python, and isn't Python slow and greedy?
--------------------------------------------------------------------------------------------------------

Many scientific algorithms can be expressed in terms of large matrix operations (see the BLAS note above). Gensim taps into these low-level BLAS libraries, by means of its dependency on NumPy. So while gensim-the-top-level-code is pure Python, it actually executes highly optimized Fortran/C under the hood, including multithreading (if your BLAS is so configured).

Memory-wise, gensim makes heavy use of Python's built-in generators and iterators for streamed data processing. Memory efficiency was one of gensim's `design goals <http://radimrehurek.com/gensim/about.html>`_, and is a central feature of gensim, rather than something bolted on as an afterthought.

Documentation
-------------
* `QuickStart`_
* `Tutorials`_
* `Tutorial Videos`_
* `Official Documentation and Walkthrough`_

Citing gensim
-------------

When `citing gensim in academic papers and theses <https://scholar.google.cz/citations?view_op=view_citation&hl=en&user=9vG_kV0AAAAJ&citation_for_view=9vG_kV0AAAAJ:u-x6o8ySG0sC>`_, please use this BibTeX entry::

@inproceedings{rehurek_lrec,
title = {{Software Framework for Topic Modelling with Large Corpora}},
author = {Radim { R}eh{
u}{ r}ek and Petr Sojka},
booktitle = {{Proceedings of the LREC 2010 Workshop on New
Challenges for NLP Frameworks}},
pages = {45--50},
year = 2010,
month = May,
day = 22,
publisher = {ELRA},
address = {Valletta, Malta},
language={English}
}

----------------

Gensim is open source software released under the `GNU LGPLv2.1 license <http://www.gnu.org/licenses/old-licenses/lgpl-2.1.en.html>`_.
Copyright (c) 2009-now Radim Rehurek

|Analytics|_

.. |Analytics| image:: https://ga-beacon.appspot.com/UA-24066335-5/your-repo/page-name
.. _Analytics: https://github.com/igrigorik/ga-beacon
.. _Official Documentation and Walkthrough: http://radimrehurek.com/gensim/
.. _Tutorials: https://github.com/RaRe-Technologies/gensim/blob/develop/tutorials.md#tutorials
.. _Tutorial Videos: https://github.com/RaRe-Technologies/gensim/blob/develop/tutorials.md#videos
.. _QuickStart: https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/gensim%20Quick%20Start.ipynb



Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gensim-2.3.0.tar.gz (17.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

gensim-2.3.0.win-amd64-py3.6.exe (6.9 MB view details)

Uploaded Source

gensim-2.3.0.win-amd64-py3.5.exe (6.9 MB view details)

Uploaded Source

gensim-2.3.0.win-amd64-py2.7.exe (6.5 MB view details)

Uploaded Source

gensim-2.3.0.win32-py3.6.exe (6.7 MB view details)

Uploaded Source

gensim-2.3.0.win32-py3.5.exe (6.7 MB view details)

Uploaded Source

gensim-2.3.0.win32-py2.7.exe (6.5 MB view details)

Uploaded Source

gensim-2.3.0-cp36-cp36m-win_amd64.whl (6.3 MB view details)

Uploaded CPython 3.6mWindows x86-64

gensim-2.3.0-cp36-cp36m-win32.whl (6.3 MB view details)

Uploaded CPython 3.6mWindows x86

gensim-2.3.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.6mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

gensim-2.3.0-cp35-cp35m-win_amd64.whl (6.3 MB view details)

Uploaded CPython 3.5mWindows x86-64

gensim-2.3.0-cp35-cp35m-win32.whl (6.3 MB view details)

Uploaded CPython 3.5mWindows x86

gensim-2.3.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.5mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

gensim-2.3.0-cp27-cp27m-win_amd64.whl (6.3 MB view details)

Uploaded CPython 2.7mWindows x86-64

gensim-2.3.0-cp27-cp27m-win32.whl (6.3 MB view details)

Uploaded CPython 2.7mWindows x86

gensim-2.3.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (6.4 MB view details)

Uploaded CPython 2.7mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

File details

Details for the file gensim-2.3.0.tar.gz.

File metadata

  • Download URL: gensim-2.3.0.tar.gz
  • Upload date:
  • Size: 17.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for gensim-2.3.0.tar.gz
Algorithm Hash digest
SHA256 7d0dccc7d2c576e270037949874800b7cfbc86ef081ff981483f612cd18e223f
MD5 bbe21a252916f154253ae6be625ae802
BLAKE2b-256 bcedfbbb2cc3f37a39cc4ff8e5f667374478fb852b384840aa7feb9608144290

See more details on using hashes here.

File details

Details for the file gensim-2.3.0.win-amd64-py3.6.exe.

File metadata

File hashes

Hashes for gensim-2.3.0.win-amd64-py3.6.exe
Algorithm Hash digest
SHA256 17e577c9dc433f04e899e7716ad428db74b1d45e2ea768814f350b58966aeecb
MD5 3c7016ebc97ef4600d0638caaf8e0293
BLAKE2b-256 041afe9acb3ad26415e76beeb28cfc6dca958db8adc44748909fcb9856aedd40

See more details on using hashes here.

File details

Details for the file gensim-2.3.0.win-amd64-py3.5.exe.

File metadata

File hashes

Hashes for gensim-2.3.0.win-amd64-py3.5.exe
Algorithm Hash digest
SHA256 ca45fe9d8ff810f0f1d27302ee4cb536c558e7636a96eb496696719beb02dda1
MD5 99bcb30fadbe21db7acd4b348bf3a799
BLAKE2b-256 42bce8844437569f2f3a4f88939fad33d59ab92ffd67af7dca00993cc2a37ef8

See more details on using hashes here.

File details

Details for the file gensim-2.3.0.win-amd64-py2.7.exe.

File metadata

File hashes

Hashes for gensim-2.3.0.win-amd64-py2.7.exe
Algorithm Hash digest
SHA256 03fee047d913c688ec9a6b2dd1ee69b397f1a957254d905c17d38c1d458e5fa4
MD5 38ab219ea931e7fcb494dbd428fe3ac9
BLAKE2b-256 b4b27f5fb62d543e48e8e6bf965df5bbd06b53258e249cd828a91cf9153a080c

See more details on using hashes here.

File details

Details for the file gensim-2.3.0.win32-py3.6.exe.

File metadata

File hashes

Hashes for gensim-2.3.0.win32-py3.6.exe
Algorithm Hash digest
SHA256 753a6dbce2bae9c0c9acd52a63a268856699c43e43593f12fb126fe02600b542
MD5 a94d32819262ad220bc7bd4774e59270
BLAKE2b-256 971f2e839f32957b7b0301f56cfe6e218dae924755725112eb2be65a1532d030

See more details on using hashes here.

File details

Details for the file gensim-2.3.0.win32-py3.5.exe.

File metadata

File hashes

Hashes for gensim-2.3.0.win32-py3.5.exe
Algorithm Hash digest
SHA256 0bd536c90b2e50298c41697fca94a6a250e705f412444a8bb6add0b97744c561
MD5 c5df3ab75ec8419ebeae10d773d7df67
BLAKE2b-256 e129a826a5add7937ea1e746041f91669c770b4807bd51087488677a97ba59ed

See more details on using hashes here.

File details

Details for the file gensim-2.3.0.win32-py2.7.exe.

File metadata

File hashes

Hashes for gensim-2.3.0.win32-py2.7.exe
Algorithm Hash digest
SHA256 d830246806d9abd4e252ab2d7d124dc327175ca1745942b751df86c6898afbad
MD5 d44604605b4325c812b52dac653fabfb
BLAKE2b-256 1fb3304d55e403d1e32d77d719cc862e932b1b2a7f44939887cfd8524dadff68

See more details on using hashes here.

File details

Details for the file gensim-2.3.0-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for gensim-2.3.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 269e51707d3ab1dc6ea490831c76096d1e7ef9a2aa8ec184ff82643f83a1c51e
MD5 f63b8e703042d3cda818d8d573eee46d
BLAKE2b-256 baa17995a21b4d47139359a98002dbbfc0a0b793fe63dbef9def3b297f52a6ce

See more details on using hashes here.

File details

Details for the file gensim-2.3.0-cp36-cp36m-win32.whl.

File metadata

File hashes

Hashes for gensim-2.3.0-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 914ff37693da1af065f3accab06c47a771bcb2e4353e7b8f9e8d748fd4b46ff7
MD5 89f87e2810dbfa101c96df2a9a4df666
BLAKE2b-256 9305ec5572d106e1af05de13c188ac8ec61a2720cb54c0e99ecf737123eab96e

See more details on using hashes here.

File details

Details for the file gensim-2.3.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-2.3.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 9ebab0d470cd9542b77d39d1ca3b7dc0c12f49e0d01ebbec4d3636ef8e4fdf61
MD5 f69ce7b61d6064061001e06e1cf2cd9f
BLAKE2b-256 82b397e377d02ffe480b552d58810cac827a929b0b3180dac9b3625d0d2be138

See more details on using hashes here.

File details

Details for the file gensim-2.3.0-cp35-cp35m-win_amd64.whl.

File metadata

File hashes

Hashes for gensim-2.3.0-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 90fa3bd5c0ea2eb1beb3c345180e666a8bfd09a7f870460485a31a5bcc7df290
MD5 553c3373f9c6c101a28fdf44a47523b4
BLAKE2b-256 a80bd09cf16d7f2f03b03b4dcb2556f119628fc40021306cf3b4f06b6ce641d1

See more details on using hashes here.

File details

Details for the file gensim-2.3.0-cp35-cp35m-win32.whl.

File metadata

File hashes

Hashes for gensim-2.3.0-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 59998aaf0c02008e80e143376cef90b456456cb06b8d24e5e57f40c1c0513da8
MD5 d6a4ee3f40444dbc56b9912ade2c6e67
BLAKE2b-256 08b87bf20a2708d86fa38157bce2a6c45b0b8a65e0bec0d141799e7404d14def

See more details on using hashes here.

File details

Details for the file gensim-2.3.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-2.3.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 44be8a06ba5b5b106a278220bd70c44ea4349e9dd952b261d0e674e0b6101c7c
MD5 878a0a1ebfcf039846c0b13858b256ce
BLAKE2b-256 a8961a57055f82a946f7b64d40541d8eaa59d0cd6d6ee504d5f03262d022ed7f

See more details on using hashes here.

File details

Details for the file gensim-2.3.0-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for gensim-2.3.0-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 3be59d43484a4f4ecbe6ae72d8a6aaf231b695e28d93d661dc2b8a0e7bf67945
MD5 5aec43d32162dbd3eeca1a4e9d61de70
BLAKE2b-256 04dac316f87dca44334b0c9cdd235edb16f90c25dc020d4828b4df4d79a2dff0

See more details on using hashes here.

File details

Details for the file gensim-2.3.0-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for gensim-2.3.0-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 ecca5c2804677fb61e06818e23f03ed54ecad78faeee8b8cdcabad376d6a0449
MD5 20e6f738d1c7a5653e84d8420451d7cb
BLAKE2b-256 e7bb4c1824f65853d7b71b54f1f1ba3774483c96d3c235622ad6f403a287887f

See more details on using hashes here.

File details

Details for the file gensim-2.3.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-2.3.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 8bfe28a43d1c48904ef1173402193960be86a269164a76a0965a250f09f9aba2
MD5 9fccd1934a08aabbc30500c1cb3fb219
BLAKE2b-256 078e3d87f4947e7dddd300c4ba2ff6a7062a86fe367dc6de3341b5c643891bea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page