Skip to main content

Python framework for fast Vector Space Modelling

Project description

==============================================
gensim -- Topic Modelling in Python
==============================================

|Travis|_
|Wheel|_

.. |Travis| image:: https://img.shields.io/travis/RaRe-Technologies/gensim/develop.svg
.. |Wheel| image:: https://img.shields.io/pypi/wheel/gensim.svg

.. _Travis: https://travis-ci.org/RaRe-Technologies/gensim
.. _Downloads: https://pypi.python.org/pypi/gensim
.. _License: http://radimrehurek.com/gensim/about.html
.. _Wheel: https://pypi.python.org/pypi/gensim

Gensim is a Python library for *topic modelling*, *document indexing* and *similarity retrieval* with large corpora.
Target audience is the *natural language processing* (NLP) and *information retrieval* (IR) community.

Features
---------

* All algorithms are **memory-independent** w.r.t. the corpus size (can process input larger than RAM, streamed, out-of-core),
* **Intuitive interfaces**

* easy to plug in your own input corpus/datastream (trivial streaming API)
* easy to extend with other Vector Space algorithms (trivial transformation API)

* Efficient multicore implementations of popular algorithms, such as online **Latent Semantic Analysis (LSA/LSI/SVD)**,
**Latent Dirichlet Allocation (LDA)**, **Random Projections (RP)**, **Hierarchical Dirichlet Process (HDP)** or **word2vec deep learning**.
* **Distributed computing**: can run *Latent Semantic Analysis* and *Latent Dirichlet Allocation* on a cluster of computers.
* Extensive `documentation and Jupyter Notebook tutorials <https://github.com/RaRe-Technologies/gensim/#documentation>`_.


If this feature list left you scratching your head, you can first read more about the `Vector
Space Model <http://en.wikipedia.org/wiki/Vector_space_model>`_ and `unsupervised
document analysis <http://en.wikipedia.org/wiki/Latent_semantic_indexing>`_ on Wikipedia.

Installation
------------

This software depends on `NumPy and Scipy <http://www.scipy.org/Download>`_, two Python packages for scientific computing.
You must have them installed prior to installing `gensim`.

It is also recommended you install a fast BLAS library before installing NumPy. This is optional, but using an optimized BLAS such as `ATLAS <http://math-atlas.sourceforge.net/>`_ or `OpenBLAS <http://xianyi.github.io/OpenBLAS/>`_ is known to improve performance by as much as an order of magnitude. On OS X, NumPy picks up the BLAS that comes with it automatically, so you don't need to do anything special.

The simple way to install `gensim` is::

pip install -U gensim

Or, if you have instead downloaded and unzipped the `source tar.gz <http://pypi.python.org/pypi/gensim>`_ package,
you'd run::

python setup.py test
python setup.py install


For alternative modes of installation (without root privileges, development
installation, optional install features), see the `install documentation <http://radimrehurek.com/gensim/install.html>`_.

This version has been tested under Python 2.7, 3.5 and 3.6. Support for Python 2.6, 3.3 and 3.4 was dropped in gensim 1.0.0. Install gensim 0.13.4 if you *must* use Python 2.6, 3.3 or 3.4. Support for Python 2.5 was dropped in gensim 0.10.0; install gensim 0.9.1 if you *must* use Python 2.5). Gensim's github repo is hooked against `Travis CI for automated testing <https://travis-ci.org/RaRe-Technologies/gensim>`_ on every commit push and pull request.

How come gensim is so fast and memory efficient? Isn't it pure Python, and isn't Python slow and greedy?
--------------------------------------------------------------------------------------------------------

Many scientific algorithms can be expressed in terms of large matrix operations (see the BLAS note above). Gensim taps into these low-level BLAS libraries, by means of its dependency on NumPy. So while gensim-the-top-level-code is pure Python, it actually executes highly optimized Fortran/C under the hood, including multithreading (if your BLAS is so configured).

Memory-wise, gensim makes heavy use of Python's built-in generators and iterators for streamed data processing. Memory efficiency was one of gensim's `design goals <http://radimrehurek.com/gensim/about.html>`_, and is a central feature of gensim, rather than something bolted on as an afterthought.

Documentation
-------------
* `QuickStart`_
* `Tutorials`_
* `Tutorial Videos`_
* `Official Documentation and Walkthrough`_

Citing gensim
-------------

When `citing gensim in academic papers and theses <https://scholar.google.cz/citations?view_op=view_citation&hl=en&user=9vG_kV0AAAAJ&citation_for_view=9vG_kV0AAAAJ:u-x6o8ySG0sC>`_, please use this BibTeX entry::

@inproceedings{rehurek_lrec,
title = {{Software Framework for Topic Modelling with Large Corpora}},
author = {Radim { R}eh{
u}{ r}ek and Petr Sojka},
booktitle = {{Proceedings of the LREC 2010 Workshop on New
Challenges for NLP Frameworks}},
pages = {45--50},
year = 2010,
month = May,
day = 22,
publisher = {ELRA},
address = {Valletta, Malta},
language={English}
}

----------------

Gensim is open source software released under the `GNU LGPLv2.1 license <http://www.gnu.org/licenses/old-licenses/lgpl-2.1.en.html>`_.
Copyright (c) 2009-now Radim Rehurek

|Analytics|_

.. |Analytics| image:: https://ga-beacon.appspot.com/UA-24066335-5/your-repo/page-name
.. _Analytics: https://github.com/igrigorik/ga-beacon
.. _Official Documentation and Walkthrough: http://radimrehurek.com/gensim/
.. _Tutorials: https://github.com/RaRe-Technologies/gensim/blob/develop/tutorials.md#tutorials
.. _Tutorial Videos: https://github.com/RaRe-Technologies/gensim/blob/develop/tutorials.md#videos
.. _QuickStart: https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/gensim%20Quick%20Start.ipynb



Project details


Release history Release notifications | RSS feed

This version

3.1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gensim-3.1.0.tar.gz (15.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

gensim-3.1.0-cp36-cp36m-win_amd64.whl (14.3 MB view details)

Uploaded CPython 3.6mWindows x86-64

gensim-3.1.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.6mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

gensim-3.1.0-cp35-cp35m-win_amd64.whl (14.3 MB view details)

Uploaded CPython 3.5mWindows x86-64

gensim-3.1.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.5mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

gensim-3.1.0-cp27-cp27m-win_amd64.whl (14.3 MB view details)

Uploaded CPython 2.7mWindows x86-64

gensim-3.1.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (14.4 MB view details)

Uploaded CPython 2.7mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

File details

Details for the file gensim-3.1.0.tar.gz.

File metadata

  • Download URL: gensim-3.1.0.tar.gz
  • Upload date:
  • Size: 15.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for gensim-3.1.0.tar.gz
Algorithm Hash digest
SHA256 eaf6ece613b47eff19be5b1344e9ac2d65a681f3edd2b8d347458bf87ada8b24
MD5 91a3c02e0f4f9ea47e90bd0e9173855c
BLAKE2b-256 92e9a5cff586629b6e6e35707214a7f9ee3a9e3a28cf30d79035f8646466b1f3

See more details on using hashes here.

File details

Details for the file gensim-3.1.0-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for gensim-3.1.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 147dd2255ef02fa8a669c78d1d52b9ce605d54bbe5b65603b85e9cbc1b54289f
MD5 0fa50f320830c0b0a226e37c1508d419
BLAKE2b-256 b4d068de2cab60ef083d082c61b79d4393b59ea2af416db0c6ab29a5957a42b1

See more details on using hashes here.

File details

Details for the file gensim-3.1.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.1.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 e98314e2472dc7eb567b30286e9e57fb179aa88147e45eaba8934a887eeccf6f
MD5 e8232cd6c6cf30fcdf29214d835c8938
BLAKE2b-256 d314c29431cb7452df52ca57323bfcb16e0c8eea141ce16469cea701ab46a08f

See more details on using hashes here.

File details

Details for the file gensim-3.1.0-cp35-cp35m-win_amd64.whl.

File metadata

File hashes

Hashes for gensim-3.1.0-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 a06d5b209eb9d13868b23d2b629ffa0f9c59a93f5d67b3551c5dc0590b8e3a3e
MD5 34551e6a044e417e7fa305bf5cf97586
BLAKE2b-256 ee0743968ffaf3e496de5cd4e7d682f318a63f4becc3e004781504227fcf5c57

See more details on using hashes here.

File details

Details for the file gensim-3.1.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.1.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 7609a19e2cdc1b93f973254495f3e71de22db26ce8c9d5aac7f0f1263901197e
MD5 dd2a1925bb236b6a9720682577655030
BLAKE2b-256 894a5fb42df2257a12c9ecaf848508aae44724b14c1c4892215a2296da6184a0

See more details on using hashes here.

File details

Details for the file gensim-3.1.0-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for gensim-3.1.0-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 bfcb69027f1f4b67e1c7d437027a1a2c71e1bc6c13b7642fedbecacf0966b92a
MD5 637e2e108bdbb4874016faef998798a6
BLAKE2b-256 acb3f53abef41ecda58d2b79e1665c926621c9b8725c247fecb40638bb85a93b

See more details on using hashes here.

File details

Details for the file gensim-3.1.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.1.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 188701ebd5ad481b8cbf2c69fc4f8a2bfff0fb1f0c8b417025f1cfd952b9fa42
MD5 8e9216c185193dbb421fe84bff0b31ae
BLAKE2b-256 55593e4088b78580f2a1ef635f24943e17541b7692ff31dbbc05a052326accb6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page