Skip to main content

Python framework for fast Vector Space Modelling

Project description

Travis Wheel

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.

Features

  • All algorithms are memory-independent w.r.t. the corpus size (can process input larger than RAM, streamed, out-of-core),

  • Intuitive interfaces

    • easy to plug in your own input corpus/datastream (trivial streaming API)

    • easy to extend with other Vector Space algorithms (trivial transformation API)

  • Efficient multicore implementations of popular algorithms, such as online Latent Semantic Analysis (LSA/LSI/SVD), Latent Dirichlet Allocation (LDA), Random Projections (RP), Hierarchical Dirichlet Process (HDP) or word2vec deep learning.

  • Distributed computing: can run Latent Semantic Analysis and Latent Dirichlet Allocation on a cluster of computers.

  • Extensive documentation and Jupyter Notebook tutorials.

If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

Installation

This software depends on NumPy and Scipy, two Python packages for scientific computing. You must have them installed prior to installing gensim.

It is also recommended you install a fast BLAS library before installing NumPy. This is optional, but using an optimized BLAS such as ATLAS or OpenBLAS is known to improve performance by as much as an order of magnitude. On OS X, NumPy picks up the BLAS that comes with it automatically, so you don’t need to do anything special.

The simple way to install gensim is:

pip install -U gensim

Or, if you have instead downloaded and unzipped the source tar.gz package, you’d run:

python setup.py test
python setup.py install

For alternative modes of installation (without root privileges, development installation, optional install features), see the install documentation.

This version has been tested under Python 2.7, 3.5 and 3.6. Support for Python 2.6, 3.3 and 3.4 was dropped in gensim 1.0.0. Install gensim 0.13.4 if you must use Python 2.6, 3.3 or 3.4. Support for Python 2.5 was dropped in gensim 0.10.0; install gensim 0.9.1 if you must use Python 2.5). Gensim’s github repo is hooked against Travis CI for automated testing on every commit push and pull request.

How come gensim is so fast and memory efficient? Isn’t it pure Python, and isn’t Python slow and greedy?

Many scientific algorithms can be expressed in terms of large matrix operations (see the BLAS note above). Gensim taps into these low-level BLAS libraries, by means of its dependency on NumPy. So while gensim-the-top-level-code is pure Python, it actually executes highly optimized Fortran/C under the hood, including multithreading (if your BLAS is so configured).

Memory-wise, gensim makes heavy use of Python’s built-in generators and iterators for streamed data processing. Memory efficiency was one of gensim’s design goals, and is a central feature of gensim, rather than something bolted on as an afterthought.

Documentation

Citing gensim

When citing gensim in academic papers and theses, please use this BibTeX entry:

@inproceedings{rehurek_lrec,
      title = {{Software Framework for Topic Modelling with Large Corpora}},
      author = {Radim {\v R}eh{\r u}{\v r}ek and Petr Sojka},
      booktitle = {{Proceedings of the LREC 2010 Workshop on New
           Challenges for NLP Frameworks}},
      pages = {45--50},
      year = 2010,
      month = May,
      day = 22,
      publisher = {ELRA},
      address = {Valletta, Malta},
      language={English}
}

Gensim is open source software released under the GNU LGPLv2.1 license. Copyright (c) 2009-now Radim Rehurek

Analytics

Project details


Release history Release notifications | RSS feed

This version

3.4.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gensim-3.4.0.tar.gz (22.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

gensim-3.4.0.win-amd64-py3.6.exe (23.1 MB view details)

Uploaded Source

gensim-3.4.0.win-amd64-py3.5.exe (23.1 MB view details)

Uploaded Source

gensim-3.4.0.win-amd64-py2.7.exe (22.8 MB view details)

Uploaded Source

gensim-3.4.0.win32-py3.6.exe (22.9 MB view details)

Uploaded Source

gensim-3.4.0.win32-py3.5.exe (22.9 MB view details)

Uploaded Source

gensim-3.4.0.win32-py2.7.exe (22.7 MB view details)

Uploaded Source

gensim-3.4.0-cp36-cp36m-win_amd64.whl (22.5 MB view details)

Uploaded CPython 3.6mWindows x86-64

gensim-3.4.0-cp36-cp36m-win32.whl (22.5 MB view details)

Uploaded CPython 3.6mWindows x86

gensim-3.4.0-cp36-cp36m-manylinux1_x86_64.whl (22.6 MB view details)

Uploaded CPython 3.6m

gensim-3.4.0-cp36-cp36m-manylinux1_i686.whl (22.5 MB view details)

Uploaded CPython 3.6m

gensim-3.4.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (22.8 MB view details)

Uploaded CPython 3.6mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

gensim-3.4.0-cp35-cp35m-win_amd64.whl (22.5 MB view details)

Uploaded CPython 3.5mWindows x86-64

gensim-3.4.0-cp35-cp35m-win32.whl (22.5 MB view details)

Uploaded CPython 3.5mWindows x86

gensim-3.4.0-cp35-cp35m-manylinux1_x86_64.whl (22.6 MB view details)

Uploaded CPython 3.5m

gensim-3.4.0-cp35-cp35m-manylinux1_i686.whl (22.5 MB view details)

Uploaded CPython 3.5m

gensim-3.4.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (22.8 MB view details)

Uploaded CPython 3.5mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

gensim-3.4.0-cp27-cp27mu-manylinux1_x86_64.whl (22.6 MB view details)

Uploaded CPython 2.7mu

gensim-3.4.0-cp27-cp27mu-manylinux1_i686.whl (22.5 MB view details)

Uploaded CPython 2.7mu

gensim-3.4.0-cp27-cp27m-win_amd64.whl (22.5 MB view details)

Uploaded CPython 2.7mWindows x86-64

gensim-3.4.0-cp27-cp27m-win32.whl (22.5 MB view details)

Uploaded CPython 2.7mWindows x86

gensim-3.4.0-cp27-cp27m-manylinux1_x86_64.whl (22.6 MB view details)

Uploaded CPython 2.7m

gensim-3.4.0-cp27-cp27m-manylinux1_i686.whl (22.5 MB view details)

Uploaded CPython 2.7m

gensim-3.4.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (22.9 MB view details)

Uploaded CPython 2.7mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

File details

Details for the file gensim-3.4.0.tar.gz.

File metadata

  • Download URL: gensim-3.4.0.tar.gz
  • Upload date:
  • Size: 22.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for gensim-3.4.0.tar.gz
Algorithm Hash digest
SHA256 05844c82c7c176449218fd3fc31e55e5d8b3fae460f261b11231f4c8ef2ed5e0
MD5 fa513b7ead371be6601d5ba4d4a2e9fd
BLAKE2b-256 04a682ee7b14c204b82ec00e91fc6b67331cc7b28460ad72b2214384abd0e0a3

See more details on using hashes here.

File details

Details for the file gensim-3.4.0.win-amd64-py3.6.exe.

File metadata

File hashes

Hashes for gensim-3.4.0.win-amd64-py3.6.exe
Algorithm Hash digest
SHA256 2611b550813cd2d6abf36840a2831df791c155c4140ae1e41d7a1ef48d7fb691
MD5 7b0856c670625e43c013da4a9781dcd7
BLAKE2b-256 c89ee61fde22eb303672dd6de0aee43becbcd4775cc2f28fff5bb1bcfd0ca437

See more details on using hashes here.

File details

Details for the file gensim-3.4.0.win-amd64-py3.5.exe.

File metadata

File hashes

Hashes for gensim-3.4.0.win-amd64-py3.5.exe
Algorithm Hash digest
SHA256 269504cd3b0cb2b18d421fdb641433e06b974f3d03561dd582ca696c0b63a6f6
MD5 6be0e1b7ee63313a057819ecea446e75
BLAKE2b-256 4bf5ab7434e62d76a437ac8d74d1fa207bf0ce829a655f426728997c58246a7f

See more details on using hashes here.

File details

Details for the file gensim-3.4.0.win-amd64-py2.7.exe.

File metadata

File hashes

Hashes for gensim-3.4.0.win-amd64-py2.7.exe
Algorithm Hash digest
SHA256 6f08251fd7daa07ef7e8e2f15fba403b089236ac2e61fcf51f5e6cec1815829c
MD5 65291bd93952678c930c1a5c511f57ed
BLAKE2b-256 41d9222a62e4a42aa6f004b2601ba38f6600d511a07567ce87f294e23ce60509

See more details on using hashes here.

File details

Details for the file gensim-3.4.0.win32-py3.6.exe.

File metadata

File hashes

Hashes for gensim-3.4.0.win32-py3.6.exe
Algorithm Hash digest
SHA256 d5dbfc6a10fcec9ef249569110b60ab1f374dad7e88e9a167ad87822371f3d59
MD5 c72fb4f66dcea689dfa3fbea4c4eab3b
BLAKE2b-256 a310d2dd8bc103f6ad03b6f0b88772c0e5c8cc4b1859eb247aef83c34317c6f7

See more details on using hashes here.

File details

Details for the file gensim-3.4.0.win32-py3.5.exe.

File metadata

File hashes

Hashes for gensim-3.4.0.win32-py3.5.exe
Algorithm Hash digest
SHA256 5f54fbb5bf14830a0fd2d5d3e554e8627667c3254d28ebd90256bd909b2f9de3
MD5 0037ecbebb6d288c94073b6adc8dfad5
BLAKE2b-256 7bad5f7332b7bf7425cb1491f248ea6945d63c968565857b02641b53c14dde38

See more details on using hashes here.

File details

Details for the file gensim-3.4.0.win32-py2.7.exe.

File metadata

File hashes

Hashes for gensim-3.4.0.win32-py2.7.exe
Algorithm Hash digest
SHA256 36dfb678674abdb3390cec8760307a9d038e3196d9dc4f85ad4431723c10654e
MD5 bce132ba67add369dca29fe4edcf468e
BLAKE2b-256 f5b41fdff5b7af49806289b4162e0b792388f4e9e20cf18969e938fd4e6f0a69

See more details on using hashes here.

File details

Details for the file gensim-3.4.0-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for gensim-3.4.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 cd8e1c062dea18032ec19be21e934f4b151e112ef90e269fbe3fea406c319724
MD5 d59daefcb02eee06a129ed004b8de448
BLAKE2b-256 ebb5e74d478d9e89528cc869c52a6d794f5a7dc5452585e23ad24db513636dc1

See more details on using hashes here.

File details

Details for the file gensim-3.4.0-cp36-cp36m-win32.whl.

File metadata

File hashes

Hashes for gensim-3.4.0-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 1f734990fcd472a048ed38696d9f5eeda4266ced0f4f2d7bbba0c18c601e441a
MD5 acf932760bc1c584b53b68eb112c10c0
BLAKE2b-256 b12fafde2f965dc9db54302626cf838f5874a4d2058dd69052a3c31cca9c3b34

See more details on using hashes here.

File details

Details for the file gensim-3.4.0-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.4.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 f6ef9459dc67cfe25976ac0d1c80b24613594c9db985310b6c2cb995955e2173
MD5 7a5efb1ef7e13ea77152df6957f484df
BLAKE2b-256 3333df6cb7acdcec5677ed130f4800f67509d24dbec74a03c329fcbf6b0864f0

See more details on using hashes here.

File details

Details for the file gensim-3.4.0-cp36-cp36m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for gensim-3.4.0-cp36-cp36m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 ac562d4a02a8fe2f3c6021acd21ae027fb8d347c7dfc4872149f77ada60616c2
MD5 d5729b7d5a993b3a1cdd5d9a009ec4bd
BLAKE2b-256 061c15bdfb819cd5c74a8002a1d24d87b3a52c8a51d4ebdad86dfffa7110b986

See more details on using hashes here.

File details

Details for the file gensim-3.4.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.4.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 7bafe3f2fd49738942ef04396cb1e50a38283fe02203e5d4c66588daa01fb87c
MD5 45420d13ceeda6b5823c960d76673eaa
BLAKE2b-256 82f2c2f2c87ed72483fce010fbfea1a3adbd168c0f0dafc878cbfb5a76381b03

See more details on using hashes here.

File details

Details for the file gensim-3.4.0-cp35-cp35m-win_amd64.whl.

File metadata

File hashes

Hashes for gensim-3.4.0-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 a7747ab74d50d9f573e42fe645ab76d61037bb14f256a141503903fb5d762ff4
MD5 4b01807d904eda259e7d89fcddd322b2
BLAKE2b-256 e8c8e2e6cb141aea53927aa1d554ab5919e202e61c2292df07c0c28d833dcf90

See more details on using hashes here.

File details

Details for the file gensim-3.4.0-cp35-cp35m-win32.whl.

File metadata

File hashes

Hashes for gensim-3.4.0-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 2418a7036a15a7e11e08f5018d9278e0a4db420b53a99fe1c4e64f683df97b4b
MD5 82dd80d363f8f414d771c9b1bcba31fd
BLAKE2b-256 c89ce443878846a15f996d8116325416374dda73a98fb5d9a658b9506e29e0c3

See more details on using hashes here.

File details

Details for the file gensim-3.4.0-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.4.0-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 f95ccb312977d2333c6dccd83644c4f89b471879ca7577e6e343f011d0d9ee16
MD5 41fe1b0bad5a5887d372e3ce2c1cc2a3
BLAKE2b-256 b8ed89f9fa9c3a290ebc454249df90891c804c3760cf054d54c5f701f2675122

See more details on using hashes here.

File details

Details for the file gensim-3.4.0-cp35-cp35m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for gensim-3.4.0-cp35-cp35m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 e98b493addf7ff10f33322d57dbe642a9bbdac005c9f1f1adb9c613a1b189a0f
MD5 b599b94b23069cb2d5a9219910fd1375
BLAKE2b-256 9192d51e8053d0a519b5b50a41287cebe74e7dc4b1ca57452395197d1995508a

See more details on using hashes here.

File details

Details for the file gensim-3.4.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.4.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 f27e7043656d3b7fdc8a6bf006e8a03883d92ce62ccfbf97aa55ad40fa017cba
MD5 0985c1e962b3e641ae7d9376fcd11049
BLAKE2b-256 3a82614507b02e49208759bd416af791c7992d3eda1de1b963f781cfa847eb66

See more details on using hashes here.

File details

Details for the file gensim-3.4.0-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.4.0-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 87c5975a605b881eda15ebc0cd7f60011b257d2af1cb04053dc096e2d2fa42da
MD5 4145f03c9057fd525c30a5b7ea683d94
BLAKE2b-256 c357dc00a059b1b739c71dd25355541ebe141ce1ba31917671c826c5fcdfd145

See more details on using hashes here.

File details

Details for the file gensim-3.4.0-cp27-cp27mu-manylinux1_i686.whl.

File metadata

File hashes

Hashes for gensim-3.4.0-cp27-cp27mu-manylinux1_i686.whl
Algorithm Hash digest
SHA256 47e436a06dbe6b56dd253df2e36654a24e39c6d65f78aff93efab478fac0b906
MD5 3681e2605ee7837ca1ef32a89fb38d5f
BLAKE2b-256 dae2c5b9eed2584a531cffd2bea1704ec5645b6ff762f936f98700d4669d12e9

See more details on using hashes here.

File details

Details for the file gensim-3.4.0-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for gensim-3.4.0-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 0f75a6e57ec93b9bc63b79d1cc8022993efa03b03ad50adad7a4fef78081e845
MD5 24a2e40c34c5331402bb2e6c5303fe9d
BLAKE2b-256 0ac3ea7223fd9b04f0183c6cc995435289e365024a2679eb997d7fa63b90258b

See more details on using hashes here.

File details

Details for the file gensim-3.4.0-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for gensim-3.4.0-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 cdc71e6d0d1f983366d2f81bdb363cc3d3018b9cedb0d7cae1fdf76b353aa1fd
MD5 5790a87aceae4b4d203371f23d4e821e
BLAKE2b-256 b96003af3884532064a7b3b96dd040e612c7394045fdcffad758fb0332d0419b

See more details on using hashes here.

File details

Details for the file gensim-3.4.0-cp27-cp27m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.4.0-cp27-cp27m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 8c3c9ed2c61c0b4fbd1ee1af45c3d9f9828213d4e8010bd40435b80ea25e7c95
MD5 0796f4b67a2dee606a8fc3c34c5eddc5
BLAKE2b-256 2e758dd6e923d4fef4fcd2c1c3056b7667c63cbdd958307db0db4c16f6d34cec

See more details on using hashes here.

File details

Details for the file gensim-3.4.0-cp27-cp27m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for gensim-3.4.0-cp27-cp27m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 645e77e37c5cd84b4e32b35de7a3c5b82e551f8f36e19087bfc4db6a97ce4838
MD5 10dde9aade30ab50ee9502dd5bb7d7b7
BLAKE2b-256 b73ba570daabe95e39868032a9cac3cd5b3827686e80313540a8e6e6a200c8e1

See more details on using hashes here.

File details

Details for the file gensim-3.4.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.4.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 a1b0cd76770cb4b1378b42b5c5f2013f82bcdaddbce404fb5454881644ff6072
MD5 b857dcdb601c34460b41c6573eb1fad1
BLAKE2b-256 48ad55c6a9cc78a73419be1f93ad4ad327d13263939b9e25cb5881cb138c7eaf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page