Skip to main content

Python framework for fast Vector Space Modelling

Project description

Travis Wheel

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.

Features

  • All algorithms are memory-independent w.r.t. the corpus size (can process input larger than RAM, streamed, out-of-core),

  • Intuitive interfaces

    • easy to plug in your own input corpus/datastream (trivial streaming API)

    • easy to extend with other Vector Space algorithms (trivial transformation API)

  • Efficient multicore implementations of popular algorithms, such as online Latent Semantic Analysis (LSA/LSI/SVD), Latent Dirichlet Allocation (LDA), Random Projections (RP), Hierarchical Dirichlet Process (HDP) or word2vec deep learning.

  • Distributed computing: can run Latent Semantic Analysis and Latent Dirichlet Allocation on a cluster of computers.

  • Extensive documentation and Jupyter Notebook tutorials.

If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

Installation

This software depends on NumPy and Scipy, two Python packages for scientific computing. You must have them installed prior to installing gensim.

It is also recommended you install a fast BLAS library before installing NumPy. This is optional, but using an optimized BLAS such as ATLAS or OpenBLAS is known to improve performance by as much as an order of magnitude. On OS X, NumPy picks up the BLAS that comes with it automatically, so you don’t need to do anything special.

The simple way to install gensim is:

pip install -U gensim

Or, if you have instead downloaded and unzipped the source tar.gz package, you’d run:

python setup.py test
python setup.py install

For alternative modes of installation (without root privileges, development installation, optional install features), see the install documentation.

This version has been tested under Python 2.7, 3.5 and 3.6. Support for Python 2.6, 3.3 and 3.4 was dropped in gensim 1.0.0. Install gensim 0.13.4 if you must use Python 2.6, 3.3 or 3.4. Support for Python 2.5 was dropped in gensim 0.10.0; install gensim 0.9.1 if you must use Python 2.5). Gensim’s github repo is hooked against Travis CI for automated testing on every commit push and pull request.

How come gensim is so fast and memory efficient? Isn’t it pure Python, and isn’t Python slow and greedy?

Many scientific algorithms can be expressed in terms of large matrix operations (see the BLAS note above). Gensim taps into these low-level BLAS libraries, by means of its dependency on NumPy. So while gensim-the-top-level-code is pure Python, it actually executes highly optimized Fortran/C under the hood, including multithreading (if your BLAS is so configured).

Memory-wise, gensim makes heavy use of Python’s built-in generators and iterators for streamed data processing. Memory efficiency was one of gensim’s design goals, and is a central feature of gensim, rather than something bolted on as an afterthought.

Documentation

Citing gensim

When citing gensim in academic papers and theses, please use this BibTeX entry:

@inproceedings{rehurek_lrec,
      title = {{Software Framework for Topic Modelling with Large Corpora}},
      author = {{Radim Rehurek and Petr Sojka}},
      booktitle = {{Proceedings of the LREC 2010 Workshop on New
           Challenges for NLP Frameworks}},
      pages = {45--50},
      year = 2010,
      month = May,
      day = 22,
      publisher = {ELRA},
      address = {Valletta, Malta},
      language={English}
}

Gensim is open source software released under the GNU LGPLv2.1 license. Copyright (c) 2009-now Radim Rehurek

Analytics

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gensim-2.2.0.tar.gz (16.6 MB view details)

Uploaded Source

Built Distributions

gensim-2.2.0.win-amd64-py3.6.exe (6.9 MB view details)

Uploaded Source

gensim-2.2.0.win-amd64-py3.5.exe (6.9 MB view details)

Uploaded Source

gensim-2.2.0.win-amd64-py2.7.exe (6.5 MB view details)

Uploaded Source

gensim-2.2.0.win32-py3.6.exe (6.8 MB view details)

Uploaded Source

gensim-2.2.0.win32-py3.5.exe (6.8 MB view details)

Uploaded Source

gensim-2.2.0.win32-py2.7.exe (6.5 MB view details)

Uploaded Source

gensim-2.2.0-cp36-cp36m-win_amd64.whl (6.3 MB view details)

Uploaded CPython 3.6m Windows x86-64

gensim-2.2.0-cp36-cp36m-win32.whl (6.3 MB view details)

Uploaded CPython 3.6m Windows x86

gensim-2.2.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.6m macOS 10.10+ Intel (x86-64, i386) macOS 10.10+ x86-64 macOS 10.6+ Intel (x86-64, i386) macOS 10.9+ Intel (x86-64, i386) macOS 10.9+ x86-64

gensim-2.2.0-cp35-cp35m-win_amd64.whl (6.3 MB view details)

Uploaded CPython 3.5m Windows x86-64

gensim-2.2.0-cp35-cp35m-win32.whl (6.3 MB view details)

Uploaded CPython 3.5m Windows x86

gensim-2.2.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.5m macOS 10.10+ Intel (x86-64, i386) macOS 10.10+ x86-64 macOS 10.6+ Intel (x86-64, i386) macOS 10.9+ Intel (x86-64, i386) macOS 10.9+ x86-64

gensim-2.2.0-cp27-cp27m-win_amd64.whl (6.3 MB view details)

Uploaded CPython 2.7m Windows x86-64

gensim-2.2.0-cp27-cp27m-win32.whl (6.3 MB view details)

Uploaded CPython 2.7m Windows x86

gensim-2.2.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (6.4 MB view details)

Uploaded CPython 2.7m macOS 10.10+ Intel (x86-64, i386) macOS 10.10+ x86-64 macOS 10.6+ Intel (x86-64, i386) macOS 10.9+ Intel (x86-64, i386) macOS 10.9+ x86-64

File details

Details for the file gensim-2.2.0.tar.gz.

File metadata

  • Download URL: gensim-2.2.0.tar.gz
  • Upload date:
  • Size: 16.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for gensim-2.2.0.tar.gz
Algorithm Hash digest
SHA256 eb099de1e50447c42e168a1a99de4721923688afc71b12fe522f79687a4fbb13
MD5 5655142e7df7f33fe1f484df82c35d64
BLAKE2b-256 fe6681322796cb00b5ce9ad7e12a89f57879125c5c64539dbb84c413a3c518ea

See more details on using hashes here.

File details

Details for the file gensim-2.2.0.win-amd64-py3.6.exe.

File metadata

File hashes

Hashes for gensim-2.2.0.win-amd64-py3.6.exe
Algorithm Hash digest
SHA256 70dd33066e28a210cbc65181f55a73d1148209aec92616d63cfa857fabc31e19
MD5 7817878687e7874265ddaf6fd378969e
BLAKE2b-256 9ec22f1996296909ddbfb3df317a9c1c740acb9d35f1270803a3cc3f7abe90a4

See more details on using hashes here.

File details

Details for the file gensim-2.2.0.win-amd64-py3.5.exe.

File metadata

File hashes

Hashes for gensim-2.2.0.win-amd64-py3.5.exe
Algorithm Hash digest
SHA256 ce6684399576e838919d86ae555b42fbf2ba4c3e836c70c06001bc4c35e22ff3
MD5 66471d032ea95a18f1567d6bebbcd44a
BLAKE2b-256 92712580fdace84b41803bd285493e84ada2c64faae0b67a403a958de30dca62

See more details on using hashes here.

File details

Details for the file gensim-2.2.0.win-amd64-py2.7.exe.

File metadata

File hashes

Hashes for gensim-2.2.0.win-amd64-py2.7.exe
Algorithm Hash digest
SHA256 c73d9349c0b287f5066f863538622428a18381b2d86616f9e49dc9a8fdff1276
MD5 059f816b6e2c26a49f782b12276d62f5
BLAKE2b-256 5082fde27629a8a58c8e60dd48fccec69f5bc2e2268cabbf46e8437c4dfdf68d

See more details on using hashes here.

File details

Details for the file gensim-2.2.0.win32-py3.6.exe.

File metadata

File hashes

Hashes for gensim-2.2.0.win32-py3.6.exe
Algorithm Hash digest
SHA256 734149f566a6d01cf6db106192f104ad9ae9fd9a93d604608471f78275ab94d0
MD5 58ffd811397d6ba86468dd55371ef877
BLAKE2b-256 94896889ec3816fd5fd99b4c37f3b0a7f1dd68a72b0365a5ee565a9e860fe6b2

See more details on using hashes here.

File details

Details for the file gensim-2.2.0.win32-py3.5.exe.

File metadata

File hashes

Hashes for gensim-2.2.0.win32-py3.5.exe
Algorithm Hash digest
SHA256 5505640401cc672b6faf0497347ae49cc992362d9d682a649af8498c81d23e71
MD5 c81ea36c16605c7f529a1c0889a26000
BLAKE2b-256 6f0eb7d5279f69fe0b0acf05dcdbad370ca22bc1ee237b0bae4d1192405ed34c

See more details on using hashes here.

File details

Details for the file gensim-2.2.0.win32-py2.7.exe.

File metadata

File hashes

Hashes for gensim-2.2.0.win32-py2.7.exe
Algorithm Hash digest
SHA256 d6ae253d5cdc00f4ef8d20490c43d853dbf663c366c6bf1ea633090b6ea979d2
MD5 32e92d405cb98296c86f9dd002db9ed9
BLAKE2b-256 46bfc76baf1b34120a11f89b846aa1fd75cb115c656e0a280366d7f910529bf7

See more details on using hashes here.

File details

Details for the file gensim-2.2.0-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for gensim-2.2.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 eb7738c0c43283160039e74ad3ae9dc50f5117853088e165836d21a6714d7861
MD5 0f57ccc7be46740992e39483cc4a09b7
BLAKE2b-256 3ef3d5557ae01fa5eebb8a0c22070005cceaad206a237780d1be60004a8d3736

See more details on using hashes here.

File details

Details for the file gensim-2.2.0-cp36-cp36m-win32.whl.

File metadata

File hashes

Hashes for gensim-2.2.0-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 a73c1ba6e2f41c4735d3b7def7279295f219db52f0a4079eded5918313fa7660
MD5 103ea58e1986e716e5a0d269e6fe7750
BLAKE2b-256 5a7a65aa1459926828b4b46a643ba846c84937ec0ed847dcef5e0d0362db0de4

See more details on using hashes here.

File details

Details for the file gensim-2.2.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-2.2.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 3052f3c85bb263c35bb839839591de7e955b39e772d58773a595f28bd44f30ab
MD5 998dd48d8b1a31f3b8befe3ca91d4da0
BLAKE2b-256 7594b5125b2de45ec2322467571f7016eab290a22ce8e17e124e960e654812c3

See more details on using hashes here.

File details

Details for the file gensim-2.2.0-cp35-cp35m-win_amd64.whl.

File metadata

File hashes

Hashes for gensim-2.2.0-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 756b0df5815b77cedb636d233f74c4e86aab230cf2cd0ad2368d55180a07ba7c
MD5 4e1a402303fb2278de248a7bbebc253a
BLAKE2b-256 f283b1e1d7e85e38a02862fb238a7658850509bbb79ced3d404a870388d2454a

See more details on using hashes here.

File details

Details for the file gensim-2.2.0-cp35-cp35m-win32.whl.

File metadata

File hashes

Hashes for gensim-2.2.0-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 150223be99066cacca75abb5a5436df3dd45d5514e157e058e1a4fbc1649ac31
MD5 ba65bcdea513a153ef006688e6fa77fd
BLAKE2b-256 ade1654584ab31fa8b6e952b64b37fdde15d47eb743ef74674d4a9f37ff4b614

See more details on using hashes here.

File details

Details for the file gensim-2.2.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-2.2.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 fe67236fa7054849c7e9e55a2482abf63a5e86b6a136388dda67825c20bffac2
MD5 dddee0da38009c9faaab6b56fe0140e4
BLAKE2b-256 7d47e7165365abfcbea2da9295307ec966069be77f37fa35daf55001a642cc1c

See more details on using hashes here.

File details

Details for the file gensim-2.2.0-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for gensim-2.2.0-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 7686ecf30ac6ebc70de89baa63c664a6fadfef8675b9bb8470d25ed37a71672d
MD5 acbe33f16b35cd43bc91927d672900cd
BLAKE2b-256 9c08ba13c24a757efec44b2d5c259c8eec7b0834307430fceb687a0ebe7e1795

See more details on using hashes here.

File details

Details for the file gensim-2.2.0-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for gensim-2.2.0-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 494ff0bebe7e5ac1965aa1d1166b950feafb60d48b310db0b8226cd5bb7deeeb
MD5 2ac2e55b5bbfe760286dd4cb6153000c
BLAKE2b-256 bd074683df6d88a2e8daf474aec4be641fbfcded955545f49a8a2e969f91593f

See more details on using hashes here.

File details

Details for the file gensim-2.2.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-2.2.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 d183c7ea23868258c0e7360291b3fd84370365c2c4bff021247576d6a97f0b0f
MD5 735d26cf4cbc592f2d4867d890afea5f
BLAKE2b-256 0bf92920578457901bda2edb0254a2fd5fcfb5ff7360c5631abe18b2eb24d290

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page