Skip to main content

Python framework for fast Vector Space Modelling

Project description

Travis Wheel

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.

Features

  • All algorithms are memory-independent w.r.t. the corpus size (can process input larger than RAM, streamed, out-of-core)

  • Intuitive interfaces

    • easy to plug in your own input corpus/datastream (simple streaming API)

    • easy to extend with other Vector Space algorithms (simple transformation API)

  • Efficient multicore implementations of popular algorithms, such as online Latent Semantic Analysis (LSA/LSI/SVD), Latent Dirichlet Allocation (LDA), Random Projections (RP), Hierarchical Dirichlet Process (HDP) or word2vec deep learning.

  • Distributed computing: can run Latent Semantic Analysis and Latent Dirichlet Allocation on a cluster of computers.

  • Extensive documentation and Jupyter Notebook tutorials.

If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

Installation

This software depends on NumPy and Scipy, two Python packages for scientific computing. You must have them installed prior to installing gensim.

It is also recommended you install a fast BLAS library before installing NumPy. This is optional, but using an optimized BLAS such as ATLAS or OpenBLAS is known to improve performance by as much as an order of magnitude. On OS X, NumPy picks up the BLAS that comes with it automatically, so you don’t need to do anything special.

Install the latest version of gensim:

pip install --upgrade gensim

Or, if you have instead downloaded and unzipped the source tar.gz package:

python setup.py install

For alternative modes of installation, see the documentation.

Gensim is being continuously tested under Python 3.6, 3.7 and 3.8. Support for Python 2.7 was dropped in gensim 4.0.0 – install gensim 3.8.3 if you must use Python 2.7.

How come gensim is so fast and memory efficient? Isn’t it pure Python, and isn’t Python slow and greedy?

Many scientific algorithms can be expressed in terms of large matrix operations (see the BLAS note above). Gensim taps into these low-level BLAS libraries, by means of its dependency on NumPy. So while gensim-the-top-level-code is pure Python, it actually executes highly optimized Fortran/C under the hood, including multithreading (if your BLAS is so configured).

Memory-wise, gensim makes heavy use of Python’s built-in generators and iterators for streamed data processing. Memory efficiency was one of gensim’s design goals, and is a central feature of gensim, rather than something bolted on as an afterthought.

Documentation

Citing gensim

When citing gensim in academic papers and theses, please use this BibTeX entry:

@inproceedings{rehurek_lrec,
      title = {{Software Framework for Topic Modelling with Large Corpora}},
      author = {Radim {\v R}eh{\r u}{\v r}ek and Petr Sojka},
      booktitle = {{Proceedings of the LREC 2010 Workshop on New
           Challenges for NLP Frameworks}},
      pages = {45--50},
      year = 2010,
      month = May,
      day = 22,
      publisher = {ELRA},
      address = {Valletta, Malta},
      language={English}
}

Gensim is open source software released under the GNU LGPLv2.1 license. Copyright (c) 2009-now Radim Rehurek

Analytics

Project details


Release history Release notifications | RSS feed

This version

4.0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gensim-4.0.1.tar.gz (23.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

gensim-4.0.1-cp38-cp38-win_amd64.whl (23.9 MB view details)

Uploaded CPython 3.8Windows x86-64

gensim-4.0.1-cp38-cp38-manylinux1_x86_64.whl (23.9 MB view details)

Uploaded CPython 3.8

gensim-4.0.1-cp38-cp38-macosx_10_9_x86_64.whl (23.9 MB view details)

Uploaded CPython 3.8macOS 10.9+ x86-64

gensim-4.0.1-cp37-cp37m-win_amd64.whl (23.9 MB view details)

Uploaded CPython 3.7mWindows x86-64

gensim-4.0.1-cp37-cp37m-manylinux1_x86_64.whl (23.9 MB view details)

Uploaded CPython 3.7m

gensim-4.0.1-cp37-cp37m-macosx_10_9_x86_64.whl (23.9 MB view details)

Uploaded CPython 3.7mmacOS 10.9+ x86-64

gensim-4.0.1-cp36-cp36m-win_amd64.whl (23.9 MB view details)

Uploaded CPython 3.6mWindows x86-64

gensim-4.0.1-cp36-cp36m-manylinux1_x86_64.whl (23.9 MB view details)

Uploaded CPython 3.6m

gensim-4.0.1-cp36-cp36m-macosx_10_9_x86_64.whl (23.9 MB view details)

Uploaded CPython 3.6mmacOS 10.9+ x86-64

File details

Details for the file gensim-4.0.1.tar.gz.

File metadata

  • Download URL: gensim-4.0.1.tar.gz
  • Upload date:
  • Size: 23.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.7

File hashes

Hashes for gensim-4.0.1.tar.gz
Algorithm Hash digest
SHA256 b4d0b9562796968684028e06635e0f7aff39ffb33719057fd1667754ea09a6e4
MD5 cca4569aa1a16d41abca448f4ae79ba8
BLAKE2b-256 1f6c363d00aa23642f42b27b908c6474ab981c75882eefc084210d5b8ce8cd8e

See more details on using hashes here.

File details

Details for the file gensim-4.0.1-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: gensim-4.0.1-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 23.9 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.7

File hashes

Hashes for gensim-4.0.1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 c748b77082125ada8ccbf7b1de9e60b7c726b5371a17f207234fee66fb32da05
MD5 19a2dd155d0304d3d3b84f3511573bae
BLAKE2b-256 63d0d46407c21c8f22ed2a654e0b9cdb7d6db803f03cfe92a2b696a633570b22

See more details on using hashes here.

File details

Details for the file gensim-4.0.1-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: gensim-4.0.1-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 23.9 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.7

File hashes

Hashes for gensim-4.0.1-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 5c0cb64d8c6df7e61497cc9d23fdbf552eb18d6d39341271b647d2e84d970e00
MD5 d1b4ee31e100c76d98685b71831ce1f8
BLAKE2b-256 cf2a32fc28ec0f0b58589cb899cb8617d2a124bd31305e8dfd77045e952f0636

See more details on using hashes here.

File details

Details for the file gensim-4.0.1-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: gensim-4.0.1-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 23.9 MB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.7

File hashes

Hashes for gensim-4.0.1-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 a78e9f51ec87930a7a621c6536c20aac41d35aea52f6b0eae165254dc908ff1b
MD5 c0e35a4f652ae75852bcffc1b8fd5f82
BLAKE2b-256 95020ab2d25a8d65e04f7beff899c401bea3ffc45f9f0d3cd488e8c76aa255c1

See more details on using hashes here.

File details

Details for the file gensim-4.0.1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: gensim-4.0.1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 23.9 MB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.7

File hashes

Hashes for gensim-4.0.1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 3691ddda33113a713453a9601ae2b21d297a5c197073eef641ae15d6968a92c1
MD5 0ee188585b02b22bce93ae96a92dfcd3
BLAKE2b-256 1589b785d557e3c806abc8beaae664571d71e8c4eb736a2c32b69aba9932cbd1

See more details on using hashes here.

File details

Details for the file gensim-4.0.1-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: gensim-4.0.1-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 23.9 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.7

File hashes

Hashes for gensim-4.0.1-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 f1200f3cc6449fc4d90c052911cee8876178adc4afbf756a77207b8ee5fb3967
MD5 ff9891f95e4fc0bd996d922ef990ec8c
BLAKE2b-256 4452f1417772965652d4ca6f901515debcd9d6c5430969e8c02ee7737e6de61c

See more details on using hashes here.

File details

Details for the file gensim-4.0.1-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: gensim-4.0.1-cp37-cp37m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 23.9 MB
  • Tags: CPython 3.7m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.7

File hashes

Hashes for gensim-4.0.1-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 ac146eadb15c197eed99280d91ddaed44f3bac099cb32f790476922b203deefd
MD5 6be029863bbe675ef85dda12f79817cd
BLAKE2b-256 5374b43358520b6bc2f5175bd648eba27f78808ed1f5d0854f38c8c17d3261b6

See more details on using hashes here.

File details

Details for the file gensim-4.0.1-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: gensim-4.0.1-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 23.9 MB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.7

File hashes

Hashes for gensim-4.0.1-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 fd23b1e19bf0cfaf37ce635b5a91d8ae4e3be7886e04bfb57e61d044995c6ed8
MD5 8a3447bae3a79a1f78003f0b2bfe6952
BLAKE2b-256 1d9a071a3172bc3383e3132b8fd009d82bc96d4718043f2b654bbb2c65640094

See more details on using hashes here.

File details

Details for the file gensim-4.0.1-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: gensim-4.0.1-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 23.9 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.7

File hashes

Hashes for gensim-4.0.1-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 4cb1761a76338abd6c9345cfacdb75173961e238e01258e0c0e599341d397016
MD5 567a6ad53754f20e02394507021ff47c
BLAKE2b-256 1692787d4c9050d4669f9103d37081b34b06c277b3997f440e53a80ef8128082

See more details on using hashes here.

File details

Details for the file gensim-4.0.1-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: gensim-4.0.1-cp36-cp36m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 23.9 MB
  • Tags: CPython 3.6m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.7

File hashes

Hashes for gensim-4.0.1-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 17375b33685d0b36646ff6cec5c9e54c7384d3a7c583cbb4480a8ceb9d811221
MD5 d19c1c2aa75f36c45601e8de256b26f9
BLAKE2b-256 eb5a1574985e83d270b26b29a259675829c6ad5b966cdf7093187d2850cec0a3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page