Skip to main content

Python framework for fast Vector Space Modelling

Project description

GA Wheel

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.

Features

  • All algorithms are memory-independent w.r.t. the corpus size (can process input larger than RAM, streamed, out-of-core)

  • Intuitive interfaces

    • easy to plug in your own input corpus/datastream (simple streaming API)

    • easy to extend with other Vector Space algorithms (simple transformation API)

  • Efficient multicore implementations of popular algorithms, such as online Latent Semantic Analysis (LSA/LSI/SVD), Latent Dirichlet Allocation (LDA), Random Projections (RP), Hierarchical Dirichlet Process (HDP) or word2vec deep learning.

  • Distributed computing: can run Latent Semantic Analysis and Latent Dirichlet Allocation on a cluster of computers.

  • Extensive documentation and Jupyter Notebook tutorials.

If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

Installation

This software depends on NumPy and Scipy, two Python packages for scientific computing. You must have them installed prior to installing gensim.

It is also recommended you install a fast BLAS library before installing NumPy. This is optional, but using an optimized BLAS such as MKL, ATLAS or OpenBLAS is known to improve performance by as much as an order of magnitude. On OSX, NumPy picks up its vecLib BLAS automatically, so you don’t need to do anything special.

Install the latest version of gensim:

pip install --upgrade gensim

Or, if you have instead downloaded and unzipped the source tar.gz package:

python setup.py install

For alternative modes of installation, see the documentation.

Gensim is being continuously tested under all supported Python versions. Support for Python 2.7 was dropped in gensim 4.0.0 – install gensim 3.8.3 if you must use Python 2.7.

How come gensim is so fast and memory efficient? Isn’t it pure Python, and isn’t Python slow and greedy?

Many scientific algorithms can be expressed in terms of large matrix operations (see the BLAS note above). Gensim taps into these low-level BLAS libraries, by means of its dependency on NumPy. So while gensim-the-top-level-code is pure Python, it actually executes highly optimized Fortran/C under the hood, including multithreading (if your BLAS is so configured).

Memory-wise, gensim makes heavy use of Python’s built-in generators and iterators for streamed data processing. Memory efficiency was one of gensim’s design goals, and is a central feature of gensim, rather than something bolted on as an afterthought.

Documentation

Citing gensim

When citing gensim in academic papers and theses, please use this BibTeX entry:

@inproceedings{rehurek_lrec,
      title = {{Software Framework for Topic Modelling with Large Corpora}},
      author = {Radim {\v R}eh{\r u}{\v r}ek and Petr Sojka},
      booktitle = {{Proceedings of the LREC 2010 Workshop on New
           Challenges for NLP Frameworks}},
      pages = {45--50},
      year = 2010,
      month = May,
      day = 22,
      publisher = {ELRA},
      address = {Valletta, Malta},
      language={English}
}

Gensim is open source software released under the GNU LGPLv2.1 license. Copyright (c) 2009-now Radim Rehurek

Analytics

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gensim-4.1.1.tar.gz (23.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

gensim-4.1.1-cp39-cp39-win_amd64.whl (24.0 MB view details)

Uploaded CPython 3.9Windows x86-64

gensim-4.1.1-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (24.0 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.12+ x86-64

gensim-4.1.1-cp39-cp39-macosx_10_9_x86_64.whl (24.0 MB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

gensim-4.1.1-cp38-cp38-win_amd64.whl (24.0 MB view details)

Uploaded CPython 3.8Windows x86-64

gensim-4.1.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (24.1 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.12+ x86-64

gensim-4.1.1-cp38-cp38-macosx_10_9_x86_64.whl (24.0 MB view details)

Uploaded CPython 3.8macOS 10.9+ x86-64

gensim-4.1.1-cp37-cp37m-win_amd64.whl (24.0 MB view details)

Uploaded CPython 3.7mWindows x86-64

gensim-4.1.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (24.1 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.12+ x86-64

gensim-4.1.1-cp37-cp37m-macosx_10_9_x86_64.whl (24.0 MB view details)

Uploaded CPython 3.7mmacOS 10.9+ x86-64

gensim-4.1.1-cp36-cp36m-win_amd64.whl (24.0 MB view details)

Uploaded CPython 3.6mWindows x86-64

gensim-4.1.1-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (24.1 MB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.12+ x86-64

gensim-4.1.1-cp36-cp36m-macosx_10_9_x86_64.whl (24.0 MB view details)

Uploaded CPython 3.6mmacOS 10.9+ x86-64

File details

Details for the file gensim-4.1.1.tar.gz.

File metadata

  • Download URL: gensim-4.1.1.tar.gz
  • Upload date:
  • Size: 23.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for gensim-4.1.1.tar.gz
Algorithm Hash digest
SHA256 7c762daa4029046dfbe467fdd79f911aa140748bf50dc64dbeddc8eaa07f760b
MD5 55193f0aea7b8cd4033206f0468506ff
BLAKE2b-256 955fe1de63570c0c74faeaaed614e57ecf01dab441b449eabc312c5eb876b24b

See more details on using hashes here.

File details

Details for the file gensim-4.1.1-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: gensim-4.1.1-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for gensim-4.1.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 195cb6913259a50f13ccb1d5c33eadb4ae6c8ea30b58e633f5506cc17f36cbe2
MD5 28ef63406473014e47e2f68e3b1ba2db
BLAKE2b-256 f5971edcd711a68e1e05d2574f73ceeebdda56079c966ad7c92d8eaf4e20bdae

See more details on using hashes here.

File details

Details for the file gensim-4.1.1-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.1.1-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 8378a3458523ff66a1862a672e1849396e77082d40a26ffd57627c8af3289cde
MD5 45c5271abb06f78a2f8a075b83b5f070
BLAKE2b-256 3b6e7fa0783d69de92c87381682b46ba55e1a3c7cbea218becefb656bc758f4b

See more details on using hashes here.

File details

Details for the file gensim-4.1.1-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: gensim-4.1.1-cp39-cp39-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.9, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for gensim-4.1.1-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 c3151364ebf64f332bbdde9ce31e9f52d20729934dd348343596b707057a80ed
MD5 e8b2a6559a920898cc1cdb7facd5cd25
BLAKE2b-256 b201ec783ae358e17f3dc0c3034bb90bfaf678b3883211c44fa57c8c6af605a2

See more details on using hashes here.

File details

Details for the file gensim-4.1.1-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: gensim-4.1.1-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for gensim-4.1.1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 27eedc9995e61d66e4b53436b520aa212a81a088d40ffe03f8ebabb6f51d7c93
MD5 c6e6face3b66f92716762a2e34c1a7f9
BLAKE2b-256 1fbd4af2f6da5e0eb73b46dd70350e45b6e720c26bf48bcbc431d3841e95bb43

See more details on using hashes here.

File details

Details for the file gensim-4.1.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.1.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 6a1e45fa3a890d138bae69ba64c3fa436570a948e0b84a39400ef172fe00b12d
MD5 b1d3e70d0d76ca23dd97ac9e62832d60
BLAKE2b-256 ee32a4ac34fe4e80ab5107e674f01dbee60740ed2710b387f96afe72ef52c040

See more details on using hashes here.

File details

Details for the file gensim-4.1.1-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: gensim-4.1.1-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for gensim-4.1.1-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 0fcb71db327baed24620597f1574b68c91c3ebb1157055bd5f2912f480771211
MD5 9bebdb615e3ffbca9f3c773a9d0fe4cb
BLAKE2b-256 0ba71a4e02ac0ce0ebf1b4cb11081db63db8c6f43d6bdeabae3046b8948f324e

See more details on using hashes here.

File details

Details for the file gensim-4.1.1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: gensim-4.1.1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for gensim-4.1.1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 1d4c0ddb3ddd2783773f50383fe19fcce6a900529709e22fa903e77745592c71
MD5 f44fa83506614729550c0cb58193f7b8
BLAKE2b-256 8037040f54495a64cc2e1a3a04979aa02b86d786f1dff6108cf7e511aacea76a

See more details on using hashes here.

File details

Details for the file gensim-4.1.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.1.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 da7150e2151d44dc5246de8dedd16e7bd1c33d0ab1ff98ef4eab812910641d2e
MD5 d875338cf7ba9291a89fd996bf7ec638
BLAKE2b-256 957401e134bfbaee68a49820e65efbefda3dd9a967c45302bbca8729fd0a587f

See more details on using hashes here.

File details

Details for the file gensim-4.1.1-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: gensim-4.1.1-cp37-cp37m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.7m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for gensim-4.1.1-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 60ee925321d57725de05e52ba83df9df7428641ca8b10a9b8a33f92fd3059066
MD5 b74ae8000e50781be89482d9e4ca9547
BLAKE2b-256 0febf9aa66454f6a310f86c595829adb4d43a22aff90b45a00d86f0f117210fd

See more details on using hashes here.

File details

Details for the file gensim-4.1.1-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: gensim-4.1.1-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for gensim-4.1.1-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 f7c82599b4e68180beed50cb13aecd656cdc1814d04a49eaf0937ad9d3383367
MD5 d6cfffee6a645b72d6efc85545d0f5e1
BLAKE2b-256 474d680f36c8a2c75ca1ec5fd77fa52f042e972ba7d4cef4b7077a161d00a086

See more details on using hashes here.

File details

Details for the file gensim-4.1.1-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.1.1-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 605f44dba281de2c752d891bf402bddda173fec195f96e15b571ebe2f1be463f
MD5 54ee6cb474780c902e03e6c18f9b0bbe
BLAKE2b-256 ccc0b959aa7b7f8d9c362aaa906ce2c602be6f692996ca7c586964a1a391a20c

See more details on using hashes here.

File details

Details for the file gensim-4.1.1-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: gensim-4.1.1-cp36-cp36m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.6m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for gensim-4.1.1-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 78cc34a3bad16e1aeaa39056cab7bcc80fa478d5580212567feaf0a43ba2ad7d
MD5 da6787d1dc99be5c3769aca7bfcc391b
BLAKE2b-256 3e4a78852d51576c24b5c281a7830579ee12f99984d914604af71b53685a1471

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page