Skip to main content

Python framework for fast Vector Space Modelling

Project description

GA Wheel

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.

Features

  • All algorithms are memory-independent w.r.t. the corpus size (can process input larger than RAM, streamed, out-of-core)

  • Intuitive interfaces

    • easy to plug in your own input corpus/datastream (simple streaming API)

    • easy to extend with other Vector Space algorithms (simple transformation API)

  • Efficient multicore implementations of popular algorithms, such as online Latent Semantic Analysis (LSA/LSI/SVD), Latent Dirichlet Allocation (LDA), Random Projections (RP), Hierarchical Dirichlet Process (HDP) or word2vec deep learning.

  • Distributed computing: can run Latent Semantic Analysis and Latent Dirichlet Allocation on a cluster of computers.

  • Extensive documentation and Jupyter Notebook tutorials.

If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

Installation

This software depends on NumPy and Scipy, two Python packages for scientific computing. You must have them installed prior to installing gensim.

It is also recommended you install a fast BLAS library before installing NumPy. This is optional, but using an optimized BLAS such as MKL, ATLAS or OpenBLAS is known to improve performance by as much as an order of magnitude. On OSX, NumPy picks up its vecLib BLAS automatically, so you don’t need to do anything special.

Install the latest version of gensim:

pip install --upgrade gensim

Or, if you have instead downloaded and unzipped the source tar.gz package:

python setup.py install

For alternative modes of installation, see the documentation.

Gensim is being continuously tested under all supported Python versions. Support for Python 2.7 was dropped in gensim 4.0.0 – install gensim 3.8.3 if you must use Python 2.7.

How come gensim is so fast and memory efficient? Isn’t it pure Python, and isn’t Python slow and greedy?

Many scientific algorithms can be expressed in terms of large matrix operations (see the BLAS note above). Gensim taps into these low-level BLAS libraries, by means of its dependency on NumPy. So while gensim-the-top-level-code is pure Python, it actually executes highly optimized Fortran/C under the hood, including multithreading (if your BLAS is so configured).

Memory-wise, gensim makes heavy use of Python’s built-in generators and iterators for streamed data processing. Memory efficiency was one of gensim’s design goals, and is a central feature of gensim, rather than something bolted on as an afterthought.

Documentation

Citing gensim

When citing gensim in academic papers and theses, please use this BibTeX entry:

@inproceedings{rehurek_lrec,
      title = {{Software Framework for Topic Modelling with Large Corpora}},
      author = {Radim {\v R}eh{\r u}{\v r}ek and Petr Sojka},
      booktitle = {{Proceedings of the LREC 2010 Workshop on New
           Challenges for NLP Frameworks}},
      pages = {45--50},
      year = 2010,
      month = May,
      day = 22,
      publisher = {ELRA},
      address = {Valletta, Malta},
      language={English}
}

Gensim is open source software released under the GNU LGPLv2.1 license. Copyright (c) 2009-now Radim Rehurek

Analytics

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gensim-4.1.2.tar.gz (23.2 MB view details)

Uploaded Source

Built Distributions

gensim-4.1.2-cp39-cp39-win_amd64.whl (24.0 MB view details)

Uploaded CPython 3.9 Windows x86-64

gensim-4.1.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (24.0 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

gensim-4.1.2-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (24.0 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

gensim-4.1.2-cp39-cp39-macosx_10_9_x86_64.whl (24.0 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

gensim-4.1.2-cp38-cp38-win_amd64.whl (24.0 MB view details)

Uploaded CPython 3.8 Windows x86-64

gensim-4.1.2-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (24.0 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

gensim-4.1.2-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (24.1 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

gensim-4.1.2-cp38-cp38-macosx_10_9_x86_64.whl (24.0 MB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

gensim-4.1.2-cp37-cp37m-win_amd64.whl (24.0 MB view details)

Uploaded CPython 3.7m Windows x86-64

gensim-4.1.2-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (24.0 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

gensim-4.1.2-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (24.1 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

gensim-4.1.2-cp37-cp37m-macosx_10_9_x86_64.whl (24.0 MB view details)

Uploaded CPython 3.7m macOS 10.9+ x86-64

gensim-4.1.2-cp36-cp36m-win_amd64.whl (24.0 MB view details)

Uploaded CPython 3.6m Windows x86-64

gensim-4.1.2-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (24.0 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ ARM64

gensim-4.1.2-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (24.1 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

gensim-4.1.2-cp36-cp36m-macosx_10_9_x86_64.whl (24.0 MB view details)

Uploaded CPython 3.6m macOS 10.9+ x86-64

File details

Details for the file gensim-4.1.2.tar.gz.

File metadata

  • Download URL: gensim-4.1.2.tar.gz
  • Upload date:
  • Size: 23.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for gensim-4.1.2.tar.gz
Algorithm Hash digest
SHA256 1932c257de4eccbb64cc40d46e8577a25f5f47b94b96019a969fb36150f11d15
MD5 cb32008b23b8b68586cc59f44838bb5f
BLAKE2b-256 4b6d22a9a2b934344fbf25ab2613543eeeb724ffd3ba7376e8fed88aabe885c8

See more details on using hashes here.

File details

Details for the file gensim-4.1.2-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: gensim-4.1.2-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for gensim-4.1.2-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 39139be83c3128e234216189a094f959ac2b052a808911b0b56d980d5f96981f
MD5 5de482380da955fbdc0b9769066eadbf
BLAKE2b-256 0926a1b10ea80b4536b6a5d8b20c5ebd979b5e12cb30a7905d56d3f278444c2e

See more details on using hashes here.

File details

Details for the file gensim-4.1.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for gensim-4.1.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ea47999c7da97472fce8f0831a63e4089d85539c8e0cdb895f087aea1eed4a3b
MD5 a029a0c95b00aae17f756a03875c380d
BLAKE2b-256 0666e875156aca2edf0416a8739894dc97b05429ebfa4ada934774361fbf25c7

See more details on using hashes here.

File details

Details for the file gensim-4.1.2-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.1.2-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 8c6a4b271f4d554fdf14b9cb34d4da6cde7084f7f581c5c6dd5fcac648db35be
MD5 b579d234fefe158d232c6a97b386b49e
BLAKE2b-256 61e8ddf62a31b4f97f543a38233047865d02be97c192f7f8d849bbf3353bc094

See more details on using hashes here.

File details

Details for the file gensim-4.1.2-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: gensim-4.1.2-cp39-cp39-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.9, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for gensim-4.1.2-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 7bbc3d6c80c9fd97b89dfee2f44562b75542f72141f5fbacb91334597485f55c
MD5 27cefb584a19dcdb3f1d4cb099ce7d2d
BLAKE2b-256 48223861fdf9834f39b5fb52148dc402105ef9645c2fc63c85609e489360d5cf

See more details on using hashes here.

File details

Details for the file gensim-4.1.2-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: gensim-4.1.2-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for gensim-4.1.2-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 66a9574f9f2bbf8fd8e6d7a120443793b96bfd4c153b41f266b6299aa3362de7
MD5 930ba8e313ba62f7f208404628f50315
BLAKE2b-256 9d38a80659e573171413e463bd36c14b6925dbbe73aeed0e4223bf8e0e058031

See more details on using hashes here.

File details

Details for the file gensim-4.1.2-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for gensim-4.1.2-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d4b4ca5d1408e2d89e0ac45cd8a432abf747d5b62eea68e6dccacefa03d759c9
MD5 d9d7d4f014daa3b1b68b3211f9139c7a
BLAKE2b-256 64dff546228e18d5ac4cb2de4ed441d778162d598a4ac090deb6fd6cb254998f

See more details on using hashes here.

File details

Details for the file gensim-4.1.2-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.1.2-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 f6133b0f76d0c262231465936cded8920df88edf079df1e7bfe95f049ad8301e
MD5 db598ad0172d56b0951e4766f4bdb0e6
BLAKE2b-256 974fcd282d20e799011dec50c3da1cd645bd5d83c819c7f4ed718e916b3af127

See more details on using hashes here.

File details

Details for the file gensim-4.1.2-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: gensim-4.1.2-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for gensim-4.1.2-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 58d9ab570b225f3aafec55286864560a25701f7446af9dbc0ad51aa5f61712fa
MD5 57fcbcaa6d8ad68d826f476ccd4a803e
BLAKE2b-256 f6fef2878be55b59c35470901bce5598dd2acf243e47ed9488b16ad16a678b46

See more details on using hashes here.

File details

Details for the file gensim-4.1.2-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: gensim-4.1.2-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for gensim-4.1.2-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 36222dbf89aa57909131fc79654e92c918e1075b9ebd00532c3d23b76b6ce8eb
MD5 f8215a91e62b033049faf997277dd3a7
BLAKE2b-256 ef92325846f87c2ef1d6e5a6d54a55340877312b3afc4f23775448e3f97f3901

See more details on using hashes here.

File details

Details for the file gensim-4.1.2-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for gensim-4.1.2-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8d0bf4074ff467a0b22c5e4cecfb7d12afcca6246dac515d5a06ab7e4c775f8e
MD5 e6db140ead71d7c3afc6fb79adc0a33c
BLAKE2b-256 da46e4fb31929d8871adc90ed85266e8418666020c09bbeb60b8e5544edb1a7e

See more details on using hashes here.

File details

Details for the file gensim-4.1.2-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.1.2-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 8bd89b791e6729a9dd1c345d32fc9e2ba51348cf54fbaa8d49259eb92e719084
MD5 806802472221d530467f2db2d2c1c179
BLAKE2b-256 9f44985c6291f160aca1257dae9b5bb62d91d0f61f12014297a2fa80e6464be1

See more details on using hashes here.

File details

Details for the file gensim-4.1.2-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: gensim-4.1.2-cp37-cp37m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.7m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for gensim-4.1.2-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 615d2a57efeaf97cd847e95f83b2fc168f9d22f4922aaa9cda9350f05648560c
MD5 ab848d1411135f2a9c1197655426036b
BLAKE2b-256 02e06c4123d6bf463160f9ef6d9ea4336c71b99cb0591d94b5cce719a7d7a80d

See more details on using hashes here.

File details

Details for the file gensim-4.1.2-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: gensim-4.1.2-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for gensim-4.1.2-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 d812dcdf2bfaf527a09ecf867303c117d6f497233db08f1d8209ffb71aaf3fdb
MD5 5ec2628eb7645524384e0b720ba5033d
BLAKE2b-256 2dd5c0e043290cad2e6346740cf943151fb8f221081ac9da7e9c79cea9c1a2b5

See more details on using hashes here.

File details

Details for the file gensim-4.1.2-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for gensim-4.1.2-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 804e18d76d9034bc70f93b8407680b7956c99f03914e85e31dd8b296623dc0ed
MD5 676b721dea3f7bba6532845433b7e058
BLAKE2b-256 75cd0a445917374b98150a1bf783500199de82d219201da9d165f9191b029f8c

See more details on using hashes here.

File details

Details for the file gensim-4.1.2-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.1.2-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 3e34cfe767a8db52812826136d6e94863081fd1456726bd1ff40b4e25965fbb5
MD5 18da6c755eb2348dcc24f94ed7c51228
BLAKE2b-256 bab3668ace2f0517b7fb01f780f93a75cb0592754d6365d808d2adccb2a94b92

See more details on using hashes here.

File details

Details for the file gensim-4.1.2-cp36-cp36m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: gensim-4.1.2-cp36-cp36m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.6m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for gensim-4.1.2-cp36-cp36m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 1ff0171ec5b7473facb1441426a6b41e8ec4599fd62e1820868ab965804e3d4c
MD5 0531b55c689fd28cc0c62d850425a10a
BLAKE2b-256 ffc156a69df76a11808e0e953e6822aa2fa505676040e1adacab102db20e70a9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page