Skip to main content

Python framework for fast Vector Space Modelling

Project description

GA Wheel

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.

Features

  • All algorithms are memory-independent w.r.t. the corpus size (can process input larger than RAM, streamed, out-of-core)

  • Intuitive interfaces

    • easy to plug in your own input corpus/datastream (simple streaming API)

    • easy to extend with other Vector Space algorithms (simple transformation API)

  • Efficient multicore implementations of popular algorithms, such as online Latent Semantic Analysis (LSA/LSI/SVD), Latent Dirichlet Allocation (LDA), Random Projections (RP), Hierarchical Dirichlet Process (HDP) or word2vec deep learning.

  • Distributed computing: can run Latent Semantic Analysis and Latent Dirichlet Allocation on a cluster of computers.

  • Extensive documentation and Jupyter Notebook tutorials.

If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

Installation

This software depends on NumPy and Scipy, two Python packages for scientific computing. You must have them installed prior to installing gensim.

It is also recommended you install a fast BLAS library before installing NumPy. This is optional, but using an optimized BLAS such as MKL, ATLAS or OpenBLAS is known to improve performance by as much as an order of magnitude. On OSX, NumPy picks up its vecLib BLAS automatically, so you don’t need to do anything special.

Install the latest version of gensim:

pip install --upgrade gensim

Or, if you have instead downloaded and unzipped the source tar.gz package:

python setup.py install

For alternative modes of installation, see the documentation.

Gensim is being continuously tested under all supported Python versions. Support for Python 2.7 was dropped in gensim 4.0.0 – install gensim 3.8.3 if you must use Python 2.7.

How come gensim is so fast and memory efficient? Isn’t it pure Python, and isn’t Python slow and greedy?

Many scientific algorithms can be expressed in terms of large matrix operations (see the BLAS note above). Gensim taps into these low-level BLAS libraries, by means of its dependency on NumPy. So while gensim-the-top-level-code is pure Python, it actually executes highly optimized Fortran/C under the hood, including multithreading (if your BLAS is so configured).

Memory-wise, gensim makes heavy use of Python’s built-in generators and iterators for streamed data processing. Memory efficiency was one of gensim’s design goals, and is a central feature of gensim, rather than something bolted on as an afterthought.

Documentation

Citing gensim

When citing gensim in academic papers and theses, please use this BibTeX entry:

@inproceedings{rehurek_lrec,
      title = {{Software Framework for Topic Modelling with Large Corpora}},
      author = {Radim {\v R}eh{\r u}{\v r}ek and Petr Sojka},
      booktitle = {{Proceedings of the LREC 2010 Workshop on New
           Challenges for NLP Frameworks}},
      pages = {45--50},
      year = 2010,
      month = May,
      day = 22,
      publisher = {ELRA},
      address = {Valletta, Malta},
      language={English}
}

Gensim is open source software released under the GNU LGPLv2.1 license. Copyright (c) 2009-now Radim Rehurek

Analytics

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gensim-4.3.2.tar.gz (23.3 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

gensim-4.3.2-cp311-cp311-win_amd64.whl (24.0 MB view details)

Uploaded CPython 3.11Windows x86-64

gensim-4.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

gensim-4.3.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (26.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

gensim-4.3.2-cp311-cp311-macosx_11_0_arm64.whl (24.0 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

gensim-4.3.2-cp311-cp311-macosx_10_9_x86_64.whl (24.1 MB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

gensim-4.3.2-cp310-cp310-win_amd64.whl (24.0 MB view details)

Uploaded CPython 3.10Windows x86-64

gensim-4.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

gensim-4.3.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (26.4 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

gensim-4.3.2-cp310-cp310-macosx_11_0_arm64.whl (24.0 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

gensim-4.3.2-cp310-cp310-macosx_10_9_x86_64.whl (24.1 MB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

gensim-4.3.2-cp39-cp39-win_amd64.whl (24.0 MB view details)

Uploaded CPython 3.9Windows x86-64

gensim-4.3.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.6 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

gensim-4.3.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (26.5 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ ARM64

gensim-4.3.2-cp39-cp39-macosx_11_0_arm64.whl (24.0 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

gensim-4.3.2-cp39-cp39-macosx_10_9_x86_64.whl (24.1 MB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

gensim-4.3.2-cp38-cp38-win_amd64.whl (24.0 MB view details)

Uploaded CPython 3.8Windows x86-64

gensim-4.3.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.6 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

gensim-4.3.2-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (26.5 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ ARM64

gensim-4.3.2-cp38-cp38-macosx_11_0_arm64.whl (24.0 MB view details)

Uploaded CPython 3.8macOS 11.0+ ARM64

gensim-4.3.2-cp38-cp38-macosx_10_9_x86_64.whl (24.1 MB view details)

Uploaded CPython 3.8macOS 10.9+ x86-64

File details

Details for the file gensim-4.3.2.tar.gz.

File metadata

  • Download URL: gensim-4.3.2.tar.gz
  • Upload date:
  • Size: 23.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for gensim-4.3.2.tar.gz
Algorithm Hash digest
SHA256 99ac6af6ffd40682e70155ed9f92ecbf4384d59fb50af120d343ea5ee1b308ab
MD5 d0f9e2d9db9e4a5316eb5e5b08169b03
BLAKE2b-256 7768074333a52f6fa82402332054ca0dfa721f7dcfa7eace313f64cdb44bacde

See more details on using hashes here.

File details

Details for the file gensim-4.3.2-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: gensim-4.3.2-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for gensim-4.3.2-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 5a52001226f9e89f7833503f99c9b4fd028fdf837002f24cdc1bc3cf901a4003
MD5 0e5815d967b71e027d938a0a679dc974
BLAKE2b-256 ad97b8253236dfedb9094f4273393a3fd03997da81f27f15822e56128da894ae

See more details on using hashes here.

File details

Details for the file gensim-4.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8c3b537c1fd4699c8e6d59c3ffa2fdd9918cd4e5555bf5ee7c1fbedd89b2d643
MD5 69045defa3e8eb592ef1e41d0c0e0f40
BLAKE2b-256 22407d2cce3ad4ad5d02aa68e253e6ea5f0acc381f02f594e235fe00a274faff

See more details on using hashes here.

File details

Details for the file gensim-4.3.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for gensim-4.3.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 e3e8035ac3f54dca3a8ca56bec526ddfe5b23006e0134b7375ca5f5dbfaef70a
MD5 fabf0319117fd28954036baf307f5693
BLAKE2b-256 1434f1e056feda95330f7d8beef6771e3441ce0e8e2d1f55bf754b0b0594b234

See more details on using hashes here.

File details

Details for the file gensim-4.3.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for gensim-4.3.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8dcd1419266bd563c371d25530f4dce3505fe78059b2c0c08724e4f9e5479b38
MD5 02a7e3140eb190ee7c47b34ee5ea396e
BLAKE2b-256 63465feab9c524a380bfa9f9f1c0d065743280dca30b216ab4c7a231f22dbed7

See more details on using hashes here.

File details

Details for the file gensim-4.3.2-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.3.2-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 a919493339cfad39d5e76768c1bc546cd507f715c5fca93165cc174a97657457
MD5 3f6287458253970495b4e12db29ad975
BLAKE2b-256 99f58d2cb0b2628bb6482baafbf0ff7262c11fc46e98b23ee79234828b927e8d

See more details on using hashes here.

File details

Details for the file gensim-4.3.2-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: gensim-4.3.2-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for gensim-4.3.2-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 c46b7395dc57c83329932f3febed9660891fdcc75327d56f55000e3e08898983
MD5 dc2005a9ed07db113a273801b31380e0
BLAKE2b-256 abb0d58dc405fd60ab546ca714321235dc2d455b2dc06bfb4fc1092940c749fc

See more details on using hashes here.

File details

Details for the file gensim-4.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5e34ee6f8a318fbf0b65e6d39a985ecf9e9051febfd1221ae6255fff1972c547
MD5 909bacab79f626aa748b47f4595e523a
BLAKE2b-256 e8d9104988573fd2c1acdc64e66883b35fb8ae559310d2d9f77db78bf7de9add

See more details on using hashes here.

File details

Details for the file gensim-4.3.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for gensim-4.3.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a9bf1a8ee2e8214499c517008a0fd175ce5c649954a88569358cfae6bfca42dc
MD5 47abb0993815c91e236d5fbc7d01fbb3
BLAKE2b-256 5984ed371ab548a02e16f83669dee5337ad3917d3bf980878608956817e0534b

See more details on using hashes here.

File details

Details for the file gensim-4.3.2-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for gensim-4.3.2-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 67c41b15e19e4950f57124f633c45839b5c84268ffa58079c5b0c0f04d2a9cb9
MD5 d58318a6b1baee3879226cf475c468a2
BLAKE2b-256 d784ec2f74713475b57b7d882f5160b0b7c9df40ecc409d793e87656a6f3ada0

See more details on using hashes here.

File details

Details for the file gensim-4.3.2-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.3.2-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 31b3cb313939b6940ee21660177f6405e71b920da462dbf065b2458a24ab33e1
MD5 5c94cd6fb63c36d7d5178249eec494cd
BLAKE2b-256 432121f993356303c4c92352d9e7c732f715e86e7b0bc04674be71bb1e9bb05b

See more details on using hashes here.

File details

Details for the file gensim-4.3.2-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: gensim-4.3.2-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for gensim-4.3.2-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 b3f26299ac241ff54329a54c37c22eac1bf4c4a337068adf2637259ee0d8484a
MD5 befd99194f04d2d63e958f4e4a08468f
BLAKE2b-256 d368373da90a8b241e2603707c7aa4c8f47829a72729c6e9497f2bc604fa6a6a

See more details on using hashes here.

File details

Details for the file gensim-4.3.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.3.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4715eafcd309c2f7e030829eddba72fe47bbe9bb466811fce3158127d29c8979
MD5 61b5ec62663eda767094527d26d958a2
BLAKE2b-256 7befd559c7daebb2f00b881575551b23866ebcbf6eeaf33393d692c7f46d0983

See more details on using hashes here.

File details

Details for the file gensim-4.3.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for gensim-4.3.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 226690ea081b92a2289661a25e8a89069ae09b1ed4137b67a0d6ec211e0371d3
MD5 aff89d52f2d452df894e5dc8d8ce5386
BLAKE2b-256 34a934fa729b0700361d35a3640b95395ead6c101ec290c0fe6c53d473c03ccc

See more details on using hashes here.

File details

Details for the file gensim-4.3.2-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for gensim-4.3.2-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 548c7bf983e619d6b8d78b6a5321dcbcba5b39f68779a0d36e38a5a971416276
MD5 d3318a5415c01d001992dca8b743cdc4
BLAKE2b-256 dbaf18b551ae8d26b8731dbe5923565fdf96502bb9aca88a37f241d510c62dc2

See more details on using hashes here.

File details

Details for the file gensim-4.3.2-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.3.2-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 c86915cf0e0b86658a40a070bd7e04db0814065963657e92910303070275865d
MD5 4b5d03630674dd92df0ed3fb9f21cff1
BLAKE2b-256 04590a073bcf0873f64f3c6e82a11c8fa90cd5564cb3e21dc6077bc7b3feb644

See more details on using hashes here.

File details

Details for the file gensim-4.3.2-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: gensim-4.3.2-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 24.0 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for gensim-4.3.2-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 f785b3caf376a1f2989e0f3c890642e5b1566393fd3831dab03fc6670d672814
MD5 0ec9e4c0fa1b386116a7759b76a1e665
BLAKE2b-256 3eb7fba98a65efea29a7d8bf25ade2db67e34ebab8e63769e8927d0a4d42a84f

See more details on using hashes here.

File details

Details for the file gensim-4.3.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.3.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 99876be00b73c7cef01f427d241b07eb1c1b298fb411580cc1067d22c43a13be
MD5 d3990b5efbd650c9a44c3cc698ab9da0
BLAKE2b-256 d3e217bad124c8dd2aa0a3062e44992eb34c282379450ebbe6fdb6b96aa3c907

See more details on using hashes here.

File details

Details for the file gensim-4.3.2-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for gensim-4.3.2-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 6a33ff0d4cf3e50e7ddd7353fb38ed2d4af2e48a6ef58d622809862c30c8b8a2
MD5 d839fc94611d56262714ffcb09740655
BLAKE2b-256 be452ed230b4ef8767c5ccc1fdcc5132d536d7188a2552d37f059bff3dcd4776

See more details on using hashes here.

File details

Details for the file gensim-4.3.2-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for gensim-4.3.2-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 bf7a9dc37c2ca465c7834863a7b264369c1373bb474135df225cee654b8adfab
MD5 ef973dcb56851064c23d38876949df4c
BLAKE2b-256 37169266c7e205d344cd6bea5074ed769e878c9b3919ab4e1e6adf0ad6370eb8

See more details on using hashes here.

File details

Details for the file gensim-4.3.2-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for gensim-4.3.2-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 e8d62604efb8281a25254e5a6c14227034c267ed56635e590c9cae2635196dca
MD5 4d4929ae9ce3f39cbca4f4703aa301f4
BLAKE2b-256 0ca72dd786427bedd2c3dc6c74b70e1e53c6c180a7da0a686c61c2ab17f6fc63

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page