Skip to main content

Python framework for fast Vector Space Modelling

Project description

Travis Wheel

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.

Features

  • All algorithms are memory-independent w.r.t. the corpus size (can process input larger than RAM, streamed, out-of-core),

  • Intuitive interfaces

    • easy to plug in your own input corpus/datastream (trivial streaming API)

    • easy to extend with other Vector Space algorithms (trivial transformation API)

  • Efficient multicore implementations of popular algorithms, such as online Latent Semantic Analysis (LSA/LSI/SVD), Latent Dirichlet Allocation (LDA), Random Projections (RP), Hierarchical Dirichlet Process (HDP) or word2vec deep learning.

  • Distributed computing: can run Latent Semantic Analysis and Latent Dirichlet Allocation on a cluster of computers.

  • Extensive documentation and Jupyter Notebook tutorials.

If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

Installation

This software depends on NumPy and Scipy, two Python packages for scientific computing. You must have them installed prior to installing gensim.

It is also recommended you install a fast BLAS library before installing NumPy. This is optional, but using an optimized BLAS such as ATLAS or OpenBLAS is known to improve performance by as much as an order of magnitude. On OS X, NumPy picks up the BLAS that comes with it automatically, so you don’t need to do anything special.

The simple way to install gensim is:

pip install -U gensim

Or, if you have instead downloaded and unzipped the source tar.gz package, you’d run:

python setup.py test
python setup.py install

For alternative modes of installation (without root privileges, development installation, optional install features), see the install documentation.

This version has been tested under Python 2.7, 3.5 and 3.6. Support for Python 2.6, 3.3 and 3.4 was dropped in gensim 1.0.0. Install gensim 0.13.4 if you must use Python 2.6, 3.3 or 3.4. Support for Python 2.5 was dropped in gensim 0.10.0; install gensim 0.9.1 if you must use Python 2.5). Gensim’s github repo is hooked against Travis CI for automated testing on every commit push and pull request.

How come gensim is so fast and memory efficient? Isn’t it pure Python, and isn’t Python slow and greedy?

Many scientific algorithms can be expressed in terms of large matrix operations (see the BLAS note above). Gensim taps into these low-level BLAS libraries, by means of its dependency on NumPy. So while gensim-the-top-level-code is pure Python, it actually executes highly optimized Fortran/C under the hood, including multithreading (if your BLAS is so configured).

Memory-wise, gensim makes heavy use of Python’s built-in generators and iterators for streamed data processing. Memory efficiency was one of gensim’s design goals, and is a central feature of gensim, rather than something bolted on as an afterthought.

Documentation

Citing gensim

When citing gensim in academic papers and theses, please use this BibTeX entry:

@inproceedings{rehurek_lrec,
      title = {{Software Framework for Topic Modelling with Large Corpora}},
      author = {Radim {\v R}eh{\r u}{\v r}ek and Petr Sojka},
      booktitle = {{Proceedings of the LREC 2010 Workshop on New
           Challenges for NLP Frameworks}},
      pages = {45--50},
      year = 2010,
      month = May,
      day = 22,
      publisher = {ELRA},
      address = {Valletta, Malta},
      language={English}
}

Gensim is open source software released under the GNU LGPLv2.1 license. Copyright (c) 2009-now Radim Rehurek

Analytics

Project details


Release history Release notifications | RSS feed

This version

3.7.3

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gensim-3.7.3.tar.gz (23.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

gensim-3.7.3-cp37-cp37m-manylinux1_x86_64.whl (24.2 MB view details)

Uploaded CPython 3.7m

gensim-3.7.3-cp37-cp37m-manylinux1_i686.whl (24.1 MB view details)

Uploaded CPython 3.7m

gensim-3.7.3-cp37-cp37m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (24.7 MB view details)

Uploaded CPython 3.7mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

gensim-3.7.3-cp36-cp36m-manylinux1_x86_64.whl (24.2 MB view details)

Uploaded CPython 3.6m

gensim-3.7.3-cp36-cp36m-manylinux1_i686.whl (24.1 MB view details)

Uploaded CPython 3.6m

gensim-3.7.3-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (24.7 MB view details)

Uploaded CPython 3.6mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

gensim-3.7.3-cp35-cp35m-manylinux1_x86_64.whl (24.2 MB view details)

Uploaded CPython 3.5m

gensim-3.7.3-cp35-cp35m-manylinux1_i686.whl (24.1 MB view details)

Uploaded CPython 3.5m

gensim-3.7.3-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (24.6 MB view details)

Uploaded CPython 3.5mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

gensim-3.7.3-cp27-cp27mu-manylinux1_x86_64.whl (24.2 MB view details)

Uploaded CPython 2.7mu

gensim-3.7.3-cp27-cp27mu-manylinux1_i686.whl (24.1 MB view details)

Uploaded CPython 2.7mu

gensim-3.7.3-cp27-cp27m-manylinux1_x86_64.whl (24.2 MB view details)

Uploaded CPython 2.7m

gensim-3.7.3-cp27-cp27m-manylinux1_i686.whl (24.1 MB view details)

Uploaded CPython 2.7m

gensim-3.7.3-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (24.7 MB view details)

Uploaded CPython 2.7mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

File details

Details for the file gensim-3.7.3.tar.gz.

File metadata

  • Download URL: gensim-3.7.3.tar.gz
  • Upload date:
  • Size: 23.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.1

File hashes

Hashes for gensim-3.7.3.tar.gz
Algorithm Hash digest
SHA256 621fe72ee1bb0e16008c34f9f5ca6168bbfc82fc85907f7254974776e482e156
MD5 2537a87355a87049a53e6a6f16f3a0b0
BLAKE2b-256 8180858ef502e80baa6384b75fd5c89f01074b791a13b830487f9e25bdce50ec

See more details on using hashes here.

File details

Details for the file gensim-3.7.3-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: gensim-3.7.3-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 24.2 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.1

File hashes

Hashes for gensim-3.7.3-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 e72962139f26194fcb8c44adaebca86f167db57f5214b80eb3126a737a3a914f
MD5 13c877fa659a10a64acfc5bbd73e640f
BLAKE2b-256 fa822542fac981c1f9302164127088bb2d2044bf70b18ed181bc745b4432f51a

See more details on using hashes here.

File details

Details for the file gensim-3.7.3-cp37-cp37m-manylinux1_i686.whl.

File metadata

  • Download URL: gensim-3.7.3-cp37-cp37m-manylinux1_i686.whl
  • Upload date:
  • Size: 24.1 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.1

File hashes

Hashes for gensim-3.7.3-cp37-cp37m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 593ec35a2dc96d19ba4e4e729dbb57838ce12a351f17491b71b9186509a868ef
MD5 28672143a2d1bdf89140dd2a7de14187
BLAKE2b-256 eda276130485b225a236421c1b52af4bfc7fdb45cb87f568a04b89edcfa7357a

See more details on using hashes here.

File details

Details for the file gensim-3.7.3-cp37-cp37m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.7.3-cp37-cp37m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 48ffbc762ea9813f11ef02b26961f5ae48a8ce3e72a22fa9a03cecbe44b8fef6
MD5 c55ca7fdb7076def20af2dc5e98cae29
BLAKE2b-256 82bb56f295a604dfafdef746cc81081ff4c6e825690de95963000300a1cd3d80

See more details on using hashes here.

File details

Details for the file gensim-3.7.3-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: gensim-3.7.3-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 24.2 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.1

File hashes

Hashes for gensim-3.7.3-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 b069f367744b3a315c5170ff5122664380eef188a911c94d1fe264a7984c9af8
MD5 665e4b41f640e92f90080d7b306e0b57
BLAKE2b-256 d34b19eecdf07d614665fa889857dc56ac965631c7bd816c3476d2f0cac6ea3b

See more details on using hashes here.

File details

Details for the file gensim-3.7.3-cp36-cp36m-manylinux1_i686.whl.

File metadata

  • Download URL: gensim-3.7.3-cp36-cp36m-manylinux1_i686.whl
  • Upload date:
  • Size: 24.1 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.1

File hashes

Hashes for gensim-3.7.3-cp36-cp36m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 2191d664471d948f7d6262138777a18d4b5ebaa6c67af970a0af4d94b26548a1
MD5 a52790cc095620b2ea4e7d63b599894a
BLAKE2b-256 4aec7fa32440f92ab90f1f71c7a23e6f7144a005c7e235962d0f688c285cba6a

See more details on using hashes here.

File details

Details for the file gensim-3.7.3-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.7.3-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 89a95d662efd41829a7c43fad3657c1b4133df5c3a58e088f8cf07b78c011519
MD5 8d93671685a1bd646b3dfdfe1e5ad51a
BLAKE2b-256 b4fbc0cefcecf82b445ff2a714935db5b475a25202d6b63241c7e95ca004136a

See more details on using hashes here.

File details

Details for the file gensim-3.7.3-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

  • Download URL: gensim-3.7.3-cp35-cp35m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 24.2 MB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.1

File hashes

Hashes for gensim-3.7.3-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 3fdd8b57221e4cbf951d34657a63731dd601a21df593f653f0e449b1abb9d657
MD5 863492bac0397afad26f6ae58a86913c
BLAKE2b-256 ae27e6a9a062104237af82b4df476c21220db6e8321dd9df929b5c91ae915425

See more details on using hashes here.

File details

Details for the file gensim-3.7.3-cp35-cp35m-manylinux1_i686.whl.

File metadata

  • Download URL: gensim-3.7.3-cp35-cp35m-manylinux1_i686.whl
  • Upload date:
  • Size: 24.1 MB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.1

File hashes

Hashes for gensim-3.7.3-cp35-cp35m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 c7ea53afeaaf1f7e0040229e9fac6357bd8e045750843b054ab6abdfbcc53efb
MD5 bd2594396e3441338289a0fdb9e8d8b8
BLAKE2b-256 c6041425333e43b5456048c2e7d866d84962fec94ebce67f9dd47a35cdab9c54

See more details on using hashes here.

File details

Details for the file gensim-3.7.3-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.7.3-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 2fd05a4ef53d3b82056da61a9b4aaff1ccacecc7ab323dc99981e6537396efad
MD5 7295bbe313e772b21cd46df77ac35b8e
BLAKE2b-256 8769c31666c89b21e527eed1cdff93f80dee404ba808fbc36007e854f771710d

See more details on using hashes here.

File details

Details for the file gensim-3.7.3-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

  • Download URL: gensim-3.7.3-cp27-cp27mu-manylinux1_x86_64.whl
  • Upload date:
  • Size: 24.2 MB
  • Tags: CPython 2.7mu
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.1

File hashes

Hashes for gensim-3.7.3-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 16d21e0034a7d100832e4a3ad52a72ab5b46b2e78042da1f121b83af71574ac8
MD5 da4b8d45a7af7da1c0b8c5ac6b6cb14e
BLAKE2b-256 25dcd3d0abc16fbddddb3eede644243fdbed0462691a5f23d2c0472704afc126

See more details on using hashes here.

File details

Details for the file gensim-3.7.3-cp27-cp27mu-manylinux1_i686.whl.

File metadata

  • Download URL: gensim-3.7.3-cp27-cp27mu-manylinux1_i686.whl
  • Upload date:
  • Size: 24.1 MB
  • Tags: CPython 2.7mu
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.1

File hashes

Hashes for gensim-3.7.3-cp27-cp27mu-manylinux1_i686.whl
Algorithm Hash digest
SHA256 af5e401ce0a978707911f2c5392e256bad2dd271cb4ed001a54ded4a6bc0542a
MD5 e00d37b088e968cac7ad8c6d61b15877
BLAKE2b-256 08207403f048ede3682a3abb6c6c3946b2205e3d8f6127922e18a1c1df4eece3

See more details on using hashes here.

File details

Details for the file gensim-3.7.3-cp27-cp27m-manylinux1_x86_64.whl.

File metadata

  • Download URL: gensim-3.7.3-cp27-cp27m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 24.2 MB
  • Tags: CPython 2.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.1

File hashes

Hashes for gensim-3.7.3-cp27-cp27m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 16990f0c6ea20c9ccad91f051af42921b902935e9fdb9e077103562721b45162
MD5 cac0ed50f1225297690a37d7b56bbc00
BLAKE2b-256 cf03353526373cf2aea90dd78f02fd5feb59754c1578aa7e637edfd1118eb2f5

See more details on using hashes here.

File details

Details for the file gensim-3.7.3-cp27-cp27m-manylinux1_i686.whl.

File metadata

  • Download URL: gensim-3.7.3-cp27-cp27m-manylinux1_i686.whl
  • Upload date:
  • Size: 24.1 MB
  • Tags: CPython 2.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.1

File hashes

Hashes for gensim-3.7.3-cp27-cp27m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 319777c2b0e5b37a2f1780e484257133f15f26d9cc1ab5740d2df00f87fa9a7d
MD5 9fd666dfa5e611b0cd60ebbc8b632b7c
BLAKE2b-256 abae289733408ef22b4546061478593045369660c1982c3f976f613da1ea8425

See more details on using hashes here.

File details

Details for the file gensim-3.7.3-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.7.3-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 1cd2b67366167a03ecc289b3065140edb9bd9820ee3323bd8786d76edf3c09ca
MD5 ed65d346c63af5cb92ad176285d0af41
BLAKE2b-256 e78eae1b656131601fb0f31795f4b0b2a3beb5450fba63a2c87144f0f18807bc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page