Skip to main content

Python framework for fast Vector Space Modelling

Project description

Travis Wheel

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.

Features

  • All algorithms are memory-independent w.r.t. the corpus size (can process input larger than RAM, streamed, out-of-core),

  • Intuitive interfaces

    • easy to plug in your own input corpus/datastream (trivial streaming API)

    • easy to extend with other Vector Space algorithms (trivial transformation API)

  • Efficient multicore implementations of popular algorithms, such as online Latent Semantic Analysis (LSA/LSI/SVD), Latent Dirichlet Allocation (LDA), Random Projections (RP), Hierarchical Dirichlet Process (HDP) or word2vec deep learning.

  • Distributed computing: can run Latent Semantic Analysis and Latent Dirichlet Allocation on a cluster of computers.

  • Extensive documentation and Jupyter Notebook tutorials.

If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

Installation

This software depends on NumPy and Scipy, two Python packages for scientific computing. You must have them installed prior to installing gensim.

It is also recommended you install a fast BLAS library before installing NumPy. This is optional, but using an optimized BLAS such as ATLAS or OpenBLAS is known to improve performance by as much as an order of magnitude. On OS X, NumPy picks up the BLAS that comes with it automatically, so you don’t need to do anything special.

The simple way to install gensim is:

pip install -U gensim

Or, if you have instead downloaded and unzipped the source tar.gz package, you’d run:

python setup.py test
python setup.py install

For alternative modes of installation (without root privileges, development installation, optional install features), see the install documentation.

This version has been tested under Python 2.7, 3.5 and 3.6. Support for Python 2.6, 3.3 and 3.4 was dropped in gensim 1.0.0. Install gensim 0.13.4 if you must use Python 2.6, 3.3 or 3.4. Support for Python 2.5 was dropped in gensim 0.10.0; install gensim 0.9.1 if you must use Python 2.5). Gensim’s github repo is hooked against Travis CI for automated testing on every commit push and pull request.

How come gensim is so fast and memory efficient? Isn’t it pure Python, and isn’t Python slow and greedy?

Many scientific algorithms can be expressed in terms of large matrix operations (see the BLAS note above). Gensim taps into these low-level BLAS libraries, by means of its dependency on NumPy. So while gensim-the-top-level-code is pure Python, it actually executes highly optimized Fortran/C under the hood, including multithreading (if your BLAS is so configured).

Memory-wise, gensim makes heavy use of Python’s built-in generators and iterators for streamed data processing. Memory efficiency was one of gensim’s design goals, and is a central feature of gensim, rather than something bolted on as an afterthought.

Documentation

Citing gensim

When citing gensim in academic papers and theses, please use this BibTeX entry:

@inproceedings{rehurek_lrec,
      title = {{Software Framework for Topic Modelling with Large Corpora}},
      author = {Radim {\v R}eh{\r u}{\v r}ek and Petr Sojka},
      booktitle = {{Proceedings of the LREC 2010 Workshop on New
           Challenges for NLP Frameworks}},
      pages = {45--50},
      year = 2010,
      month = May,
      day = 22,
      publisher = {ELRA},
      address = {Valletta, Malta},
      language={English}
}

Gensim is open source software released under the GNU LGPLv2.1 license. Copyright (c) 2009-now Radim Rehurek

Analytics

Project details


Release history Release notifications | RSS feed

This version

3.2.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gensim-3.2.0.tar.gz (15.3 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

gensim-3.2.0.win-amd64-py3.6.exe (16.1 MB view details)

Uploaded Source

gensim-3.2.0.win-amd64-py3.5.exe (16.1 MB view details)

Uploaded Source

gensim-3.2.0.win-amd64-py2.7.exe (15.8 MB view details)

Uploaded Source

gensim-3.2.0.win32-py3.6.exe (16.0 MB view details)

Uploaded Source

gensim-3.2.0.win32-py3.5.exe (16.0 MB view details)

Uploaded Source

gensim-3.2.0.win32-py2.7.exe (15.7 MB view details)

Uploaded Source

gensim-3.2.0-cp36-cp36m-win_amd64.whl (15.5 MB view details)

Uploaded CPython 3.6mWindows x86-64

gensim-3.2.0-cp36-cp36m-win32.whl (15.5 MB view details)

Uploaded CPython 3.6mWindows x86

gensim-3.2.0-cp36-cp36m-manylinux1_x86_64.whl (15.9 MB view details)

Uploaded CPython 3.6m

gensim-3.2.0-cp36-cp36m-manylinux1_i686.whl (15.8 MB view details)

Uploaded CPython 3.6m

gensim-3.2.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (15.7 MB view details)

Uploaded CPython 3.6mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

gensim-3.2.0-cp35-cp35m-win_amd64.whl (15.5 MB view details)

Uploaded CPython 3.5mWindows x86-64

gensim-3.2.0-cp35-cp35m-win32.whl (15.5 MB view details)

Uploaded CPython 3.5mWindows x86

gensim-3.2.0-cp35-cp35m-manylinux1_x86_64.whl (15.9 MB view details)

Uploaded CPython 3.5m

gensim-3.2.0-cp35-cp35m-manylinux1_i686.whl (15.8 MB view details)

Uploaded CPython 3.5m

gensim-3.2.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (15.7 MB view details)

Uploaded CPython 3.5mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

gensim-3.2.0-cp27-cp27mu-manylinux1_x86_64.whl (15.9 MB view details)

Uploaded CPython 2.7mu

gensim-3.2.0-cp27-cp27mu-manylinux1_i686.whl (15.8 MB view details)

Uploaded CPython 2.7mu

gensim-3.2.0-cp27-cp27m-win_amd64.whl (15.5 MB view details)

Uploaded CPython 2.7mWindows x86-64

gensim-3.2.0-cp27-cp27m-win32.whl (15.5 MB view details)

Uploaded CPython 2.7mWindows x86

gensim-3.2.0-cp27-cp27m-manylinux1_x86_64.whl (15.9 MB view details)

Uploaded CPython 2.7m

gensim-3.2.0-cp27-cp27m-manylinux1_i686.whl (15.8 MB view details)

Uploaded CPython 2.7m

gensim-3.2.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (15.7 MB view details)

Uploaded CPython 2.7mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

File details

Details for the file gensim-3.2.0.tar.gz.

File metadata

  • Download URL: gensim-3.2.0.tar.gz
  • Upload date:
  • Size: 15.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for gensim-3.2.0.tar.gz
Algorithm Hash digest
SHA256 db00b68c6567ba0598d400b917c889e8801adf249170ce0a80ec38187d1b0797
MD5 a68b34058d762b5591b26892aecd4c3f
BLAKE2b-256 9f7ef8c34a1291edde755c881cb960deff6785d0b7f8eefa6d1583ef4f3abb14

See more details on using hashes here.

File details

Details for the file gensim-3.2.0.win-amd64-py3.6.exe.

File metadata

File hashes

Hashes for gensim-3.2.0.win-amd64-py3.6.exe
Algorithm Hash digest
SHA256 8a0e465e4e7e09ddf07ea372da685ab058a1435eacc75b10ccee8caf25bcc276
MD5 eaa37755c130c8e90f8ab3d7b4e03af5
BLAKE2b-256 58507d91fb1b815284e76bb6153246acf574e620f4c0e5ea836df6348b51b132

See more details on using hashes here.

File details

Details for the file gensim-3.2.0.win-amd64-py3.5.exe.

File metadata

File hashes

Hashes for gensim-3.2.0.win-amd64-py3.5.exe
Algorithm Hash digest
SHA256 a86990e9c231822862549f0e9d75bdea4c242114f9d1399822c4b52177d1de3c
MD5 d9b2c3ffbe9607cfbcbb4e4495af1607
BLAKE2b-256 4f98566e89b888fe2b2c674b1ba8b67750c9000aec280c1049f34fc3131ac6de

See more details on using hashes here.

File details

Details for the file gensim-3.2.0.win-amd64-py2.7.exe.

File metadata

File hashes

Hashes for gensim-3.2.0.win-amd64-py2.7.exe
Algorithm Hash digest
SHA256 b1ab55a0ebfcf31d39ddda58bc68f0c38e9900f72b855449a86f1edf0ce0f3ed
MD5 6912d12de33279e34d964b31e84c16b7
BLAKE2b-256 ff6e22018ff12ad82d91da285d84d5f18f4d51092bddd01e874b0e69e0890690

See more details on using hashes here.

File details

Details for the file gensim-3.2.0.win32-py3.6.exe.

File metadata

File hashes

Hashes for gensim-3.2.0.win32-py3.6.exe
Algorithm Hash digest
SHA256 e181ec9b254dea6987169342c7f8ede6fbf62af4e58fd3bcb53de1bc1916901b
MD5 7d46f6165bc44833fec0b8defe9c8bed
BLAKE2b-256 0b6b50476bb1244ed1181353d606aa0f66f3099f2f5679cf4949cadecc0e66dc

See more details on using hashes here.

File details

Details for the file gensim-3.2.0.win32-py3.5.exe.

File metadata

File hashes

Hashes for gensim-3.2.0.win32-py3.5.exe
Algorithm Hash digest
SHA256 71a28be2663cd9b115ea4f2c4397435270adefdbcdd34dc3d267be791b196403
MD5 869bd2a704390b3269bd65f1a8b562f0
BLAKE2b-256 ff1921d1441a9ea86a94767f430e4bb4ef67b5f0fb052b1459a296151fe13b83

See more details on using hashes here.

File details

Details for the file gensim-3.2.0.win32-py2.7.exe.

File metadata

File hashes

Hashes for gensim-3.2.0.win32-py2.7.exe
Algorithm Hash digest
SHA256 affb4af6f76f848ff41ffe972cb885f2bf774f3ed46a72e304d95971a5462e16
MD5 a1d91a1b160bf7508bd78f5eaa6efac0
BLAKE2b-256 cc86bfde6e077391b0f57966b4f7c74e4cc9ebbd6c84b87762def52dad3bc7b0

See more details on using hashes here.

File details

Details for the file gensim-3.2.0-cp36-cp36m-win_amd64.whl.

File metadata

File hashes

Hashes for gensim-3.2.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 5ff568b5c40433c577ddc6a14614166848865168776648fc75091cf772f146f2
MD5 9dc4482ea246372f9f7acc00645d378a
BLAKE2b-256 75805632370ca3534b6760f32664de5bf69009c4d75b256af28719f0676e9406

See more details on using hashes here.

File details

Details for the file gensim-3.2.0-cp36-cp36m-win32.whl.

File metadata

File hashes

Hashes for gensim-3.2.0-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 ee697a69002b84ed8c4d1fbea8ec8a5789b1245265043a0a8f77d53396637f78
MD5 b78b17c92b502527fe59daa1810f27da
BLAKE2b-256 30f737f06f8f5740048d20dcfebb1ee02f4a8ec1dda6543e7d87b5dbe85f30a9

See more details on using hashes here.

File details

Details for the file gensim-3.2.0-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.2.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 79e3a0d29aa0f2962f1cfbc3b4181bbbcafdf5d034be2b6d40fd2bb827681546
MD5 b2d4b8e50bbed4773e4c1b2d16af0aaf
BLAKE2b-256 f2465a642e2220937e9ddf455b982b3ab442845b6e9dfe571f5f3da463e16592

See more details on using hashes here.

File details

Details for the file gensim-3.2.0-cp36-cp36m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for gensim-3.2.0-cp36-cp36m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 3a65dfffa8a426a37465995723d6c4d9e64f713fab95fecf9eae536c87e7dfd9
MD5 9e27bbc0f1c6c57a4a441b412426f224
BLAKE2b-256 b1762207f4d685a67bc03766451a458c365a63c7542ac107aa7a7b786097fce8

See more details on using hashes here.

File details

Details for the file gensim-3.2.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.2.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 de2aac640120a23cb8240f9c42e6ecaac73e854e1bd5c51a28d8cc2d79e06abd
MD5 84d05d0703d4262adb28c4481b662da7
BLAKE2b-256 1e468552b34a3ff24e161a33479da6f68026ea1139e34ca11b86d9ae8f27db7b

See more details on using hashes here.

File details

Details for the file gensim-3.2.0-cp35-cp35m-win_amd64.whl.

File metadata

File hashes

Hashes for gensim-3.2.0-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 ff63f5824aa05bb237b47c1076ead327bc215350f63d10d467d0c1b389c09d50
MD5 1f0aa4599bae8b58ec2952d2f7b102a1
BLAKE2b-256 10d41f022cc301d77443a493cced15f1544a8f0e8b8d48a6e7e121f97b9106ba

See more details on using hashes here.

File details

Details for the file gensim-3.2.0-cp35-cp35m-win32.whl.

File metadata

File hashes

Hashes for gensim-3.2.0-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 cf41d6733f542a6ec2bbc7ba52d331748446aa8f9024386e01ed44493d2b0b82
MD5 cb7f7e27e4381fbfe67181447a6ff892
BLAKE2b-256 c3c341f396d207f8b0e5dac78d1904879fc7f1fe9cae4dc164ae6865ba766ace

See more details on using hashes here.

File details

Details for the file gensim-3.2.0-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.2.0-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 6ac0caed58fa497d6865222233dd1ab692550fff0df98999d0248c3e96213d7f
MD5 c52fe9a72f8cdfd0fa556b44c8f14e30
BLAKE2b-256 1f452c89934244e7f4b81a8b9aaea7153edb1b271e44840b9839566d33510424

See more details on using hashes here.

File details

Details for the file gensim-3.2.0-cp35-cp35m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for gensim-3.2.0-cp35-cp35m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 1f0769fd1fb2d1d242742445fb071fc56f449a4bf5c03d1144e056f48fe6f9f8
MD5 f4293dd7a704744de9302d4c1a169b3c
BLAKE2b-256 90b356d3c2143f355e75587f6e425b0821287d49af5f48e6a347e3ae8899aca7

See more details on using hashes here.

File details

Details for the file gensim-3.2.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.2.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 b1910c34c366dc37c0032b539b16c3096c286b5722d39cb5ef1d4381ac99e3b1
MD5 b409c4836e07f5652b97035623079ff3
BLAKE2b-256 92d536234d18bee9f5427e1c77c792af20852516ae30d2774973ff6c579cd4a4

See more details on using hashes here.

File details

Details for the file gensim-3.2.0-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.2.0-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 9ea7ec096f32a45d77c80fa184cd01ecd97b9bde23290716580cbf9b36ef52ff
MD5 e4a79a1ab5f11d3618e7c56eebe112d6
BLAKE2b-256 325a2c8fd6c1a5ade3ca4e48f1c97e8a25be72a097fb181e233446354d71ed00

See more details on using hashes here.

File details

Details for the file gensim-3.2.0-cp27-cp27mu-manylinux1_i686.whl.

File metadata

File hashes

Hashes for gensim-3.2.0-cp27-cp27mu-manylinux1_i686.whl
Algorithm Hash digest
SHA256 604e9119fb70a8471cdb17b3c26342ebc736afe7d5bda64709bb875e7d61a62e
MD5 beaa6f56bf99cf840d07be0e581758c9
BLAKE2b-256 7d650fe89dfbfff4fb96dcf253bf241ba319ca19ac2f5c275272a38551cf28a9

See more details on using hashes here.

File details

Details for the file gensim-3.2.0-cp27-cp27m-win_amd64.whl.

File metadata

File hashes

Hashes for gensim-3.2.0-cp27-cp27m-win_amd64.whl
Algorithm Hash digest
SHA256 849fe53a478f089e84884780ce4ec8df062822756ff803839c33efced87c38ae
MD5 bb165faacf710551598cfbade1568bc0
BLAKE2b-256 464d58542bd4d753298f09d99f5b5c5c50cd95c997cc9f8f3159cada6f8d333c

See more details on using hashes here.

File details

Details for the file gensim-3.2.0-cp27-cp27m-win32.whl.

File metadata

File hashes

Hashes for gensim-3.2.0-cp27-cp27m-win32.whl
Algorithm Hash digest
SHA256 6d958f758999580c718b1e81e481661cd27ae868db7c75f3f3d46359a236d5d1
MD5 179dd159fde95f0c50076dabe83cc600
BLAKE2b-256 6e027141b2aebc6f800315eda68d16cdbbb96fc5db423233c70597a3cf34dd73

See more details on using hashes here.

File details

Details for the file gensim-3.2.0-cp27-cp27m-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.2.0-cp27-cp27m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 864b78dfb05a88364e7a0cdaaa27402bb5b083984eb2f0d20a07abbac9f6fa86
MD5 713879064b6b7aef88bbab092979a140
BLAKE2b-256 cfa59f489feb15203ec8ec759083407c5570742e3e86a3a8f5b8415173b05955

See more details on using hashes here.

File details

Details for the file gensim-3.2.0-cp27-cp27m-manylinux1_i686.whl.

File metadata

File hashes

Hashes for gensim-3.2.0-cp27-cp27m-manylinux1_i686.whl
Algorithm Hash digest
SHA256 92ff7882b75f061332d4791063b25995b3a5d80e20db28e5730143a8d4cb54c5
MD5 e542fbc038fc5fa584688a9a49ca5d02
BLAKE2b-256 74c7d1bb612e216734b5c80f68b6840ed7d7f6665f0cf06b16fb723157988720

See more details on using hashes here.

File details

Details for the file gensim-3.2.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-3.2.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 6d60cc07a38a44afbff8e367b2dda07c47283778296fe610d9a462af04500adb
MD5 47ad297cb1dc4396bd35bbe6b81a8033
BLAKE2b-256 846c99a20c395a5e066a5574c7b98ffa55c239c0d60cf5284f5c955cc8fbae4b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page