Skip to main content

Python framework for fast Vector Space Modelling

Project description

Travis Downloads Wheel License

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.

Features

  • All algorithms are memory-independent w.r.t. the corpus size (can process input larger than RAM, streamed, out-of-core),

  • Intuitive interfaces

    • easy to plug in your own input corpus/datastream (trivial streaming API)

    • easy to extend with other Vector Space algorithms (trivial transformation API)

  • Efficient multicore implementations of popular algorithms, such as online Latent Semantic Analysis (LSA/LSI/SVD), Latent Dirichlet Allocation (LDA), Random Projections (RP), Hierarchical Dirichlet Process (HDP) or word2vec deep learning.

  • Distributed computing: can run Latent Semantic Analysis and Latent Dirichlet Allocation on a cluster of computers.

  • Extensive HTML documentation and tutorials.

If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

Installation

This software depends on NumPy and Scipy, two Python packages for scientific computing. You must have them installed prior to installing gensim.

It is also recommended you install a fast BLAS library before installing NumPy. This is optional, but using an optimized BLAS such as ATLAS or OpenBLAS is known to improve performance by as much as an order of magnitude. On OS X, NumPy picks up the BLAS that comes with it automatically, so you don’t need to do anything special.

The simple way to install gensim is:

pip install -U gensim

Or, if you have instead downloaded and unzipped the source tar.gz package, you’d run:

python setup.py test
python setup.py install

For alternative modes of installation (without root privileges, development installation, optional install features), see the documentation.

This version has been tested under Python 2.6, 2.7, 3.3, 3.4 and 3.5 (support for Python 2.5 was dropped in gensim 0.10.0; install gensim 0.9.1 if you must use Python 2.5). Gensim’s github repo is hooked to Travis CI for automated testing on every commit push and pull request.

How come gensim is so fast and memory efficient? Isn’t it pure Python, and isn’t Python slow and greedy?

Many scientific algorithms can be expressed in terms of large matrix operations (see the BLAS note above). Gensim taps into these low-level BLAS libraries, by means of its dependency on NumPy. So while gensim-the-top-level-code is pure Python, it actually executes highly optimized Fortran/C under the hood, including multithreading (if your BLAS is so configured).

Memory-wise, gensim makes heavy use of Python’s built-in generators and iterators for streamed data processing. Memory efficiency was one of gensim’s design goals, and is a central feature of gensim, rather than something bolted on as an afterthought.

Documentation

Manual for the gensim package is available in HTML. It contains a walk-through of all its features and a complete reference section. It is also included in the source distribution package.

Citing gensim

When citing gensim in academic papers and theses, please use this BibTeX entry:

@inproceedings{rehurek_lrec,
      title = {{Software Framework for Topic Modelling with Large Corpora}},
      author = {Radim {\v R}eh{\r u}{\v r}ek and Petr Sojka},
      booktitle = {{Proceedings of the LREC 2010 Workshop on New
           Challenges for NLP Frameworks}},
      pages = {45--50},
      year = 2010,
      month = May,
      day = 22,
      publisher = {ELRA},
      address = {Valletta, Malta},
      note={\url{http://is.muni.cz/publication/884893/en}},
      language={English}
}

Gensim is open source software released under the GNU LGPL license. Copyright (c) 2009-now Radim Rehurek

Analytics

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gensim-0.12.4.tar.gz (2.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

gensim-0.12.4.win-amd64-py3.5.exe (2.6 MB view details)

Uploaded Source

gensim-0.12.4.win-amd64-py2.7.exe (2.7 MB view details)

Uploaded Source

gensim-0.12.4.win32-py3.5.exe (2.6 MB view details)

Uploaded Source

gensim-0.12.4.win32-py2.7.exe (2.6 MB view details)

Uploaded Source

gensim-0.12.4-cp35-none-win_amd64.whl (2.4 MB view details)

Uploaded CPython 3.5Windows x86-64

gensim-0.12.4-cp35-none-win32.whl (2.4 MB view details)

Uploaded CPython 3.5Windows x86

gensim-0.12.4-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.5mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

gensim-0.12.4-cp34-cp34m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.4mmacOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

gensim-0.12.4-cp33-cp33m-macosx_10_6_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.3mmacOS 10.6+ x86-64

gensim-0.12.4-cp27-none-win_amd64.whl (2.4 MB view details)

Uploaded CPython 2.7Windows x86-64

gensim-0.12.4-cp27-none-win32.whl (2.4 MB view details)

Uploaded CPython 2.7Windows x86

gensim-0.12.4-cp27-none-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (2.5 MB view details)

Uploaded CPython 2.7macOS 10.10+ Intel (x86-64, i386)macOS 10.10+ x86-64macOS 10.6+ Intel (x86-64, i386)macOS 10.9+ Intel (x86-64, i386)macOS 10.9+ x86-64

File details

Details for the file gensim-0.12.4.tar.gz.

File metadata

  • Download URL: gensim-0.12.4.tar.gz
  • Upload date:
  • Size: 2.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for gensim-0.12.4.tar.gz
Algorithm Hash digest
SHA256 d7c6fa6cd13b29fdc1ffea0946f8172ebab68d68954d5f336e0a750861f8aa15
MD5 7369a2d1f50904320e60b5ef490c1ca6
BLAKE2b-256 f1b097ec78a6374bcd4f3213e8972541145a120d6a00198e52bdfd1213d8a285

See more details on using hashes here.

File details

Details for the file gensim-0.12.4.win-amd64-py3.5.exe.

File metadata

File hashes

Hashes for gensim-0.12.4.win-amd64-py3.5.exe
Algorithm Hash digest
SHA256 245116cfff9b7f79119bf8998bfc704c164056933cb16571cf4ef5a4e8e1c814
MD5 803e2abd9b90f59c4f4187d3eec21522
BLAKE2b-256 6cdc4328f895672cf15fe338fe4365ab72d3b5a923839cacf9ed89a067b2d8b7

See more details on using hashes here.

File details

Details for the file gensim-0.12.4.win-amd64-py2.7.exe.

File metadata

File hashes

Hashes for gensim-0.12.4.win-amd64-py2.7.exe
Algorithm Hash digest
SHA256 c61d45131f7a013278cced1a1d82b2ea2db81518a7cbd13bade07d757eaf8478
MD5 d52ba199c37ac65e89a2e2b32a677f14
BLAKE2b-256 099576b79f42b2164daf98091cde4050fdb6430b92631cbc54aace5e3863ce5d

See more details on using hashes here.

File details

Details for the file gensim-0.12.4.win32-py3.5.exe.

File metadata

File hashes

Hashes for gensim-0.12.4.win32-py3.5.exe
Algorithm Hash digest
SHA256 1bc7af601ebac94f11890cccb52b947c1ebdc97bb9d3e52a93580bc9cdbdb915
MD5 2a2ac77cefae95ad01cc43e6015600b2
BLAKE2b-256 5ded4d17ee0e53576f0e07060a724193979cf4afe458eb4e228f49d71e8a491b

See more details on using hashes here.

File details

Details for the file gensim-0.12.4.win32-py2.7.exe.

File metadata

File hashes

Hashes for gensim-0.12.4.win32-py2.7.exe
Algorithm Hash digest
SHA256 342bd2b259d11f854ca463b336d153ab3ab50051e32fe0973f3f1df955bd5812
MD5 6c85d8b43e1ea426be19626ec658d908
BLAKE2b-256 fab583c2b20523a937d23454577b7085b44ed3e90867562ba3db6221d06053b7

See more details on using hashes here.

File details

Details for the file gensim-0.12.4-cp35-none-win_amd64.whl.

File metadata

File hashes

Hashes for gensim-0.12.4-cp35-none-win_amd64.whl
Algorithm Hash digest
SHA256 8353fe1eaa224755c06169d80fdc3c13fa221e062bd0cc2269d71dddaabd2749
MD5 1e70cc8110896837a29b55e5e72f9160
BLAKE2b-256 6d7d3155af2d47217a76b64236bae442c95fc60f548714c396b39d19299955b9

See more details on using hashes here.

File details

Details for the file gensim-0.12.4-cp35-none-win32.whl.

File metadata

File hashes

Hashes for gensim-0.12.4-cp35-none-win32.whl
Algorithm Hash digest
SHA256 cb5a9856a59975d6a9ccd977ba962a1295b02b16a4fa9c621c9de3bdd245f554
MD5 0b371f366a758baece2502da58422299
BLAKE2b-256 340bb19e01bb0330f630473fe6bd77e11b9e0af05998a0af010fd02c8936e6b7

See more details on using hashes here.

File details

Details for the file gensim-0.12.4-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-0.12.4-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 18d2485e0c46e18168cc51fb967234c4311714bcc4869c948fcae23e0ba43ab1
MD5 65675b05c50e2b45b24012ce5f7baed4
BLAKE2b-256 3707c044e50724cbe5bdb460a93b0ff1c850c189617ac9895989ef4eceea35db

See more details on using hashes here.

File details

Details for the file gensim-0.12.4-cp34-cp34m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-0.12.4-cp34-cp34m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 d933f534f897eb6202733124c57c74f85274baebca5910928f006601c11c2288
MD5 9224f32333b9348b48ec712371cdbdc6
BLAKE2b-256 d7bca423e4e05dbb3a81bc048538521e8b81a53d98168426688ea3a91883f87e

See more details on using hashes here.

File details

Details for the file gensim-0.12.4-cp33-cp33m-macosx_10_6_x86_64.whl.

File metadata

File hashes

Hashes for gensim-0.12.4-cp33-cp33m-macosx_10_6_x86_64.whl
Algorithm Hash digest
SHA256 7f77354a6bee461f155a89596c7e0b497851e293d05b1362f483efb71467a462
MD5 01dea8ab8330fc044da3f04373d8545b
BLAKE2b-256 9eed627e30941e993b3fd3535c147d22d20e4ff2a595453c32ab245806c6604e

See more details on using hashes here.

File details

Details for the file gensim-0.12.4-cp27-none-win_amd64.whl.

File metadata

File hashes

Hashes for gensim-0.12.4-cp27-none-win_amd64.whl
Algorithm Hash digest
SHA256 f387ced56b9ec0acb7cb72415585591b53c82a5a44d61b6ce00636b3537de0eb
MD5 612b7e514462c008ebc3c9a763f8bb71
BLAKE2b-256 57b0d4ad982fbb5eec086efc7d4113627d2c801d35772bc22c58ea8a1a2b0ab4

See more details on using hashes here.

File details

Details for the file gensim-0.12.4-cp27-none-win32.whl.

File metadata

File hashes

Hashes for gensim-0.12.4-cp27-none-win32.whl
Algorithm Hash digest
SHA256 55feab5360c95b178597f4f57b373b735ba2868c9d5ddab90d6c7319e594838e
MD5 6ab36ef4bf284a95fd3424265e4ca1d2
BLAKE2b-256 92e039100f65c2287aa69e9b73a82cf2bb3fd94f01c7cc4f6e885ae199d3f6f6

See more details on using hashes here.

File details

Details for the file gensim-0.12.4-cp27-none-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl.

File metadata

File hashes

Hashes for gensim-0.12.4-cp27-none-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Algorithm Hash digest
SHA256 b165593ede01d1e3ca90391d19493c41edcbc86c9c574bb8b1d72c743783a0f4
MD5 e6c9fd06d237d7a5b7c7c66fc6406ef1
BLAKE2b-256 17cada45b5e1b5cb9f282194473b78e57c5b096eb8ef3949480c0bad9719d12a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page