GISMO is a NLP tool to rank and organize a corpus of documents according to a query.
Project description
GISMO
GISMO is a NLP tool to rank and organize a corpus of documents according to a query.
Gismo stands for Generic Information Search… with a Mind of its Own.
Free software: GNU General Public License v3
Github: https://github.com/balouf/gismo.
Documentation: https://gismo.readthedocs.io.
Features
Gismo combines three main ideas:
TF-IDTF: a symmetric version of the TF-IDF embedding.
DI-Iteration: a fast, push-based, variant of the PageRank algorithm.
Fuzzy dendrogram: a variant of the Louvain clustering algorithm.
Quickstart
Install gismo:
$ pip install gismo
Import gismo in a Python project:
import gismo as gs
Credits
Thomas Bonald, Anne Bouillard, Marc-Olivier Buob, Dohy Hong.
This package was created with Cookiecutter and the francois-durand/package_helper project template.
History
0.3.0 (2020-05-13)
dblp module: url2source function added to directly load a small dblp source in memory instead of using a FileSource approach.
Possibility to disable query distortion in gismo.
XGismo class to cross analyze embeddings.
Tutorials updated
0.2.5 (2020-05-11)
auto_k feature: if not specified, a query-dependent, reasonable, number of results k is estimated.
covering methods added to gismo. It is now possible to use get_covering_* instead of get_ranked_* to maximize coverage and/or eliminate redundancy.
0.2.4 (2020-05-07)
- Tutorials for ACM and DBLP added. After cleaning, there is currently 3 tutorials:
Toy model, to get the hang of Gismo on a tiny example,
ACM, to play with Gismo on a small example,
DBLP, to play with a large dataset.
0.2.3 (2020-05-04)
ACM and DBLP dataset creation added.
0.2.2 (2020-05-04)
Notebook tutorials added (early version)
0.2.1 (2020-05-03)
Actual code
Coverage badge
0.1.0 (2020-04-30)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for gismo-0.3.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 86f6eb0c421be9b3f0510db1ac09b8fbc31345cfedbafba8804edeb62c433a77 |
|
MD5 | ba89857efda7c28c40cddcab9aaaaad0 |
|
BLAKE2b-256 | 010f4fc0310e386e25adb89058bdf7062cbb74f94827273ae315d4a6fa17a1c9 |