GISMO is a NLP tool to rank and organize a corpus of documents according to a query.
Project description
GISMO
GISMO is a NLP tool to rank and organize a corpus of documents according to a query.
Gismo stands for Generic Information Search… with a Mind of its Own.
Free software: GNU General Public License v3
Github: https://github.com/balouf/gismo.
Documentation: https://gismo.readthedocs.io.
Features
Gismo combines three main ideas:
TF-IDTF: a symmetric version of the TF-IDF embedding.
DI-Iteration: a fast, push-based, variant of the PageRank algorithm.
Fuzzy dendrogram: a variant of the Louvain clustering algorithm.
Quickstart
Install gismo:
$ pip install gismo
Import gismo in a Python project:
import gismo as gs
Credits
Thomas Bonald, Anne Bouillard, Marc-Olivier Buob, Dohy Hong.
This package was created with Cookiecutter and the francois-durand/package_helper project template.
History
0.2.4 (2020-05-07)
- Tutorials for ACM and DBLP added. After cleaning, there is currently 3 tutorials:
Toy model, to get the hang of Gismo on a tiny example,
ACM, to play with Gismo on a small example,
DBLP, to play with a large dataset.
0.2.3 (2020-05-04)
ACM and DBLP dataset creation added.
0.2.2 (2020-05-04)
Notebook tutorials added (early version)
0.2.1 (2020-05-03)
Actual code
Coverage badge
0.1.0 (2020-04-30)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for gismo-0.2.4-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8bd9b1d152120cdbee4206f6d2fabda28da01b27e905baedb68c32f85f49f621 |
|
MD5 | 405e45233e110c2ccdaba5e834b37566 |
|
BLAKE2b-256 | 251afa01a16d517da34179b3a7e80693768684d93c3ea4cbbcf763f18c0625af |