GISMO is a NLP tool to rank and organize a corpus of documents according to a query.
Project description
GISMO
GISMO is a NLP tool to rank and organize a corpus of documents according to a query.
Gismo stands for Generic Information Search… with a Mind of its Own.
Free software: GNU General Public License v3
Github: https://github.com/balouf/gismo.
Documentation: https://gismo.readthedocs.io.
Features
Gismo combines three main ideas:
TF-IDTF: a symmetric version of the TF-IDF embedding.
DI-Iteration: a fast, push-based, variant of the PageRank algorithm.
Fuzzy dendrogram: a variant of the Louvain clustering algorithm.
Quickstart
Install gismo:
$ pip install gismo
Import gismo in a Python project:
import gismo as gs
Credits
Thomas Bonald, Anne Bouillard, Marc-Olivier Buob, Dohy Hong.
This package was created with Cookiecutter and the francois-durand/package_helper project template.
History
0.2.5 (2020-05-11)
auto_k feature: if not specified, a query-dependent, reasonable, number of results k is estimated.
covering methods added to gismo. It is now possible to use get_covering_* instead of get_ranked_* to maximize coverage and/or eliminate redundancy.
0.2.4 (2020-05-07)
- Tutorials for ACM and DBLP added. After cleaning, there is currently 3 tutorials:
Toy model, to get the hang of Gismo on a tiny example,
ACM, to play with Gismo on a small example,
DBLP, to play with a large dataset.
0.2.3 (2020-05-04)
ACM and DBLP dataset creation added.
0.2.2 (2020-05-04)
Notebook tutorials added (early version)
0.2.1 (2020-05-03)
Actual code
Coverage badge
0.1.0 (2020-04-30)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for gismo-0.2.5-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 86b80562e53e5b013f140b418a6cd28f2502bb7da8f3adac3fed1a958577735e |
|
MD5 | 2e86b303eb2fb9035e8a4430ae445ba5 |
|
BLAKE2b-256 | aacd566b23ab7fe17a3402eb45682f0e8ddf921ed6915d8e2ba1ecf4b6d2c606 |