GISMO is a NLP tool to rank and organize a corpus of documents according to a query.
Project description
A Generic Information Search... With a Mind of its Own!
GISMO is a NLP tool to rank and organize a corpus of documents according to a query.
Gismo stands for Generic Information Search... with a Mind of its Own.
- Free software: MIT License
- Github: https://github.com/balouf/gismo/
- Documentation: https://balouf.github.io/gismo/
Features
Gismo combines three main ideas:
- TF-IDTF: a symmetric version of the TF-IDF embedding.
- DIteration: a fast, push-based, variant of the PageRank algorithm.
- Fuzzy dendrogram: a variant of the Louvain clustering algorithm.
Quickstart
Install gismo:
$ pip install gismo
Use gismo in a Python project:
>>> from gismo.common import toy_source_dict
>>> from gismo import Corpus, Embedding, CountVectorizer, Gismo
>>> corpus = Corpus(toy_source_dict, to_text=lambda x: x['content'])
>>> embedding = Embedding(vectorizer=CountVectorizer(dtype=float))
>>> embedding.fit_transform(corpus)
>>> gismo = Gismo(corpus, embedding)
>>> gismo.rank("Mogwaï")
>>> gismo.get_features_by_rank()
['mogwaï', 'gizmo', 'chinese', 'in', 'demon', 'folklore', 'is']
To get the hang of a typical Gismo workflow, you can check the Toy Example notebook. For more advanced uses, look at the other tutorials or directly the reference section.
Credits
Thomas Bonald, Anne Bouillard, Marc-Olivier Buob, Dohy Hong for their helpful contribution.
This package was created with Cookiecutter and the francois-durand/package_helper project template.
Coverage
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gismo-0.5.3.tar.gz.
File metadata
- Download URL: gismo-0.5.3.tar.gz
- Upload date:
- Size: 78.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.6.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8070a3f203d5def122486c12c6b9fb943360b5196ea9e73649f781a8f340eb70
|
|
| MD5 |
40a4e7a2cc414edacbc415f6151dee47
|
|
| BLAKE2b-256 |
2ebc75d904d7f708fc0694f5f58b661e4f10a1f38e71a022bd8394e8950cb6c6
|
File details
Details for the file gismo-0.5.3-py3-none-any.whl.
File metadata
- Download URL: gismo-0.5.3-py3-none-any.whl
- Upload date:
- Size: 82.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.6.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
639f3159d6f5c97cf91e6eda687024440bebfaea17df9a8c0c8e256e047d7262
|
|
| MD5 |
6ff29a7e6a916be235e7af7bfa20b9a6
|
|
| BLAKE2b-256 |
f865608dd7af96406371e8b8479c120c0abe4724811bd2ffefb44727834dde74
|