Skip to main content

GISMO is a NLP tool to rank and organize a corpus of documents according to a query.

Project description

Gismo logo

A Generic Information Search... With a Mind of its Own!

Pypi badge Build badge Documentation badge codecov License: MIT

GISMO is a NLP tool to rank and organize a corpus of documents according to a query.

Gismo stands for Generic Information Search... with a Mind of its Own.

Features

Gismo combines three main ideas:

  • TF-IDTF: a symmetric version of the TF-IDF embedding.
  • DIteration: a fast, push-based, variant of the PageRank algorithm.
  • Fuzzy dendrogram: a variant of the Louvain clustering algorithm.

Quickstart

Install gismo:

$ pip install gismo

Use gismo in a Python project:

>>> from gismo.common import toy_source_dict
>>> from gismo import Corpus, Embedding, CountVectorizer, Gismo
>>> corpus = Corpus(toy_source_dict, to_text=lambda x: x['content'])
>>> embedding = Embedding(vectorizer=CountVectorizer(dtype=float))
>>> embedding.fit_transform(corpus)
>>> gismo = Gismo(corpus, embedding)
>>> gismo.rank("Mogwaï")
>>> gismo.get_features_by_rank()
['mogwaï', 'gizmo', 'chinese', 'in', 'demon', 'folklore', 'is']

To get the hang of a typical Gismo workflow, you can check the Toy Example notebook. For more advanced uses, look at the other tutorials or directly the reference section.

Credits

Thomas Bonald, Anne Bouillard, Marc-Olivier Buob, Dohy Hong for their helpful contribution.

This package was created with Cookiecutter and the francois-durand/package_helper project template.

Coverage

codecov

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gismo-0.5.4.tar.gz (78.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gismo-0.5.4-py3-none-any.whl (82.6 kB view details)

Uploaded Python 3

File details

Details for the file gismo-0.5.4.tar.gz.

File metadata

  • Download URL: gismo-0.5.4.tar.gz
  • Upload date:
  • Size: 78.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.6.11

File hashes

Hashes for gismo-0.5.4.tar.gz
Algorithm Hash digest
SHA256 fa518882e181557bef377ba89cf4b4aec831b7ca597a70ed2842cea465822b42
MD5 32e498a23f9e34ccdb24be60cb42cb0a
BLAKE2b-256 d54c92b97ef230626470be2abb0de977b6c6a5373fae8a22d57400c3f9f2b929

See more details on using hashes here.

File details

Details for the file gismo-0.5.4-py3-none-any.whl.

File metadata

  • Download URL: gismo-0.5.4-py3-none-any.whl
  • Upload date:
  • Size: 82.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.6.11

File hashes

Hashes for gismo-0.5.4-py3-none-any.whl
Algorithm Hash digest
SHA256 585beba57a1e21484d448ec1b6b2d173aea732dd918a7d95b63e1ef6428bca89
MD5 f5bfe9f768b4cf5ab8999520e28b9679
BLAKE2b-256 d7247b9ec1fe615fa42d12b9fcae860579a2d9076c637d99349db7507683de80

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page