Skip to main content

GISMO is a NLP tool to rank and organize a corpus of documents according to a query.

Project description

Gismo logo

A Generic Information Search... With a Mind of its Own!

Pypi badge Build badge Documentation badge codecov License: MIT

GISMO is a NLP tool to rank and organize a corpus of documents according to a query.

Gismo stands for Generic Information Search... with a Mind of its Own.

Features

Gismo combines three main ideas:

  • TF-IDTF: a symmetric version of the TF-IDF embedding.
  • DIteration: a fast, push-based, variant of the PageRank algorithm.
  • Fuzzy dendrogram: a variant of the Louvain clustering algorithm.

Quickstart

Install gismo:

$ pip install gismo

Use gismo in a Python project:

>>> from gismo.common import toy_source_dict
>>> from gismo import Corpus, Embedding, CountVectorizer, Gismo
>>> corpus = Corpus(toy_source_dict, to_text=lambda x: x['content'])
>>> embedding = Embedding(vectorizer=CountVectorizer(dtype=float))
>>> embedding.fit_transform(corpus)
>>> gismo = Gismo(corpus, embedding)
>>> gismo.rank("Mogwaï")
>>> gismo.get_features_by_rank()
['mogwaï', 'gizmo', 'chinese', 'in', 'demon', 'folklore', 'is']

To get the hang of a typical Gismo workflow, you can check the Toy Example notebook. For more advanced uses, look at the other tutorials or directly the reference section.

Credits

Thomas Bonald, Anne Bouillard, Marc-Olivier Buob, Dohy Hong for their helpful contribution.

This package was created with Cookiecutter and the francois-durand/package_helper project template.

Coverage

codecov

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gismo-0.5.3.tar.gz (78.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gismo-0.5.3-py3-none-any.whl (82.6 kB view details)

Uploaded Python 3

File details

Details for the file gismo-0.5.3.tar.gz.

File metadata

  • Download URL: gismo-0.5.3.tar.gz
  • Upload date:
  • Size: 78.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.6.11

File hashes

Hashes for gismo-0.5.3.tar.gz
Algorithm Hash digest
SHA256 8070a3f203d5def122486c12c6b9fb943360b5196ea9e73649f781a8f340eb70
MD5 40a4e7a2cc414edacbc415f6151dee47
BLAKE2b-256 2ebc75d904d7f708fc0694f5f58b661e4f10a1f38e71a022bd8394e8950cb6c6

See more details on using hashes here.

File details

Details for the file gismo-0.5.3-py3-none-any.whl.

File metadata

  • Download URL: gismo-0.5.3-py3-none-any.whl
  • Upload date:
  • Size: 82.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.6.11

File hashes

Hashes for gismo-0.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 639f3159d6f5c97cf91e6eda687024440bebfaea17df9a8c0c8e256e047d7262
MD5 6ff29a7e6a916be235e7af7bfa20b9a6
BLAKE2b-256 f865608dd7af96406371e8b8479c120c0abe4724811bd2ffefb44727834dde74

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page