Skip to main content

Keyword cloud and topic hotspot detection in documents. Find where keywords concentrate to detect topic relevance.

Project description

Cloud Finder

Keyword cloud and topic hotspot detection in documents. Find where keywords concentrate to detect topic relevance.

Built on top of vibe-finder with a simplified, focused API.

Use Cases

  • Topic Detection: Find paragraphs/regions discussing specific topics
  • Document Relevance: Score documents by keyword concentration
  • Keyword Clustering: Analyze where keywords cluster together
  • Content Analysis: Find the most relevant sections in long documents

Installation

pip install cloud-finder

Quick Start

from cloud_finder import find_topics, analyze_relevance

# Find topic regions
topics = find_topics(
    document,
    keywords=["налоговая", "оптимизация", "НДС"],
    min_relevance=0.3
)

for topic in topics:
    print(f"Relevance: {topic.relevance:.0%}")
    print(f"Position: {topic.start}:{topic.end}")
    print(f"Keywords found: {topic.found_keywords}")
    print(f"Clustering: {topic.clustering:.1f}x concentrated")

API

find_topics()

Find all topic regions in a document:

from cloud_finder import find_topics

topics = find_topics(
    text,
    keywords=["keyword1", "keyword2", "keyword3"],
    min_relevance=0.3,   # Minimum score (0-1)
    min_coverage=0.4,    # At least 40% keywords required
    window_size=400,     # Search window in chars
    fuzzy=True           # Enable fuzzy matching for typos
)

find_best_topic()

Find the single best topic region:

from cloud_finder import find_best_topic

best = find_best_topic(document, ["налоговая", "оптимизация"])
if best:
    print(f"Best match at position {best.position}")

analyze_relevance()

Quick relevance check for a document:

from cloud_finder import analyze_relevance

result = analyze_relevance(document, ["tax", "optimization", "deduction"])

if result["is_relevant"]:
    print(f"Document is relevant: {result['max_relevance']:.0%}")
    print(f"Found {result['topic_count']} topic regions")

TopicFinder (OOP API)

from cloud_finder import TopicFinder

finder = TopicFinder(document, fuzzy_threshold=0.85)

# Variadic keyword syntax
topics = finder.find("налоговая", "оптимизация", "НДС")

# Or find best
best = finder.find_best("tax", "optimization", min_coverage=0.5)

TopicMatch Fields

Field Type Description
position int Center position of topic region
start int Start offset
end int End offset
relevance float Overall relevance score (0-1)
keyword_coverage float % of keywords found (0-1)
found_keywords List[str] Keywords that were found
missing_keywords List[str] Keywords not found
clustering float Concentration vs even distribution (>1 = clustered)
density float Keywords per 100 chars
spread float Standard deviation of positions
preview str Text preview of region

Properties

topic.is_highly_relevant  # True if relevance >= 0.5 and clustering >= 1.5
topic.interpretation      # "highly_relevant", "relevant", "somewhat_relevant", "weakly_relevant"

Fuzzy Matching

Cloud Finder uses Jaro-Winkler similarity for fuzzy matching, which handles:

  • OCR errors (налоговая → налоговоя)
  • Typos (оптимизация → оптимизацыя)
  • Case differences (НДС → ндс)
  • Word form variations (налоговая → налоговой)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cloud_finder-1.1.0.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cloud_finder-1.1.0-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file cloud_finder-1.1.0.tar.gz.

File metadata

  • Download URL: cloud_finder-1.1.0.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for cloud_finder-1.1.0.tar.gz
Algorithm Hash digest
SHA256 3d648ef83088a9744c095f3c3e3c8f05f8e014f1bcaf25670df4bc327bef8e80
MD5 8be3670b56a38c70cec07c4c050eb6f3
BLAKE2b-256 fe256cb26cfb1ecab5dc0986f945ad32bd1ae4dfdbcd8b3c08a2786b35bc53ac

See more details on using hashes here.

File details

Details for the file cloud_finder-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: cloud_finder-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for cloud_finder-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b9edbdbf36a63a01365026fcdb7c87675ea84b856363b19ec53ad99ce051b042
MD5 be274747289b1b3af37b20816818932d
BLAKE2b-256 13027d7b0df29dee7fb466df3fcfcf625ead5d120e69e87c92d036e26ade2189

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page