Skip to main content

Keyword cloud and topic hotspot detection in documents. Find where keywords concentrate to detect topic relevance.

Project description

Cloud Finder

Keyword cloud and topic hotspot detection in documents. Find where keywords concentrate to detect topic relevance.

Built on top of vibe-finder with a simplified, focused API.

Use Cases

  • Topic Detection: Find paragraphs/regions discussing specific topics
  • Document Relevance: Score documents by keyword concentration
  • Keyword Clustering: Analyze where keywords cluster together
  • Content Analysis: Find the most relevant sections in long documents

Installation

pip install cloud-finder

Quick Start

from cloud_finder import find_topics, analyze_relevance

# Find topic regions
topics = find_topics(
    document,
    keywords=["налоговая", "оптимизация", "НДС"],
    min_relevance=0.3
)

for topic in topics:
    print(f"Relevance: {topic.relevance:.0%}")
    print(f"Position: {topic.start}:{topic.end}")
    print(f"Keywords found: {topic.found_keywords}")
    print(f"Clustering: {topic.clustering:.1f}x concentrated")

API

find_topics()

Find all topic regions in a document:

from cloud_finder import find_topics

topics = find_topics(
    text,
    keywords=["keyword1", "keyword2", "keyword3"],
    min_relevance=0.3,   # Minimum score (0-1)
    min_coverage=0.4,    # At least 40% keywords required
    window_size=400,     # Search window in chars
    fuzzy=True           # Enable fuzzy matching for typos
)

find_best_topic()

Find the single best topic region:

from cloud_finder import find_best_topic

best = find_best_topic(document, ["налоговая", "оптимизация"])
if best:
    print(f"Best match at position {best.position}")

analyze_relevance()

Quick relevance check for a document:

from cloud_finder import analyze_relevance

result = analyze_relevance(document, ["tax", "optimization", "deduction"])

if result["is_relevant"]:
    print(f"Document is relevant: {result['max_relevance']:.0%}")
    print(f"Found {result['topic_count']} topic regions")

TopicFinder (OOP API)

from cloud_finder import TopicFinder

finder = TopicFinder(document, fuzzy_threshold=0.85)

# Variadic keyword syntax
topics = finder.find("налоговая", "оптимизация", "НДС")

# Or find best
best = finder.find_best("tax", "optimization", min_coverage=0.5)

TopicMatch Fields

Field Type Description
position int Center position of topic region
start int Start offset
end int End offset
relevance float Overall relevance score (0-1)
keyword_coverage float % of keywords found (0-1)
found_keywords List[str] Keywords that were found
missing_keywords List[str] Keywords not found
clustering float Concentration vs even distribution (>1 = clustered)
density float Keywords per 100 chars
spread float Standard deviation of positions
preview str Text preview of region

Properties

topic.is_highly_relevant  # True if relevance >= 0.5 and clustering >= 1.5
topic.interpretation      # "highly_relevant", "relevant", "somewhat_relevant", "weakly_relevant"

Fuzzy Matching

Cloud Finder uses Jaro-Winkler similarity for fuzzy matching, which handles:

  • OCR errors (налоговая → налоговоя)
  • Typos (оптимизация → оптимизацыя)
  • Case differences (НДС → ндс)
  • Word form variations (налоговая → налоговой)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cloud_finder-1.5.0.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cloud_finder-1.5.0-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file cloud_finder-1.5.0.tar.gz.

File metadata

  • Download URL: cloud_finder-1.5.0.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for cloud_finder-1.5.0.tar.gz
Algorithm Hash digest
SHA256 18a5ee92895973b13acebaabede30ba4dd4c66e0292637eaf4dec24151d1a568
MD5 2f1e0198a32373a143e65afbac02d607
BLAKE2b-256 70a09ce1a09fe3be5fbcf87ec87993f178a9981cdef31163e15f7da91a972a74

See more details on using hashes here.

File details

Details for the file cloud_finder-1.5.0-py3-none-any.whl.

File metadata

  • Download URL: cloud_finder-1.5.0-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for cloud_finder-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 538338f46ab6d10a8946a8b3e98f8183654f52a8bbd8ca474bb06403b0bafbb0
MD5 5924a42e7664814b12ac417ad45a5ddd
BLAKE2b-256 f3244bdd9b7a95bb7b7d3ecbb26ad539b78620d007efb3187c8ee5c48b04c47c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page