Skip to main content

Keyword cloud and topic hotspot detection in documents. Find where keywords concentrate to detect topic relevance.

Project description

Cloud Finder

Keyword cloud and topic hotspot detection in documents. Find where keywords concentrate to detect topic relevance.

Built on top of vibe-finder with a simplified, focused API.

Use Cases

  • Topic Detection: Find paragraphs/regions discussing specific topics
  • Document Relevance: Score documents by keyword concentration
  • Keyword Clustering: Analyze where keywords cluster together
  • Content Analysis: Find the most relevant sections in long documents

Installation

pip install cloud-finder

Quick Start

from cloud_finder import find_topics, analyze_relevance

# Find topic regions
topics = find_topics(
    document,
    keywords=["налоговая", "оптимизация", "НДС"],
    min_relevance=0.3
)

for topic in topics:
    print(f"Relevance: {topic.relevance:.0%}")
    print(f"Position: {topic.start}:{topic.end}")
    print(f"Keywords found: {topic.found_keywords}")
    print(f"Clustering: {topic.clustering:.1f}x concentrated")

API

find_topics()

Find all topic regions in a document:

from cloud_finder import find_topics

topics = find_topics(
    text,
    keywords=["keyword1", "keyword2", "keyword3"],
    min_relevance=0.3,   # Minimum score (0-1)
    min_coverage=0.4,    # At least 40% keywords required
    window_size=400,     # Search window in chars
    fuzzy=True           # Enable fuzzy matching for typos
)

find_best_topic()

Find the single best topic region:

from cloud_finder import find_best_topic

best = find_best_topic(document, ["налоговая", "оптимизация"])
if best:
    print(f"Best match at position {best.position}")

analyze_relevance()

Quick relevance check for a document:

from cloud_finder import analyze_relevance

result = analyze_relevance(document, ["tax", "optimization", "deduction"])

if result["is_relevant"]:
    print(f"Document is relevant: {result['max_relevance']:.0%}")
    print(f"Found {result['topic_count']} topic regions")

TopicFinder (OOP API)

from cloud_finder import TopicFinder

finder = TopicFinder(document, fuzzy_threshold=0.85)

# Variadic keyword syntax
topics = finder.find("налоговая", "оптимизация", "НДС")

# Or find best
best = finder.find_best("tax", "optimization", min_coverage=0.5)

TopicMatch Fields

Field Type Description
position int Center position of topic region
start int Start offset
end int End offset
relevance float Overall relevance score (0-1)
keyword_coverage float % of keywords found (0-1)
found_keywords List[str] Keywords that were found
missing_keywords List[str] Keywords not found
clustering float Concentration vs even distribution (>1 = clustered)
density float Keywords per 100 chars
spread float Standard deviation of positions
preview str Text preview of region

Properties

topic.is_highly_relevant  # True if relevance >= 0.5 and clustering >= 1.5
topic.interpretation      # "highly_relevant", "relevant", "somewhat_relevant", "weakly_relevant"

Fuzzy Matching

Cloud Finder uses Jaro-Winkler similarity for fuzzy matching, which handles:

  • OCR errors (налоговая → налоговоя)
  • Typos (оптимизация → оптимизацыя)
  • Case differences (НДС → ндс)
  • Word form variations (налоговая → налоговой)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cloud_finder-1.4.0.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cloud_finder-1.4.0-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file cloud_finder-1.4.0.tar.gz.

File metadata

  • Download URL: cloud_finder-1.4.0.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for cloud_finder-1.4.0.tar.gz
Algorithm Hash digest
SHA256 18b9f8f2303a60d2234999dee0dd60640a5e3f13e9fc9fe643094f069d9d0ac6
MD5 0175d93b117db6faa29d316cd796b838
BLAKE2b-256 09d35655a5cf3b94910e0910980b16337477d3e53792ef94d1be9406740f73ea

See more details on using hashes here.

File details

Details for the file cloud_finder-1.4.0-py3-none-any.whl.

File metadata

  • Download URL: cloud_finder-1.4.0-py3-none-any.whl
  • Upload date:
  • Size: 8.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for cloud_finder-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a8e5e29394f8bdaee11d61de3009d4a247440957895acda39bb808c706feb670
MD5 11686331104989eb35c2ffc283083257
BLAKE2b-256 692ab58d086b45157bac780818c4843dc3b3c93fecfa0292d6a9d7794972e2a2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page