Skip to main content

Keyword cloud and topic hotspot detection in documents. Find where keywords concentrate to detect topic relevance.

Project description

Cloud Finder

Keyword cloud and topic hotspot detection in documents. Find where keywords concentrate to detect topic relevance.

Built on top of vibe-finder with a simplified, focused API.

Use Cases

  • Topic Detection: Find paragraphs/regions discussing specific topics
  • Document Relevance: Score documents by keyword concentration
  • Keyword Clustering: Analyze where keywords cluster together
  • Content Analysis: Find the most relevant sections in long documents

Installation

pip install cloud-finder

Quick Start

from cloud_finder import find_topics, analyze_relevance

# Find topic regions
topics = find_topics(
    document,
    keywords=["налоговая", "оптимизация", "НДС"],
    min_relevance=0.3
)

for topic in topics:
    print(f"Relevance: {topic.relevance:.0%}")
    print(f"Position: {topic.start}:{topic.end}")
    print(f"Keywords found: {topic.found_keywords}")
    print(f"Clustering: {topic.clustering:.1f}x concentrated")

API

find_topics()

Find all topic regions in a document:

from cloud_finder import find_topics

topics = find_topics(
    text,
    keywords=["keyword1", "keyword2", "keyword3"],
    min_relevance=0.3,   # Minimum score (0-1)
    min_coverage=0.4,    # At least 40% keywords required
    window_size=400,     # Search window in chars
    fuzzy=True           # Enable fuzzy matching for typos
)

find_best_topic()

Find the single best topic region:

from cloud_finder import find_best_topic

best = find_best_topic(document, ["налоговая", "оптимизация"])
if best:
    print(f"Best match at position {best.position}")

analyze_relevance()

Quick relevance check for a document:

from cloud_finder import analyze_relevance

result = analyze_relevance(document, ["tax", "optimization", "deduction"])

if result["is_relevant"]:
    print(f"Document is relevant: {result['max_relevance']:.0%}")
    print(f"Found {result['topic_count']} topic regions")

TopicFinder (OOP API)

from cloud_finder import TopicFinder

finder = TopicFinder(document, fuzzy_threshold=0.85)

# Variadic keyword syntax
topics = finder.find("налоговая", "оптимизация", "НДС")

# Or find best
best = finder.find_best("tax", "optimization", min_coverage=0.5)

TopicMatch Fields

Field Type Description
position int Center position of topic region
start int Start offset
end int End offset
relevance float Overall relevance score (0-1)
keyword_coverage float % of keywords found (0-1)
found_keywords List[str] Keywords that were found
missing_keywords List[str] Keywords not found
clustering float Concentration vs even distribution (>1 = clustered)
density float Keywords per 100 chars
spread float Standard deviation of positions
preview str Text preview of region

Properties

topic.is_highly_relevant  # True if relevance >= 0.5 and clustering >= 1.5
topic.interpretation      # "highly_relevant", "relevant", "somewhat_relevant", "weakly_relevant"

Fuzzy Matching

Cloud Finder uses Jaro-Winkler similarity for fuzzy matching, which handles:

  • OCR errors (налоговая → налоговоя)
  • Typos (оптимизация → оптимизацыя)
  • Case differences (НДС → ндс)
  • Word form variations (налоговая → налоговой)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cloud_finder-1.2.0.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cloud_finder-1.2.0-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file cloud_finder-1.2.0.tar.gz.

File metadata

  • Download URL: cloud_finder-1.2.0.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for cloud_finder-1.2.0.tar.gz
Algorithm Hash digest
SHA256 50959f4fd87a0f15e3a9219f70c6e48a1d606d7404eb8413f019c7b1b914a331
MD5 772768f2cb7372fb1a87355271b70135
BLAKE2b-256 1832363cb1de858db254b8eb897faf7e47a47f77471510831aa6f286a415931c

See more details on using hashes here.

File details

Details for the file cloud_finder-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: cloud_finder-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 8.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for cloud_finder-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 28ff521f62598f51020b2338df82774dd27f3617e4f5985586a0cf76cfe7b571
MD5 562e4ee2713be3c0e137b40abfeec6b3
BLAKE2b-256 0075a1572875f98fde44dcf5486f4bbd55ab8b37da9216096fa2a21ad587537a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page