Skip to main content

Keyword cloud and topic hotspot detection in documents. Find where keywords concentrate to detect topic relevance.

Project description

Cloud Finder

Keyword cloud and topic hotspot detection in documents. Find where keywords concentrate to detect topic relevance.

Built on top of vibe-finder with a simplified, focused API.

Use Cases

  • Topic Detection: Find paragraphs/regions discussing specific topics
  • Document Relevance: Score documents by keyword concentration
  • Keyword Clustering: Analyze where keywords cluster together
  • Content Analysis: Find the most relevant sections in long documents

Installation

pip install cloud-finder

Quick Start

from cloud_finder import find_topics, analyze_relevance

# Find topic regions
topics = find_topics(
    document,
    keywords=["налоговая", "оптимизация", "НДС"],
    min_relevance=0.3
)

for topic in topics:
    print(f"Relevance: {topic.relevance:.0%}")
    print(f"Position: {topic.start}:{topic.end}")
    print(f"Keywords found: {topic.found_keywords}")
    print(f"Clustering: {topic.clustering:.1f}x concentrated")

API

find_topics()

Find all topic regions in a document:

from cloud_finder import find_topics

topics = find_topics(
    text,
    keywords=["keyword1", "keyword2", "keyword3"],
    min_relevance=0.3,   # Minimum score (0-1)
    min_coverage=0.4,    # At least 40% keywords required
    window_size=400,     # Search window in chars
    fuzzy=True           # Enable fuzzy matching for typos
)

find_best_topic()

Find the single best topic region:

from cloud_finder import find_best_topic

best = find_best_topic(document, ["налоговая", "оптимизация"])
if best:
    print(f"Best match at position {best.position}")

analyze_relevance()

Quick relevance check for a document:

from cloud_finder import analyze_relevance

result = analyze_relevance(document, ["tax", "optimization", "deduction"])

if result["is_relevant"]:
    print(f"Document is relevant: {result['max_relevance']:.0%}")
    print(f"Found {result['topic_count']} topic regions")

TopicFinder (OOP API)

from cloud_finder import TopicFinder

finder = TopicFinder(document, fuzzy_threshold=0.85)

# Variadic keyword syntax
topics = finder.find("налоговая", "оптимизация", "НДС")

# Or find best
best = finder.find_best("tax", "optimization", min_coverage=0.5)

TopicMatch Fields

Field Type Description
position int Center position of topic region
start int Start offset
end int End offset
relevance float Overall relevance score (0-1)
keyword_coverage float % of keywords found (0-1)
found_keywords List[str] Keywords that were found
missing_keywords List[str] Keywords not found
clustering float Concentration vs even distribution (>1 = clustered)
density float Keywords per 100 chars
spread float Standard deviation of positions
preview str Text preview of region

Properties

topic.is_highly_relevant  # True if relevance >= 0.5 and clustering >= 1.5
topic.interpretation      # "highly_relevant", "relevant", "somewhat_relevant", "weakly_relevant"

Fuzzy Matching

Cloud Finder uses Jaro-Winkler similarity for fuzzy matching, which handles:

  • OCR errors (налоговая → налоговоя)
  • Typos (оптимизация → оптимизацыя)
  • Case differences (НДС → ндс)
  • Word form variations (налоговая → налоговой)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cloud_finder-1.6.1.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cloud_finder-1.6.1-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file cloud_finder-1.6.1.tar.gz.

File metadata

  • Download URL: cloud_finder-1.6.1.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for cloud_finder-1.6.1.tar.gz
Algorithm Hash digest
SHA256 4c99acc169d8c77ad3b7dc2a92815cb5f4b1d0d338aae17c4c7589212b5560bc
MD5 21bcf12864ee1783abe7262025253162
BLAKE2b-256 9a4bc8847d8a9fa545ae61c82dc4c0c23a11a6185e8eef120924baf219b01e2d

See more details on using hashes here.

File details

Details for the file cloud_finder-1.6.1-py3-none-any.whl.

File metadata

  • Download URL: cloud_finder-1.6.1-py3-none-any.whl
  • Upload date:
  • Size: 11.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for cloud_finder-1.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2889f306457efbc929e4bfc8ee31722cacd23394a20f8eaf22224f37b4214b3f
MD5 f51e0ea5c930b051a18c5377ce7c9fff
BLAKE2b-256 2290e7ee8bbbab23f2a92537874b557df247e71c1387fa9bd47ab9d56e87f56b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page