Skip to main content

SDK Router Tools — collection of utility tools for automation pipelines (telegram, logging, html cleaner, etc.)

Project description

sdkrouter-tools

SDK Router Tools — collection of utility tools for automation pipelines.

Installation

pip install sdkrouter-tools

Tools Included

  • logging — Rich-powered logger with file persistence
  • telegram — Rate-limited Telegram sender with priority queue
  • html — HTML cleaner optimized for LLM pipelines

1. Logging (Rich-powered)

Universal Python logger with Rich console output and file persistence.

from sdkrouter_tools import get_logger

log = get_logger(__name__)
log.info("Hello world")
log.error("Something failed", exc_info=True)

# With custom level
log = get_logger(__name__, level="DEBUG")
log.debug("Debug details: %s", data)

Features

  • Rich console output with colors and formatting
  • Automatic file logging (daily rotation)
  • Auto-detects project root for log directory
  • Rich tracebacks with local variables

Convenience Functions

from sdkrouter_tools.logging import debug, info, warning, error, critical

info("Processing started")
warning("Low memory")
error("Failed to connect")

Configuration

from sdkrouter_tools import setup_logging

setup_logging(
    level="DEBUG",           # Log level
    log_to_file=True,        # Write to file
    log_to_console=True,     # Output to console
    app_name="myapp",        # App name for log file
    rich_tracebacks=True,    # Rich exception formatting
)

2. Telegram Sender

Rate-limited Telegram message sender with priority queue support.

from sdkrouter_tools import TelegramSender, ParseMode

sender = TelegramSender(
    bot_token="YOUR_BOT_TOKEN",
    chat_id="YOUR_CHAT_ID",
)

sender.send_message("Hello from sdkrouter-tools!")
sender.send_message("<b>Bold</b> message", parse_mode=ParseMode.HTML)

Convenience Functions

from sdkrouter_tools.telegram import (
    send_error, send_success, send_warning,
    send_info, send_stats, send_alert,
)

send_error("Something went wrong!", {"details": "error info"})
send_success("Task completed!", {"items_processed": 100})
send_warning("Disk space low", {"available": "10GB"})
send_alert("Critical: Server down!", {"server": "prod-1"})

Environment Variables

export TELEGRAM_BOT_TOKEN="your_bot_token"
export TELEGRAM_CHAT_ID="your_chat_id"

Priority Queue

Messages are processed with rate limiting (20 msg/sec):

from sdkrouter_tools import MessagePriority

# CRITICAL (1), HIGH (2), NORMAL (3), LOW (4)
sender.send_message("Important!", priority=MessagePriority.HIGH)

Sending Files

sender.send_photo("/path/to/image.jpg", caption="Check this out!")
sender.send_document("/path/to/file.pdf", caption="Report attached")

Queue Management

from sdkrouter_tools import telegram_queue

stats = telegram_queue.get_stats()
telegram_queue.flush(timeout=10.0)  # Wait before script exit

3. HTML Cleaner

HTML cleaner optimized for LLM pipelines. Aggressive DOM cleaning, SSR hydration extraction, CSS class filtering, semantic chunking, and multiple output formats.

from sdkrouter_tools import HTMLCleaner, CleanerConfig, OutputFormat

cleaner = HTMLCleaner()
result = cleaner.clean(html)

print(result.output)
print(f"Reduction: {result.stats.reduction_percent}%")
print(f"Tokens: {result.stats.original_tokens} -> {result.stats.cleaned_tokens}")

Quick Functions

from sdkrouter_tools import clean, clean_to_json

# Quick clean
result = clean(html, max_tokens=5000, output_format="markdown")

# Get JSON if SSR data available, otherwise cleaned HTML
data = clean_to_json(html)

Configuration

from sdkrouter_tools import CleanerConfig, OutputFormat

config = CleanerConfig(
    max_tokens=5000,
    output_format=OutputFormat.MARKDOWN,  # HTML, MARKDOWN, AOM, XTREE
    filter_classes=True,
    class_threshold=0.5,
    try_hydration=True,
)

cleaner = HTMLCleaner(config)
result = cleaner.clean(html)

SSR Hydration Extraction

Extract structured data from server-side rendered pages:

from sdkrouter_tools.html import extract_hydration, detect_framework

framework = detect_framework(html)  # NEXTJS_APP, NUXT3, etc.

data = extract_hydration(html)
if data.has_data:
    products = data.page_props.get("products", [])

Supported: Next.js, Nuxt 2/3, SvelteKit, Remix, Gatsby, Qwik, Astro

CSS Class Filtering

from sdkrouter_tools.html import score_class, filter_classes, detect_css_framework

# Score classes by semantic relevance
result = score_class("product-card")  # High score
result = score_class("css-abc123")    # Low score (hash)

# Filter list of classes
classes = ["product-card", "css-abc123", "flex", "MuiButton-root"]
kept = filter_classes(classes, threshold=0.5)  # ["product-card"]

# Detect CSS framework
framework = detect_css_framework(html)  # "tailwind", "bootstrap", etc.

Output Formats

from sdkrouter_tools.html import to_markdown, to_aom_yaml, to_xtree
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "lxml")

# Markdown
md = to_markdown(soup)

# AOM YAML (Playwright-style aria snapshot)
yaml = to_aom_yaml(soup)
# - navigation:
#   - link "Home"
#   - link "Products"

# XTree (hierarchical tree)
tree = to_xtree(soup)
# ROOT
# ├─ nav#main-nav
# │  └─ a.nav-link → "Home"
# └─ main

Pipeline API

from sdkrouter_tools import clean_html, clean_for_llm

result = clean_html(html, max_tokens=5000, output_format="markdown")
output = clean_for_llm(html)  # Returns dict (SSR) or str (cleaned HTML)

Advanced Features

from sdkrouter_tools.html import (
    # Shadow DOM
    flatten_shadow_dom,
    # Downsampling
    downsample_html, estimate_tokens,
    # Semantic Chunking
    SemanticChunker, ChunkConfig,
    # Context Extraction
    extract_context, generate_selector,
    # Helpers
    json_to_toon, html_to_text, extract_links, extract_images,
)

Requirements

  • Python >= 3.10
  • rich >= 13.0
  • pyTelegramBotAPI >= 4.14
  • beautifulsoup4 >= 4.12
  • lxml >= 5.3
  • pydantic >= 2.10
  • markdownify >= 0.14
  • tiktoken >= 0.8

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdkrouter_tools-0.1.2.tar.gz (65.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sdkrouter_tools-0.1.2-py3-none-any.whl (87.8 kB view details)

Uploaded Python 3

File details

Details for the file sdkrouter_tools-0.1.2.tar.gz.

File metadata

  • Download URL: sdkrouter_tools-0.1.2.tar.gz
  • Upload date:
  • Size: 65.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for sdkrouter_tools-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b124d691c038b56cc3dcbb7f8b90fd101c9d663e21f5ec137a544573c3f7caaf
MD5 089250ceb622f12c5be26852eb79316a
BLAKE2b-256 9d387a035b1b2f30cf0b68e15d581c181438311cedfd3feb218a68112f088268

See more details on using hashes here.

File details

Details for the file sdkrouter_tools-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for sdkrouter_tools-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3ce7cc2f576d6991e1133baa49084fb0ecc30225d48f9675acb3fbe82fbcb748
MD5 d15bd0d052452f420c12e7ded62f4915
BLAKE2b-256 597ad4e40581762621e963997384cc6d720b1eadd2e8ea9b09d53b58a8a619e4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page