Skip to main content

SDK Router Tools — collection of utility tools for automation pipelines (telegram, logging, html cleaner, etc.)

Project description

sdkrouter-tools

SDK Router Tools — collection of utility tools for automation pipelines.

Installation

pip install sdkrouter-tools

Tools Included

  • logging — Rich-powered logger with file persistence
  • telegram — Rate-limited Telegram sender with priority queue
  • html — HTML cleaner optimized for LLM pipelines

1. Logging (Rich-powered)

Universal Python logger with Rich console output and file persistence.

from sdkrouter_tools import get_logger

log = get_logger(__name__)
log.info("Hello world")
log.error("Something failed", exc_info=True)

# With custom level
log = get_logger(__name__, level="DEBUG")
log.debug("Debug details: %s", data)

Features

  • Rich console output with colors and formatting
  • Automatic file logging (daily rotation)
  • Auto-detects project root for log directory
  • Rich tracebacks with local variables

Convenience Functions

from sdkrouter_tools.logging import debug, info, warning, error, critical

info("Processing started")
warning("Low memory")
error("Failed to connect")

Configuration

from sdkrouter_tools import setup_logging

setup_logging(
    level="DEBUG",           # Log level
    log_to_file=True,        # Write to file
    log_to_console=True,     # Output to console
    app_name="myapp",        # App name for log file
    rich_tracebacks=True,    # Rich exception formatting
)

2. Telegram Sender

Rate-limited Telegram message sender with priority queue support.

from sdkrouter_tools import TelegramSender, ParseMode

sender = TelegramSender(
    bot_token="YOUR_BOT_TOKEN",
    chat_id="YOUR_CHAT_ID",
)

sender.send_message("Hello from sdkrouter-tools!")
sender.send_message("<b>Bold</b> message", parse_mode=ParseMode.HTML)

Convenience Functions

from sdkrouter_tools.telegram import (
    send_error, send_success, send_warning,
    send_info, send_stats, send_alert,
)

send_error("Something went wrong!", {"details": "error info"})
send_success("Task completed!", {"items_processed": 100})
send_warning("Disk space low", {"available": "10GB"})
send_alert("Critical: Server down!", {"server": "prod-1"})

Environment Variables

export TELEGRAM_BOT_TOKEN="your_bot_token"
export TELEGRAM_CHAT_ID="your_chat_id"

Priority Queue

Messages are processed with rate limiting (20 msg/sec):

from sdkrouter_tools import MessagePriority

# CRITICAL (1), HIGH (2), NORMAL (3), LOW (4)
sender.send_message("Important!", priority=MessagePriority.HIGH)

Sending Files

sender.send_photo("/path/to/image.jpg", caption="Check this out!")
sender.send_document("/path/to/file.pdf", caption="Report attached")

Queue Management

from sdkrouter_tools import telegram_queue

stats = telegram_queue.get_stats()
telegram_queue.flush(timeout=10.0)  # Wait before script exit

3. HTML Cleaner

HTML cleaner optimized for LLM pipelines. Aggressive DOM cleaning, SSR hydration extraction, CSS class filtering, semantic chunking, and multiple output formats.

from sdkrouter_tools import HTMLCleaner, CleanerConfig, OutputFormat

cleaner = HTMLCleaner()
result = cleaner.clean(html)

print(result.output)
print(f"Reduction: {result.stats.reduction_percent}%")
print(f"Tokens: {result.stats.original_tokens} -> {result.stats.cleaned_tokens}")

Quick Functions

from sdkrouter_tools import clean, clean_to_json

# Quick clean
result = clean(html, max_tokens=5000, output_format="markdown")

# Get JSON if SSR data available, otherwise cleaned HTML
data = clean_to_json(html)

Configuration

from sdkrouter_tools import CleanerConfig, OutputFormat

config = CleanerConfig(
    max_tokens=5000,
    output_format=OutputFormat.MARKDOWN,  # HTML, MARKDOWN, AOM, XTREE
    filter_classes=True,
    class_threshold=0.5,
    try_hydration=True,
)

cleaner = HTMLCleaner(config)
result = cleaner.clean(html)

SSR Hydration Extraction

Extract structured data from server-side rendered pages:

from sdkrouter_tools.html import extract_hydration, detect_framework

framework = detect_framework(html)  # NEXTJS_APP, NUXT3, etc.

data = extract_hydration(html)
if data.has_data:
    products = data.page_props.get("products", [])

Supported: Next.js, Nuxt 2/3, SvelteKit, Remix, Gatsby, Qwik, Astro

CSS Class Filtering

from sdkrouter_tools.html import score_class, filter_classes, detect_css_framework

# Score classes by semantic relevance
result = score_class("product-card")  # High score
result = score_class("css-abc123")    # Low score (hash)

# Filter list of classes
classes = ["product-card", "css-abc123", "flex", "MuiButton-root"]
kept = filter_classes(classes, threshold=0.5)  # ["product-card"]

# Detect CSS framework
framework = detect_css_framework(html)  # "tailwind", "bootstrap", etc.

Output Formats

from sdkrouter_tools.html import to_markdown, to_aom_yaml, to_xtree
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "lxml")

# Markdown
md = to_markdown(soup)

# AOM YAML (Playwright-style aria snapshot)
yaml = to_aom_yaml(soup)
# - navigation:
#   - link "Home"
#   - link "Products"

# XTree (hierarchical tree)
tree = to_xtree(soup)
# ROOT
# ├─ nav#main-nav
# │  └─ a.nav-link → "Home"
# └─ main

Pipeline API

from sdkrouter_tools import clean_html, clean_for_llm

result = clean_html(html, max_tokens=5000, output_format="markdown")
output = clean_for_llm(html)  # Returns dict (SSR) or str (cleaned HTML)

Advanced Features

from sdkrouter_tools.html import (
    # Shadow DOM
    flatten_shadow_dom,
    # Downsampling
    downsample_html, estimate_tokens,
    # Semantic Chunking
    SemanticChunker, ChunkConfig,
    # Context Extraction
    extract_context, generate_selector,
    # Helpers
    json_to_toon, html_to_text, extract_links, extract_images,
)

Requirements

  • Python >= 3.10
  • rich >= 13.0
  • pyTelegramBotAPI >= 4.14
  • beautifulsoup4 >= 4.12
  • lxml >= 5.3
  • pydantic >= 2.10
  • markdownify >= 0.14
  • tiktoken >= 0.8

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdkrouter_tools-0.1.3.tar.gz (65.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sdkrouter_tools-0.1.3-py3-none-any.whl (87.7 kB view details)

Uploaded Python 3

File details

Details for the file sdkrouter_tools-0.1.3.tar.gz.

File metadata

  • Download URL: sdkrouter_tools-0.1.3.tar.gz
  • Upload date:
  • Size: 65.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for sdkrouter_tools-0.1.3.tar.gz
Algorithm Hash digest
SHA256 adf81394c12b3ef363c73c7a511ddddc67671ad37649812e2b1c53b61d995263
MD5 0b6d316f1723b31fa3ff9c39bd978ee9
BLAKE2b-256 598b836a1efe7c89c21a7271ea55e3111f341225b2aadeaab894f9e5789d0e4b

See more details on using hashes here.

File details

Details for the file sdkrouter_tools-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for sdkrouter_tools-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c1d76e4fae643fb3b6bf8628dc57d1c5ea3c35528fd5dcfba165376f7f3a75ec
MD5 95fc6fdb90a8317f04ac40036578d6d5
BLAKE2b-256 e3ba88ff86254556be7cb27bb2df3ae285ffd7072f440aa147ef03db096f8277

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page