Skip to main content

SDK Router Tools — collection of utility tools for automation pipelines (telegram, logging, html cleaner, etc.)

Project description

sdkrouter-tools

SDK Router Tools — collection of utility tools for automation pipelines.

Installation

pip install sdkrouter-tools

Tools Included

  • logging — Rich-powered logger with file persistence
  • telegram — Rate-limited Telegram sender with priority queue
  • html — HTML cleaner optimized for LLM pipelines

1. Logging (Rich-powered)

Universal Python logger with Rich console output and file persistence.

from sdkrouter_tools import get_logger

log = get_logger(__name__)
log.info("Hello world")
log.error("Something failed", exc_info=True)

# With custom level
log = get_logger(__name__, level="DEBUG")
log.debug("Debug details: %s", data)

Features

  • Rich console output with colors and formatting
  • Automatic file logging (daily rotation)
  • Auto-detects project root for log directory
  • Rich tracebacks with local variables

Convenience Functions

from sdkrouter_tools.logging import debug, info, warning, error, critical

info("Processing started")
warning("Low memory")
error("Failed to connect")

Configuration

from sdkrouter_tools import setup_logging

setup_logging(
    level="DEBUG",           # Log level
    log_to_file=True,        # Write to file
    log_to_console=True,     # Output to console
    app_name="myapp",        # App name for log file
    rich_tracebacks=True,    # Rich exception formatting
)

2. Telegram Sender

Rate-limited Telegram message sender with priority queue support.

from sdkrouter_tools import TelegramSender, ParseMode

sender = TelegramSender(
    bot_token="YOUR_BOT_TOKEN",
    chat_id="YOUR_CHAT_ID",
)

sender.send_message("Hello from sdkrouter-tools!")
sender.send_message("<b>Bold</b> message", parse_mode=ParseMode.HTML)

Convenience Functions

from sdkrouter_tools.telegram import (
    send_error, send_success, send_warning,
    send_info, send_stats, send_alert,
)

send_error("Something went wrong!", {"details": "error info"})
send_success("Task completed!", {"items_processed": 100})
send_warning("Disk space low", {"available": "10GB"})
send_alert("Critical: Server down!", {"server": "prod-1"})

Environment Variables

export TELEGRAM_BOT_TOKEN="your_bot_token"
export TELEGRAM_CHAT_ID="your_chat_id"

Priority Queue

Messages are processed with rate limiting (20 msg/sec):

from sdkrouter_tools import MessagePriority

# CRITICAL (1), HIGH (2), NORMAL (3), LOW (4)
sender.send_message("Important!", priority=MessagePriority.HIGH)

Sending Files

sender.send_photo("/path/to/image.jpg", caption="Check this out!")
sender.send_document("/path/to/file.pdf", caption="Report attached")

Queue Management

from sdkrouter_tools import telegram_queue

stats = telegram_queue.get_stats()
telegram_queue.flush(timeout=10.0)  # Wait before script exit

3. HTML Cleaner

HTML cleaner optimized for LLM pipelines. Aggressive DOM cleaning, SSR hydration extraction, CSS class filtering, semantic chunking, and multiple output formats.

from sdkrouter_tools import HTMLCleaner, CleanerConfig, OutputFormat

cleaner = HTMLCleaner()
result = cleaner.clean(html)

print(result.output)
print(f"Reduction: {result.stats.reduction_percent}%")
print(f"Tokens: {result.stats.original_tokens} -> {result.stats.cleaned_tokens}")

Quick Functions

from sdkrouter_tools import clean, clean_to_json

# Quick clean
result = clean(html, max_tokens=5000, output_format="markdown")

# Get JSON if SSR data available, otherwise cleaned HTML
data = clean_to_json(html)

Configuration

from sdkrouter_tools import CleanerConfig, OutputFormat

config = CleanerConfig(
    max_tokens=5000,
    output_format=OutputFormat.MARKDOWN,  # HTML, MARKDOWN, AOM, XTREE
    filter_classes=True,
    class_threshold=0.5,
    try_hydration=True,
)

cleaner = HTMLCleaner(config)
result = cleaner.clean(html)

SSR Hydration Extraction

Extract structured data from server-side rendered pages:

from sdkrouter_tools.html import extract_hydration, detect_framework

framework = detect_framework(html)  # NEXTJS_APP, NUXT3, etc.

data = extract_hydration(html)
if data.has_data:
    products = data.page_props.get("products", [])

Supported: Next.js, Nuxt 2/3, SvelteKit, Remix, Gatsby, Qwik, Astro

CSS Class Filtering

from sdkrouter_tools.html import score_class, filter_classes, detect_css_framework

# Score classes by semantic relevance
result = score_class("product-card")  # High score
result = score_class("css-abc123")    # Low score (hash)

# Filter list of classes
classes = ["product-card", "css-abc123", "flex", "MuiButton-root"]
kept = filter_classes(classes, threshold=0.5)  # ["product-card"]

# Detect CSS framework
framework = detect_css_framework(html)  # "tailwind", "bootstrap", etc.

Output Formats

from sdkrouter_tools.html import to_markdown, to_aom_yaml, to_xtree
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "lxml")

# Markdown
md = to_markdown(soup)

# AOM YAML (Playwright-style aria snapshot)
yaml = to_aom_yaml(soup)
# - navigation:
#   - link "Home"
#   - link "Products"

# XTree (hierarchical tree)
tree = to_xtree(soup)
# ROOT
# ├─ nav#main-nav
# │  └─ a.nav-link → "Home"
# └─ main

Pipeline API

from sdkrouter_tools import clean_html, clean_for_llm

result = clean_html(html, max_tokens=5000, output_format="markdown")
output = clean_for_llm(html)  # Returns dict (SSR) or str (cleaned HTML)

Advanced Features

from sdkrouter_tools.html import (
    # Shadow DOM
    flatten_shadow_dom,
    # Downsampling
    downsample_html, estimate_tokens,
    # Semantic Chunking
    SemanticChunker, ChunkConfig,
    # Context Extraction
    extract_context, generate_selector,
    # Helpers
    json_to_toon, html_to_text, extract_links, extract_images,
)

Requirements

  • Python >= 3.10
  • rich >= 13.0
  • pyTelegramBotAPI >= 4.14
  • beautifulsoup4 >= 4.12
  • lxml >= 5.3
  • pydantic >= 2.10
  • markdownify >= 0.14
  • tiktoken >= 0.8

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdkrouter_tools-0.1.1.tar.gz (65.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sdkrouter_tools-0.1.1-py3-none-any.whl (87.8 kB view details)

Uploaded Python 3

File details

Details for the file sdkrouter_tools-0.1.1.tar.gz.

File metadata

  • Download URL: sdkrouter_tools-0.1.1.tar.gz
  • Upload date:
  • Size: 65.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for sdkrouter_tools-0.1.1.tar.gz
Algorithm Hash digest
SHA256 3857e0356f2ff878a9d6fda4d56326aaac78740cfc25b57d8d0e7d11c6ebacc2
MD5 63bbf8bc78c8e637de8f686adea6fb18
BLAKE2b-256 cf00a8b44d9f52f0c79b6ac1d87e81d55f7aec8d798f34586e46f7b01ffc7683

See more details on using hashes here.

File details

Details for the file sdkrouter_tools-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for sdkrouter_tools-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2201125478bfe897a59d21f18c29c146835df6c644ff49e5a42c514dee4d4633
MD5 22ef63e70f95a6ef933699f90c19aa58
BLAKE2b-256 445a9f179b3d75f090c936df9b4f1bb24e5ff245a6a10e906b34cef830c67f37

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page