Skip to main content

A personal Python toolkit for common tasks

Project description

ToolkitX

Documentation Status License: MIT Python 3.12+

A personal Python toolkit for common tasks. This package provides robust utility functions to simplify common development workflows, focusing on text processing, HTML conversion, and task resilience.

📖 Full Documentation: https://toolkitx.readthedocs.io/en/latest/

Features

  • HTML Utilities (toolkitx.html_utils):

    • html_to_markdown: Robust HTML to Markdown conversion. Handles complex tables (colspan/rowspan) by expansion and serializes nested tables to JSON for better LLM/Agent understanding. Automatically promotes the first row to header if missing.
  • Text Utilities (toolkitx.text_utils):

    • truncate_text_smart: Smartly truncates text by characters or words, attempting to preserve sentence or word boundaries with configurable tolerance.
    • split_text_by_word_count: Splits long text into overlapping chunks based on word count.
  • Task Utilities (toolkitx.task_utils):

    • with_resilience: A decorator for API resilience with rate limiting (QPS), exponential backoff retry, and jitter.
    • PersistentTaskQueue: A persistent task queue with SQLite backend, supporting concurrent processing, automatic retry, crash recovery, and graceful shutdown.
  • Experimental Translator (toolkitx.lab.translator):

    • Translator: A class providing translation capabilities using Baidu or Tencent translation APIs, with disk-based caching.

Installation

We recommend using uv for fast and reliable dependency management.

# Clone the repository
git clone https://github.com/ider-zh/toolkitx.git
cd toolkitx

# Install with development dependencies
uv pip install -e ".[dev,docs]"

Usage

HTML to Markdown (Robust Table Support)

from toolkitx import html_to_markdown

# Handles merged cells (colspan/rowspan) and nested tables
html = """
<table>
  <tr><td colspan="2">Merged Header</td></tr>
  <tr><td>Cell 1</td><td>Cell 2</td></tr>
  <tr>
    <td>Outer</td>
    <td>
      <table><tr><td>Nested</td></tr></table>
    </td>
  </tr>
</table>
"""

md = html_to_markdown(html)
print(md)

Text Smart Truncation

from toolkitx import truncate_text_smart

text = "Hello World. This is a long sentence that should be truncated smartly."
# Strips trailing punctuation automatically
truncated = truncate_text_smart(text, limit=12) 
print(truncated) # Output: 'Hello World...'

Task Resilience Decorator

from toolkitx import with_resilience

@with_resilience(qps=2.0, max_retries=3)
def fetch_data(url):
    # This function will be rate-limited and retried automatically
    pass

Development

Running Tests

# Run unit tests
make test

# Run documentation tests (verify examples in docstrings)
make test-docs

Documentation

# Preview documentation locally
make docs-serve

# Build static documentation site
make docs-build

Changelog

v0.0.5 (2026-05-30)

  • New Feature: Added html_utils with robust html_to_markdown converter.
  • Improved: truncate_text_smart now strips trailing punctuation before appending suffix.
  • Documentation: Established full automated documentation system with MkDocs, Material theme, and Read the Docs integration.
  • Verifiable Docs: Added doctest examples to all core functions and a make test-docs target.
  • Workflow: Integrated ruff for linting and formatting.
  • Dependency Management: Fully transitioned to uv and pinned mkdocs for stability.

v0.0.4 (2026-03-07)

  • Added task_utils module with with_resilience decorator and PersistentTaskQueue.
  • Added polars, pydantic, and tqdm as dependencies.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toolkitx-0.0.5.tar.gz (86.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toolkitx-0.0.5-py3-none-any.whl (21.4 kB view details)

Uploaded Python 3

File details

Details for the file toolkitx-0.0.5.tar.gz.

File metadata

  • Download URL: toolkitx-0.0.5.tar.gz
  • Upload date:
  • Size: 86.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.3

File hashes

Hashes for toolkitx-0.0.5.tar.gz
Algorithm Hash digest
SHA256 93239cc3d5899766c4b4dc6e8df9d47c72b3be73a60a04f3e6a03c952e875748
MD5 05c6e4ce2ee0727136454e5a3a7606de
BLAKE2b-256 dd55fb3e20afca2818fbb90866c14726927c548a26e5990bf381bc3e7058916c

See more details on using hashes here.

File details

Details for the file toolkitx-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: toolkitx-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 21.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.3

File hashes

Hashes for toolkitx-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 2c60f60bdd365a6e1da6d028d3c32c62e47e4c1371682a8eeb153b02ad89f974
MD5 603483f10ffbed7d961a9f537669b5bd
BLAKE2b-256 bd58eceede6d92a3bb45269f863eec8a0ad9fe8a0af15feb3f52c2d2c2810fc7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page