A personal Python toolkit for common tasks
Project description
ToolkitX
A personal Python toolkit for common tasks. This package provides robust utility functions to simplify common development workflows, focusing on text processing, HTML conversion, and task resilience.
📖 Full Documentation: https://toolkitx.readthedocs.io/en/latest/
Features
-
HTML Utilities (
toolkitx.html_utils):html_to_markdown: Robust HTML to Markdown conversion. Handles complex tables (colspan/rowspan) by expansion and serializes nested tables to JSON for better LLM/Agent understanding. Automatically promotes the first row to header if missing.
-
Text Utilities (
toolkitx.text_utils):truncate_text_smart: Smartly truncates text by characters or words, attempting to preserve sentence or word boundaries with configurable tolerance.split_text_by_word_count: Splits long text into overlapping chunks based on word count.
-
Task Utilities (
toolkitx.task_utils):with_resilience: A decorator for API resilience with rate limiting (QPS), exponential backoff retry, and jitter.PersistentTaskQueue: A persistent task queue with SQLite backend, supporting concurrent processing, automatic retry, crash recovery, and graceful shutdown.
-
Experimental Translator (
toolkitx.lab.translator):Translator: A class providing translation capabilities using Baidu or Tencent translation APIs, with disk-based caching.
Installation
We recommend using uv for fast and reliable dependency management.
# Clone the repository
git clone https://github.com/ider-zh/toolkitx.git
cd toolkitx
# Install with development dependencies
uv pip install -e ".[dev,docs]"
Usage
HTML to Markdown (Robust Table Support)
from toolkitx import html_to_markdown
# Handles merged cells (colspan/rowspan) and nested tables
html = """
<table>
<tr><td colspan="2">Merged Header</td></tr>
<tr><td>Cell 1</td><td>Cell 2</td></tr>
<tr>
<td>Outer</td>
<td>
<table><tr><td>Nested</td></tr></table>
</td>
</tr>
</table>
"""
md = html_to_markdown(html)
print(md)
Text Smart Truncation
from toolkitx import truncate_text_smart
text = "Hello World. This is a long sentence that should be truncated smartly."
# Strips trailing punctuation automatically
truncated = truncate_text_smart(text, limit=12)
print(truncated) # Output: 'Hello World...'
Task Resilience Decorator
from toolkitx import with_resilience
@with_resilience(qps=2.0, max_retries=3)
def fetch_data(url):
# This function will be rate-limited and retried automatically
pass
Development
Running Tests
# Run unit tests
make test
# Run documentation tests (verify examples in docstrings)
make test-docs
Documentation
# Preview documentation locally
make docs-serve
# Build static documentation site
make docs-build
Changelog
v0.0.5 (2026-05-30)
- New Feature: Added
html_utilswith robusthtml_to_markdownconverter. - Improved:
truncate_text_smartnow strips trailing punctuation before appending suffix. - Documentation: Established full automated documentation system with MkDocs, Material theme, and Read the Docs integration.
- Verifiable Docs: Added
doctestexamples to all core functions and amake test-docstarget. - Workflow: Integrated
rufffor linting and formatting. - Dependency Management: Fully transitioned to
uvand pinnedmkdocsfor stability.
v0.0.4 (2026-03-07)
- Added
task_utilsmodule withwith_resiliencedecorator andPersistentTaskQueue. - Added
polars,pydantic, andtqdmas dependencies.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file toolkitx-0.0.5.tar.gz.
File metadata
- Download URL: toolkitx-0.0.5.tar.gz
- Upload date:
- Size: 86.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93239cc3d5899766c4b4dc6e8df9d47c72b3be73a60a04f3e6a03c952e875748
|
|
| MD5 |
05c6e4ce2ee0727136454e5a3a7606de
|
|
| BLAKE2b-256 |
dd55fb3e20afca2818fbb90866c14726927c548a26e5990bf381bc3e7058916c
|
File details
Details for the file toolkitx-0.0.5-py3-none-any.whl.
File metadata
- Download URL: toolkitx-0.0.5-py3-none-any.whl
- Upload date:
- Size: 21.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c60f60bdd365a6e1da6d028d3c32c62e47e4c1371682a8eeb153b02ad89f974
|
|
| MD5 |
603483f10ffbed7d961a9f537669b5bd
|
|
| BLAKE2b-256 |
bd58eceede6d92a3bb45269f863eec8a0ad9fe8a0af15feb3f52c2d2c2810fc7
|