A personal Python toolkit for common tasks
Project description
toolkitx
A personal Python toolkit for common tasks. This package provides various utility functions to simplify common development workflows.
Features
-
Text Utilities (
toolkitx.text_utils):truncate_text_smart: Smartly truncates text by characters or words, with options for suffix and tolerance, attempting to preserve sentence or word boundaries.split_text_by_word_count: Splits long text into overlapping chunks based on word count.
-
Task Utilities (
toolkitx.task_utils):with_resilience: A decorator for API resilience with rate limiting (QPS), exponential backoff retry, and jitter to prevent thundering herd.PersistentTaskQueue: A persistent task queue with SQLite backend, supporting concurrent processing, automatic retry, crash recovery, and graceful shutdown.
-
Experimental Translator (
toolkitx.lab.translator):Translator: A class providing translation capabilities using Baidu or Tencent translation APIs, with disk-based caching for performance. (Requires API credentials)
Installation
- Clone the repository:
git clone https://github.com/ider-zh/toolkitx.git cd toolkitx
- Install the package. For development, you can install it in editable mode with development dependencies:
pip install -e ".[dev]"
For regular installation:pip install .
Usage
Text Utilities
from toolkitx import truncate_text_smart, split_text_by_word_count
# Smart Truncation
text = "This is a very long sentence that needs to be truncated."
truncated_char = truncate_text_smart(text, limit=20, mode="char", suffix="...")
print(f"Char truncated: {truncated_char}")
truncated_word = truncate_text_smart(text, limit=5, mode="word", suffix="...")
print(f"Word truncated: {truncated_word}")
# Split Text
long_text = "This is a long piece of text that we want to split into several smaller chunks with some overlap between them for context."
chunks = split_text_by_word_count(long_text, max_words=10, overlap=2)
for i, chunk in enumerate(chunks):
print(f"Chunk {i+1}: {chunk}")
Task Utilities
with_resilience Decorator
from toolkitx.task_utils import with_resilience
import requests
@with_resilience(qps=5.0, max_retries=3, base_delay=1.0, max_delay=60.0)
def call_api_with_retry(url: str) -> dict:
"""Call API with automatic retry and rate limiting"""
response = requests.get(url, timeout=10)
response.raise_for_status()
return response.json()
# The decorator will automatically:
# - Limit requests to 5 per second (QPS)
# - Retry up to 3 times on failure with exponential backoff
# - Add random jitter to prevent thundering herd
result = call_api_with_retry("https://api.example.com/data")
PersistentTaskQueue
import polars as pl
from pydantic import BaseModel
from toolkitx.task_utils import PersistentTaskQueue
import tempfile
# Define your data model
class EntityModel(BaseModel):
name: str
is_company: bool
# Define your processing function
def extract_entity(text: str) -> EntityModel:
"""Extract entity information from text"""
# Your processing logic here
return EntityModel(name=text.split()[0], is_company=True)
# Prepare data
df = pl.DataFrame({
"batch_id": ["batch1", "batch1", "batch2"],
"input_text": ["Apple Inc.", "Google Corp.", "Microsoft"]
})
# Initialize queue with temporary database
with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as f:
db_path = f.name
queue = PersistentTaskQueue(db_path=db_path, task_name="entity_extraction", max_retries=3)
# Setup and enqueue data
queue.setup()
queue.enqueue_dataframe(df)
# Process tasks with concurrency control (supports Ctrl+C for graceful shutdown)
queue.process(worker_func=extract_entity, concurrency=10)
# Get results
results = queue.get_results(response_model=EntityModel)
print(results)
Changelog
v0.0.4 (2026-03-07)
- Added
task_utilsmodule withwith_resiliencedecorator for API resilience - Added
PersistentTaskQueueclass for persistent task processing with SQLite backend - Added comprehensive documentation for new features
- Bumped version to 0.0.4
- Updated dependencies (httpx, tencentcloud-sdk-python, pytest, etc.)
- Removed
helloscript and related functionality - Added
polars,pydantic, andtqdmas dependencies - Improved translator module to use
tempfilefor cache paths
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file toolkitx-0.0.4.tar.gz.
File metadata
- Download URL: toolkitx-0.0.4.tar.gz
- Upload date:
- Size: 43.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e35835718f2b69b3150ee15b5c80635004178f96602900d9e10a0389e013bd8b
|
|
| MD5 |
e8d845e68e2ee32fc333907bd0c23206
|
|
| BLAKE2b-256 |
5c20e98e8ffc59b5b91cc2e5a2e3a8f18f03b27be1e7437ac0e8efb42101060f
|
File details
Details for the file toolkitx-0.0.4-py3-none-any.whl.
File metadata
- Download URL: toolkitx-0.0.4-py3-none-any.whl
- Upload date:
- Size: 18.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d37e0fd66cc71e582e546129e804c3f2bc35537ce8eb662880c4f7d9088b3096
|
|
| MD5 |
dfa9a933454b85130ba9b0e552211cd9
|
|
| BLAKE2b-256 |
7310580dcc8ef85a90a2d0b2235e8f6f90c6c33df84653f715884c6d2b35dae7
|