Skip to main content

A personal Python toolkit for common tasks

Project description

toolkitx

A personal Python toolkit for common tasks. This package provides various utility functions to simplify common development workflows.

Features

  • Text Utilities (toolkitx.text_utils):

    • truncate_text_smart: Smartly truncates text by characters or words, with options for suffix and tolerance, attempting to preserve sentence or word boundaries.
    • split_text_by_word_count: Splits long text into overlapping chunks based on word count.
  • Task Utilities (toolkitx.task_utils):

    • with_resilience: A decorator for API resilience with rate limiting (QPS), exponential backoff retry, and jitter to prevent thundering herd.
    • PersistentTaskQueue: A persistent task queue with SQLite backend, supporting concurrent processing, automatic retry, crash recovery, and graceful shutdown.
  • Experimental Translator (toolkitx.lab.translator):

    • Translator: A class providing translation capabilities using Baidu or Tencent translation APIs, with disk-based caching for performance. (Requires API credentials)

Installation

  1. Clone the repository:
    git clone https://github.com/ider-zh/toolkitx.git
    cd toolkitx
    
  2. Install the package. For development, you can install it in editable mode with development dependencies:
    pip install -e ".[dev]"
    
    For regular installation:
    pip install .
    

Usage

Text Utilities

from toolkitx import truncate_text_smart, split_text_by_word_count

# Smart Truncation
text = "This is a very long sentence that needs to be truncated."
truncated_char = truncate_text_smart(text, limit=20, mode="char", suffix="...")
print(f"Char truncated: {truncated_char}")

truncated_word = truncate_text_smart(text, limit=5, mode="word", suffix="...")
print(f"Word truncated: {truncated_word}")

# Split Text
long_text = "This is a long piece of text that we want to split into several smaller chunks with some overlap between them for context."
chunks = split_text_by_word_count(long_text, max_words=10, overlap=2)
for i, chunk in enumerate(chunks):
    print(f"Chunk {i+1}: {chunk}")

Task Utilities

with_resilience Decorator

from toolkitx.task_utils import with_resilience
import requests

@with_resilience(qps=5.0, max_retries=3, base_delay=1.0, max_delay=60.0)
def call_api_with_retry(url: str) -> dict:
    """Call API with automatic retry and rate limiting"""
    response = requests.get(url, timeout=10)
    response.raise_for_status()
    return response.json()

# The decorator will automatically:
# - Limit requests to 5 per second (QPS)
# - Retry up to 3 times on failure with exponential backoff
# - Add random jitter to prevent thundering herd
result = call_api_with_retry("https://api.example.com/data")

PersistentTaskQueue

import polars as pl
from pydantic import BaseModel
from toolkitx.task_utils import PersistentTaskQueue
import tempfile

# Define your data model
class EntityModel(BaseModel):
    name: str
    is_company: bool

# Define your processing function
def extract_entity(text: str) -> EntityModel:
    """Extract entity information from text"""
    # Your processing logic here
    return EntityModel(name=text.split()[0], is_company=True)

# Prepare data
df = pl.DataFrame({
    "batch_id": ["batch1", "batch1", "batch2"],
    "input_text": ["Apple Inc.", "Google Corp.", "Microsoft"]
})

# Initialize queue with temporary database
with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as f:
    db_path = f.name

queue = PersistentTaskQueue(db_path=db_path, task_name="entity_extraction", max_retries=3)

# Setup and enqueue data
queue.setup()
queue.enqueue_dataframe(df)

# Process tasks with concurrency control (supports Ctrl+C for graceful shutdown)
queue.process(worker_func=extract_entity, concurrency=10)

# Get results
results = queue.get_results(response_model=EntityModel)
print(results)

Changelog

v0.0.4 (2026-03-07)

  • Added task_utils module with with_resilience decorator for API resilience
  • Added PersistentTaskQueue class for persistent task processing with SQLite backend
  • Added comprehensive documentation for new features
  • Bumped version to 0.0.4
  • Updated dependencies (httpx, tencentcloud-sdk-python, pytest, etc.)
  • Removed hello script and related functionality
  • Added polars, pydantic, and tqdm as dependencies
  • Improved translator module to use tempfile for cache paths

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toolkitx-0.0.4.tar.gz (43.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toolkitx-0.0.4-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file toolkitx-0.0.4.tar.gz.

File metadata

  • Download URL: toolkitx-0.0.4.tar.gz
  • Upload date:
  • Size: 43.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.3

File hashes

Hashes for toolkitx-0.0.4.tar.gz
Algorithm Hash digest
SHA256 e35835718f2b69b3150ee15b5c80635004178f96602900d9e10a0389e013bd8b
MD5 e8d845e68e2ee32fc333907bd0c23206
BLAKE2b-256 5c20e98e8ffc59b5b91cc2e5a2e3a8f18f03b27be1e7437ac0e8efb42101060f

See more details on using hashes here.

File details

Details for the file toolkitx-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: toolkitx-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 18.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.3

File hashes

Hashes for toolkitx-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 d37e0fd66cc71e582e546129e804c3f2bc35537ce8eb662880c4f7d9088b3096
MD5 dfa9a933454b85130ba9b0e552211cd9
BLAKE2b-256 7310580dcc8ef85a90a2d0b2235e8f6f90c6c33df84653f715884c6d2b35dae7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page