Fast and easy-to-use package for data science
Project description
Speedy Utils
Speedy Utils is a Python utility library for caching, parallel processing,
file I/O, LLM integration, dataset inspection, and image processing. The repo
ships multiple importable packages and keeps import time under the repository's
0.4s hook budget by keeping heavy external dependencies lazy.
Table of Contents
Installation
pip install speedy-utils
# or
uv pip install speedy-utils
Install from source:
pip install git+https://github.com/anhvth/speedy_utils
# or
uv pip install git+https://github.com/anhvth/speedy_utils
Local development:
git clone https://github.com/anhvth/speedy_utils
cd speedy_utils
uv sync
Upgrading from older split packages:
pip uninstall speedy_llm_utils speedy_utils
pip install -U speedy-utils
Packages
The wheel currently installs four packages from src/:
| Package | Purpose |
|---|---|
speedy_utils |
Core utilities: caching, I/O, formatting, parallelism, timing |
llm_utils |
OpenAI-compatible LLM wrappers and chat-format helpers |
vision_utils |
Image loading, plotting, and mmap-backed image datasets |
datasets_utils |
Dataset inspection helpers, including the viz_chat CLI |
Core Utilities
memoize and imemoize
from speedy_utils import memoize, imemoize
@memoize
def expensive_function(x):
import time
time.sleep(2)
return x * x
@imemoize
def fast_function(x):
return x + 1
memoize uses memory, disk, or both. The default disk cache root is
~/.cache/speedy_cache.
@memoize(
keys=["x"],
cache_dir="/tmp/my_cache",
cache_type="both", # "memory" | "disk" | "both"
size=512,
verbose=True,
)
def fn(x, ignored_arg):
...
Both decorators support sync and async functions.
multi_thread
from speedy_utils import multi_thread
results = multi_thread(lambda x: x * 2, [1, 2, 3, 4, 5])
Important public options:
multi_thread(
func,
inputs,
workers=None,
batch=1,
ordered=True,
progress=True,
progress_update=10,
progress_total=None,
progress_weight=None,
prefetch_factor=4,
timeout=None,
error_handler="raise", # "raise" | "ignore" | "log"
max_error_files=100,
store_output_pkl_file=None,
**fixed_kwargs,
)
Error handling:
def process(item):
if item == 3:
raise ValueError("bad item")
return item * 2
multi_thread(process, [1, 2, 3], error_handler="raise")
multi_thread(process, [1, 2, 3], error_handler="ignore")
multi_thread(process, [1, 2, 3], error_handler="log")
error_handler="log" writes rich error reports under
.cache/speedy_utils/error_logs/.
multi_process
from speedy_utils import multi_process
results = multi_process(
func,
items,
num_procs=4,
num_threads=1,
backend="spawn", # "spawn" | "fork"
error_handler="log", # "raise" | "ignore" | "log"
progress=True,
dump_in_thread=True,
log_worker="first", # "zero" | "first" | "all"
)
Current behavior worth knowing:
num_procs=Nonenormalizes to1, not automatic process-count detection.num_procs <= 1andnum_threads <= 1uses a local sequential backend.num_procs <= 1andnum_threads > 1uses the in-process thread backend.
File I/O
Use load_jsonl() for JSONL and load_json_or_pickle() for .json and pickle.
from speedy_utils import (
dump_json_or_pickle,
dump_jsonl,
jdumps,
jloads,
load_by_ext,
load_json_or_pickle,
load_jsonl,
)
records = load_jsonl("data/file.jsonl")
records = load_jsonl("data/**/*.jsonl")
records = load_jsonl(["train/*.jsonl", "val/file.jsonl"])
data = load_json_or_pickle("data.json")
data = load_json_or_pickle("data.pkl")
dump_json_or_pickle({"name": "Alice"}, "out.json")
dump_jsonl([{"a": 1}, {"a": 2}], "out.jsonl")
obj = jloads('{"key": "value",}')
text = jdumps(obj)
data = load_by_ext("data.csv")
data = load_by_ext(["part1.jsonl", "part2.jsonl"])
For streaming or compressed JSONL, use fast_load_jsonl directly:
from speedy_utils.common.utils_io import fast_load_jsonl
for record in fast_load_jsonl(
"data/large.jsonl.gz",
progress=True,
on_error="skip",
max_lines=1000,
use_orjson=True,
):
...
Data, Printing, and Timing Helpers
from speedy_utils import (
Clock,
convert_to_builtin_python,
dedup,
flatten_dict,
flatten_list,
fprint,
print_table,
timef,
)
flatten_list([[1, 2], [3, 4]])
flatten_dict({"a": {"b": 1}, "c": 2})
dedup([3, 1, 2, 1, 3])
fprint({"name": "Dana", "scores": [95, 87, 92]})
print_table([{"a": 1, "b": 2}, {"a": 3, "b": 4}])
@timef
def slow_function():
...
clock = Clock()
CLI Tools
The installed console scripts are:
| CLI | Purpose |
|---|---|
mpython |
Launch sharded Python runs across tmux windows |
kill-mpython |
Kill mpython tmux sessions |
sp_chat |
Launch a Chainlit chat UI for an OpenAI-compatible backend |
spu-prefetch-large-model |
Read large model files into the OS page cache |
viz_chat |
Inspect chat datasets from JSON, JSONL, folders, or HF saves |
openapi_client_codegen |
Generate a sync client from an OpenAPI JSON spec |
Examples:
mpython -t 8 script.py
kill-mpython
sp_chat client=8000
sp_chat client=http://10.0.0.3:8000/v1 port=5010 model=Qwen/Qwen2.5-7B-Instruct
spu-prefetch-large-model /path/to/model -j 8
viz_chat data/my_dataset.jsonl
viz_chat data/hf_dataset/ --count 5
viz_chat data/tokenized_dataset/ --tokenizer Qwen/Qwen3-8B
openapi_client_codegen openapi.json -o generated_client.py
LLM
llm_utils wraps OpenAI-compatible chat and completion APIs.
LLM main entry points
from llm_utils import LLM
llm = LLM(client=8000)
The three main sync entry points are:
chat_completion(...)for chat responses.generate(...)for raw prompt continuation through the completions API.pydantic_parse(...)for structured outputs.
The convenience llm(...) wrapper routes like this:
llm("prompt")->chat_completion(...)llm("prompt", response_model=MyModel)->pydantic_parse(...)llm("prompt", return_dict=True)-> normalized dict with raw artifacts
Basic chat completion
from llm_utils import LLM
llm = LLM(model="gpt-4o-mini")
message = llm("What is Python?")
print(message.content)
Equivalent explicit call:
message = llm.chat_completion(
[
{"role": "system", "content": "Be concise."},
{"role": "user", "content": "What is Python?"},
]
)
Structured output with Pydantic
from pydantic import BaseModel
from llm_utils import LLM
class Sentiment(BaseModel):
sentiment: str
confidence: float
llm = LLM(model="gpt-4o-mini")
result = llm.pydantic_parse(
"Return JSON for the sentiment of: I love this product!",
response_model=Sentiment,
)
print(result.sentiment, result.confidence)
Normalized dict output
result = llm(
"Return JSON for the sentiment of: I love this product!",
response_model=Sentiment,
return_dict=True,
)
print(result.keys())
# dict_keys(["completion", "message", "messages", "parsed"])
Streaming chat responses
Streaming is only supported for text completions, not Pydantic parsing.
from llm_utils import LLM
llm = LLM(model="gpt-4o-mini")
for chunk in llm("Tell me a story", stream=True):
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
Raw prompt continuation with generate()
generate() uses the completions API and returns an OpenAI
CompletionChoice-like object.
choice = llm.generate(
"Write a haiku about coding:",
max_tokens=50,
temperature=0.8,
)
print(choice.text)
print(choice.finish_reason)
print(choice.usage.total_tokens)
Current public behavior:
generate()expectspromptto be a string.n=1only; multi-choice generation is rejected.- backend-specific metadata such as
token_idsorprompt_logprobsis kept when the backend returns it.
Client configuration
from llm_utils import LLM
from openai import OpenAI
llm = LLM(
client=OpenAI(base_url="http://localhost:8000/v1", api_key="sk-..."),
model="llama-3",
)
llm = LLM(client=8000, model="llama-3")
llm = LLM(client="http://localhost:8000/v1", model="llama-3")
llm = LLM(client=[8000, 8001, 8002], model="llama-3")
Caching and history inspection
llm = LLM(model="gpt-4o-mini", cache=True)
message = llm("What is 2+2?")
again = llm("What is 2+2?")
fresh = llm("What is 2+2?", cache=False)
history = llm.inspect_history()
inspect_history() returns the recent conversation that was recorded for the
last response.
LLMSignature
LLMSignature binds a Signature class to default structured output.
from llm_utils import Input, LLMSignature, Output, Signature
class SentimentSignature(Signature):
text: str = Input("Text to analyze")
sentiment: str = Output("positive | negative | neutral")
confidence: float = Output("Confidence score")
sig = LLMSignature(signature=SentimentSignature, model="gpt-4o-mini")
result = sig("Analyze: I love this!")
print(result.sentiment, result.confidence)
Qwen3LLM
Qwen3LLM adds staged prefix continuation for Qwen3-style reasoning flows.
Standard chat path:
from llm_utils import Qwen3LLM
llm = Qwen3LLM(client=8000)
message = llm.chat_completion(
[{"role": "user", "content": "Solve x^2 + 2x + 1 = 0"}],
thinking_max_tokens=32,
content_max_tokens=128,
)
print(message.content)
print(getattr(message, "reasoning_content", None))
print(getattr(message, "call_count", None))
Custom staged prefix flow:
memory_state = llm.complete_until(
[{"role": "user", "content": "Plan the answer in stages"}],
"<memory>",
stop="</memory>",
max_tokens=128,
)
think_state = llm.complete_until(
[{"role": "user", "content": "Plan the answer in stages"}],
memory_state.assistant_prompt_prefix + "\n<think_efficient>",
stop="</think_efficient>",
max_tokens=256,
)
final_state = llm.complete_until(
[{"role": "user", "content": "Plan the answer in stages"}],
think_state.assistant_prompt_prefix,
stop="<|im_end|>",
max_tokens=256,
)
print(final_state.generated_text)
print(final_state.assistant_prompt_prefix)
print(final_state.call_count)
complete_until() returns a continuation state object, not a
ChatCompletionMessage.
Dataset Tools
datasets_utils.viz_chat is a lightweight dataset inspector for conversation
data.
Supported inputs:
- HuggingFace datasets saved with
save_to_disk() - JSONL files
- JSON files containing one object or a list of objects
- Folders of JSON files
- tokenized datasets when
--tokenizeris provided
Examples:
viz_chat data/my_dataset
viz_chat data/conversations.jsonl
viz_chat data/sharegpt.jsonl --format sharegpt
viz_chat data/tokenized_dataset/ --tokenizer Qwen/Qwen3-8B
viz_chat data/with_tools.jsonl --show-tools
Vision Utils
vision_utils exports:
read_imagesread_images_cpuread_images_gpuplot_images_notebookImageMmapImageMmapDynamic
Image loading
The image loaders return a dict mapping each input path to a NumPy array or
None on failure.
from vision_utils import read_images, read_images_cpu, read_images_gpu
paths = ["img1.jpg", "img2.png"]
images = read_images(paths)
cpu_images = read_images_cpu(paths)
gpu_images = read_images_gpu(paths)
first_image = images[paths[0]]
Notebook plotting
plot_images_notebook() accepts NumPy arrays, PyTorch tensors, lists, or tuples
of image arrays. If you loaded images with read_images*, pass the values.
from vision_utils import plot_images_notebook, read_images
paths = ["img1.jpg", "img2.png"]
images = read_images(paths)
plot_images_notebook(list(images.values()))
The current defaults include dpi=300, automatic grid sizing, and automatic
format normalization for (H, W), (H, W, C), (C, H, W), (B, H, W, C),
and (B, C, H, W) inputs.
Mmap-backed datasets
Both mmap dataset classes take image paths, not a prebuilt mmap filename as the only positional argument.
from vision_utils import ImageMmap, ImageMmapDynamic
paths = ["img1.jpg", "img2.jpg"]
fixed = ImageMmap(paths, size=(224, 224))
dynamic = ImageMmapDynamic(paths)
img = fixed[0]
img2 = dynamic[0]
Testing and Checks
# Run all tests with xdist workers
./tools/uv_test.sh -n 32
# Single test file
./tools/uv_test.sh tests/test_thread.py
# Verbose
./tools/uv_test.sh -v
# Check import-time budget
uv run python scripts/debug_import_time.py speedy_utils llm_utils vision_utils \
--max-total-sec 0.4 --top 12 --min-sec 0.01 --no-stdlib
# Type checking
uv run python tools/check_syntax.py
# Ruff
uv run ruff check .
uv run ruff format .
TDD and Regression Testing
- Every bug fix should include a regression test that reproduces the bug first.
- Keep regression tests deterministic (no flaky time/random/network behavior).
- Test through public APIs and assert specific outcomes.
- Keep one behavior per test so failures are easy to diagnose.
- Prefer fast, isolated tests that are run frequently.
See the full playbook in docs/TDD.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file speedy_utils-2.0.6.tar.gz.
File metadata
- Download URL: speedy_utils-2.0.6.tar.gz
- Upload date:
- Size: 743.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4fc33ec1e6ae30c86111e969cc23f939c6ace0f0980d0d8383ef9db687688b04
|
|
| MD5 |
c53b0b79bf7150acf1acdf02cfe74211
|
|
| BLAKE2b-256 |
d5089b27bc244c5f5976f9256d4b007735f845a680f6aa793b7b0ab814269170
|
File details
Details for the file speedy_utils-2.0.6-py3-none-any.whl.
File metadata
- Download URL: speedy_utils-2.0.6-py3-none-any.whl
- Upload date:
- Size: 141.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f39e69aa1aef74e0a4c06cdb593aabbca821dce95a66fd3396360126ea7ec86
|
|
| MD5 |
1330071bbdc70a86a5b0ef1a5df4aa88
|
|
| BLAKE2b-256 |
8f9dd1b46f1d957e3532506697d3e0a49131baaa5cd8bab05fc4bb8fc70e32c9
|