Zero-boilerplate converter from raw data (images, text, categories) to numeric NumPy arrays.
Project description
FastNum
Zero-boilerplate conversion from raw data to numeric NumPy arrays.
FastNum detects the kind of data you hand it — an image path, a sentence, a category list, or a batch of any of these — and returns the right numeric representation without a single line of configuration.
Why FastNum?
Most ML preprocessing pipelines repeat the same four patterns:
| Input | Desired output |
|---|---|
| Image file | Float32 pixel array normalised to [0, 1] |
| Text sentence | Integer token-ID sequence |
| Flat category list | One-hot matrix |
| Batch of sentences | Padded token-ID matrix |
FastNum collapses all four into one call: fn.to_num(data).
Installation
pip install fastnum
Or from source:
git clone https://github.com/your-username/fastnum.git
cd fastnum
pip install -e ".[dev]"
Requirements: Python ≥ 3.9, numpy ≥ 1.24, opencv-python ≥ 4.8.
Quick start
from fastnum import FastNum
fn = FastNum()
# --- Image -----------------------------------------------------------
pixels = fn.to_num("photo.jpg") # (H, W, 3) float32, values in [0, 1]
pixels = fn.to_num("photo.jpg", image_size=(224, 224)) # resize on the fly
# Batch of images (all resized to the same shape for stacking)
batch = fn.to_num(["a.jpg", "b.jpg"], image_size=(224, 224)) # (2, 224, 224, 3)
# --- Plain text ------------------------------------------------------
tokens = fn.to_num("the cat sat on the mat") # int32 array of token IDs
print(fn.decode(tokens)) # → "the cat sat on the mat"
# --- Category list ---------------------------------------------------
labels = ["dog", "cat", "dog", "bird"]
one_hot = fn.to_num(labels)
# array([[0., 1., 0.],
# [1., 0., 0.],
# [0., 1., 0.],
# [0., 0., 1.]], dtype=float32)
# --- Sentence batch --------------------------------------------------
matrix = fn.to_num(["hello world", "foo bar baz"])
# int32 matrix (2, 3), shorter rows are right-padded with pad_token_id
# --- Raw NumPy array -------------------------------------------------
import numpy as np
fn.to_num(np.array([1, 2, 3])) # cast to float32, no-op otherwise
API reference
FastNum(pad_token_id=0)
| Parameter | Type | Default | Description |
|---|---|---|---|
pad_token_id |
int |
0 |
ID reserved for the [PAD] token. The special token is inserted into the vocabulary at construction time so real words are always assigned different IDs. |
to_num(data, image_size=None) → np.ndarray
| Parameter | Type | Description |
|---|---|---|
data |
str | list[str] | np.ndarray |
Input to convert. |
image_size |
tuple[int, int] | None |
Target (H, W) for image resizing. |
Return type depends on input:
| Input | dtype | Shape |
|---|---|---|
| Image path / list of paths | float32 |
(H, W, C) / (N, H, W, C) |
| Sentence | int32 |
(T,) |
| Category list | float32 |
(N, num_classes) |
| Sentence batch | int32 |
(N, max_len) |
np.ndarray |
float32 |
same as input |
decode(token_ids) → str
Converts a token-ID sequence back to whitespace-separated text. Padding tokens are silently dropped.
vocab_size → int
Number of entries currently in the vocabulary, including [PAD].
The [PAD] token and collision safety
FastNum reserves pad_token_id inside the vocabulary at construction time:
self.vocab = {"[PAD]": pad_token_id}
self.inverse_vocab = {pad_token_id: "[PAD]"}
Because [PAD] occupies a slot before any text is tokenised, _get_or_add assigns new words IDs equal to len(self.vocab), which can never equal pad_token_id again. This means:
- A padded cell in a token matrix will never decode to a real word.
decode()does not need a special-case filter beyondi != self.pad_token_id— the two sets are disjoint by construction.
Development
# Run tests with coverage
pytest
# Lint
ruff check fastnum
# Type-check
mypy fastnum
License
MIT © your-username
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fastnum-0.1.0.tar.gz.
File metadata
- Download URL: fastnum-0.1.0.tar.gz
- Upload date:
- Size: 7.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a8e050d9a43d34fb3a2c261573c89d68e5c981a9e9791766520485be6a2ea9f
|
|
| MD5 |
ef218a9deb76b211370b1c25a94f4c8d
|
|
| BLAKE2b-256 |
f35a36efdedc306a4286053809c53c0ca3a9ce962a47acc31d91a5250f56afce
|
File details
Details for the file fastnum-0.1.0-py3-none-any.whl.
File metadata
- Download URL: fastnum-0.1.0-py3-none-any.whl
- Upload date:
- Size: 3.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d74ea888e876b36bcaaf623faf92ccda3bc587bc7833315abcaabf815d8119d
|
|
| MD5 |
c067ecf2e286e03c35ef61d2435d6f41
|
|
| BLAKE2b-256 |
db6643dd7e96d7da33fe46c8db4de963316183abf190364b795769708a80cd06
|