Skip to main content

Zero-boilerplate converter from raw data (images, text, categories) to numeric NumPy arrays.

Project description

FastNum

Zero-boilerplate conversion from raw data to numeric NumPy arrays.

FastNum detects the kind of data you hand it — an image path, a sentence, a category list, or a batch of any of these — and returns the right numeric representation without a single line of configuration.


Why FastNum?

Most ML preprocessing pipelines repeat the same four patterns:

Input Desired output
Image file Float32 pixel array normalised to [0, 1]
Text sentence Integer token-ID sequence
Flat category list One-hot matrix
Batch of sentences Padded token-ID matrix

FastNum collapses all four into one call: fn.to_num(data).


Installation

pip install fastnum

Or from source:

git clone https://github.com/your-username/fastnum.git
cd fastnum
pip install -e ".[dev]"

Requirements: Python ≥ 3.9, numpy ≥ 1.24, opencv-python ≥ 4.8.


Quick start

from fastnum import FastNum

fn = FastNum()

# --- Image -----------------------------------------------------------
pixels = fn.to_num("photo.jpg")          # (H, W, 3) float32, values in [0, 1]
pixels = fn.to_num("photo.jpg", image_size=(224, 224))  # resize on the fly

# Batch of images (all resized to the same shape for stacking)
batch = fn.to_num(["a.jpg", "b.jpg"], image_size=(224, 224))  # (2, 224, 224, 3)

# --- Plain text ------------------------------------------------------
tokens = fn.to_num("the cat sat on the mat")   # int32 array of token IDs
print(fn.decode(tokens))                        # → "the cat sat on the mat"

# --- Category list ---------------------------------------------------
labels = ["dog", "cat", "dog", "bird"]
one_hot = fn.to_num(labels)
# array([[0., 1., 0.],
#        [1., 0., 0.],
#        [0., 1., 0.],
#        [0., 0., 1.]], dtype=float32)

# --- Sentence batch --------------------------------------------------
matrix = fn.to_num(["hello world", "foo bar baz"])
# int32 matrix (2, 3), shorter rows are right-padded with pad_token_id

# --- Raw NumPy array -------------------------------------------------
import numpy as np
fn.to_num(np.array([1, 2, 3]))              # cast to float32, no-op otherwise

API reference

FastNum(pad_token_id=0)

Parameter Type Default Description
pad_token_id int 0 ID reserved for the [PAD] token. The special token is inserted into the vocabulary at construction time so real words are always assigned different IDs.

to_num(data, image_size=None) → np.ndarray

Parameter Type Description
data str | list[str] | np.ndarray Input to convert.
image_size tuple[int, int] | None Target (H, W) for image resizing.

Return type depends on input:

Input dtype Shape
Image path / list of paths float32 (H, W, C) / (N, H, W, C)
Sentence int32 (T,)
Category list float32 (N, num_classes)
Sentence batch int32 (N, max_len)
np.ndarray float32 same as input

decode(token_ids) → str

Converts a token-ID sequence back to whitespace-separated text. Padding tokens are silently dropped.

vocab_size → int

Number of entries currently in the vocabulary, including [PAD].


The [PAD] token and collision safety

FastNum reserves pad_token_id inside the vocabulary at construction time:

self.vocab        = {"[PAD]": pad_token_id}
self.inverse_vocab = {pad_token_id: "[PAD]"}

Because [PAD] occupies a slot before any text is tokenised, _get_or_add assigns new words IDs equal to len(self.vocab), which can never equal pad_token_id again. This means:

  • A padded cell in a token matrix will never decode to a real word.
  • decode() does not need a special-case filter beyond i != self.pad_token_id — the two sets are disjoint by construction.

Development

# Run tests with coverage
pytest

# Lint
ruff check fastnum

# Type-check
mypy fastnum

License

MIT © your-username

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastnum-0.1.1.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastnum-0.1.1-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file fastnum-0.1.1.tar.gz.

File metadata

  • Download URL: fastnum-0.1.1.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for fastnum-0.1.1.tar.gz
Algorithm Hash digest
SHA256 ffca40a01b03c3e1e660c1174008fed3f87a99db72c339b2d7211c0719932f76
MD5 503bb99004c1a860433a117088a7405c
BLAKE2b-256 9c36fda29cfdc9a94dccae4911c56c9820c8e3751db8d7dc0a5c14d50b972f3f

See more details on using hashes here.

File details

Details for the file fastnum-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: fastnum-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for fastnum-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 052725d7807af7ca82d4a8505005178de4420edf2a8995713cf13042c6432235
MD5 fa19ef001187c27d597aa3e7de616b74
BLAKE2b-256 f4c48a5b5ac186e1c3e0cd7aec05de5a0616b309e02e2e71c2326e51dc2a23bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page