Skip to main content

Zero-boilerplate converter from raw data (images, text, categories) to numeric NumPy arrays.

Project description

FastNum

Zero-boilerplate conversion from raw data to numeric NumPy arrays.

FastNum detects the kind of data you hand it — an image path, a sentence, a category list, or a batch of any of these — and returns the right numeric representation without a single line of configuration.


Why FastNum?

Most ML preprocessing pipelines repeat the same four patterns:

Input Desired output
Image file Float32 pixel array normalised to [0, 1]
Text sentence Integer token-ID sequence
Flat category list One-hot matrix
Batch of sentences Padded token-ID matrix

FastNum collapses all four into one call: fn.to_num(data).


Installation

pip install fastnum

Or from source:

git clone https://github.com/your-username/fastnum.git
cd fastnum
pip install -e ".[dev]"

Requirements: Python ≥ 3.9, numpy ≥ 1.24, opencv-python ≥ 4.8.


Quick start

from fastnum import FastNum

fn = FastNum()

# --- Image -----------------------------------------------------------
pixels = fn.to_num("photo.jpg")          # (H, W, 3) float32, values in [0, 1]
pixels = fn.to_num("photo.jpg", image_size=(224, 224))  # resize on the fly

# Batch of images (all resized to the same shape for stacking)
batch = fn.to_num(["a.jpg", "b.jpg"], image_size=(224, 224))  # (2, 224, 224, 3)

# --- Plain text ------------------------------------------------------
tokens = fn.to_num("the cat sat on the mat")   # int32 array of token IDs
print(fn.decode(tokens))                        # → "the cat sat on the mat"

# --- Category list ---------------------------------------------------
labels = ["dog", "cat", "dog", "bird"]
one_hot = fn.to_num(labels)
# array([[0., 1., 0.],
#        [1., 0., 0.],
#        [0., 1., 0.],
#        [0., 0., 1.]], dtype=float32)

# --- Sentence batch --------------------------------------------------
matrix = fn.to_num(["hello world", "foo bar baz"])
# int32 matrix (2, 3), shorter rows are right-padded with pad_token_id

# --- Raw NumPy array -------------------------------------------------
import numpy as np
fn.to_num(np.array([1, 2, 3]))              # cast to float32, no-op otherwise

API reference

FastNum(pad_token_id=0)

Parameter Type Default Description
pad_token_id int 0 ID reserved for the [PAD] token. The special token is inserted into the vocabulary at construction time so real words are always assigned different IDs.

to_num(data, image_size=None) → np.ndarray

Parameter Type Description
data str | list[str] | np.ndarray Input to convert.
image_size tuple[int, int] | None Target (H, W) for image resizing.

Return type depends on input:

Input dtype Shape
Image path / list of paths float32 (H, W, C) / (N, H, W, C)
Sentence int32 (T,)
Category list float32 (N, num_classes)
Sentence batch int32 (N, max_len)
np.ndarray float32 same as input

decode(token_ids) → str

Converts a token-ID sequence back to whitespace-separated text. Padding tokens are silently dropped.

vocab_size → int

Number of entries currently in the vocabulary, including [PAD].


The [PAD] token and collision safety

FastNum reserves pad_token_id inside the vocabulary at construction time:

self.vocab        = {"[PAD]": pad_token_id}
self.inverse_vocab = {pad_token_id: "[PAD]"}

Because [PAD] occupies a slot before any text is tokenised, _get_or_add assigns new words IDs equal to len(self.vocab), which can never equal pad_token_id again. This means:

  • A padded cell in a token matrix will never decode to a real word.
  • decode() does not need a special-case filter beyond i != self.pad_token_id — the two sets are disjoint by construction.

Development

# Run tests with coverage
pytest

# Lint
ruff check fastnum

# Type-check
mypy fastnum

License

MIT © Ali

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastnum-0.1.2.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastnum-0.1.2-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file fastnum-0.1.2.tar.gz.

File metadata

  • Download URL: fastnum-0.1.2.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for fastnum-0.1.2.tar.gz
Algorithm Hash digest
SHA256 20f51685aec66d3acdf0ded052166b2964f6df10d9a9f735a60033ccc544410d
MD5 749cb79032ec90a036e14b4cd4c12120
BLAKE2b-256 d78d638d9a218c164c5a5bf10f5f47930bd1a2235f6253a72f906dbe293cdf72

See more details on using hashes here.

File details

Details for the file fastnum-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: fastnum-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 7.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for fastnum-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 24cd173d55a12bdf11c0e596a2c8f190eea09a3e1f67d50b49841d715ea5172e
MD5 27d2528fe72cd2703ad070c922426aec
BLAKE2b-256 33a3a927493b55358a5ae4f48aa83595817c5dd18707f82edef920d05eb10402

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page