Skip to main content

Unified image to tensor utility with streaming TFRecord support

Project description

img2tensor

A unified, high-performance utility to convert images into training-ready tensors for NumPy, PyTorch, and TensorFlow, or stream them directly into TFRecords.

img2tensor handles the standard deep learning data ingestion "papercuts": BGR/RGB swaps, memory layouts (NHWC vs NCHW), and dtype scaling in a single, lightweight function.


✨ Key Features

  • Multi-Framework Support: Automatic conversion to np.ndarray, torch.Tensor (NCHW), or tf.Tensor.
  • Lossless Augmentations: Geometric transformations (orthogonal rotations and flips) via pure NumPy axis permutations to avoid interpolation drift.
  • High-Fidelity Resizing: Support for standard and aspect-ratio-preserving (letterboxed) resizing with synchronized interpolation across PIL and OpenCV backends.
  • Deterministic Parallelism: Thread-safe execution with per-image seeding to guarantee reproducible results across runs.
  • Automatic Memory Management: Internal RAM monitoring (70% threshold) to auto-calculate batch sizes and prevent OOM (Out-Of-Memory) crashes.
  • Production Streaming: Native sharded TFRecord output for massive datasets, enabling parallel I/O during training.

🚀 Installation

pip install img2tensor


📖 Usage

1. Single Image (In-Memory)

Returns a 3D tensor ($C, H, W$ for PyTorch).

import img2tensor

Returns: torch.Tensor of shape (3, 224, 224)

tensor = img2tensor.get_tensor("cat.jpg", tensor_type="pytorch")

2. Batch Loading (In-Memory)

Returns a 4D tensor ($N, H, W, C$ for NumPy/TF).

Returns: np.ndarray of shape (32, 224, 224, 3)

batch = img2tensor.get_tensor(list_of_paths, n_jobs=8)

3. Production Pipeline (TFRecord)

Writes to disk using a chunked streaming approach to save RAM.

img2tensor.get_tensor( img_paths=large_list_of_paths, output_format="tfrecord", tfrecord_path="dataset.tfrecord", n_jobs=12 )

4. High-Fidelity Resizing (Letterboxed)

Resize images while maintaining the original aspect ratio using high-quality bicubic interpolation.

import img2tensor

Returns: torch.Tensor of shape (3, 224, 224)

Pads with black (default) to keep the original image proportions

tensor = img2tensor.get_tensor( "input.jpg", tensor_type="pytorch", resize=(224, 224), preserve_aspect_ratio=True )

🧠 Resizing and Augmentation Logic

Our get_tensor utility implements a "Quality-First" approach to data preparation. When features are enabled without specific parameters, the following internal defaults are applied to ensure scientific reproducibility and high signal-to-noise ratios.

1. High-Fidelity Resizing

Resizing often involves interpolation, which can introduce artifacts or blurriness if not managed carefully.

  • Default Interpolation (Bicubic): If resize is provided but interpolation is None, the system defaults to Bicubic interpolation. This method uses a $4 \times 4$ pixel neighborhood for calculation, resulting in sharper edges and better detail preservation than the standard Bilinear method.
  • Backend Parity: The function synchronizes interpolation flags across PIL and OpenCV. This ensures that "Bicubic" resizing yields numerically consistent results regardless of the underlying decoder.
  • Aspect Ratio Preservation: When preserve_aspect_ratio=True is set, the image is scaled to fit the target dimensions without stretching. Any remaining space is filled using Letterboxing with a default letterbox_color (black).

2. Lossless Geometric Augmentations

Standard rotations (e.g., $15^\circ$) require interpolation that "guesses" new pixel values, creating blur. img2tensor enforces a Lossless Philosophy.

  • D4 Symmetry Group: When augmentation=True is enabled, the utility randomly selects from bit-perfect orthogonal transformations, including $90^\circ, 180^\circ, 270^\circ$ rotations and horizontal/vertical flips.
  • Pure NumPy Permutations: These operations are executed using np.rot90 and np.flip. Because these are memory-address rearrangements (swapping axes), they are mathematically lossless—no new pixels are generated and zero information is lost.

3. Internal Safety Defaults

Parameter Internal Default Rationale
interpolation bicubic Prioritizes higher image quality for model training over faster, blurrier methods.
augmentation_seed None If provided, generates a unique but deterministic seed per image path to ensure experiments are 100% reproducible.
Memory Threshold 0.7 Automatically monitors available RAM and caps usage at 70% to prevent system-wide OOM (Out-of-Memory) crashes.
Channel Sync RGB Automatically replicates 1-channel Grayscale to 3-channels and strips Alpha from RGBA to maintain uniform batch shapes.

🛠 API Reference: get_tensor()

Inputs

Parameter Type Default Description
img_paths str | Path | list Required Single path or list of paths to image files.
tensor_type str "numpy" Target framework: "numpy", "pytorch", or "tensorflow".
dtype str "float32" Target type: "float32", "float16", "uint8". Floats are auto-scaled (1/255).
image_layer str "PIL" Backend decoder: "PIL" or "OpenCV".
n_jobs int 4 Number of threads for parallel processing and decoding.
output_format str "tensor" "tensor" (returns object) or "tfrecord" (writes to disk).
tfrecord_path str | Path None Required if output_format='tfrecord'.
num_shards int 1 Number of shards to split TFRecord output into.
resize tuple None (H, W) target size. Defaults to Bicubic interpolation if set.
interpolation str None nearest, bilinear, bicubic, area, or lanczos (PIL only).
preserve_aspect_ratio bool False Uses Letterboxing (padding) to maintain original aspect ratio.
augmentation bool None Enables Lossless geometric augmentations (D4 symmetry group).
augmentation_angles list [90, 180, 270] Specific orthogonal angles to select from when augmentation=True.
augmentation_seed int None Seed for deterministic and reproducible augmentation results.

Outputs

  • Single Path Input: Returns a 3D Tensor ($H, W, C$ for NumPy/TF; $C, H, W$ for PyTorch).
  • List Input: Returns a 4D Tensor ($N, H, W, C$ for NumPy/TF; $N, C, H, W$ for PyTorch).
  • TFRecord Mode: Returns a success dictionary containing shard metadata and file counts.

🧠 Design Philosophy

Our design approach for img2tensor is centered on numerical precision, scientific reproducibility, and production reliability. We aim to eliminate the common "silent bugs" that occur during the transition from data loading to model training.

1. Framework-Aware Layouts ($NCHW$ vs $NHWC$)

One of the most frequent errors in Computer Vision pipelines is passing the incorrect channel layout to a model. img2tensor automatically detects your tensor_type and reorders dimensions accordingly:

  • PyTorch: Returns $N \times C \times H \times W$ and ensures memory is .contiguous().
  • NumPy/TensorFlow: Returns $N \times H \times W \times C$.

2. Lossless vs. Lossy Augmentation

Standard library rotations often use rotate() or warpAffine(), which introduce interpolation blur and "black triangle" artifacts at the corners. img2tensor enforces a Lossless Philosophy:

  • Memory Permutations: We use pure NumPy axis permutations (rot90, fliplr) to perform geometric transformations.
  • Bit-Perfect Integrity: Because these operations simply rearrange existing memory addresses, they are mathematically perfect—no new pixel values are "guessed," and the original image signal remains identical.

3. Synchronized High-Fidelity Resizing

Standard libraries (PIL vs. OpenCV) often have different default behaviors for interpolation. img2tensor synchronizes interpolation flags between both backends:

  • Default Bicubic: We default to Bicubic interpolation over the standard Bilinear to ensure sharper edges and better detail retention for deep learning features.
  • Letterboxing: When preserve_aspect_ratio is enabled, we use a letterboxing strategy that scales the image to fit the target dimensions without distortion, padding the remaining area with a consistent color.

4. Deterministic Parallelism

In most libraries, multi-threading can break reproducibility because the order of operations depends on thread scheduling.

  • Per-Path Seeding: img2tensor pre-calculates an independent seed for every image path before starting the thread pool.
  • Guarantee: This ensures that a specific augmentation_seed will produce the exact same augmented batch regardless of your hardware, the number of workers (n_jobs), or the thread execution order.

5. Industrial-Grade Memory Safety

To prevent the "OOM (Out-Of-Memory) Crash" common when processing large datasets, the library utilizes psutil to monitor real-time available RAM.

  • RAM Thresholding: We cap memory usage at 70% of available system RAM.
  • Auto-Chunking: The utility automatically calculates the memory footprint of your request and chunks the dataset into safe_batch_size groups, allowing you to process millions of images on a standard workstation without crashing the kernel.

📄 License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

img2tensor-0.1.4.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

img2tensor-0.1.4-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file img2tensor-0.1.4.tar.gz.

File metadata

  • Download URL: img2tensor-0.1.4.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for img2tensor-0.1.4.tar.gz
Algorithm Hash digest
SHA256 e18d2b6a54a20f55cf2b1da47fe3ce53adca5585a43bf8294146ad6d7aaff205
MD5 d2d7206b925129b975de59df583237af
BLAKE2b-256 4412a430039d75964e34de80ccbe53cfefb06cfb99d15e7cffca92c44dfc739c

See more details on using hashes here.

Provenance

The following attestation bundles were made for img2tensor-0.1.4.tar.gz:

Publisher: publish.yml on sourabhyadav999/img2tensor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file img2tensor-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: img2tensor-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for img2tensor-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 bb092cb5660b64f76f7b9b070ecc23013b4fdd0202531737ef067b536fbd2087
MD5 394d88ada514c40083b4f5c29db54fd7
BLAKE2b-256 512d420721bff3f819c88e3a4b16379edecdc9d43c005cbbaff30be009b26c9a

See more details on using hashes here.

Provenance

The following attestation bundles were made for img2tensor-0.1.4-py3-none-any.whl:

Publisher: publish.yml on sourabhyadav999/img2tensor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page