Unified image to tensor utility with streaming TFRecord support
Project description
img2tensor
A unified, high-performance utility to convert images into training-ready tensors for NumPy, PyTorch, and TensorFlow, or stream them directly into TFRecords.
img2tensor handles the standard deep learning data ingestion "papercuts": BGR/RGB swaps, memory layouts (NHWC vs NCHW), and dtype scaling in a single, lightweight function.
✨ Key Features
- Multi-Framework Support: Automatic conversion to
np.ndarray,torch.Tensor(NCHW), ortf.Tensor. - Lossless Augmentations: Geometric transformations (orthogonal rotations and flips) via pure NumPy axis permutations to avoid interpolation drift.
- High-Fidelity Resizing: Support for standard and aspect-ratio-preserving (letterboxed) resizing with synchronized interpolation across PIL and OpenCV backends.
- Deterministic Parallelism: Thread-safe execution with per-image seeding to guarantee reproducible results across runs.
- Automatic Memory Management: Internal RAM monitoring (70% threshold) to auto-calculate batch sizes and prevent OOM (Out-Of-Memory) crashes.
- Production Streaming: Native sharded TFRecord output for massive datasets, enabling parallel I/O during training.
🚀 Installation
pip install img2tensor
📖 Usage
1. Single Image (In-Memory)
Returns a 3D tensor ($C, H, W$ for PyTorch).
import img2tensor
Returns: torch.Tensor of shape (3, 224, 224)
tensor = img2tensor.get_tensor("cat.jpg", tensor_type="pytorch")
2. Batch Loading (In-Memory)
Returns a 4D tensor ($N, H, W, C$ for NumPy/TF).
Returns: np.ndarray of shape (32, 224, 224, 3)
batch = img2tensor.get_tensor(list_of_paths, n_jobs=8)
3. Production Pipeline (TFRecord)
Writes to disk using a chunked streaming approach to save RAM.
img2tensor.get_tensor( img_paths=large_list_of_paths, output_format="tfrecord", tfrecord_path="dataset.tfrecord", n_jobs=12 )
4. High-Fidelity Resizing (Letterboxed)
Resize images while maintaining the original aspect ratio using high-quality bicubic interpolation.
import img2tensor
Returns: torch.Tensor of shape (3, 224, 224)
Pads with black (default) to keep the original image proportions
tensor = img2tensor.get_tensor( "input.jpg", tensor_type="pytorch", resize=(224, 224), preserve_aspect_ratio=True )
🧠 Resizing and Augmentation Logic
Our get_tensor utility implements a "Quality-First" approach to data preparation. When features are enabled without specific parameters, the following internal defaults are applied to ensure scientific reproducibility and high signal-to-noise ratios.
1. High-Fidelity Resizing
Resizing often involves interpolation, which can introduce artifacts or blurriness if not managed carefully.
- Default Interpolation (Bicubic): If
resizeis provided butinterpolationisNone, the system defaults to Bicubic interpolation. This method uses a $4 \times 4$ pixel neighborhood for calculation, resulting in sharper edges and better detail preservation than the standard Bilinear method. - Backend Parity: The function synchronizes interpolation flags across PIL and OpenCV. This ensures that "Bicubic" resizing yields numerically consistent results regardless of the underlying decoder.
- Aspect Ratio Preservation: When
preserve_aspect_ratio=Trueis set, the image is scaled to fit the target dimensions without stretching. Any remaining space is filled using Letterboxing with a defaultletterbox_color(black).
2. Lossless Geometric Augmentations
Standard rotations (e.g., $15^\circ$) require interpolation that "guesses" new pixel values, creating blur. img2tensor enforces a Lossless Philosophy.
- D4 Symmetry Group: When
augmentation=Trueis enabled, the utility randomly selects from bit-perfect orthogonal transformations, including $90^\circ, 180^\circ, 270^\circ$ rotations and horizontal/vertical flips. - Pure NumPy Permutations: These operations are executed using
np.rot90andnp.flip. Because these are memory-address rearrangements (swapping axes), they are mathematically lossless—no new pixels are generated and zero information is lost.
3. Internal Safety Defaults
| Parameter | Internal Default | Rationale |
|---|---|---|
interpolation |
bicubic |
Prioritizes higher image quality for model training over faster, blurrier methods. |
augmentation_seed |
None |
If provided, generates a unique but deterministic seed per image path to ensure experiments are 100% reproducible. |
Memory Threshold |
0.7 |
Automatically monitors available RAM and caps usage at 70% to prevent system-wide OOM (Out-of-Memory) crashes. |
| Channel Sync | RGB |
Automatically replicates 1-channel Grayscale to 3-channels and strips Alpha from RGBA to maintain uniform batch shapes. |
🛠 API Reference: get_tensor()
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
img_paths |
str | Path | list |
Required | Single path or list of paths to image files. |
tensor_type |
str |
"numpy" |
Target framework: "numpy", "pytorch", or "tensorflow". |
dtype |
str |
"float32" |
Target type: "float32", "float16", "uint8". Floats are auto-scaled (1/255). |
image_layer |
str |
"PIL" |
Backend decoder: "PIL" or "OpenCV". |
n_jobs |
int |
4 |
Number of threads for parallel processing and decoding. |
output_format |
str |
"tensor" |
"tensor" (returns object) or "tfrecord" (writes to disk). |
tfrecord_path |
str | Path |
None |
Required if output_format='tfrecord'. |
num_shards |
int |
1 |
Number of shards to split TFRecord output into. |
resize |
tuple |
None |
(H, W) target size. Defaults to Bicubic interpolation if set. |
interpolation |
str |
None |
nearest, bilinear, bicubic, area, or lanczos (PIL only). |
preserve_aspect_ratio |
bool |
False |
Uses Letterboxing (padding) to maintain original aspect ratio. |
augmentation |
bool |
None |
Enables Lossless geometric augmentations (D4 symmetry group). |
augmentation_angles |
list |
[90, 180, 270] |
Specific orthogonal angles to select from when augmentation=True. |
augmentation_seed |
int |
None |
Seed for deterministic and reproducible augmentation results. |
Outputs
- Single Path Input: Returns a 3D Tensor ($H, W, C$ for NumPy/TF; $C, H, W$ for PyTorch).
- List Input: Returns a 4D Tensor ($N, H, W, C$ for NumPy/TF; $N, C, H, W$ for PyTorch).
- TFRecord Mode: Returns a success dictionary containing shard metadata and file counts.
🧠 Design Philosophy
Our design approach for img2tensor is centered on numerical precision, scientific reproducibility, and production reliability. We aim to eliminate the common "silent bugs" that occur during the transition from data loading to model training.
1. Framework-Aware Layouts ($NCHW$ vs $NHWC$)
One of the most frequent errors in Computer Vision pipelines is passing the incorrect channel layout to a model. img2tensor automatically detects your tensor_type and reorders dimensions accordingly:
- PyTorch: Returns $N \times C \times H \times W$ and ensures memory is
.contiguous(). - NumPy/TensorFlow: Returns $N \times H \times W \times C$.
2. Lossless vs. Lossy Augmentation
Standard library rotations often use rotate() or warpAffine(), which introduce interpolation blur and "black triangle" artifacts at the corners. img2tensor enforces a Lossless Philosophy:
- Memory Permutations: We use pure NumPy axis permutations (
rot90,fliplr) to perform geometric transformations. - Bit-Perfect Integrity: Because these operations simply rearrange existing memory addresses, they are mathematically perfect—no new pixel values are "guessed," and the original image signal remains identical.
3. Synchronized High-Fidelity Resizing
Standard libraries (PIL vs. OpenCV) often have different default behaviors for interpolation. img2tensor synchronizes interpolation flags between both backends:
- Default Bicubic: We default to Bicubic interpolation over the standard Bilinear to ensure sharper edges and better detail retention for deep learning features.
- Letterboxing: When
preserve_aspect_ratiois enabled, we use a letterboxing strategy that scales the image to fit the target dimensions without distortion, padding the remaining area with a consistent color.
4. Deterministic Parallelism
In most libraries, multi-threading can break reproducibility because the order of operations depends on thread scheduling.
- Per-Path Seeding:
img2tensorpre-calculates an independent seed for every image path before starting the thread pool. - Guarantee: This ensures that a specific
augmentation_seedwill produce the exact same augmented batch regardless of your hardware, the number of workers (n_jobs), or the thread execution order.
5. Industrial-Grade Memory Safety
To prevent the "OOM (Out-Of-Memory) Crash" common when processing large datasets, the library utilizes psutil to monitor real-time available RAM.
- RAM Thresholding: We cap memory usage at 70% of available system RAM.
- Auto-Chunking: The utility automatically calculates the memory footprint of your request and chunks the dataset into
safe_batch_sizegroups, allowing you to process millions of images on a standard workstation without crashing the kernel.
📄 License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file img2tensor-0.1.4.tar.gz.
File metadata
- Download URL: img2tensor-0.1.4.tar.gz
- Upload date:
- Size: 9.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e18d2b6a54a20f55cf2b1da47fe3ce53adca5585a43bf8294146ad6d7aaff205
|
|
| MD5 |
d2d7206b925129b975de59df583237af
|
|
| BLAKE2b-256 |
4412a430039d75964e34de80ccbe53cfefb06cfb99d15e7cffca92c44dfc739c
|
Provenance
The following attestation bundles were made for img2tensor-0.1.4.tar.gz:
Publisher:
publish.yml on sourabhyadav999/img2tensor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
img2tensor-0.1.4.tar.gz -
Subject digest:
e18d2b6a54a20f55cf2b1da47fe3ce53adca5585a43bf8294146ad6d7aaff205 - Sigstore transparency entry: 813208869
- Sigstore integration time:
-
Permalink:
sourabhyadav999/img2tensor@1a14e75fee556e72e4f92e0590c5745dc2a0f779 -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/sourabhyadav999
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1a14e75fee556e72e4f92e0590c5745dc2a0f779 -
Trigger Event:
release
-
Statement type:
File details
Details for the file img2tensor-0.1.4-py3-none-any.whl.
File metadata
- Download URL: img2tensor-0.1.4-py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb092cb5660b64f76f7b9b070ecc23013b4fdd0202531737ef067b536fbd2087
|
|
| MD5 |
394d88ada514c40083b4f5c29db54fd7
|
|
| BLAKE2b-256 |
512d420721bff3f819c88e3a4b16379edecdc9d43c005cbbaff30be009b26c9a
|
Provenance
The following attestation bundles were made for img2tensor-0.1.4-py3-none-any.whl:
Publisher:
publish.yml on sourabhyadav999/img2tensor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
img2tensor-0.1.4-py3-none-any.whl -
Subject digest:
bb092cb5660b64f76f7b9b070ecc23013b4fdd0202531737ef067b536fbd2087 - Sigstore transparency entry: 813208871
- Sigstore integration time:
-
Permalink:
sourabhyadav999/img2tensor@1a14e75fee556e72e4f92e0590c5745dc2a0f779 -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/sourabhyadav999
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1a14e75fee556e72e4f92e0590c5745dc2a0f779 -
Trigger Event:
release
-
Statement type: