Skip to main content

A PyCUDA package for memory-efficient, element-wise tensor operations with unequal dimensions (multiplication, division) on the GPU using batch processing.

Project description

tikos-utils-os

CUDA Batched Tensor Operations (tikos-utils-os)

A PyPI package that leverages PyCUDA for memory-efficient, element-wise tensor operations with unequal dimensions (multiplication and division) on NVIDIA GPUs.

This package is designed for large tensors that may not fit entirely in GPU VRAM. It processes the element-wise operations in slices (batches), ensuring that memory usage remains predictable and constrained. It implicitly handles broadcasting rules similar to NumPy, padding smaller tensors to match the shape of larger ones during the operation.


Installation

pip install tikos-utils-os

Note: This is a placeholder name. Once published, you would install it via the name you choose on PyPI.

Prerequisites

You must have the NVIDIA CUDA Toolkit installed and your environment correctly configured for PyCUDA to function.

Usage

Here are examples of how to use the multiply and divide functions.

Example: Element-wise Multiplication

import numpy as np
from tikos-utils-os import multiply

# Create two tensors of different shapes
tensor_a = np.random.randn(500, 10).astype(np.float32)
tensor_b = np.random.randn(160, 12800, 640).astype(np.float32)

# Perform memory-optimized multiplication on the GPU.
# The 2D tensor 'a' will be broadcast to match the 3D tensor 'b'.
# verbose=True prints logs and timing information.
# slice_size_mb controls the VRAM used for each batch.
product = multiply(tensor_a, tensor_b, slice_size_mb=128, verbose=True)

print("\nMultiplication complete.")
print(f"Result shape: {product.shape}")

Example: Safe Element-wise Division

import numpy as np
from tikos-utils-os import divide

# Create two tensors
numerator = np.full((100, 200, 300), 10, dtype=np.float32)
denominator = np.random.randn(100, 200, 300).astype(np.float32)

# Introduce some zeros into the denominator to test safety
denominator[10, 20, 30] = 0.0

# Perform safe division. The kernel ensures that division by zero results in 0.0.
result = divide(numerator, denominator, slice_size_mb=64, verbose=True)

print("\nDivision complete.")
# The value at result[10, 20, 30] should be 0 because of the safe division.
print(f"Value at result[10, 20, 30]: {result[10, 20, 30]}")
assert result[10, 20, 30] == 0.0
print("Verification successful!")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tikos_utils_os-0.1.0.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tikos_utils_os-0.1.0-py3-none-any.whl (14.9 kB view details)

Uploaded Python 3

File details

Details for the file tikos_utils_os-0.1.0.tar.gz.

File metadata

  • Download URL: tikos_utils_os-0.1.0.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for tikos_utils_os-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8f798ff8dbc96a8c34ede332faa3930662ae8c14aa96345896b5300b2ad8ea4e
MD5 e50cf033e5ae194b90a40b5bbe21c4c7
BLAKE2b-256 597935d9567cd7321d2a619cd46998d627ffaedb0b942dcf65a9de4d4bb1bf1a

See more details on using hashes here.

File details

Details for the file tikos_utils_os-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tikos_utils_os-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for tikos_utils_os-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a064b3301346645f846bfead16f605d91f431ae6704da7bdd47b72e325fb9a4a
MD5 36d8cf94ed429ecc2246b13d810111c6
BLAKE2b-256 986e6a8d9d94a770dda2029240fecd2c4a24935951f10e393ef7dd82db6df9c5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page