LensAI: A Python Library for Image and Model Profiling with Probabilistic Data Structures using TensorFlow

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Build Status

Lens AI - Data and Model Monitoring Library for Edge AI Applications

Lensai Profiler Python is a Python library designed to profile data using probabilistic data structures. These profiles are essential for computing model and data drift, ensuring efficient and accurate monitoring of your AI models on edge devices. The Pytorch support is planned in the upcoming releases.

The following Lens AI tools

Lens AI Python Profiler
Lens AI C++ Profiler
Lens AI Monitoring Server

alt text

Model monitoring pipeline for Edge devices

Model Training - Use the lensai_profiler_tf

During the model training process, use the lensai_profiler_tf library to profile the training data. This step is crucial for establishing baseline profiles for data and model performance.

Model Deployment - Use the lensai_c++_profiler (As most of the edge models use C++ , python support in next release)

During model deployment, integrate the Lensai C++ profiling code from the following repository: GitHub Repository. This integration profiles the real-time data during inference, enabling continuous monitoring.

Monitoring - Use lensai Monitoring Server

Use the LensAI Server to monitor data and model drift by comparing profiles from the training phase and real-time data obtained during inference. Detailed instructions and implementation examples are available in the examples directory.

Key Advantages

The time complexity of the distribution profiles are sublinear space and time
The space complexity is 𝑂(1/𝜖log(𝜖𝑁)) where as traditionally it is O(N).
The time complexity while insertion and query time is O(log(1/ϵ)), where as traditionally it is O(N log(N))

Scalability:

The KLL sketch scales well with large datasets due to its logarithmic space complexity. This makes it suitable for applications involving big data where traditional methods would be too costly.

Streaming Data:

KLL sketches are particularly well-suited for streaming data scenarios. They allow for efficient updates and can provide approximate quantiles on-the-fly without needing to store the entire dataset.

Accuracy:

The KLL sketch provides a controllable trade-off between accuracy and space. By adjusting the error parameter 𝜖, one can balance the precision of the quantile approximations against the memory footprint.

Ease of Use:

The KLL sketch can be easily merged, allowing for distributed computation and aggregation of results from multiple data streams or partitions.

Practical Performance:

In practice, the KLL sketch offers high accuracy for quantile estimation with significantly less memory usage than exact methods, making it a practical choice for many real-world applications.

Installation

Install the Python package using pip:

pip install lensai_profiler_tf

Features

Computes the distribution of image metrics
Saves the metrics in corresponding files for further analysis
Generated the metrics qunatiles that can be used on the edge device for sampling
✨Magic ✨

Documentation:

Please refer for more detailed documentation:

a link

Usage

For detailed usage, please refer to the provided iPython notebook. : This gives the metrics distributions and also their quantiles. These are used as the base line when computing data drift.

NOTE: Please make sure the order in which you register the metrics need to be same as the order in which you they are returned from the batch processing

Tensor Flow Usage

import tensorflow as tf
from lensai_profiler import Metrics, calculate_percentiles
from lensai_profiler.sketches import Sketches
import os
import matplotlib.pyplot as plt
import numpy as np

_URL = 'https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip'
path_to_zip = tf.keras.utils.get_file('cats_and_dogs.zip', origin=_URL, extract=True)
PATH = os.path.join(os.path.dirname(path_to_zip), 'cats_and_dogs_filtered')

train_dir = os.path.join(PATH, 'train')
validation_dir = os.path.join(PATH, 'validation')

BATCH_SIZE = 32
IMG_SIZE = (160, 160)

train_dataset = tf.keras.utils.image_dataset_from_directory(train_dir,
                                                            shuffle=True,
                                                            batch_size=BATCH_SIZE,
                                                            image_size=IMG_SIZE)

metrics = Metrics(framework="tf")

# Initialize Sketches class
num_channels = 3  # Assuming RGB images
sketches = Sketches()

sketches.register_metric('brightness')
sketches.register_metric('sharpness')
sketches.register_metric('noise')

sketches.register_metric('mean', num_channels=3)
sketches.register_metric('pixel', num_channels=3)


# Process a batch of images in parallel
train_dataset = train_dataset.map(lambda images, labels: metrics.process_batch(images), num_parallel_calls=tf.data.AUTOTUNE)

# Iterate through the dataset and update the KLL sketches in parallel
for batch in train_dataset:
    brightness, sharpness, mean, noise, pixel = batch
    sketches.update_sketches(
        brightness=brightness,
        sharpness=sharpness,
        mean=channel_mean,
        noise=noise,
        pixel=pixel
    )

# Save the KLL sketches to a specified directory
save_path = '/content/sample_data/'
sketches.save_sketches(save_path)
sketches.publish_sketches(save_path, lensai_server_endpoint)

Pytorch Usage

import multiprocessing as mp
import torch
from torch.utils.data import DataLoader
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from lensai_profiler import Metrics
from lensai_profiler.sketches import Sketches
import os

# Set multiprocessing start method
mp.set_start_method('spawn', force=True)

# Path setup (equivalent to TensorFlow code)
URL = 'https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip'
path_to_zip = tf.keras.utils.get_file('cats_and_dogs.zip', origin=URL, extract=True)
PATH = os.path.join(os.path.dirname(path_to_zip), 'cats_and_dogs_filtered')

train_dir = os.path.join(PATH, 'train')
validation_dir = os.path.join(PATH, 'validation')

# Image transformation and dataset setup
transform = transforms.Compose([
    transforms.Resize((160, 160)),
    transforms.ToTensor(),
])

train_dataset = datasets.ImageFolder(train_dir, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True, num_workers=0)  # Set num_workers to 0 to avoid threading issues

# Initialize Metrics and Sketches classes
metrics = Metrics(framework="pt")
sketches = Sketches()

num_channels = 3  # Assuming RGB images
sketches.register_metric('brightness')
sketches.register_metric('sharpness')
sketches.register_metric('noise')
sketches.register_metric('mean', num_channels=3)
sketches.register_metric('pixel', num_channels=3)

# Process the dataset and update the KLL sketches
for images, labels in train_loader:
    images = images.to('cuda' if torch.cuda.is_available() else 'cpu')
    brightness, sharpness, mean, noise, pixel = metrics.process_batch(images)
    
    sketches.update_sketches(
        brightness=brightness.cpu(),
        sharpness=sharpness.cpu(),
        mean=mean.cpu(),
        noise=noise.cpu(),
        pixel=pixel.cpu()
    )

# Save the KLL sketches to a specified directory
save_path = '/content/'
sketches.save_sketches(save_path)
sketches.publish_sketches(save_path, lensai_server_endpoint)

Embeddings Logging - TF

class FinalEpochMetricsAndEmbeddingCallback(tf.keras.callbacks.Callback):
    def __init__(self, model, train_dataset, metrics, sketches, layer_name, save_path):
        super().__init__()
        self.train_dataset = train_dataset
        self.metrics = metrics
        self.sketches = sketches
        self.layer_name = layer_name
        self.save_path = save_path
        os.makedirs(save_path, exist_ok=True)

    def on_train_end(self, logs=None):
        print("Final epoch ended. Computing metrics and embeddings...")

        # Create the embedding model based on the trained model
        embedding_model = tf.keras.Model(inputs=self.model.input, outputs=self.model.get_layer(self.layer_name).output)

        all_embeddings = []
        for batch in self.train_dataset:
            images, labels = batch

            # Compute metrics for the batch
            brightness, sharpness, mean, noise, pixel = self.metrics.process_batch(images)

            # Get embeddings from the specified layer
            embeddings = embedding_model.predict(images)
            all_embeddings.append(embeddings)

            # Update sketches with the calculated metrics and embeddings
            self.sketches.update_sketches(
                brightness=brightness,
                sharpness=sharpness,
                mean=mean,
                noise=noise,
                pixel=pixel,
                embeddings=embeddings
            )

        # Save the sketches to a specified directory
        self.sketches.save_sketches(self.save_path)

Complete Model monitoring pipeline

Model Training

During the model training process, use the lensai_profiler_tf library to profile the training data. This step is crucial for establishing baseline profiles for data and model performance.

Model Deployment

Monitoring

Repository Link: GitHub Repository

Examples and API Documentation For implementation details, refer to the examples directory. The API documentation can be found at the following link: API Documentation.

Future Development

The current library focuses on image data profiling and monitoring. Future releases will extend support to other data types.

Contributions

We welcome new feature requests and bug reports. Feel free to create issues and contribute to the project.

Thank you for using Lens AI for your edge AI applications. For more information, please visit our GitHub Repository.

Lens AI Profiler Python uses a number of open source projects to work properly:

[Datasketches]
[Numpy]
[Scipy]

And of course lens_ai_profiler itself is open source with a [public repository][dill] on GitHub.

Upcoming features

Lens AI tensorflow is currently extended with the following features.

Feature	Version
Adding tracking metrics	1.3.0
Adding Inference time profiling	1.4.0
Adding Audio data metrics	1.5.0

Development

Want to contribute? Great!

License

AGPL

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

1.2.2

Dec 6, 2024

1.2.1

Dec 6, 2024

1.2.0

Sep 3, 2024

1.1.2

Aug 25, 2024

1.1.1

Aug 24, 2024

1.1.0

Aug 23, 2024

1.0.0

Jun 23, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lensai_profiler-1.2.2.tar.gz (18.1 kB view details)

Uploaded Dec 6, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lensai_profiler-1.2.2-py3-none-any.whl (16.4 kB view details)

Uploaded Dec 6, 2024 Python 3

File details

Details for the file lensai_profiler-1.2.2.tar.gz.

File metadata

Download URL: lensai_profiler-1.2.2.tar.gz
Upload date: Dec 6, 2024
Size: 18.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.10.12

File hashes

Hashes for lensai_profiler-1.2.2.tar.gz
Algorithm	Hash digest
SHA256	`17234ca83a9930c80bab1bc0f861c9ff312d1f5f87733b129755168aca268b2c`
MD5	`0162995b4a1dfcbfe9ac5a04be92d6cf`
BLAKE2b-256	`38253df8b665c6b294d35bb14f50e8f829bd9ea63050328b2ed16eb3df970ef5`

See more details on using hashes here.

File details

Details for the file lensai_profiler-1.2.2-py3-none-any.whl.

File metadata

Download URL: lensai_profiler-1.2.2-py3-none-any.whl
Upload date: Dec 6, 2024
Size: 16.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.10.12

File hashes

Hashes for lensai_profiler-1.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2dd26e7e1923fc1eed0f333cb1661a6c50f85db71052cf2481f6d36d4c89d685`
MD5	`2851528306b4973e77c17cd2df65bef2`
BLAKE2b-256	`5548354a9d1cd2c7da02f7305ddc7798aabbd8901839238483aea0a1f773f67b`

See more details on using hashes here.

lensai-profiler 1.2.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Lens AI - Data and Model Monitoring Library for Edge AI Applications

The following Lens AI tools

Model monitoring pipeline for Edge devices

Model Training - Use the lensai_profiler_tf

Model Deployment - Use the lensai_c++_profiler (As most of the edge models use C++ , python support in next release)

Monitoring - Use lensai Monitoring Server

Key Advantages

Scalability:

Streaming Data:

Accuracy:

Ease of Use:

Practical Performance:

Installation

Features

Documentation:

Usage

Tensor Flow Usage

Pytorch Usage

Embeddings Logging - TF

Complete Model monitoring pipeline

Model Training

Model Deployment

Monitoring

Repository Link: GitHub Repository

Future Development

Contributions

Upcoming features

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes