Skip to main content

A professional tool for cleaning duplicate or near-duplicate image frames using perceptual hashing and embeddings.

Project description

CleanFrames

CleanFrames is a Python library designed to clean and summarize video frames efficiently using embedding models and clustering techniques. It provides tools to process video frames, remove duplicates or near-duplicates, and generate concise reports and visualizations.

Key Features

  • Support for multiple embedding models to represent frames.
  • Various clustering methods to group similar frames.
  • Caching mechanisms to optimize performance.
  • Visualization tools to inspect clusters and embeddings.
  • Cleaning functions to remove redundant frames.
  • Reporting capabilities to summarize cleaning results.
  • Two main classes: CleanFrame for standard processing and CleanFrame_optimized for enhanced performance.

Installation

To install CleanFrames, you can clone the repository and install the required dependencies. (Note: The exact installation commands depend on your setup and are not specified in the code.)

git clone <repository-url>
cd cleanframes
pip install -r requirements.txt

Usage

Using CleanFrame

from cleanframes import CleanFrame

# Initialize with video path and parameters
cf = CleanFrame(
    video_path='path/to/video.mp4',
    embedding_model='clip-ViT-B-32',
    clustering_method='kmeans',
    cache_folder='cache/',
    verbose=True
)

# Load video frames
cf.load_frames()

# Generate embeddings for frames
cf.embed_frames()

# Cluster embeddings to group similar frames
cf.cluster_frames()

# Clean frames by removing duplicates or near-duplicates
cleaned_frames = cf.clean_frames()

# Generate report of cleaning
cf.report()

# Visualize clusters or embeddings
cf.visualize()

Using CleanFrame_optimized

from cleanframes import CleanFrame_optimized

# Initialize with video path and parameters
cf_opt = CleanFrame_optimized(
    video_path='path/to/video.mp4',
    embedding_model='clip-ViT-L-14',
    clustering_method='dbscan',
    cache_folder='cache_optimized/',
    verbose=True
)

# Load video frames with optimized method
cf_opt.load_frames()

# Generate embeddings using optimized pipeline
cf_opt.embed_frames()

# Cluster embeddings
cf_opt.cluster_frames()

# Clean frames
cleaned_frames_opt = cf_opt.clean_frames()

# Generate report
cf_opt.report()

# Visualize results
cf_opt.visualize()

Supported Embedding Models

CleanFrames supports various embedding models to convert video frames into numerical representations, including but not limited to:

  • CLIP models such as clip-ViT-B-32 and clip-ViT-L-14
  • Other models can be integrated as per user requirements.

Clustering Methods

The library provides different clustering algorithms to group similar frames:

  • KMeans clustering
  • DBSCAN clustering
  • Other clustering methods can be added or customized.

Caching

To improve performance, CleanFrames supports caching of intermediate results such as extracted frames and computed embeddings. Users can specify a cache folder where these results are stored and reused.

Visualization

CleanFrames includes visualization tools to help users inspect the clustering results and embedding distributions. This aids in understanding the cleaning process and verifying the quality of frame grouping.

Cleaning and Reporting

The cleaning functions remove redundant frames based on clustering results and embedding similarity. After cleaning, a report is generated summarizing the number of frames processed, removed, and retained, providing insights into the cleaning effectiveness.


For more detailed information and advanced usage, please refer to the source code and examples provided in the repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleanframes-0.3.1.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cleanframes-0.3.1-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file cleanframes-0.3.1.tar.gz.

File metadata

  • Download URL: cleanframes-0.3.1.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for cleanframes-0.3.1.tar.gz
Algorithm Hash digest
SHA256 70612cb9d169454abf633a6bb8be35de1498e6b053fa54c141b6c9fe8cac489a
MD5 fd6a909ffe4490884be4d62b8e8fa53b
BLAKE2b-256 bac34a43db164eadf36bac91c4b1e9ac2f94b9220c8e40ed54e32d16d1ae2338

See more details on using hashes here.

File details

Details for the file cleanframes-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: cleanframes-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for cleanframes-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dd977aad7f19c400215613309be8b89c2bfef6ca32ce823e3b0bccfadb44e35b
MD5 c2834c4623a34425970302cefa163f91
BLAKE2b-256 a415a3912c0d9277fa9de930bc43954cc7f4e0cd506f79c893cb34d31a35e300

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page