Skip to main content

A package for extracting keyframes from videos and generating captions using ViT-GPT2 model

Project description

ViT Captioner

PyPI version License: MIT

A Python package for extracting keyframes from videos and generating captions using the ViT-GPT2 model.

Features

  • Extract keyframes from videos using Katna or uniform sampling
  • Generate captions for images using the ViT-GPT2 model
  • Match keyframes with timestamps in a video
  • Convert videos to SRT subtitle files with captions
  • Visualize keyframes and timeline data
  • Performance optimized with smart resource management
  • Thread-safe image processing and visualization

Installation

pip install vit-captioner

Command Line Usage

Extract keyframes from a video:

vit-captioner extract -V /path/to/video.mp4 -N 10 -v

Generate caption for an image:

vit-captioner caption-image -I /path/to/image.jpg

Convert video to captions:

vit-captioner caption-video -V /path/to/video.mp4 -N 10 -v

The -v flag enables verbose output with progress bars.

Find matching timestamps for keyframes:

vit-captioner find-timestamps -V /path/to/video.mp4 -K /path/to/keyframes_folder -v

Python API Usage

from vit_captioner.keyframes.extractor import KeyFrameExtractor
from vit_captioner.captioning.image import ImageCaptioner
from vit_captioner.captioning.video import VideoToCaption

# Extract keyframes
extractor = KeyFrameExtractor("/path/to/video.mp4")
extractor.extract_key_frames("/path/to/video.mp4", 10)

# Generate caption for an image
captioner = ImageCaptioner()
caption = captioner.predict_caption("/path/to/image.jpg")

# Convert video to captions
# Note: verbose flag enables progress bars
converter = VideoToCaption("/path/to/video.mp4", num_frames=10, verbose=True)
converter.convert()

Performance Optimizations

  • Smart resource management with proper cleanup
  • Single model loading for multiple frames (improved memory usage)
  • Thread-safe image processing with error fallbacks
  • Progress bars for tracking long-running operations
  • Limited number of concurrent workers to prevent memory issues

Requirements

  • Python 3.6+
  • OpenCV
  • PyTorch
  • Transformers
  • Katna
  • Matplotlib
  • tqdm

Source Code

Source code is available on GitHub: https://github.com/lachlanchen/VideoCaptionerWithVit

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vit_captioner-0.1.2.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vit_captioner-0.1.2-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file vit_captioner-0.1.2.tar.gz.

File metadata

  • Download URL: vit_captioner-0.1.2.tar.gz
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for vit_captioner-0.1.2.tar.gz
Algorithm Hash digest
SHA256 6c34b173457b2cef4e150f1a86e3f6142a35f9a5475654abdb67110c8b014fd5
MD5 44edb0e906b5e11801d5c6723ef1cb50
BLAKE2b-256 897bc545567fe8bc8759c9b73435a832d7e84aab9ed75f70d6df542d1ee7c5fc

See more details on using hashes here.

Provenance

The following attestation bundles were made for vit_captioner-0.1.2.tar.gz:

Publisher: publish.yml on lachlanchen/VideoCaptionerWithVit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vit_captioner-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: vit_captioner-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for vit_captioner-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5c302eeec7ec1ca1989397db57c0cefbb0aad283a6a16f7b6379e55cf247246d
MD5 3f15f4b5c5c9cd3842e8df9ac6d6e452
BLAKE2b-256 13158bac14a2287fc1bc7b129bc971c9a9bbd8fdfa393296f3d36bad0c3a4a64

See more details on using hashes here.

Provenance

The following attestation bundles were made for vit_captioner-0.1.2-py3-none-any.whl:

Publisher: publish.yml on lachlanchen/VideoCaptionerWithVit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page