A package for extracting keyframes from videos and generating captions using ViT-GPT2 model
Project description
ViT Captioner
A Python package for extracting keyframes from videos and generating captions using the ViT-GPT2 model.
Features
- Extract keyframes from videos using Katna or uniform sampling
- Generate captions for images using the ViT-GPT2 model
- Match keyframes with timestamps in a video
- Convert videos to SRT subtitle files with captions
- Visualize keyframes and timeline data
- Performance optimized with smart resource management
- Thread-safe image processing and visualization
Installation
pip install vit-captioner
Command Line Usage
Extract keyframes from a video:
vit-captioner extract -V /path/to/video.mp4 -N 10 -v
Generate caption for an image:
vit-captioner caption-image -I /path/to/image.jpg
Convert video to captions:
vit-captioner caption-video -V /path/to/video.mp4 -N 10 -v
The -v flag enables verbose output with progress bars.
Find matching timestamps for keyframes:
vit-captioner find-timestamps -V /path/to/video.mp4 -K /path/to/keyframes_folder -v
Python API Usage
from vit_captioner.keyframes.extractor import KeyFrameExtractor
from vit_captioner.captioning.image import ImageCaptioner
from vit_captioner.captioning.video import VideoToCaption
# Extract keyframes
extractor = KeyFrameExtractor("/path/to/video.mp4")
extractor.extract_key_frames("/path/to/video.mp4", 10)
# Generate caption for an image
captioner = ImageCaptioner()
caption = captioner.predict_caption("/path/to/image.jpg")
# Convert video to captions
# Note: verbose flag enables progress bars
converter = VideoToCaption("/path/to/video.mp4", num_frames=10, verbose=True)
converter.convert()
Performance Optimizations
- Smart resource management with proper cleanup
- Single model loading for multiple frames (improved memory usage)
- Thread-safe image processing with error fallbacks
- Progress bars for tracking long-running operations
- Limited number of concurrent workers to prevent memory issues
Requirements
- Python 3.6+
- OpenCV
- PyTorch
- Transformers
- Katna
- Matplotlib
- tqdm
Source Code
Source code is available on GitHub: https://github.com/lachlanchen/VideoCaptionerWithVit
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vit_captioner-0.1.2.tar.gz.
File metadata
- Download URL: vit_captioner-0.1.2.tar.gz
- Upload date:
- Size: 15.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c34b173457b2cef4e150f1a86e3f6142a35f9a5475654abdb67110c8b014fd5
|
|
| MD5 |
44edb0e906b5e11801d5c6723ef1cb50
|
|
| BLAKE2b-256 |
897bc545567fe8bc8759c9b73435a832d7e84aab9ed75f70d6df542d1ee7c5fc
|
Provenance
The following attestation bundles were made for vit_captioner-0.1.2.tar.gz:
Publisher:
publish.yml on lachlanchen/VideoCaptionerWithVit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vit_captioner-0.1.2.tar.gz -
Subject digest:
6c34b173457b2cef4e150f1a86e3f6142a35f9a5475654abdb67110c8b014fd5 - Sigstore transparency entry: 199269246
- Sigstore integration time:
-
Permalink:
lachlanchen/VideoCaptionerWithVit@1a0fadda51bfed1783ad7768442f76b2d7b611e1 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/lachlanchen
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1a0fadda51bfed1783ad7768442f76b2d7b611e1 -
Trigger Event:
release
-
Statement type:
File details
Details for the file vit_captioner-0.1.2-py3-none-any.whl.
File metadata
- Download URL: vit_captioner-0.1.2-py3-none-any.whl
- Upload date:
- Size: 18.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c302eeec7ec1ca1989397db57c0cefbb0aad283a6a16f7b6379e55cf247246d
|
|
| MD5 |
3f15f4b5c5c9cd3842e8df9ac6d6e452
|
|
| BLAKE2b-256 |
13158bac14a2287fc1bc7b129bc971c9a9bbd8fdfa393296f3d36bad0c3a4a64
|
Provenance
The following attestation bundles were made for vit_captioner-0.1.2-py3-none-any.whl:
Publisher:
publish.yml on lachlanchen/VideoCaptionerWithVit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vit_captioner-0.1.2-py3-none-any.whl -
Subject digest:
5c302eeec7ec1ca1989397db57c0cefbb0aad283a6a16f7b6379e55cf247246d - Sigstore transparency entry: 199269248
- Sigstore integration time:
-
Permalink:
lachlanchen/VideoCaptionerWithVit@1a0fadda51bfed1783ad7768442f76b2d7b611e1 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/lachlanchen
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1a0fadda51bfed1783ad7768442f76b2d7b611e1 -
Trigger Event:
release
-
Statement type: