Generate text descriptions of video clips
Project description
video-clip-describer
Installation
pip install video-clip-describer
Usage
import asyncio
from video_clip_describer import VisionAgent
agent = VisionAgent(
"~/Videos/test.mp4",
api_base_url="https://my-litellm-proxy.local/v1",
api_key="sk-apikey",
vision_model="claude-3-5-sonnet",
refine_model="gemini-1.5-flash",
stack_grid=True,
stack_grid_size=(3, 3),
resize_video=(1024, 768),
hashing_max_frames=200,
hash_size=8,
debug=True,
debug_dir="./debug",
)
description = asyncio.run(agent.run())
print(description)
CLI
$ video2text path/to/video.mp4
$ video2text --help
Usage: video2text [OPTIONS] VIDEO_FILE
╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ * video_file FILENAME The video file to process. [required] │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --resize <width>x<height> Resize frames before sending to GPT-V. [default: 1024x768] │
│ --stack-grid Put video frames in a grid before sending to GPT-V. │
│ --stack-grid-size <cols>x<rows> Grid size to stack frames in. [default: 3x3] │
│ --context Context to add to prompt. [default: None] │
│ --api-base-url OpenAI API compatible base URL. [env var: OPENAI_BASE_URL] [default: https://api.openai.com/v1] │
│ --api-key OpenAI API key. [env var: OPENAI_API_KEY] │
│ --model LLM model to use (overrides --vision-model and --refine-model). [default: None] │
│ --vision-model LLM model to use for vision. [default: claude-3-5-sonnet] │
│ --refine-model LLM model to use for refinement. [default: gemini-1.5-flash] │
│ --no-compress Don't remove similar frames before sending to GPT-V. │
│ --max-frames Max number of frames to allow before decreasing hashing length. [default: 200] │
│ --debug Enable debugging. │
│ --debug-dir PATH Directory to output debug frames to if --debug is enabled. [default: ./debug] │
│ -v Enable verbose output. Repeat for increased verbosity. │
│ --test Don't send requests to LLM. │
│ --install-completion Install completion for the current shell. │
│ --show-completion Show completion for the current shell, to copy it or customize the installation. │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file video_clip_describer-0.1.0.tar.gz
.
File metadata
- Download URL: video_clip_describer-0.1.0.tar.gz
- Upload date:
- Size: 7.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | aad7c7f14b0d1014ffa4fc3538fafe48b507c4b5688c33c40c1d4c52dba620fe |
|
MD5 | dc8232077f7b3be3d5f88ca5ab4a05bb |
|
BLAKE2b-256 | cb813cb62340975567fe4dd669fb3824e759a13e489707f79d48003a549d325c |
File details
Details for the file video_clip_describer-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: video_clip_describer-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8198c58f3a1f7366bbe0e52a8ed18c20be45121691a33052ec54ec8391838a85 |
|
MD5 | 517f6da632bdc344d63403ac7a779e44 |
|
BLAKE2b-256 | 180167c9f85a20dfe73d6007dd818d80a7209988a5ad278505d1d7938b6ec125 |