Skip to main content

Generate text descriptions of video clips

Project description

video-clip-describer

Installation

pip install video-clip-describer

Usage

import asyncio
from video_clip_describer import VisionAgent

agent = VisionAgent(
    "~/Videos/test.mp4",
    api_base_url="https://my-litellm-proxy.local/v1",
    api_key="sk-apikey",
    vision_model="claude-3-5-sonnet",
    refine_model="gemini-1.5-flash",
    stack_grid=True,
    stack_grid_size=(3, 3),
    resize_video=(1024, 768),
    hashing_max_frames=200,
    hash_size=8,
    debug=True,
    debug_dir="./debug",
)

description = asyncio.run(agent.run())
print(description)

CLI

$ video2text path/to/video.mp4
$ video2text --help

 Usage: video2text [OPTIONS] VIDEO_FILE

╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *    video_file      FILENAME  The video file to process. [required]                                                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --resize                      <width>x<height>  Resize frames before sending to GPT-V. [default: 1024x768]                                       │
│ --stack-grid                                    Put video frames in a grid before sending to GPT-V.                                              │
│ --stack-grid-size             <cols>x<rows>     Grid size to stack frames in. [default: 3x3]                                                     │
│ --context                                       Context to add to prompt. [default: None]                                                        │
│ --api-base-url                                  OpenAI API compatible base URL. [env var: OPENAI_BASE_URL] [default: https://api.openai.com/v1]  │
│ --api-key                                       OpenAI API key. [env var: OPENAI_API_KEY]                                                        │
│ --model                                         LLM model to use (overrides --vision-model and --refine-model). [default: None]                  │
│ --vision-model                                  LLM model to use for vision. [default: claude-3-5-sonnet]                                        │
│ --refine-model                                  LLM model to use for refinement. [default: gemini-1.5-flash]                                     │
│ --no-compress                                   Don't remove similar frames before sending to GPT-V.                                             │
│ --max-frames                                    Max number of frames to allow before decreasing hashing length. [default: 200]                   │
│ --debug                                         Enable debugging.                                                                                │
│ --debug-dir                   PATH              Directory to output debug frames to if --debug is enabled. [default: ./debug]                    │
│                       -v                        Enable verbose output. Repeat for increased verbosity.                                           │
│ --test                                          Don't send requests to LLM.                                                                      │
│ --install-completion                            Install completion for the current shell.                                                        │
│ --show-completion                               Show completion for the current shell, to copy it or customize the installation.                 │
│ --help                                          Show this message and exit.                                                                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

video_clip_describer-0.1.0.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

video_clip_describer-0.1.0-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file video_clip_describer-0.1.0.tar.gz.

File metadata

  • Download URL: video_clip_describer-0.1.0.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.7

File hashes

Hashes for video_clip_describer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 aad7c7f14b0d1014ffa4fc3538fafe48b507c4b5688c33c40c1d4c52dba620fe
MD5 dc8232077f7b3be3d5f88ca5ab4a05bb
BLAKE2b-256 cb813cb62340975567fe4dd669fb3824e759a13e489707f79d48003a549d325c

See more details on using hashes here.

File details

Details for the file video_clip_describer-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for video_clip_describer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8198c58f3a1f7366bbe0e52a8ed18c20be45121691a33052ec54ec8391838a85
MD5 517f6da632bdc344d63403ac7a779e44
BLAKE2b-256 180167c9f85a20dfe73d6007dd818d80a7209988a5ad278505d1d7938b6ec125

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page