Skip to main content

Generate text descriptions of video clips

Project description

video-clip-describer

Installation

pip install video-clip-describer

Usage

import asyncio
from video_clip_describer import VisionAgent

agent = VisionAgent(
    "~/Videos/test.mp4",
    api_base_url="https://my-litellm-proxy.local/v1",
    api_key="sk-apikey",
    vision_model="claude-3-5-sonnet",
    refine_model="gemini-1.5-flash",
    stack_grid=True,
    stack_grid_size=(3, 3),
    resize_video=(1024, 768),
    hashing_max_frames=200,
    hash_size=8,
    debug=True,
    debug_dir="./debug",
)

description = asyncio.run(agent.run())
print(description)

CLI

$ video2text path/to/video.mp4
$ video2text --help

 Usage: video2text [OPTIONS] VIDEO_FILE

╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *    video_file      FILENAME  The video file to process. [required]                                                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --resize                      <width>x<height>  Resize frames before sending to GPT-V. [default: 1024x768]                                       │
│ --stack-grid                                    Put video frames in a grid before sending to GPT-V.                                              │
│ --stack-grid-size             <cols>x<rows>     Grid size to stack frames in. [default: 3x3]                                                     │
│ --context                                       Context to add to prompt. [default: None]                                                        │
│ --api-base-url                                  OpenAI API compatible base URL. [env var: OPENAI_BASE_URL] [default: https://api.openai.com/v1]  │
│ --api-key                                       OpenAI API key. [env var: OPENAI_API_KEY]                                                        │
│ --model                                         LLM model to use (overrides --vision-model and --refine-model). [default: None]                  │
│ --vision-model                                  LLM model to use for vision. [default: claude-3-5-sonnet]                                        │
│ --refine-model                                  LLM model to use for refinement. [default: gemini-1.5-flash]                                     │
│ --no-compress                                   Don't remove similar frames before sending to GPT-V.                                             │
│ --max-frames                                    Max number of frames to allow before decreasing hashing length. [default: 200]                   │
│ --debug                                         Enable debugging.                                                                                │
│ --debug-dir                   PATH              Directory to output debug frames to if --debug is enabled. [default: ./debug]                    │
│                       -v                        Enable verbose output. Repeat for increased verbosity.                                           │
│ --test                                          Don't send requests to LLM.                                                                      │
│ --install-completion                            Install completion for the current shell.                                                        │
│ --show-completion                               Show completion for the current shell, to copy it or customize the installation.                 │
│ --help                                          Show this message and exit.                                                                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

video_clip_describer-0.2.1.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

video_clip_describer-0.2.1-py3-none-any.whl (8.8 kB view details)

Uploaded Python 3

File details

Details for the file video_clip_describer-0.2.1.tar.gz.

File metadata

  • Download URL: video_clip_describer-0.2.1.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.7

File hashes

Hashes for video_clip_describer-0.2.1.tar.gz
Algorithm Hash digest
SHA256 560f152e4e9547fc96d6040dfff3a620c24d5a8466bce882afad5a389c410567
MD5 a36de08a74f50262e5cee2b84e42d6e9
BLAKE2b-256 65bd880dca3dc76558e166e9032927891ac37edae1a29860e6d94d1572d0499a

See more details on using hashes here.

File details

Details for the file video_clip_describer-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for video_clip_describer-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 443659af18bd451cfcebb7be8f87f2ea9c6de57a9bf3081b4cf5a5169b7f0776
MD5 78f73ede6d22ac1d7d61838618b596c4
BLAKE2b-256 2a86d4425d0e7f4f20ec309124d6f74cde8f599b909dc52ae776f2c0338278d2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page