Skip to main content

Generate text descriptions of video clips

Project description

video-clip-describer

Installation

pip install video-clip-describer

Usage

import asyncio
from video_clip_describer import VisionAgent

agent = VisionAgent(
    "~/Videos/test.mp4",
    api_base_url="https://my-litellm-proxy.local/v1",
    api_key="sk-apikey",
    vision_model="claude-3-5-sonnet",
    refine_model="gemini-1.5-flash",
    stack_grid=True,
    stack_grid_size=(3, 3),
    resize_video=(1024, 768),
    hashing_max_frames=200,
    hash_size=8,
    debug=True,
    debug_dir="./debug",
)

description = asyncio.run(agent.run())
print(description)

CLI

$ video2text path/to/video.mp4
$ video2text --help

 Usage: video2text [OPTIONS] VIDEO_FILE

╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *    video_file      FILENAME  The video file to process. [required]                                                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --resize                      <width>x<height>  Resize frames before sending to GPT-V. [default: 1024x768]                                       │
│ --stack-grid                                    Put video frames in a grid before sending to GPT-V.                                              │
│ --stack-grid-size             <cols>x<rows>     Grid size to stack frames in. [default: 3x3]                                                     │
│ --context                                       Context to add to prompt. [default: None]                                                        │
│ --api-base-url                                  OpenAI API compatible base URL. [env var: OPENAI_BASE_URL] [default: https://api.openai.com/v1]  │
│ --api-key                                       OpenAI API key. [env var: OPENAI_API_KEY]                                                        │
│ --model                                         LLM model to use (overrides --vision-model and --refine-model). [default: None]                  │
│ --vision-model                                  LLM model to use for vision. [default: claude-3-5-sonnet]                                        │
│ --refine-model                                  LLM model to use for refinement. [default: gemini-1.5-flash]                                     │
│ --no-compress                                   Don't remove similar frames before sending to GPT-V.                                             │
│ --max-frames                                    Max number of frames to allow before decreasing hashing length. [default: 200]                   │
│ --debug                                         Enable debugging.                                                                                │
│ --debug-dir                   PATH              Directory to output debug frames to if --debug is enabled. [default: ./debug]                    │
│                       -v                        Enable verbose output. Repeat for increased verbosity.                                           │
│ --test                                          Don't send requests to LLM.                                                                      │
│ --install-completion                            Install completion for the current shell.                                                        │
│ --show-completion                               Show completion for the current shell, to copy it or customize the installation.                 │
│ --help                                          Show this message and exit.                                                                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

video_clip_describer-0.3.0.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

video_clip_describer-0.3.0-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file video_clip_describer-0.3.0.tar.gz.

File metadata

  • Download URL: video_clip_describer-0.3.0.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.7

File hashes

Hashes for video_clip_describer-0.3.0.tar.gz
Algorithm Hash digest
SHA256 fa7a526f04f5474dceec6644ed9c20adb296faa393cbffb73686aba2946d7c93
MD5 e8140b1fcb7507437e7b113acf2cb7df
BLAKE2b-256 385b5a7a645addde275d9f1e3df18aa3a417c17db1200a045a3f7124a9cb1c68

See more details on using hashes here.

File details

Details for the file video_clip_describer-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for video_clip_describer-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9d19c5ad697c153d6af15004c7382b5193e0dec817a867400ac3c7ce152d7037
MD5 6cac7d6bb98a5f6604258f015e33faee
BLAKE2b-256 0abbddc42840ed0b5f209f1d5622e8f5e7ce1b1f95877a0ce1d004a4ae6637cd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page