This is a simple project that converts speech from video into subtitles by brice

Project description

Video2SRT Project (Video to Subtitle Project)

Quick Start

First, install the software required by the project (refer to the next section for detailed steps):

python >= 3.13.5
ffmpeg

Install the Video2SRT project package:

pip install video2srt

# To speed up installation, users in mainland China can append the -i parameter to each command to use domestic pip mirrors. For example:
pip install video2srt -i https://pypi.tuna.tsinghua.edu.cn/simple

Run the code (Python):

from video2srt import video_to_srt
# Convert video to subtitles
srt_lines = video_to_srt('input.mp4', 'output.srt')
# Print subtitle content (only the first 10 lines are displayed)
print(srt_lines[:10])

Run the code (Full Parameter Example)

from video2srt import video_to_srt
# ## Generate SRT (using the video_to_srt function)
srt_lines = video_to_srt(
    video_path = 'input.mp4',       
    # Input video path
    is_audio = False, 
    # Whether the input is an audio file, default value: False. If True, the audio file will be used directly for recognition
    srt_output_path = 'output.srt', 
    # Output subtitle path
    model_size = 'base', 
    # Model size, optional values: tiny, base, small, medium, large; default value: base
    language = 'ja',  
    # Japanese, supports en, zh, ko, fr, etc.
    is_translate = True,
    translate_engine = 'api', 
    # Translation engine, optional values: model (local model), api (Google Translate API)
    translate_lang = 'zh',  
    # Translate to Chinese, supports en, zh, ko, fr, etc.
    use_gpu = True  
    # Whether to use GPU acceleration, default value: True. It will automatically switch to CPU mode if CUDA is not installed or the environment is not properly configured
)
print(srt_lines[:10])
Translation Notes

Project Overview

This is a simple project that converts speech from video into subtitles.

FFmpeg Audio Extraction

The project uses FFmpeg to extract the audio stream from video files:

Input video formats: mp4, mkv, avi, flv, ts, m3u8, mov, wmv, asf, rmvb, vob, webm, and other formats

Output audio formats: wav, mp3, aac, flac, ogg, wma, m4a, aiff, and other formats

FFmpeg must be pre-installed and configured in the system environment variables.For detailed configuration methods, refer to the FFmpeg official website.

Whisper Speech Recognition

The project uses the Whisper model for speech recognition, supporting multilingual recognition (90+ languages).

Official Core Models: tiny: Tiny model, supports only core languages with low accuracy but fastest speed (approximately 108MB). base: Base model, supports mainstream languages with high accuracy and fast speed (approximately 1GB). This is the default model. small: Small model, supports multiple languages with high accuracy and medium speed (approximately 4GB). medium: Medium model, supports low-resource languages with high accuracy and slow speed (approximately 10GB). large: Large model, supports 99 languages + dialects (Cantonese, Wu Chinese, etc.) with high accuracy but slow speed (approximately 20GB).

Suitable for high-precision requirements and recognition of minority languages/dialects.Supported Languages List:[https://github.com/openai/whisper/blob/main/whisper/tokenizer.py]

Multilingual Translation

The project uses the facebook/m2m100 model and Google Translate API for multilingual translation, converting source language to target language (supports 100+ languages).

Optional Parameters: is_translate: Default value: False Optional values: True

translate_engine: Default value: model: facebook/m2m100 local model (private, free and open-source; the model needs to be downloaded on first use, with slow speed) Optional values: api: Google Translate API (free, with fast speed)

Supported Languages List: facebook/m2m100 model: [https://huggingface.co/facebook/m2m100_418M/blob/main/README.md] Google Translate API: [https://cloud.google.com/translate/docs/languages]

GPU Acceleration

To use GPU acceleration for steps 2 and 3, CUDA must be installed along with the GPU-supported version of torch compatible with the installed CUDA version.For detailed installation methods, refer to the PyTorch official website.

nvidia-smi # First, verify whether your graphics card supports CUDA. If the graphics card information is output (including the CUDA Version field, e.g., CUDA Version: 12.6), it means CUDA is supported. If a prompt appears stating that nvidia-smi is not an internal or external command, you need to install the NVIDIA driver first.
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126 # CUDA 12.6

Video2SRT Usage Steps

Install environment software: Ensure Python 3.13.5 or higher is installed, and FFmpeg is installed and configured in the system environment variables.
Install project dependencies (Project dependency file: [pyproject.toml]; project dependencies can be installed using pip or uv)
Run the project code (Project main file: [video2srt.py])

Install Environment Software

Ensure Python 3.13.5 or higher is installed [https://www.python.org/downloads/] Ensure FFmpeg is installed and configured in the system environment variables [https://ffmpeg.org/download.html]

### Install FFmpeg
# CentOS
yum install ffmpeg ffmpeg-devel -y

# Ubuntu/MacOS
apt install ffmpeg
brew install ffmpeg

# Windows: Download the FFmpeg installation package, place it in a specific directory, and add the path to the system Path.
# Download URL: https://ffmpeg.org/download.html

# Verify installation
ffmpeg -version # Check FFmpeg version
ffmpeg -formats # View all formats supported by FFmpeg

Install Project Dependencies

Python >= 3.13.5

Using uv (Recommended)

Project configuration file: [project.toml]

uv python install 3.14
uv python pin 3.14
uv sync
uv lock

Using Pip

Install required packages (Project dependency file: [requirements.txt])

pip install deep_translator
pip install openai-whisper
pip install transformers torch
pip install sentencepiece

# To speed up installation, users in mainland China can add -i to each command to use domestic pip mirrors. For example:
pip install pandas -i https://pypi.tuna.tsinghua.edu.cn/simple

# You can also install directly using the requirements.txt file
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

Run Project Code

The Whisper model (the base model is used by default) needs to be downloaded on the first run. The model is approximately 1GB in size and will take some time. Please wait patiently.

Usage Example (sample)

    from video2srt import video_to_srt
    srt_lines = video_to_srt(
        video_path = "test_video.mp4", 
        srt_output_path = "test_video.mp4",
        language="zh"
    )
    
    print("Sample SRT 0-7 lines:")
    print("\n".join(srt_lines[:7]))

Usage Example (Full Parameter Configuration)

# ## Import the video_to_srt function
from video2srt import video_to_srt

# ## Parameters Configuration

VIDEO_PATH = input("Please input video file path:")  
# Video file path
IS_AUDIO = False       
# Whether the input is an audio file, default value: False. If True, the audio file will be used directly for recognition
SRT_OUTPUT = "test_out.srt"  
# Output SRT path
MODEL_SIZE = "base"  
# Whisper model type (tiny/base/small/medium/large)
# Defaults to base; large offers higher accuracy but requires more VRAM and processing time, and supports minority languages and dialects
LANGUAGE = None       
# Recognition language, auto-detected by default, supports multilingual recognition (90+ languages) (e.g., en/zh/ja/lo/fr/de, etc.; refer to Whisper documentation for more languages)
IS_TRANSLATE = False  
# Whether to translate recognized text, disabled by default
TRANSLATE_ENGINE = "model"  
# Translation engine (model/api); model uses the local facebook/m2m100_418M model by default, api uses Google Translate API by default
TRANSLATE_LANG = "zh"      
# Target translation language (e.g., zh/en/ja/ko/fr), defaults to Chinese, supports translation to 100+ languages. Basically consistent with Whisper but with some differences (e.g., no dialect support: zh/yue/wuu -> zh), the system will automatically convert them to zh
USE_GPU = True      
# Whether to use GPU acceleration, uses GPU by default, if no GPU is available, uses CPU

# ## Generate SRT (using the video_to_srt function)
srt_lines = video_to_srt(
    video_path = VIDEO_PATH,
    is_audio = IS_AUDIO,
    srt_output_path = SRT_OUTPUT,
    model_size = MODEL_SIZE,
    language = LANGUAGE,
    is_translate = IS_TRANSLATE,
    translate_engine = TRANSLATE_ENGINE,
    translate_lang = TRANSLATE_LANG,
    use_gpu = USE_GPU
)

print("Sample SRT 0-7 lines:")
print("\n".join(srt_lines[:7]))

Usage Example（used save_live_captions）

from video2srt import livecaptions_to_srt

file_ = input("Please input live captions file path:")  
langaue_ = "ja"
is_translate_ = True
translate_lang_ = "zh"

srt_lines = livecaptions_to_srt(
    live_file_name = file_,
    language = langaue_,
    is_translate = is_translate_,
    translate_lang = translate_lang_
)

print("Sample SRT 0-7 lines:")
print("\n".join(srt_lines[:7]))

View Help and Samples

from video2srt import sample, hello_video2srt

hello_video2srt()

srt_lines = sample()
print("Sample SRT 0-7 lines:")
print("\n".join(srt_lines[:7]))

Project details

Release history Release notifications | RSS feed

This version

0.2.6

Jan 19, 2026

0.2.1

Dec 19, 2025

0.1.7

Dec 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

video2srt-0.2.6.tar.gz (1.1 MB view details)

Uploaded Jan 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

video2srt-0.2.6-py3-none-any.whl (1.1 MB view details)

Uploaded Jan 19, 2026 Python 3

File details

Details for the file video2srt-0.2.6.tar.gz.

File metadata

Download URL: video2srt-0.2.6.tar.gz
Upload date: Jan 19, 2026
Size: 1.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for video2srt-0.2.6.tar.gz
Algorithm	Hash digest
SHA256	`6bf80c77df8510ad5baf52140bd8f94858032ae9fb2550ff01af86305c0ba81b`
MD5	`c2828d00b88b5c446dada14dc16c3b9e`
BLAKE2b-256	`466c82a50c1dcfc2400d516836be12c316f40da2628e860eb6262496c0a0d5bb`

See more details on using hashes here.

File details

Details for the file video2srt-0.2.6-py3-none-any.whl.

File metadata

Download URL: video2srt-0.2.6-py3-none-any.whl
Upload date: Jan 19, 2026
Size: 1.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for video2srt-0.2.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6e2b651762926e8ea5ea1f711be7c81081baab6e2aa62d4652fd00bebc6c8297`
MD5	`c44f2c43d72aca8153ea92291ccb7d4a`
BLAKE2b-256	`1509cf89b7c963dca3f1a9dded6ae336e29c0f2479a4da88bee7cb3ce1dbe12c`

See more details on using hashes here.

video2srt 0.2.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Video2SRT Project (Video to Subtitle Project)

Quick Start

Project Overview

FFmpeg Audio Extraction

Whisper Speech Recognition

Multilingual Translation

GPU Acceleration

Video2SRT Usage Steps

Install Environment Software

Install Project Dependencies

Using uv (Recommended)

Using Pip

Run Project Code

Usage Example (sample)

Usage Example (Full Parameter Configuration)

Usage Example（used save_live_captions）

View Help and Samples

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes