A tool to identify TV episodes from video files and rename them to match Plex naming conventions.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

TVIdentify

Python tool for automatically identifying and renaming TV Show episodes for Plex given the video files and series name. Identifies TV show episodes from video files using OCR on PGS/VobSub or directly use SRT subtitles and LLM analysis of subtitles.

Now with Ollama OCR integration, you can use glm-ocr for OCR. You can also use tesseract as a fallback. glm-ocr is a lot more accurate and handles things like screen-door effect without any additional processing.

Get Started

To just install the package and use it as a utility

python3 -m venv tvidentify
cd tvidentify
source bin/activate
pip install tvidentify
tvidentify /path/to/TVShows/Game\ Of\ Thrones/Season\ 02/ --max-frames 10 --offset 3 --series-name "Game Of Thrones" --scan-duration 5 --output-dir ~/gots2 --model gemini-3-pro-preview --rename --skip-already-named

To modify the sources or build/work from source

git clone https://github.com/ram-nat/tvidentify tvidentify
cd tvidentify
python3 -m venv venv
source venv/bin/activate
pip install -e .
tvidentify /path/to/TVShows/Game\ Of\ Thrones/Season\ 02/ --max-frames 10 --offset 3 --series-name "Game Of Thrones" --scan-duration 5 --output-dir ~/gots2 --model gemini-3-pro-preview --rename --skip-already-named

Usage

usage: tvidentify [-h] [--size-threshold SIZE_THRESHOLD] [--skip-already-named] [--rename] [--rename-format RENAME_FORMAT] [--provider {google,openai,perplexity}] [--model MODEL] --series-name SERIES_NAME [--max-frames MAX_FRAMES]
                  [--subtitle-track SUBTITLE_TRACK] [--offset OFFSET] [--scan-duration SCAN_DURATION] [--output-dir OUTPUT_DIR] [--log-file LOG_FILE] [--verbose] [--debug]
                  input_dir

Batch identify TV show episodes in a directory and rename them to match Plex TV episode naming.

positional arguments:
  input_dir             The directory containing video files.

options:
  -h, --help            show this help message and exit
  --size-threshold SIZE_THRESHOLD
                        Size similarity threshold for filtering episodes (default: 0.7).
  --skip-already-named  Skip files that are already in the expected naming format (only when --rename is specified).
  --rename              Rename files to "<series_name> S<season>E<episode>" format if identification is successful.
  --rename-format RENAME_FORMAT
                        Format for renamed files. Available placeholders: {{series}}, {{season}}, {{episode}}. Default: "{{series}} S{{season:02d}}E{{episode:02d}}"

LLM Configuration:
  --provider {google,openai,perplexity}
                        LLM provider to use (default: google).
  --model MODEL         Model name. If not provided, defaults based on provider (google: gemini-2.5-flash, openai: gpt-4, perplexity: sonar).
  --series-name SERIES_NAME
                        The name of the TV series.

Subtitle Extraction:
  --max-frames MAX_FRAMES
                        Maximum number of subtitles to extract.
  --subtitle-track SUBTITLE_TRACK
                        The subtitle track index to use. If not specified, finds English automatically. If not found, uses first subtitle track.
  --offset OFFSET       Skip the first N minutes of the video.
  --scan-duration SCAN_DURATION
                        How many minutes of the video to scan for subtitles from the offset (default: 15).
  --output-dir OUTPUT_DIR
                        Optional directory to save JSON output instead of printing to console.

Logging:
  --log-file LOG_FILE   Path to a file to write detailed debug logs to.
  --verbose, -v         Enable verbose output (INFO level is default, this is largely for symmetry).
  --debug               Enable debug output to console.

Features

Subtitle Extraction:
- subtitle_extractor.py is the stand-alone module for this.
- Extracts subtitle stream (supports PGS, VobSub and SRT formats)
- You can specify starting offset and duration of subtitle stream to extract (so entire file is not processed). You can also specify the maximum number of subtitle events to extract.
  - --offset to specify starting offset in minutes
  - --scan-duration to specify how many minutes from starting offset you want to extract.
  - --max-frames to specify how many subtitle events to extract.
- Uses OCR (pytesseract) for bitmap subtitles (PGS/VobSub) and direct text extraction for SRT.
- Applies regex clean-up to normalize text output.
- PGS parsing code is from https://github.com/EzraBC/pgsreader
- Use --output-dir to store output in json format.
Episode Identification
- episode_identifier.py is the stand-alone module for this.
- With extracted subtitles and the series name, use LLMs to identify the episode of the series.
- Supports different LLM providers - Google Gemini, OpenAI or Perplexity
- Use --model to pass the model to use for episode identification
- You can pass an MKV (or other container format) file or the json output from subtitle_extractor.py as input to this stage.
- Use --output-dir to store output in json format.
Batch Identification
- batch_identifier.py is the stand-alone module for this.
- Pass an entire season folder to identify all episodes in folder.
- Identifies and ignores non-episode files (assumes largest files are episodes)
- Identifies and does not process duplicate episode files (uses subtitle similarity for duplicates)
- Use --rename option to rename identified episodes to match Plex episode naming requirements.
- Use --output-dir to store output in json format. Stores both batch results and results for individual files.
File Renaming
- file_renamer.py is the stand-alone module for this.
- Use --rename-format to specify the rename format. Series, season and episode are the available variables for the format string.

Installation

Clone the repository
Set up the Python virtual environment (already configured)
Install required packages:

pip install -r requirements.txt

Configuration

Set the appropriate API key environment variables:

# Google Gemini
export GOOGLE_API_KEY="your-google-api-key"

# OpenAI
export OPENAI_API_KEY="your-openai-api-key"

# Perplexity
export PERPLEXITY_API_KEY="your-perplexity-api-key"

Usage

Extract Subtitles from Video

python subtitle_extractor.py /path/to/video.mkv \
  --output-dir ./subtitles

Identify Episode from Video File

python episode_identifier.py /path/to/video.mkv \
  --series-name "Game of Thrones" \
  --provider google

Identify Episode from Pre-extracted Subtitles

python episode_identifier.py \
  --series-name "Game of Thrones" \
  --subtitles-json subtitles.json \
  --provider openai

Batch Processing an Entire Season

python batch_identifier.py /path/to/episodes/directory \
  --series-name "Game of Thrones" \
  --provider google \
  --rename

Command-line Options

subtitle_extractor.py

input_file: Path to the video file to extract subtitles from
--max-frames: Maximum number of subtitle events to extract
--subtitle-track: Subtitle track index to use (default: finds English automatically, or uses first subtitle track if not found)
--offset: Skip first N minutes (default: 0)
--scan-duration: Minutes to scan from offset (default: 15)
--output-dir: Directory to save JSON output file

episode_identifier.py

input_file (optional): Path to video file (required if --subtitles-json not provided)
--series-name (required): Name of the TV series
--provider: LLM provider (default: google). Options: google, openai, perplexity
--model: Model name. Defaults: gemini-2.5-flash (google), gpt-4 (openai), sonar (perplexity)
--subtitles-json: Path to JSON file with pre-extracted subtitles (alternative to video input)
--max-frames: Maximum number of subtitle events to process (default: 10)
--subtitle-track: Subtitle track index to use (default: 0)
--offset: Skip first N minutes (default: 0)
--scan-duration: Minutes to scan from offset (default: 15)
--output-dir: Directory to save JSON output file

batch_identifier.py

input_dir: Directory containing video files to process
--series-name (required): Name of the TV series
--size-threshold: File size similarity threshold for filtering episodes (default: 0.7)
--provider: LLM provider (default: google). Options: google, openai, perplexity
--model: Model name. Defaults: gemini-2.5-flash (google), gpt-4 (openai), sonar (perplexity)
--max-frames: Maximum number of subtitle events to process (default: 10)
--subtitle-track: Subtitle track index to use (finds English automatically, or uses first subtitle track if not found)
--offset: Skip first N minutes (default: 0)
--scan-duration: Minutes to scan from offset (default: 15)
--output-dir: Directory to save JSON output files
--rename: Rename identified episodes to match Plex naming format
--rename-format: Format string for renamed files (default: {series} S{season:02d}E{episode:02d})
--skip-already-named: Skip files that are already in the expected naming format (only when --rename is specified)
--log-file: Path to a file to write detailed debug logs to
--verbose, -v: Enable verbose output
--debug: Enable debug output to console

file_renamer.py

--batch-results (required): Path to batch_results.json from batch_identifier
--series-name (required): Name of the TV series
--rename-format: Format string for renamed files. Available placeholders: {series}, {season}, {episode} (default: {series} S{season:02d}E{episode:02d})
--dry-run: Show what would be renamed without actually renaming

Components

subtitle_extractor.py

Handles extraction of subtitles from video files:

Detects subtitle tracks using ffprobe
Extracts frames for each subtitle event
Performs OCR on frames using Tesseract
Filters gibberish using character pattern analysis

episode_identifier.py

Identifies TV show episodes from subtitles:

Loads subtitles from video or JSON file
Sends subtitles to LLM with identifying prompt
Parses LLM response for season/episode information
Supports multiple LLM providers

batch_identifier.py

Processes multiple video files:

Discovers episode files by size similarity
Processes each file with episode_identifier
Outputs results in JSON format

Requirements

System Dependencies

ffmpeg - For video processing
ffprobe - For reading video metadata (comes with ffmpeg)
tesseract-ocr - For optical character recognition (OCR) on subtitle images
ollama - For OCR using glm-ocr
mkvtoolnix - For extracting VobSub subtitles (mkvextract and mkvmerge)

Install on Ubuntu/Debian:

sudo apt-get install ffmpeg tesseract-ocr mkvtoolnix
curl -fsSL https://ollama.com/install.sh | sh

Install on macOS:

brew install ffmpeg tesseract mkvtoolnix
curl -fsSL https://ollama.com/install.sh | sh

Python Dependencies

See requirements.txt - includes:

opencv-python-headless - For video frame processing
pytesseract - Python interface to Tesseract OCR
openai - OpenAI API client
google-genai - Google Generative AI client
ollama - For OCR using glm-ocr

Example Output

{
  "season": 1,
  "episode": 2,
  "subtitles": [
    "Sorry, Your Grace.",
    "My deepest apologies.",
    "No. No, Your Grace."
  ],
  "provider": "google",
  "model": "gemini-2.5-flash"
}

Notes

PGS subtitles are image-based, so OCR quality depends on video resolution and subtitle clarity
In my tests, gemini-3-pro-preview has been the best model at identifying episodes consistently and correctly.
About 5 minutes of subtitle input has been sufficient to identify GOT episodes in my testing.

License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

nramkumar

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.7

Mar 29, 2026

0.1.6

Feb 1, 2026

0.1.5

Jan 13, 2026

0.1.4

Jan 11, 2026

0.1.3

Jan 3, 2026

0.1.2

Jan 3, 2026

0.1.1

Jan 3, 2026

0.1.0

Dec 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tvidentify-0.1.7.tar.gz (48.0 kB view details)

Uploaded Mar 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tvidentify-0.1.7-py3-none-any.whl (38.5 kB view details)

Uploaded Mar 29, 2026 Python 3

File details

Details for the file tvidentify-0.1.7.tar.gz.

File metadata

Download URL: tvidentify-0.1.7.tar.gz
Upload date: Mar 29, 2026
Size: 48.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tvidentify-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`c6307d9218da02802e93f2f38a381ca086c52d67035f59b7ffcb22ab8427e9b3`
MD5	`8ddb92fbf76b4b061e0b1318f26f0a35`
BLAKE2b-256	`f025a86b4d89f05e8f07947b790f711210ef41390086733f3f333ffae0e07f50`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tvidentify-0.1.7.tar.gz:

Publisher: wheels.yml on ram-nat/tvidentify

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tvidentify-0.1.7.tar.gz
- Subject digest: c6307d9218da02802e93f2f38a381ca086c52d67035f59b7ffcb22ab8427e9b3
- Sigstore transparency entry: 1191711404
- Sigstore integration time: Mar 29, 2026
Source repository:
- Permalink: ram-nat/tvidentify@66fdbc902d0fff1df81482693f002042bdc3ea4e
- Branch / Tag: refs/tags/v0.1.7
- Owner: https://github.com/ram-nat
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: wheels.yml@66fdbc902d0fff1df81482693f002042bdc3ea4e
- Trigger Event: push

File details

Details for the file tvidentify-0.1.7-py3-none-any.whl.

File metadata

Download URL: tvidentify-0.1.7-py3-none-any.whl
Upload date: Mar 29, 2026
Size: 38.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tvidentify-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`46fec9fc7eba7105dd8a531f5803b20935fa98be32b64f9c5194f429f3416b2c`
MD5	`a4b2ae63abfbbf9276fd07d46629f247`
BLAKE2b-256	`77469518b426e80f099b803b65d3cbbbd220dd9812641a506f0fc1c2fd173569`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tvidentify-0.1.7-py3-none-any.whl:

Publisher: wheels.yml on ram-nat/tvidentify

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tvidentify-0.1.7-py3-none-any.whl
- Subject digest: 46fec9fc7eba7105dd8a531f5803b20935fa98be32b64f9c5194f429f3416b2c
- Sigstore transparency entry: 1191711406
- Sigstore integration time: Mar 29, 2026
Source repository:
- Permalink: ram-nat/tvidentify@66fdbc902d0fff1df81482693f002042bdc3ea4e
- Branch / Tag: refs/tags/v0.1.7
- Owner: https://github.com/ram-nat
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: wheels.yml@66fdbc902d0fff1df81482693f002042bdc3ea4e
- Trigger Event: push

tvidentify 0.1.7

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

TVIdentify

Get Started

To just install the package and use it as a utility

To modify the sources or build/work from source

Usage

Features

Installation

Configuration

Usage

Extract Subtitles from Video

Identify Episode from Video File

Identify Episode from Pre-extracted Subtitles

Batch Processing an Entire Season

Command-line Options

subtitle_extractor.py

episode_identifier.py

batch_identifier.py

file_renamer.py

Components

subtitle_extractor.py

episode_identifier.py

batch_identifier.py

Requirements

System Dependencies

Python Dependencies

Example Output

Notes

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance