A command line utility for converting Blu-ray subs to SRT or ASS using AI Language Models.
Project description
pgsocr
Convert Blu-Ray SUP subtitles to SRT or ASS using AI Language Models or Tesseract.
Prerequisites
If planning on using Tesseract, see: https://tesseract-ocr.github.io/tessdoc/Installation.html
Make sure to install all the required language packs and note down the location of the 'tessdata' directory.
Make sure to set the TESSDATA_PREFIX environment variable to the location of the 'tessdata' directory from the previous step.
Installation
Download the latest .whl from the Releases tab and install using pip.
Make sure to install the [lm] extras if you want to use AI models.
Usage:
Options:
-i: Specify the path to the SUP file or (batch mode) directory.
-o: Specify the path to the output directory.
-m: Specify the OCR engine to use (florence2 or tesseract).
-l: (Only if using Tesseract) Specify the list of languages to use separated by spaces. Defaults to English.
-b: (Only if using Tesseract) Specify a custom character blacklist for Tesseract. Enter an empty string to turn off the default blacklist.
-f: Specify the output format (SRT or ASS). ASS output also has support for subtitle positioning.
Note: Florence2 is more accurate than Tesseract but far more resource heavy and only works for English. A recent GPU with a large amount of VRAM is recommended.
Examples:
# Single file
pgsocr -i /path/to/file -o path/to/outputdir -m tesseract -l eng jpn
# Multiple files in a directory
pgsocr -i /path/to/inputdir -o /path/to/outputdir -m florence2
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pgsocr-0.1.2.tar.gz
.
File metadata
- Download URL: pgsocr-0.1.2.tar.gz
- Upload date:
- Size: 10.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.17.3 CPython/3.12.4 Linux/6.10.2-arch1-2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 33d2cde26d0110574654f539816a44fc4b516a557ad961acba73a42d3107ac3e |
|
MD5 | 60228538c95380c2854d6bf7c1d631d4 |
|
BLAKE2b-256 | 6e443b6143a7670b64d0486111455d0876ea9fdad7192dd564870e466b71aef7 |
File details
Details for the file pgsocr-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: pgsocr-0.1.2-py3-none-any.whl
- Upload date:
- Size: 12.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.17.3 CPython/3.12.4 Linux/6.10.2-arch1-2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e99148eb863f9a2d0d219fa0bd0942d5499f287b92a2da475a010532c42986b |
|
MD5 | 2e9be3626d562a85703b27ba4571accb |
|
BLAKE2b-256 | 844042de015b3591d35d23087e033373ad0f48c678f1b0302ded943abb60bf9e |