Skip to main content

A command line utility for converting Blu-ray subs to SRT or ASS using AI Language Models.

Reason this release was yanked:

Broken MiniCPM models

Project description

pgsocr

Convert Blu-Ray SUP subtitles to SRT or ASS using AI Language Models or Tesseract.

Prerequisites

If planning on using Tesseract, see: https://tesseract-ocr.github.io/tessdoc/Installation.html
Make sure to install all the required language packs and note down the location of the 'tessdata' directory.
Make sure to set the TESSDATA_PREFIX environment variable to the location of the 'tessdata' directory from the previous step.

Installation

Download the latest .whl from the Releases tab and install using pip.
Make sure to install the [lm] extras if you want to use AI models.

Usage:

Options:
-i: Specify the path to the SUP file or (batch mode) directory.
-o: Specify the path to the output directory.
-m: Specify the OCR engine to use (florence2 or tesseract or minicpmv).
-l: (Only if using Tesseract) Specify the list of languages to use separated by spaces. Defaults to English.
-b: (Only if using Tesseract) Specify a custom character blacklist for Tesseract. Enter an empty string to turn off the default blacklist.
-f: Specify the output format (SRT or ASS). ASS output also has support for subtitle positioning.

Note: The AI models are more accurate than Tesseract but far more resource heavy. A recent GPU with a large amount of VRAM is recommended.

Examples:
# Single file
pgsocr -i /path/to/file -o path/to/outputdir -m tesseract -l eng jpn

# Multiple files in a directory
pgsocr -i /path/to/inputdir -o /path/to/outputdir -m florence2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pgsocr-0.1.3.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

pgsocr-0.1.3-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file pgsocr-0.1.3.tar.gz.

File metadata

  • Download URL: pgsocr-0.1.3.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.18.1 CPython/3.12.5 Linux/6.10.6-arch1-1

File hashes

Hashes for pgsocr-0.1.3.tar.gz
Algorithm Hash digest
SHA256 7f758467769f0f71fe6c3d0a68f1a214b2a6536fbebe2364789065e19f72e747
MD5 4c5743924940a5e50da84e297e7f7b8b
BLAKE2b-256 7c2424299735e942e3aab514f309e052e0a151a49ea523822d2ab041e0709dc6

See more details on using hashes here.

File details

Details for the file pgsocr-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: pgsocr-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.18.1 CPython/3.12.5 Linux/6.10.6-arch1-1

File hashes

Hashes for pgsocr-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 73cfb308c58d8dc0f175a09728bf53d6db44e9bd9df735efc7c9fe2b6e06c6de
MD5 3ae97764b92deab4243e9c45ebbb30de
BLAKE2b-256 27089a6643e27685a3f71a4557583c6358745133d435f6504a74c1a3982477e8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page