Skip to main content

Subtitles (VTT, SRT, PDF, DOCX, HTML, images, etc) to text convertor, with a GUI, great for preprocessing to feed to LLMs

Project description

subtitles2text

Description

Convert subtitles files (vtt, srt, PDF) and any files supported by Docling (DOCX, PPTX, XLSX, images PNG/JPG/JPEG, web pages HTML/XHTML) from any metadata to only leave the text content. This is especially useful to feed to genAI models such as LLMs and GPTs.

Installation

pip install subtitles2text

Usage

subtitles2text

This will launch a Tk GUI where you can select the files you want to convert.

The app supports OCR.

License

MIT License.

Author

This app was coded using Roo Code with Gemini 2.0 flash thinking exp 01-21 under the architecture specified by Stephen Karl Larroque.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

subtitles2text-0.0.3.tar.gz (6.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

subtitles2text-0.0.3-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file subtitles2text-0.0.3.tar.gz.

File metadata

  • Download URL: subtitles2text-0.0.3.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for subtitles2text-0.0.3.tar.gz
Algorithm Hash digest
SHA256 98aa69c687f9f4b3e94a91a6350172856f1178b87193dd5b42cb953db178879b
MD5 d9615af45a333e9b8d63ed810ad6d8cc
BLAKE2b-256 94da70da3b0bfb3e15d89eeb7cc44173ea01e7591014510459885517cfcb2a89

See more details on using hashes here.

File details

Details for the file subtitles2text-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: subtitles2text-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 5.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for subtitles2text-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a5dbb7850d5cd7e043632a95bc7730b1d45ee6ff96f3d724d477710da9634418
MD5 3b0787e1e0a8a1eda7248b18777b7936
BLAKE2b-256 c249f2c8b87d723e71a0c4b3e16d839b923f4eee19ef2076066e38871cd790d0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page