Subtitles (VTT, SRT, PDF, DOCX, HTML, images, etc) to text convertor, with a GUI, great for preprocessing to feed to LLMs
Project description
subtitles2text
Description
Convert subtitles files (vtt, srt, PDF) and any files supported by Docling (DOCX, PPTX, XLSX, images PNG/JPG/JPEG, web pages HTML/XHTML) from any metadata to only leave the text content. This is especially useful to feed to genAI models such as LLMs and GPTs.
Installation
pip install subtitles2text
Usage
subtitles2text
This will launch a Tk GUI where you can select the files you want to convert.
The app supports OCR.
License
MIT License.
Author
This app was coded using Roo Code with Gemini 2.0 flash thinking exp 01-21 under the architecture specified by Stephen Karl Larroque.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file subtitles2text-0.0.3.tar.gz.
File metadata
- Download URL: subtitles2text-0.0.3.tar.gz
- Upload date:
- Size: 6.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98aa69c687f9f4b3e94a91a6350172856f1178b87193dd5b42cb953db178879b
|
|
| MD5 |
d9615af45a333e9b8d63ed810ad6d8cc
|
|
| BLAKE2b-256 |
94da70da3b0bfb3e15d89eeb7cc44173ea01e7591014510459885517cfcb2a89
|
File details
Details for the file subtitles2text-0.0.3-py3-none-any.whl.
File metadata
- Download URL: subtitles2text-0.0.3-py3-none-any.whl
- Upload date:
- Size: 5.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5dbb7850d5cd7e043632a95bc7730b1d45ee6ff96f3d724d477710da9634418
|
|
| MD5 |
3b0787e1e0a8a1eda7248b18777b7936
|
|
| BLAKE2b-256 |
c249f2c8b87d723e71a0c4b3e16d839b923f4eee19ef2076066e38871cd790d0
|