PyCaptions, parser and converter for captions formats
Project description
PyCaptions
PyCaptions is a caption reading/writing library.Why LGPL-3.0? This is just to ensure that source code for the library is always under the same licence and cannot be closed-sourced. All the conditions for this licence only apply for the the library itself and it's modifications. We reccomend to just contribute to the project if you are making modifications, unless they are drastic and specific to your case.
Table of Contents
Installation
- PIP
pip install --upgrade pycaptions
- Source
git clone https://github.com/adfreelife/PyCaptions.git cd PyCaptions python setup.py install
Supported Formats
- SubRip (SRT) (reader + writer)
- MicroDVD (SUB) (reader + writer*)
- Timed Text Markup Language (TTML, DFXP, XML) (reader* + writer*)
- Web Video Text Tracks Format (VTT) (reader + writer*)
reader* - does not read styling/layout/metadata
writer* - does not write styling layout/metadata
Future plans
- add writers to all supported formats
- auto-fit lines into multilines or split captions blocks into two parts
- add support for more formats
- Synchronized Accessible Media Interchange (SAMI)
- Universal Subtitle Format (USF)
- LyRiCs (LRC)
- open an issue with "enhancement" label for more
Examples
Read the wiki.
Generic from file name
from pycaptions import Captions
with Captions("tests/test.en.srt") as captions:
captions.saveVTT("test")
Generic from file stream
with open("tests/test.en.srt", encoding="UTF-8") as f:
captions = Captions(f) # or captions = Captions()
# captions.read(f)
captions.saveVTT("test")
Generic from string
srt = """1
00:00:00,500 --> 00:00:02,000
This is a test file
"""
captions = Captions(srt) # or captions = Captions()
# captions.detect(srt)
captions.saveVTT("test")
Specific reader
Have the same functions as generic, except
from pycaptions import SubRip, detectSRT
with open("tests/test.en.srt", encoding="UTF-8") as f:
if detectSRT(f): # or SubRip.detect(f)
captions = SubRip().read(f)
captions.saveVTT("test")
Multilingual
from pycaptions import Captions
# if the format supports multiple languages
with Captions("tests/test.ttml") as captions:
# first line will be in english, second one in spanish
captions.saveSRT("test", ["en","es"] lines=1) # recomended to specify lines=1
# if you have multiple files and you want to make multilingual one
with Captions("tests/test.en.srt") as captions:
with Captions("tests/test.es.srt") as captions2:
# only subtitle text and comments (if format supports them) are added
captions+=captions2
# first line will be in english, second one in spanish
captions.save("test", ["en","es"], lines=1) # recomended to specify lines=1
Combine files
with Captions("tests/test.en.srt") as captions:
captions.joinFile("tests/test.en.srt", add_end_time=True)
captions.save("test")
Changelog
v0.6.0
Release date: 2024-01-26
Changes:
- Added support for inline style conversion for MicroDVD
- Added
style
argument to readers, possible valuesNone
(no styling), defaultfull
(converts inline styles only for now) - Added
lines
argument to readers, possible values default-1
(preserves original),0
(automatically determins number of lines, works only withstyle=None
for now),1
(fits everything in one line),n
(positive integer bigger than 1, fits text inton
lines, works only withstyle=None
for now) - Removed
no_styling
argument, replaced bystyle=None
- Renamed
Block.getLines
toBlock.get_lines
- TTML writer now writes multilingual files the same way as other writers by default, add
mark_language_type=True
to make it write the same as before - Added dependency for
webcolors
to transform web color names to hex colors - Added decorators
@captionsDetector
,@captionsReader
,@captionsWriter
for better code structure - Added
MicroTime.recalculate
to recalculate time into the right values (e.g. 99min -> 1h 39min) - Moved
CaptionsFormat.checkContent
andCaptionsFormat.getGenerator
to decorators that used them - Added
Captions.detectors
and improvedCaptions.get_format
function
Fixes:
- Fixed
detectTTML
not seeking file to the original offset - Fixed
MicroTime.fromTTMLTime
returning 0 instead of infinity if no valid values are provided - Fixed
TTML.reader
not adding section time to end block time - Fixed
Block.copy
not returning a deepcopy of itself - Fixed
Block
substraction and addition not usingBlock.copy
Read past changes here.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pycaptions-0.6.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 30d8c408fd63cd637e4ecaa5c3710b87cdb44b3991985e56d48f4a54d7a98889 |
|
MD5 | 91efd39d1eb13d893625a861f58ba755 |
|
BLAKE2b-256 | 4853e86f50b784c209d698ef9c2d99b8751cfa511b04526ed783af1621601394 |