Utility to extract the contents of a subtitle file
Project description
pysub-parser
Utility to extract the contents of a subtitle file.
Supported types:
ssa
: SubStation Alphasrt
: SubRipsub
: MicroDVDtxt
: Sub Viewer
For more information: http://write.flossmanuals.net/video-subtitling/file-formats
Usage
The method parse requires the following parameters:
path
: location of the subtitle file.subtype
: one of the supported file types, by default file extension is used.encoding
: encoding of the file,utf-8
by default.**kwargs
: optional parameters.fps
: framerate (only used bysub
files),23.976
by default.
from parser import parse
subtitles = parse('./files/space-jam.srt')
for subtitle in subtitles:
print('{} > {}'.format(subtitle.index, subtitle.text))
Output:
1 > [BALL BOUNCING]
2 > Michael?
3 > What are you doing out here, son? It's after midnight.
4 > MICHAEL: Couldn't sleep, Pops.
5 > Well, neither can we, with all that noise you're making.
6 > Come on, let's go inside.
7 > Just one more shot?
Subtitle Class
Each line of a dialogue is represented with a Subtitle
object with the following properties:
index
: position in the file.start
: timestamp of the start of the dialog.end
: timestamp of the end of the dialog.text
: dialog contents.
text clean up:
The class Subtitle
provides a method clean_up
to normalize its text,
this will lower case it and remove anything that isn't letters or numbers.
to_lowercase
: ifFalse
, the string wont be transformed to lowercase.to_ascii
: ifTrue
, every character will be transformed to their closest ascii representation.remove_brackes
: ifTrue
, everything inside[brackets]
will be removed.remove_format
: ifTrue
, every formatting tag<i>abc</i>
will be removed.
from parser import parse
subtitles = parse('./files/space-jam.srt')
for subtitle in subtitles:
print('{} > {}'.format(subtitle.index, subtitle.clean_up(to_ascii=True, remove_brackets=True)))
Output:
1 >
2 > michael
3 > what are you doing out here son its after midnight
4 > michael couldnt sleep pops
5 > well neither can we with all that noise youre making
6 > come on lets go inside
7 > just one more shot
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pysub-parser-1.0.tar.gz
(4.5 kB
view hashes)