Skip to main content

Utility to extract the contents of a subtitle file

Project description

pysub-parser

Build Status codecov PEP8 license: MIT

Utility to extract the contents of a subtitle file.

Supported types:

For more information: http://write.flossmanuals.net/video-subtitling/file-formats

Usage

The method parse requires the following parameters:

  • path: location of the subtitle file.
  • subtype: one of the supported file types, by default file extension is used.
  • encoding: encoding of the file, utf-8 by default.
  • **kwargs: optional parameters.
    • fps: framerate (only used by sub files), 23.976 by default.
from parser import parse

subtitles = parse('./files/space-jam.srt')

for subtitle in subtitles:
    print('{} > {}'.format(subtitle.index, subtitle.text))

Output:

1 > [BALL BOUNCING]
2 > Michael?
3 > What are you doing out here, son? It's after midnight.
4 > MICHAEL: Couldn't sleep, Pops.
5 > Well, neither can we, with all that noise you're making.
6 > Come on, let's go inside.
7 > Just one more shot?


Subtitle Class

Each line of a dialogue is represented with a Subtitle object with the following properties:

  • index: position in the file.
  • start: timestamp of the start of the dialog.
  • end: timestamp of the end of the dialog.
  • text: dialog contents.

text clean up:

The class Subtitle provides a method clean_up to normalize its text, this will lower case it and remove anything that isn't letters or numbers.

  • to_lowercase: if False, the string wont be transformed to lowercase.
  • to_ascii: if True, every character will be transformed to their closest ascii representation.
  • remove_brackes: if True, everything inside [brackets] will be removed.
  • remove_format: if True, every formatting tag <i>abc</i> will be removed.
from parser import parse

subtitles = parse('./files/space-jam.srt')

for subtitle in subtitles:
    print('{} > {}'.format(subtitle.index, subtitle.clean_up(to_ascii=True, remove_brackets=True)))

Output:

1 > 
2 > michael
3 > what are you doing out here son its after midnight
4 > michael couldnt sleep pops
5 > well neither can we with all that noise youre making
6 > come on lets go inside
7 > just one more shot

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysub-parser-1.0.1.tar.gz (4.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page