Skip to main content

Utility to extract the contents of a subtitle file.

Project description

pysub-parser

Version Quality Gate Status CodeCoverage

Utility to extract the contents of a subtitle file.

Supported types:

For more information: http://write.flossmanuals.net/video-subtitling/file-formats

Usage

The method parse requires the following parameters:

  • path: location of the subtitle file.
  • subtype: one of the supported file types, by default file extension is used.
  • encoding: encoding of the file, utf-8 by default.
  • **kwargs: optional parameters.
    • fps: framerate (only used by sub files), 23.976 by default.
from pysubparser import parser

subtitles = parser.parse('./files/space-jam.srt')

for subtitle in subtitles:
    print(subtitle)

Output:

0 > [BALL BOUNCING]
1 > Michael?
2 > What are you doing out here, son? It's after midnight.
3 > MICHAEL: Couldn't sleep, Pops.

Subtitle Class

Each line of a dialogue is represented with a Subtitle object with the following properties:

  • index: position in the file.
  • start: timestamp of the start of the dialog.
  • end: timestamp of the end of the dialog.
  • text: dialog contents.
for subtitle in subtitles:
    print(f'{subtitle.start} > {subtitle.end}')
    print(subtitle.text)
    print()

Output:

00:00:36.328000 > 00:00:38.329000
[BALL BOUNCING]

00:01:03.814000 > 00:01:05.189000
Michael?

00:01:08.402000 > 00:01:11.404000
What are you doing out here, son? It's after midnight.

00:01:11.572000 > 00:01:13.072000
MICHAEL: Couldn't sleep, Pops.

Cleaners

Currently, 4 cleaners are provided:

  • ascii will translate every unicode character to its ascii equivalent.
  • brackets will remove anything between them (e.g., [BALL BOUNCING])
  • formatting will remove formatting keys like <i> and </i>.
  • lower_case will lower case all text.
from pysubparser.cleaners import ascii, brackets, formatting, lower_case

subtitles = brackets.clean(
    lower_case.clean(
        subtitles
    )
)

for subtitle in subtitles:
    print(subtitle)
0 > 
1 > michael?
2 > what are you doing out here, son? it's after midnight.
3 > michael: couldn't sleep, pops.

Writers

Given any list of Subtitle and a path it will output those subtitles in a srt format.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysub_parser-1.7.1.tar.gz (7.5 kB view hashes)

Uploaded Source

Built Distribution

pysub_parser-1.7.1-py3-none-any.whl (11.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page