Skip to main content

Utility to extract the contents of a subtitle file.

Project description

pysub-parser

Version Quality Gate Status CodeCoverage

Utility to extract the contents of a subtitle file.

Supported types:

For more information: http://write.flossmanuals.net/video-subtitling/file-formats

Usage

The method parse requires the following parameters:

  • path: location of the subtitle file.
  • subtype: one of the supported file types, by default file extension is used.
  • encoding: encoding of the file, utf-8 by default.
  • **kwargs: optional parameters.
    • fps: framerate (only used by sub files), 23.976 by default.
from pysubparser import parser

subtitles = parser.parse('./files/space-jam.srt')

for subtitle in subtitles:
    print(subtitle)

Output:

0 > [BALL BOUNCING]
1 > Michael?
2 > What are you doing out here, son? It's after midnight.
3 > MICHAEL: Couldn't sleep, Pops.

Subtitle Class

Each line of a dialogue is represented with a Subtitle object with the following properties:

  • index: position in the file.
  • start: timestamp of the start of the dialog.
  • end: timestamp of the end of the dialog.
  • text: dialog contents.
for subtitle in subtitles:
    print(f'{subtitle.start} > {subtitle.end}')
    print(subtitle.text)
    print()

Output:

00:00:36.328000 > 00:00:38.329000
[BALL BOUNCING]

00:01:03.814000 > 00:01:05.189000
Michael?

00:01:08.402000 > 00:01:11.404000
What are you doing out here, son? It's after midnight.

00:01:11.572000 > 00:01:13.072000
MICHAEL: Couldn't sleep, Pops.

Cleaners

Currently, 4 cleaners are provided:

  • ascii will translate every unicode character to its ascii equivalent.
  • brackets will remove anything between them (e.g., [BALL BOUNCING])
  • formatting will remove formatting keys like <i> and </i>.
  • lower_case will lower case all text.
from pysubparser.cleaners import ascii, brackets, formatting, lower_case

subtitles = brackets.clean(
    lower_case.clean(
        subtitles
    )
)

for subtitle in subtitles:
    print(subtitle)
0 > 
1 > michael?
2 > what are you doing out here, son? it's after midnight.
3 > michael: couldn't sleep, pops.

Writers

Given any list of Subtitle and a path it will output those subtitles in a srt format.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysub-parser-1.7.0.tar.gz (7.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pysub_parser-1.7.0-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file pysub-parser-1.7.0.tar.gz.

File metadata

  • Download URL: pysub-parser-1.7.0.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.0 CPython/3.8.13 Linux/5.15.0-1017-azure

File hashes

Hashes for pysub-parser-1.7.0.tar.gz
Algorithm Hash digest
SHA256 75a823ac87150cb3fe1cd2323f9c225bc6c61a47c91b4aa34b82d0a99c8bf328
MD5 11bd29165b527e4e76a5c90221b6d43d
BLAKE2b-256 074d1e9ddd6be94fc09be5a239f8912d8916e7a6ce5eddd2cadc5a66b2c43140

See more details on using hashes here.

File details

Details for the file pysub_parser-1.7.0-py3-none-any.whl.

File metadata

  • Download URL: pysub_parser-1.7.0-py3-none-any.whl
  • Upload date:
  • Size: 11.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.0 CPython/3.8.13 Linux/5.15.0-1017-azure

File hashes

Hashes for pysub_parser-1.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d5ce5cb80096d8f508114fcd69d5d6f76b3f0420380fda39b3ed8786e827828d
MD5 f084c2b3482c13857217a369f084433b
BLAKE2b-256 873b04de852c5f050eef6f80051be0331ad7002b448d0337cdd600b908d8e5f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page