Skip to main content

Caption/subtitle processing library with multi-format support (SRT, VTT, ASS, TTML, TextGrid, NLE formats)

Project description

lattifai-captions

Caption/subtitle processing library with comprehensive format support.

Features

  • Multi-format support: SRT, VTT, ASS, SSA, TTML, TextGrid, LRC, SRV3, and more
  • YouTube formats: SRV3 (YTT v3), YouTube VTT with word-level timestamps
  • Professional NLE formats: Avid DS, Final Cut Pro XML, Premiere Pro XML, Adobe Audition
  • Word-level timing: Karaoke-style word-by-word timestamps
  • Standardization: Netflix/BBC broadcast guidelines compliance
  • Sentence splitting: AI-powered intelligent sentence segmentation
  • Zero dependencies on heavy ML frameworks: Lightweight and fast

Installation

# Basic installation
pip install lattifai-captions

# With sentence splitting support
pip install lattifai-captions[splitting]

Quick Start

from lattifai.caption import Caption

# Read a caption file
caption = Caption.read("input.srt")

# Write to different format
caption.write("output.vtt")

# Convert to string
vtt_content = caption.to_string("vtt")

# Access segments
for segment in caption.supervisions:
    print(f"{segment.start:.2f} - {segment.end:.2f}: {segment.text}")

Supported Formats

Input/Output (Read & Write)

Format Extensions Description
SRT .srt SubRip subtitle format
VTT .vtt WebVTT, includes YouTube VTT with word-level timestamps
ASS/SSA .ass, .ssa Advanced SubStation Alpha
SRV3 .srv3, .ytt YouTube Timed Text v3 with word-level timing
SBV .sbv YouTube SubViewer format
SUB .sub MicroDVD subtitle format
SAMI .sami, .smi SAMI subtitle format
JSON .json Structured data with word-level support
CSV/TSV .csv, .tsv Tabular formats
TextGrid .textgrid Praat TextGrid format
LRC .lrc Lyrics format with word-level timestamps
Gemini .md Gemini AI transcript markdown

Output Only

Format Extensions Description
TTML .ttml Timed Text Markup Language (W3C standard)
IMSC1 .ttml Netflix/streaming TTML profile
EBU-TT-D .ttml European broadcast TTML profile
Avid DS .txt Avid Media Composer SubCap
FCPXML .fcpxml Final Cut Pro XML
Premiere XML .xml Adobe Premiere Pro XML
Audition CSV .csv Adobe Audition markers
EdiMarker CSV .csv Pro Tools markers

Word-Level Timing

Many formats support word-level timing for karaoke-style output:

from lattifai.caption import Caption

caption = Caption.read("input.srv3")  # SRV3 has built-in word timing

# Access word-level alignment
for segment in caption.supervisions:
    if segment.alignment and "word" in segment.alignment:
        for word in segment.alignment["word"]:
            print(f"  {word.symbol}: {word.start:.3f}s - {word.end:.3f}s")

# Export with word-level timing
caption.write("output.json", word_level=True)  # JSON preserves words array
caption.write("output.ass", word_level=True, karaoke_config=KaraokeConfig(enabled=True))

YouTube SRV3 Format

SRV3 is YouTube's proprietary timed text format with millisecond-precision word timing:

from lattifai.caption import Caption

# Read SRV3 (automatically extracts word-level timing)
caption = Caption.read("video.srv3")

# Convert to other formats
caption.write("output.srt")  # Standard SRT
caption.write("output.vtt", word_level=True)  # VTT with word timing
caption.write("output.srv3", word_level=True)  # Back to SRV3

SRV3 structure example:

<timedtext format="3">
  <body>
    <p t="240" d="6559" w="1">
      <s ac="0">Does</s>
      <s t="320" ac="0"> fast</s>
      <s t="560" ac="0"> charging</s>
    </p>
  </body>
</timedtext>

Sentence Splitting

Split captions into natural sentences (requires [splitting] extra):

from lattifai.caption import Caption, SentenceSplitter

# Using Caption method
caption = Caption.read("input.srt")
split_caption = caption.split_sentences()

# Or use SentenceSplitter directly
splitter = SentenceSplitter()
split_supervisions = splitter.split_sentences(caption.supervisions)

Format Conversion

from lattifai.caption import Caption

# Read any format
caption = Caption.read("input.srt")

# Write to any supported format
caption.write("output.vtt")
caption.write("output.ass")
caption.write("output.json")
caption.write("output.srv3", word_level=True)
caption.write("output.ttml")

# Or get as string
srt_content = caption.to_string("srt")
json_content = caption.to_string("json", word_level=True)

Standardization

Apply broadcast standards to captions:

from lattifai.caption import Caption, CaptionStandardizer

standardizer = CaptionStandardizer(
    min_duration=0.7,      # Minimum segment duration
    max_duration=7.0,      # Maximum segment duration
    min_gap=0.08,          # Minimum gap between segments
    max_lines=2,           # Maximum lines per segment
    max_chars_per_line=42, # Maximum characters per line
)

caption = Caption.read("input.srt")
standardized = standardizer.process(caption.supervisions)

Validation

Check captions against quality standards:

from lattifai.caption import Caption, CaptionValidator

validator = CaptionValidator(
    min_duration=0.7,
    max_duration=7.0,
    min_gap=0.08,
    max_chars_per_line=42,
)

caption = Caption.read("input.srt")
result = validator.validate(caption.supervisions)

print(f"Valid: {result.valid}")
print(f"Average CPS: {result.avg_cps:.1f}")
print(f"Max CPL: {result.max_cpl}")
print(f"Warnings: {result.warnings}")

API Reference

Caption Class

from lattifai.caption import Caption

# Class methods
Caption.read(path, format=None, normalize_text=True)
Caption.from_string(content, format)
Caption.from_supervisions(supervisions, language=None, metadata=None)

# Instance methods
caption.write(path, include_speaker=True, word_level=False, karaoke_config=None)
caption.to_string(format, include_speaker=True, word_level=False, karaoke_config=None)
caption.split_sentences()
caption.shift_time(seconds)

# Properties
caption.supervisions  # List[Supervision]
caption.duration      # Total duration in seconds
caption.language      # Language code
caption.source_format # Original format

Supervision Class

from lattifai.caption import Supervision

sup = Supervision(
    start=0.0,           # Start time in seconds
    duration=2.5,        # Duration in seconds
    text="Hello world",  # Caption text
    speaker="Alice",     # Optional speaker label
    alignment=None,      # Optional word-level alignment
)

# Properties
sup.end       # start + duration
sup.text      # Caption text
sup.speaker   # Speaker label
sup.alignment # Dict with "word" key containing AlignmentItem list

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lattifai_captions-0.1.8.tar.gz (80.5 kB view details)

Uploaded Source

File details

Details for the file lattifai_captions-0.1.8.tar.gz.

File metadata

  • Download URL: lattifai_captions-0.1.8.tar.gz
  • Upload date:
  • Size: 80.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lattifai_captions-0.1.8.tar.gz
Algorithm Hash digest
SHA256 310c2b8993b10d52c823aa999494142c7c493f0ff3871a7d38cf0396e005f7ae
MD5 fdad5a8b721c8e254a7b1a7330c277e2
BLAKE2b-256 7069a6af2bafb3fc339587584f482b213075f397ad2a1eb49b76cb64c0e4a453

See more details on using hashes here.

Provenance

The following attestation bundles were made for lattifai_captions-0.1.8.tar.gz:

Publisher: publish-wheels.yml on lattifai/captions

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page