Skip to main content

Semantic subtitle aligner and merger for bilingual subtitle syncing.

Project description

๐ŸŽฌ DuoSubs

CI PyPI version Python Versions License: Apache-2.0 Type Checked: Mypy Code Style: Ruff codecov Documentation Status Open In Colab Hugging Face Spaces

Merging subtitles using only the nearest timestamp often leads to incorrect pairings โ€” lines may end up out of sync, duplicated, or mismatched.

This Python tool uses semantic similarity (via Sentence Transformers) to align subtitle lines based on meaning instead of timestamps โ€” making it possible to pair subtitles across different languages.


โœจ Features

  • ๐Ÿ“Œ Aligns subtitle lines based on meaning, not timing
  • ๐ŸŒ Multilingual support based on the user selected Sentence Transformer model
  • ๐Ÿ“„ Flexible format support โ€” works with SRT, VTT, MPL2, TTML, ASS, SSA files
  • ๐Ÿงฉ Easy-to-use Python API for integration
  • ๐Ÿ’ป Command-line interface with customizable options
  • ๐ŸŒ Web UI โ€” run locally or in the cloud via Google Colab or Hugging Face Spaces

โ˜๏ธ Cloud Deployment

You can launch the Web UI instantly without installing anything locally by running it in the cloud.

  • Open In Colab
  • Hugging Face Spaces

[!NOTE]

  • Google Colab has a limited runtime allocation, especially when using the free instance.
  • On Hugging Face Spaces, only a few models are preloaded, and inference can be slower because it runs on CPU.

๐Ÿ’ป Local Deployment

๐Ÿ› ๏ธ Installation

  1. Install the correct version of PyTorch for your system by following the official instructions: https://pytorch.org/get-started/locally
  2. Install this repo via pip:
    pip install duosubs
    

๐Ÿš€ Usage

๐ŸŒ Launch Web UI Locally

You can launch the web UI locally:

  • via command line

    duosubs launch-webui
    
  • via Python API

    from duosubs import create_duosubs_gr_blocks
    
    # Build the Web UI layout (Gradio Blocks)
    webui = create_duosubs_gr_blocks() 
    
    # These commands work just like launching a regular Gradio app
    webui.queue(default_concurrency_limit=None) # Allow unlimited concurrent requests
    webui.launch(inbrowser=True)                # Start the Web UI and open it in a browser tab
    

This starts the server, prints its url (e.g. http://127.0.0.1:7860), and then opens the Web UI in a new browser tab.

If you want to launch it in other url (e.g. 0.0.0.0) and port (e.g 8000), you can run:

  • via command line

    duosubs launch-webui --host 0.0.0.0 --port 8000
    
  • via Python API

    from duosubs import create_duosubs_gr_blocks
    
    webui = create_duosubs_gr_blocks() 
    
    webui.queue(default_concurrency_limit=None)
    webui.launch(
        server_name = "0.0.0.0",    # use different address
        server_port = 8000,         # use different port number
        inbrowser=True
    )
    

[!WARNING]

  • The Web UI caches files during processing, and clears files older than 2 hours every 1 hour. Cached data may remain if the server stops unexpectedly.
  • Sometimes, older model may fail to be released after switching or closing sessions. If you run out of RAM or VRAM, simply restart the script.

To learn more about the launching options, please see the sections of Launch Web UI Command and Web UI Launching in the documentation.

๐Ÿ’ป Merge Subtitles

With the demo files provided, here are the simplest way to merge the subtitles:

  • via command line

    duosubs merge -p demo/primary_sub.srt -s demo/secondary_sub.srt
    
  • via Python API

    from duosubs import MergeArgs, run_merge_pipeline
    
    # Store all arguments
    args = MergeArgs(
        primary="demo/primary_sub.srt",
        secondary="demo/secondary_sub.srt"
    )
    
    # Load, merge, and save subtitles.
    run_merge_pipeline(args, print)
    

These codes will produce primary_sub.zip, with the following structure:

primary_sub.zip
โ”œโ”€โ”€ primary_sub_combined.ass   # Merged subtitles
โ”œโ”€โ”€ primary_sub_primary.ass    # Original primary subtitles
โ””โ”€โ”€ primary_sub_secondary.ass  # Time-shifted secondary subtitles

By default, the Sentence Transformer model used is LaBSE.

If you want to experiment with different models, then pick one from ๐Ÿค— Hugging Face or check out from the leaderboard for top performing model.

For example, if the model chosen is Qwen/Qwen3-Embedding-0.6B, you can run:

  • via command line

    duosubs merge -p demo/primary_sub.srt -s demo/secondary_sub.srt --model Qwen/Qwen3-Embedding-0.6B
    
  • via Python API

    from duosubs import MergeArgs, run_merge_pipeline
    
    # Store all arguments
    args = MergeArgs(
        primary="demo/primary_sub.srt",
        secondary="demo/secondary_sub.srt",
        model="Qwen/Qwen3-Embedding-0.6B"
    )
    
    # Load, merge, and save subtitles.
    run_merge_pipeline(args, print)
    
  • via Web UI

    In Configurations โ†’ Model & Device โ†’ Sentence Transformer Model, replace sentence-transformers/LaBSE with Qwen/Qwen3-Embedding-0.6B.

[!WARNING]

  • Some models may require significant RAM or GPU (VRAM) to run, and might not be compatible with all devices โ€” especially larger models.
  • Also, please ensure the selected model supports your desired language for reliable results.

Also, this tool has 3 merging modes, i.e. synced, mixed, and cuts modes. Here are some of the simple guidelines to choose the appropriate mode:

  • If both subtitle files are timestamp-synced, use synced for the cleanest result.
  • If timestamps drift or only partially overlap, use mixed.
  • If subtitles come from different editions of the video, with primary subtitles being the extended or longer version, use cuts.

To merge with a specific mode (e.g. cuts), run:

  • via command line

    duosubs merge -p primary_sub.srt -s secondary_sub.srt --mode cuts
    
  • via Python API

    from duosubs import MergeArgs, MergingMode, run_merge_pipeline
    
    # Store all arguments
    args = MergeArgs(
        primary="primary_sub.srt",
        secondary="secondary_sub.srt",
        merging_mode=MergingMode.CUTS   # Modes available: MergingMode.SYNCED, MergingMode.MIXED, MergingMode.CUTS
    )
    
    # Load, merge, and save subtitles.
    run_merge_pipeline(args, print)
    
  • via Web UI

    In Configurations โ†’ Alignment Behavior โ†’ Merging Mode, choose Cuts.

[!TIP] For mixed and cuts modes, try to use subtitle files without scene annotations if possible, as they may reduce alignment quality.

To learn more about merging options, please see the sections of Merge Command and Core Subtitle Merging in the documentation.


๐Ÿ“š Behind the Scenes

  1. Parse subtitles and detect language.
  2. Tokenize subtitle lines.
  3. Extract and filter non-overlapping subtitles (synced mode only).
  4. Estimate tokenized subtitle pairings using DTW.
  5. Refine alignment using a sliding window approach with size of 3.
  6. Extract and filter extended subtitles from the primary track (cuts mode only).
  7. Refine alignment using a sliding window approach with size of 2.
  8. Combine aligned and non-overlapping subtitles or extended subtitles
  9. Eliminate unnecessary newline within subtitle lines.

๐Ÿšซ Known Limitations

  • The accuracy of the merging process varies on the model selected.
  • Some models may produce unreliable results for unsupported or low-resource languages.
  • Some sentence fragments from secondary subtitles may be misaligned to the primary subtitles line due to the tokenization algorithm used.
  • Secondary subtitles might contain extra whitespace as a result of token-level merging.
  • In mixed and cuts modes, the algorithm may not work reliably since matching lines have no timestamp overlap, and either subtitle could contain extra or missing lines.

๐Ÿ™ Acknowledgements

This project wouldn't be possible without the incredible work of the open-source community. Special thanks to:


๐Ÿค Contributing

Contributions are welcome! If you'd like to submit a pull request, please check out the contributing guidelines.


๐Ÿ”‘ License

Apache-2.0 license - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duosubs-1.2.0.tar.gz (49.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

duosubs-1.2.0-py3-none-any.whl (53.7 kB view details)

Uploaded Python 3

File details

Details for the file duosubs-1.2.0.tar.gz.

File metadata

  • Download URL: duosubs-1.2.0.tar.gz
  • Upload date:
  • Size: 49.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for duosubs-1.2.0.tar.gz
Algorithm Hash digest
SHA256 a92094093da8c7cc7794cfa5db60b63708ae8bd3edd49142bdc0d66151dc5448
MD5 81748bea2d842d3703fca445c8cd6988
BLAKE2b-256 841ad79d2776bc824703ea8342661f14337aca60a3af3da144473cb6f9c70f74

See more details on using hashes here.

Provenance

The following attestation bundles were made for duosubs-1.2.0.tar.gz:

Publisher: release.yml on CK-Explorer/DuoSubs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file duosubs-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: duosubs-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 53.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for duosubs-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9348bb2c67e9afe9a45f21f40c6fb758cc3023d1c35a174116861640f4c91b8b
MD5 ac08c3dd6ba3c19b1f0f5580c3cbc7de
BLAKE2b-256 9b89c338f80b952edca4556db5cd15e1efaec195866603b611a0d097968bb4e9

See more details on using hashes here.

Provenance

The following attestation bundles were made for duosubs-1.2.0-py3-none-any.whl:

Publisher: release.yml on CK-Explorer/DuoSubs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page