Semantic subtitle aligner and merger for bilingual subtitle syncing.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

CK-Explorer

These details have not been verified by PyPI

Project links

Documentation

Project description

🎬 DuoSubs

Merging subtitles using only the nearest timestamp often leads to incorrect pairings — lines may end up out of sync, duplicated, or mismatched.

This Python tool uses semantic similarity (via Sentence Transformers) to align subtitle lines based on meaning instead of timestamps — making it possible to pair subtitles across different languages.

✨ Features

📌 Aligns subtitle lines based on meaning, not timing
🌍 Multilingual support based on the user selected Sentence Transformer model
📄 Flexible format support — works with SRT, VTT, MPL2, TTML, ASS, SSA files
🧩 Easy-to-use Python API for integration
💻 Command-line interface with customizable options
🌐 Web UI — run locally or in the cloud via Google Colab or Hugging Face Spaces

☁️ Cloud Deployment

You can launch the Web UI instantly without installing anything locally by running it in the cloud.

[!NOTE]

Google Colab has a limited runtime allocation, especially when using the free instance.

On Hugging Face Spaces, only a few models are preloaded, and inference can be slower because it runs on CPU.

💻 Local Deployment

🛠️ Installation

Install the correct version of PyTorch for your system by following the official instructions: https://pytorch.org/get-started/locally
Install this repo via pip:
```
pip install duosubs
```

🚀 Usage

🌐 Launch Web UI Locally

You can launch the web UI locally:

via command line
```
duosubs launch-webui
```

via Python API

from duosubs import create_duosubs_gr_blocks

# Build the Web UI layout (Gradio Blocks)
webui = create_duosubs_gr_blocks() 

# These commands work just like launching a regular Gradio app
webui.queue(default_concurrency_limit=None) # Allow unlimited concurrent requests
webui.launch(inbrowser=True)                # Start the Web UI and open it in a browser tab

This starts the server, prints its url (e.g. http://127.0.0.1:7860), and then opens the Web UI in a new browser tab.

If you want to launch it in other url (e.g. 0.0.0.0) and port (e.g 8000), you can run:

via command line

duosubs launch-webui --host 0.0.0.0 --port 8000

via Python API

from duosubs import create_duosubs_gr_blocks

webui = create_duosubs_gr_blocks() 

webui.queue(default_concurrency_limit=None)
webui.launch(
    server_name = "0.0.0.0",    # use different address
    server_port = 8000,         # use different port number
    inbrowser=True
)

[!WARNING]

The Web UI caches files during processing, and clears files older than 2 hours every 1 hour. Cached data may remain if the server stops unexpectedly.

Sometimes, older model may fail to be released after switching or closing sessions. If you run out of RAM or VRAM, simply restart the script.

To learn more about the launching options, please see the documentation.

💻 Merge Subtitles

With the demo files provided, here are the simplest way to merge the subtitles:

via command line

duosubs merge -p demo/primary_sub.srt -s demo/secondary_sub.srt

via Python API

from duosubs import MergeArgs, run_merge_pipeline

# Store all arguments
args = MergeArgs(
    primary="demo/primary_sub.srt",
    secondary="demo/secondary_sub.srt"
)

# Load, merge, and save subtitles.
run_merge_pipeline(args, print)

These codes will produce primary_sub.zip, with the following structure:

primary_sub.zip
├── primary_sub_combined.ass   # Merged subtitles
├── primary_sub_primary.ass    # Original primary subtitles
└── primary_sub_secondary.ass  # Time-shifted secondary subtitles

By default, the Sentence Transformer model used is LaBSE.

If you want to experiment with different models, then pick one from 🤗 Hugging Face or check out from the leaderboard for top performing model.

For example, if the model chosen is Qwen/Qwen3-Embedding-0.6B, you can run:

via command line

duosubs merge -p demo/primary_sub.srt -s demo/secondary_sub.srt --model Qwen/Qwen3-Embedding-0.6B

via Python API

from duosubs import MergeArgs, run_merge_pipeline

# Store all arguments
args = MergeArgs(
    primary="demo/primary_sub.srt",
    secondary="demo/secondary_sub.srt",
    model="Qwen/Qwen3-Embedding-0.6B"
)

# Load, merge, and save subtitles.
run_merge_pipeline(args, print)

[!WARNING]

Some models may require significant RAM or GPU (VRAM) to run, and might not be compatible with all devices — especially larger models.

Also, please ensure the selected model supports your desired language for reliable results.

To learn more about this tool, please see the documentation.

📚 Behind the Scenes

Parse subtitles and detect language.
Tokenize subtitle lines.
Extract and filter non-overlapping subtitles. (Optional)
Estimate tokenized subtitle pairings using DTW.
Refine alignment using a sliding window approach.
Combine aligned and non-overlapping subtitles.
Eliminate unnecessary newline within subtitle lines.

🚫 Known Limitations

The accuracy of the merging process varies on the model selected.
Some models may produce unreliable results for unsupported or low-resource languages.
Some sentence fragments from secondary subtitles may be misaligned to the primary subtitles line due to the tokenization algorithm used.
Secondary subtitles might contain extra whitespace as a result of token-level merging.
The algorithm may not work reliably if the timestamps of some matching lines don’t overlap at all.

[!TIP] For the final known limitation, there are three possible ways to address it:

If all subtitle lines are completely out of sync, consider using another subtitle syncing tool first to align them, e.g.

smacke/ffsubsync

sc0ty/subsync

kaegi/alass

before using this tool with ignore-non-overlap-filter disabled.

Alternatively, see points 2 and 3.

If both subtitle files are known to be perfectly semantically aligned, meaning:

matching dialogue contents

no extra lines like scene annotations or bonus Director’s Cut stuff.

Then, just enable the ignore-non-overlap-filter option in either:

Web UI (Advanced Configurations → Alignment Behavior)

CLI (--ignore-non-overlap-filter)

Python API (see documentation)

to skip the overlap check — the merge should go smoothly from there.

If the subtitle timings are off and the two subtitle files don’t fully match in content, the algorithm likely won’t produce great results. Still, you can try running it with ignore-non-overlap-filter enabled.

🙏 Acknowledgements

This project wouldn't be possible without the incredible work of the open-source community. Special thanks to:

sentence-transformers — for the semantic embedding backbone
Hugging Face — for hosting models and making them easy to use
PyTorch — for providing the deep learning framework
fastdtw — for aligning the subtitles
lingua-py — for detecting the subtitles' language codes
pysubs2 — for subtitle file I/O utilities
charset_normalizer — for identifying the file encoding
typer — for CLI application
tqdm — for displaying progress bar
gradio — for creating Web UI application
Tears of Steel — subtitles used for demo, testing and development purposes. Created by the Blender Foundation, licensed under CC BY 3.0.

🤝 Contributing

Contributions are welcome! If you'd like to submit a pull request, please check out the contributing guidelines.

🔑 License

Apache-2.0 license - see the LICENSE file for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

CK-Explorer

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

1.2.0

Mar 15, 2026

1.1.0

Aug 19, 2025

1.0.1

Aug 14, 2025

This version

1.0.0

Aug 13, 2025

0.2.0

Jul 23, 2025

0.1.0

Jul 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duosubs-1.0.0.tar.gz (45.6 kB view details)

Uploaded Aug 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

duosubs-1.0.0-py3-none-any.whl (50.0 kB view details)

Uploaded Aug 13, 2025 Python 3

File details

Details for the file duosubs-1.0.0.tar.gz.

File metadata

Download URL: duosubs-1.0.0.tar.gz
Upload date: Aug 13, 2025
Size: 45.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for duosubs-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`4bdc78bf03e2003bad43de09cc73a602ccef39e73872caffa7a0d01341e6320d`
MD5	`5e20904816019ae3bbbeba332f4eff70`
BLAKE2b-256	`060cf4f8511852a789dc79aa7879248b1d003e5fda4ab8f568b1ec4b9c8a3a33`

See more details on using hashes here.

Provenance

The following attestation bundles were made for duosubs-1.0.0.tar.gz:

Publisher: release.yml on CK-Explorer/DuoSubs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: duosubs-1.0.0.tar.gz
- Subject digest: 4bdc78bf03e2003bad43de09cc73a602ccef39e73872caffa7a0d01341e6320d
- Sigstore transparency entry: 391814263
- Sigstore integration time: Aug 13, 2025
Source repository:
- Permalink: CK-Explorer/DuoSubs@b1a4661e0a58cbc3f30a66cc3ed2654a63f92d3f
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/CK-Explorer
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@b1a4661e0a58cbc3f30a66cc3ed2654a63f92d3f
- Trigger Event: push

File details

Details for the file duosubs-1.0.0-py3-none-any.whl.

File metadata

Download URL: duosubs-1.0.0-py3-none-any.whl
Upload date: Aug 13, 2025
Size: 50.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for duosubs-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9f63f76681ba20fa393ae1f4ed5cffd0dc3125ac649759af3e6d4b81d01964bb`
MD5	`8758e0a9814a91ecd1cc6c8eeab2fb65`
BLAKE2b-256	`96201544de8ab7fec5b839360ed5f851a359e482d1ed9a25f0128bf4522b48ef`

See more details on using hashes here.

Provenance

The following attestation bundles were made for duosubs-1.0.0-py3-none-any.whl:

Publisher: release.yml on CK-Explorer/DuoSubs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: duosubs-1.0.0-py3-none-any.whl
- Subject digest: 9f63f76681ba20fa393ae1f4ed5cffd0dc3125ac649759af3e6d4b81d01964bb
- Sigstore transparency entry: 391814275
- Sigstore integration time: Aug 13, 2025
Source repository:
- Permalink: CK-Explorer/DuoSubs@b1a4661e0a58cbc3f30a66cc3ed2654a63f92d3f
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/CK-Explorer
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@b1a4661e0a58cbc3f30a66cc3ed2654a63f92d3f
- Trigger Event: push

duosubs 1.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🎬 DuoSubs

✨ Features

☁️ Cloud Deployment

💻 Local Deployment

🛠️ Installation

🚀 Usage

🌐 Launch Web UI Locally

💻 Merge Subtitles

📚 Behind the Scenes

🚫 Known Limitations

🙏 Acknowledgements

🤝 Contributing

🔑 License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance