Skip to main content

Automatically synchronize and translate subtitles, or create new ones by transcribing, using pre-trained DNNs, Forced Alignments and Transformers.

Project description

subaligner

Build Status Codecov Python 3.11 Python 3.10 Python 3.9 Python 3.8 Documentation Status GitHub license PyPI Docker Pulls Citation

Supported Formats

Subtitle: SubRip, TTML, WebVTT, (Advanced) SubStation Alpha, MicroDVD, MPL2, TMP, EBU STL, SAMI, SCC and SBV.

Video/Audio: MP4, WebM, Ogg, 3GP, FLV, MOV, Matroska, MPEG TS, WAV, MP3, AAC, FLAC, etc.

:information_source: Subaligner relies on file extensions as default hints to process a wide range of audiovisual or subtitle formats. It is recommended to use extensions widely acceppted by the community to ensure compatibility.

Dependencies

Required by basic: FFmpeg

$ apt-get install ffmpeg

or

$ brew install ffmpeg

Basic Installation

$ pip install -U pip && pip install -U setuptools wheel
$ pip install subaligner

or install from source:

$ git clone git@github.com:baxtree/subaligner.git && cd subaligner
$ pip install -U pip && pip install -U setuptools
$ python setup.py install

:information_source: It is highly recommended creating a virtual environment prior to installation.

Installation with Optional Packages Supporting Additional Features

# Install dependencies for enabling translation and transcription

$ pip install 'subaligner[llm]'
# Install dependencies for enabling forced alignment

$ pip install 'subaligner[stretch]'
# Install dependencies for setting up the development environment

$ pip install 'subaligner[dev]'

Note that both subaligner[stretch] and subaligner[dev] require additional dependencies to be pre-installed:

$ apt-get install espeak libespeak1 libespeak-dev espeak-data

or

$ brew install espeak

To install all supported features:

$ pip install 'subaligner[harmony]'

Container Support

If you prefer using a containerised environment over installing everything locally, run:

$ docker run -v `pwd`:`pwd` -w `pwd` -it baxtree/subaligner bash

For Windows users, you can use Windows Subsystem for Linux (WSL) to install Subaligner. Alternatively, you can use Docker Desktop to pull and run the image. Assuming your media assets are stored under d:\media, open built-in command prompt, PowerShell, or Windows Terminal and run:

docker pull baxtree/subaligner
docker run -v "/d/media":/media -w "/media" -it baxtree/subaligner bash

Usage

# Single-stage alignment (high-level shift with lower latency)

$ subaligner -m single -v video.mp4 -s subtitle.srt
$ subaligner -m single -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt
# Dual-stage alignment (low-level shift with higher latency)

$ subaligner -m dual -v video.mp4 -s subtitle.srt
$ subaligner -m dual -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt
# Generate subtitles by transcribing audiovisual files
$ subaligner -m transcribe -v video.mp4 -ml eng -mr whisper -mf small -o subtitle_aligned.srt
$ subaligner -m transcribe -v video.mp4 -ml zho -mr whisper -mf medium -o subtitle_aligned.srt
# Alignment on segmented plain texts (double newlines as the delimiter)

$ subaligner -m script -v test.mp4 -s subtitle.txt -o subtitle_aligned.srt
$ subaligner -m script -v https://example.com/video.mp4 -s https://example.com/subtitle.txt -o subtitle_aligned.srt
# Alignment on multiple subtitles against the single media file

$ subaligner -m script -v test.mp4 -s subtitle_lang_1.txt -s subtitle_lang_2.txt
$ subaligner -m script -v test.mp4 -s subtitle_lang_1.txt subtitle_lang_2.txt
# Alignment on embedded subtitles

$ subaligner -m single -v video.mkv -s embedded:stream_index=0 -o subtitle_aligned.srt
$ subaligner -m dual -v video.mkv -s embedded:stream_index=0 -o subtitle_aligned.srt
# Translative alignment with the ISO 639-3 language code pair (src,tgt)

$ subaligner --languages
$ subaligner -m single -v video.mp4 -s subtitle.srt -t src,tgt
$ subaligner -m dual -v video.mp4 -s subtitle.srt -t src,tgt
$ subaligner -m script -v test.mp4 -s subtitle.txt -o subtitle_aligned.srt -t src,tgt
$ subaligner -m dual -v video.mp4 -tr helsinki-nlp -o subtitle_aligned.srt -t src,tgt
$ subaligner -m dual -v video.mp4 -tr facebook-mbart -tf large -o subtitle_aligned.srt -t src,tgt
$ subaligner -m dual -v video.mp4 -tr whisper -tf small -o subtitle_aligned.srt -t src,eng
# Transcribe audiovisual files and generate translated subtitles

$ subaligner -m transcribe -v video.mp4 -ml src -mr whisper -mf small -tr helsinki-nlp -o subtitle_aligned.srt -t src,tgt
# Shift subtitle manually by offset in seconds

$ subaligner -m shift --subtitle_path subtitle.srt -os 5.5
$ subaligner -m shift --subtitle_path subtitle.srt -os -5.5 -o subtitle_shifted.srt
# Run batch alignment against directories

$ subaligner_batch -m single -vd videos/ -sd subtitles/ -od aligned_subtitles/
$ subaligner_batch -m dual -vd videos/ -sd subtitles/ -od aligned_subtitles/
$ subaligner_batch -m dual -vd videos/ -sd subtitles/ -od aligned_subtitles/ -of ttml
# Run alignments with pipx

$ pipx run subaligner -m single -v video.mp4 -s subtitle.srt
$ pipx run subaligner -m dual -v video.mp4 -s subtitle.srt
# Run the module as a script
$ python -m subaligner -m single -v video.mp4 -s subtitle.srt
$ python -m subaligner -m dual -v video.mp4 -s subtitle.srt
# Run alignments with the docker image

$ docker pull baxtree/subaligner
$ docker run -v `pwd`:`pwd` -w `pwd` -it baxtree/subaligner subaligner -m single -v video.mp4 -s subtitle.srt
$ docker run -v `pwd`:`pwd` -w `pwd` -it baxtree/subaligner subaligner -m dual -v video.mp4 -s subtitle.srt
$ docker run -it baxtree/subaligner subaligner -m single -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt
$ docker run -it baxtree/subaligner subaligner -m dual -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt

The aligned subtitle will be saved at subtitle_aligned.srt. For details on CLIs, run subaligner -h or subaligner_batch -h, subaligner_convert -h, subaligner_train -h and subaligner_tune -h for additional utilities. subaligner_1pass and subaligner_2pass are shortcuts for running subaligner with -m single and -m dual options, respectively.

Advanced Usage

You can train a new model with your own audiovisual files and subtitle files:

$ subaligner_train -vd VIDEO_DIRECTORY -sd SUBTITLE_DIRECTORY -tod TRAINING_OUTPUT_DIRECTORY

Then you can apply it to your subtitle synchronisation with the aforementioned commands. For more details on how to train and tune your own model, please refer to Subaligner Docs.

Anatomy

Subtitles can be out of sync with their companion audiovisual media files for a variety of causes including latency introduced by Speech-To-Text on live streams or calibration and rectification involving human intervention during post-production.

A model has been trained with synchronised video and subtitle pairs and later used for predicating shifting offsets and directions under the guidance of a dual-stage aligning approach.

First Stage (Global Alignment):

Second Stage (Parallelised Individual Alignment):

Acknowledgement

This tool wouldn't be possible without the following packages: librosa tensorflow scikit-learn pycaption pysrt pysubs2 aeneas transformers openai-whisper.

Thanks to Alan Robinson and Nigel Megitt for their invaluable feedback.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

subaligner-0.3.7.tar.gz (1.7 MB view details)

Uploaded Source

Built Distributions

subaligner-0.3.7-py311-none-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded Python 3.11 macOS 11.0+ ARM64

subaligner-0.3.7-py311-none-any.whl (1.2 MB view details)

Uploaded Python 3.11

subaligner-0.3.7-py310-none-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded Python 3.10 macOS 11.0+ ARM64

subaligner-0.3.7-py310-none-any.whl (1.2 MB view details)

Uploaded Python 3.10

subaligner-0.3.7-py39-none-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded Python 3.9 macOS 11.0+ ARM64

subaligner-0.3.7-py39-none-any.whl (1.2 MB view details)

Uploaded Python 3.9

subaligner-0.3.7-py38-none-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded Python 3.8 macOS 11.0+ ARM64

subaligner-0.3.7-py38-none-any.whl (1.2 MB view details)

Uploaded Python 3.8

File details

Details for the file subaligner-0.3.7.tar.gz.

File metadata

  • Download URL: subaligner-0.3.7.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/37.3 requests/2.31.0 requests-toolbelt/0.10.1 urllib3/1.26.15 tqdm/4.65.0 importlib-metadata/6.1.0 keyring/23.13.1 rfc3986/2.0.0 colorama/0.4.6 CPython/3.8.2

File hashes

Hashes for subaligner-0.3.7.tar.gz
Algorithm Hash digest
SHA256 43a26ee66705614394642e773ac3a7177b32902c5c429c4eac4bc1d05936c3ab
MD5 3a6b4bc350f1cddc0a3f803b6e6f9397
BLAKE2b-256 a44582f3b170df800f536e9707e13cf31acdfc20f0d11ecfdb0c603737adc928

See more details on using hashes here.

File details

Details for the file subaligner-0.3.7-py311-none-macosx_11_0_arm64.whl.

File metadata

  • Download URL: subaligner-0.3.7-py311-none-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3.11, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/42.0 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/1.26.18 tqdm/4.66.1 importlib-metadata/6.8.0 keyring/24.3.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.11.1

File hashes

Hashes for subaligner-0.3.7-py311-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 cb2943e1345815cb182d7a36dba36002adab7ac003e2ccb00c9388215082f795
MD5 85823641db6f8d1f292c2f723e774957
BLAKE2b-256 280efa621b86cdb29ee95a4f1adfcd65f8bcb9fcbdaab9b328be726d212bf14b

See more details on using hashes here.

File details

Details for the file subaligner-0.3.7-py311-none-any.whl.

File metadata

  • Download URL: subaligner-0.3.7-py311-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3.11
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/42.0 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/1.26.18 tqdm/4.66.1 importlib-metadata/7.0.0 keyring/24.3.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.11.1

File hashes

Hashes for subaligner-0.3.7-py311-none-any.whl
Algorithm Hash digest
SHA256 c35411f6a483a48a5e43a0a349fb417b639c69c1545bd28437497f81c6ebe9ea
MD5 2f4282c8384ce41bae9cd08c51b5961d
BLAKE2b-256 3ca2909c42c30443cd047cd13f5fd7525a81dde860e3b2dfefa1b1ea73b14f62

See more details on using hashes here.

File details

Details for the file subaligner-0.3.7-py310-none-macosx_11_0_arm64.whl.

File metadata

  • Download URL: subaligner-0.3.7-py310-none-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3.10, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/40.0 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/1.26.16 tqdm/4.65.0 importlib-metadata/6.8.0 keyring/24.2.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.10.10

File hashes

Hashes for subaligner-0.3.7-py310-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e84b6fae9cbd56a934b0c15e55f4742bdeb4ef185ebbccd829069e9eefa83a3f
MD5 531d7916ba19e494f92eb995956d3b50
BLAKE2b-256 41ff416978624d1d1512a7154b6fe8143e6eb3d427adc05bd50b47004131debf

See more details on using hashes here.

File details

Details for the file subaligner-0.3.7-py310-none-any.whl.

File metadata

  • Download URL: subaligner-0.3.7-py310-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3.10
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/37.3 requests/2.31.0 requests-toolbelt/0.10.1 urllib3/1.26.15 tqdm/4.65.0 importlib-metadata/6.4.1 keyring/23.13.1 rfc3986/2.0.0 colorama/0.4.6 CPython/3.10.2

File hashes

Hashes for subaligner-0.3.7-py310-none-any.whl
Algorithm Hash digest
SHA256 f15f737907652533210550fcbc1ae556c38d8e865803f48a0e48f74a805191b1
MD5 31ba4e89a061549056f7b003785b95b7
BLAKE2b-256 985855cc7fe83e1b3302c6a9b7d4fd6f7278c5458ac48091f2e8ba165e1c309b

See more details on using hashes here.

File details

Details for the file subaligner-0.3.7-py39-none-macosx_11_0_arm64.whl.

File metadata

  • Download URL: subaligner-0.3.7-py39-none-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3.9, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/40.0 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/1.26.16 tqdm/4.65.0 importlib-metadata/6.8.0 keyring/24.2.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.9.12

File hashes

Hashes for subaligner-0.3.7-py39-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ac6c52e6dbb6823c20fa2745431c23e2c83178bb6e70a4801d61f595e25c408c
MD5 a82f437e1276e19bec28ff245508a6d9
BLAKE2b-256 ef988cabdd719eae55f31169297a8c2b22a3bcc1ee9a5df5e753005bc7b7e320

See more details on using hashes here.

File details

Details for the file subaligner-0.3.7-py39-none-any.whl.

File metadata

  • Download URL: subaligner-0.3.7-py39-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/37.3 requests/2.31.0 requests-toolbelt/0.10.1 urllib3/1.26.15 tqdm/4.65.0 importlib-metadata/6.4.1 keyring/23.13.1 rfc3986/2.0.0 colorama/0.4.6 CPython/3.9.4

File hashes

Hashes for subaligner-0.3.7-py39-none-any.whl
Algorithm Hash digest
SHA256 0853b258115aafe853282f47144e7481b1fb267b42031fa3a6530dd210b7fbc4
MD5 1395ae2e56a86e09d5598783afbc2207
BLAKE2b-256 f8fed804413870ec8aafb8783215932090a02dab1d853e81011902b6d83aea10

See more details on using hashes here.

File details

Details for the file subaligner-0.3.7-py38-none-macosx_11_0_arm64.whl.

File metadata

  • Download URL: subaligner-0.3.7-py38-none-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3.8, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/40.0 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/1.26.16 tqdm/4.65.0 importlib-metadata/6.8.0 keyring/25.2.1 rfc3986/2.0.0 colorama/0.4.6 CPython/3.8.12

File hashes

Hashes for subaligner-0.3.7-py38-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8aad45c8e884cfb05579c03b75f26234015de7bc97a80d0e4b7f943aafb11caf
MD5 d7510768cfca10d890a5dab4910ebeb9
BLAKE2b-256 9d50c83ef80d00d28a754d4b82bfb847ec67d44c7629673e7dcb0ae50495309f

See more details on using hashes here.

File details

Details for the file subaligner-0.3.7-py38-none-any.whl.

File metadata

  • Download URL: subaligner-0.3.7-py38-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/37.3 requests/2.31.0 requests-toolbelt/0.10.1 urllib3/1.26.15 tqdm/4.65.0 importlib-metadata/6.1.0 keyring/23.13.1 rfc3986/2.0.0 colorama/0.4.6 CPython/3.8.2

File hashes

Hashes for subaligner-0.3.7-py38-none-any.whl
Algorithm Hash digest
SHA256 e794dc5e3e7f95376ba7062f861239e09bd5b8e2d99c73461c9651d9ff265168
MD5 f669d865bfa392fb58d60a6ef50d626e
BLAKE2b-256 7262e0df2d48f7e78876b01f8ee113c08a158ce8e26a280a42ce213c0d311e31

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page