Skip to main content

Automatically synchronize and translate subtitles with pretrained deep neural networks, forced alignments and transformers.

Project description

subaligner

Build Status Codecov Python 3.9 Python 3.8 Python 3.7 Documentation Status GitHub license PyPI Docker Hub Docker Hub

Supported Formats

Subtitle: SubRip, TTML, WebVTT, (Advanced) SubStation Alpha, MicroDVD, MPL2, TMP, EBU STL, SAMI, SCC and SBV.

Video/Audio: MP4, WebM, Ogg, 3GP, FLV, MOV, Matroska, MPEG TS, WAV, MP3, AAC, FLAC, etc.

Dependencies

Required by basic: FFmpeg

$ apt-get install ffmpeg

or

$ brew install ffmpeg

Basic Installation

$ pip install -U pip
$ pip install subaligner

Installation with Optional Packages Supporting Additional Features

# Install dependencies for enabling translation

$ pip install 'subaligner[translation]'
# Install dependencies for enabling forced alignment

$ pip install 'subaligner[stretch]'
# Install dependencies for setting up the development environment

$ pip install 'subaligner[dev]'

Note that both subaligner[stretch] and subaligner[dev] require additional dependencies to be pre-installed:

$ apt-get install espeak libespeak1 libespeak-dev espeak-data

or

$ brew install espeak

To install all supported features:

$ pip install 'subaligner[harmony]'

Alternative Installations

# Install via pipx
$ pip install -U pip pipx
$ pipx install subaligner

or

# Install from GitHub via Pipenv
$ pipenv install subaligner
$ pipenv install 'subaligner[stretch]'
$ pipenv install 'subaligner[dev]'

or

# Install from source

$ git clone git@github.com:baxtree/subaligner.git
$ cd subaligner
$ python setup.py install

or

# Use dockerised installation

$ docker run -v `pwd`:`pwd` -w `pwd` -it baxtree/subaligner bash

For users on Windows 10: Docker Desktop is the only option at present. Assuming your media assets are stored under d:\media, open built-in command prompt, PowerShell, or Windows Terminal and run:

docker pull baxtree/subaligner
docker run -v "/d/media":/media -w "/media" -it baxtree/subaligner bash

Usage

# Single-stage alignment (high-level shift with lower latency)

$ subaligner_1pass -v video.mp4 -s subtitle.srt
$ subaligner_1pass -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt
# Dual-stage alignment (low-level shift with higher latency)

$ subaligner_2pass -v video.mp4 -s subtitle.srt
$ subaligner_2pass -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt

or

# Pass in single-stage or dual-stage as the alignment mode

$ subaligner -m single -v video.mp4 -s subtitle.srt
$ subaligner -m dual -v video.mp4 -s subtitle.srt
$ subaligner -m single -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt
$ subaligner -m dual -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt
# Alignment on segmented plain texts (double newlines as the delimiter)

$ subaligner -m script -v test.mp4 -s subtitle.txt -o subtitle_aligned.srt
$ subaligner -m script -v https://example.com/video.mp4 -s https://example.com/subtitle.txt -o subtitle_aligned.srt
# Translative alignment with the ISO 639-3 language code pair (src,tgt)

$ subaligner_1pass --languages
$ subaligner_1pass -v video.mp4 -s subtitle.srt -t src,tgt
$ subaligner_2pass --languages
$ subaligner_2pass -v video.mp4 -s subtitle.srt -t src,tgt
$ subaligner --languages
$ subaligner -m single -v video.mp4 -s subtitle.srt -t src,tgt
$ subaligner -m dual -v video.mp4 -s subtitle.srt -t src,tgt
$ subaligner -m script -v test.mp4 -s subtitle.txt -o subtitle_aligned.srt -t src,tgt
# Run batch alignment against directories

$ subaligner_batch -m single -vd videos/ -sd subtitles/ -od aligned_subtitles/
$ subaligner_batch -m dual -vd videos/ -sd subtitles/ -od aligned_subtitles/
$ subaligner_batch -m dual -vd videos/ -sd subtitles/ -od aligned_subtitles/ -of ttml
# Run alignments with pipx

$ pipx run subaligner -m single -v video.mp4 -s subtitle.srt
$ pipx run subaligner -m dual -v video.mp4 -s subtitle.srt
# Run the module as a script
$ python -m subaligner -m single -v video.mp4 -s subtitle.srt
$ python -m subaligner -m dual -v video.mp4 -s subtitle.srt
$ python -m subaligner.subaligner_1pass -v video.mp4 -s subtitle.srt
$ python -m subaligner.subaligner_2pass -v video.mp4 -s subtitle.srt
# Run alignments with the docker image

$ docker pull baxtree/subaligner
$ docker run -v `pwd`:`pwd` -w `pwd` -it baxtree/subaligner subaligner_1pass -v video.mp4 -s subtitle.srt
$ docker run -v `pwd`:`pwd` -w `pwd` -it baxtree/subaligner subaligner_2pass -v video.mp4 -s subtitle.srt
$ docker run -it baxtree/subaligner subaligner_1pass -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt
$ docker run -it baxtree/subaligner subaligner_2pass -v https://example.com/video.mp4 -s https://example.com/subtitle.srt -o subtitle_aligned.srt

The aligned subtitle will be saved at subtitle_aligned.srt. For details on CLI, run subaligner_1pass -h, subaligner_2pass -h or subaligner -h. Additional utilities can be used after consulting subaligner_batch -h, subaligner_convert -h, subaligner_train -h and subaligner_tune -h.

Advanced Usage

You can train a new model with your own audiovisual files and subtitle files:

$ subaligner_train -vd VIDEO_DIRECTORY -sd SUBTITLE_DIRECTORY -tod TRAINING_OUTPUT_DIRECTORY

Then you can apply it to your subtitle synchronisation with the aforementioned commands. For more details on how to train and tune your own model, please refer to Subaligner Docs.

Anatomy

Subtitles can be out of sync with their companion audiovisual media files for a variety of causes including latency introduced by Speech-To-Text on live streams or calibration and rectification involving human intervention during post-production.

A model has been trained with synchronised video and subtitle pairs and later used for predicating shifting offsets and directions under the guidance of a dual-stage aligning approach.

First Stage (Global Alignment):

Second Stage (Parallelised Individual Alignment):

Acknowledgement

Thanks to Alan Robinson and Nigel Megitt for their invaluable feedback.

This tool wouldn't be possible without the following packages: librosa tensorflow scikit-learn pycaption pysrt pysubs2 aeneas transformers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

subaligner-0.2.1.tar.gz (1.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

subaligner-0.2.1-py3.9.egg (1.2 MB view details)

Uploaded Egg

subaligner-0.2.1-py3.8.egg (1.2 MB view details)

Uploaded Egg

subaligner-0.2.1-py3.7.egg (1.2 MB view details)

Uploaded Egg

subaligner-0.2.1-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file subaligner-0.2.1.tar.gz.

File metadata

  • Download URL: subaligner-0.2.1.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.7

File hashes

Hashes for subaligner-0.2.1.tar.gz
Algorithm Hash digest
SHA256 025fc14e78b6e7e8b9f27276032d51511bb033f69e7f769ecce9124d9255a75b
MD5 f661ccbc528372f8522468cd4cc45bf8
BLAKE2b-256 5f6ef1ae4aa02065439f26262a07556639de964c845f6e1d7cb916c630271c00

See more details on using hashes here.

File details

Details for the file subaligner-0.2.1-py3.9.egg.

File metadata

  • Download URL: subaligner-0.2.1-py3.9.egg
  • Upload date:
  • Size: 1.2 MB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for subaligner-0.2.1-py3.9.egg
Algorithm Hash digest
SHA256 31afb9c4855a1fb6f7731028b0958b442a64bb28522cb7c86c116af524f8767e
MD5 3c41a46d9a9c980f50e3aed103924c4b
BLAKE2b-256 74722ac60280a0e179ef0e095f1abb712afab7476bb64fedcd3daf2ca29a342d

See more details on using hashes here.

File details

Details for the file subaligner-0.2.1-py3.8.egg.

File metadata

  • Download URL: subaligner-0.2.1-py3.8.egg
  • Upload date:
  • Size: 1.2 MB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.5

File hashes

Hashes for subaligner-0.2.1-py3.8.egg
Algorithm Hash digest
SHA256 cea55814c1c4f4c847a4d579d5f32d5cd3612924aa43af5e679651541b8dc9d4
MD5 e1885e0548450718d22dc12889030d1b
BLAKE2b-256 ccd895032c2ec05b9c2e9db888f069c9d645632f7dc447eee537695faf0dbf8c

See more details on using hashes here.

File details

Details for the file subaligner-0.2.1-py3.7.egg.

File metadata

  • Download URL: subaligner-0.2.1-py3.7.egg
  • Upload date:
  • Size: 1.2 MB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.7

File hashes

Hashes for subaligner-0.2.1-py3.7.egg
Algorithm Hash digest
SHA256 b4240991456edcf08b1d8c3538b20043d461a21f6b4e32c91e2bd6d70ef2da19
MD5 b3614d56f2f51a65057c22b6a94289d0
BLAKE2b-256 71950983fd43ba50ffac4fc38164731ab5cd4cd6044bdee1f384ecd3325720bb

See more details on using hashes here.

File details

Details for the file subaligner-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: subaligner-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.7

File hashes

Hashes for subaligner-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b49ec2eb3f435cd33230658ca534cfab80681b67e709c5c90f72ec4cf7bcd597
MD5 93d003684e25dfae0b2bd2d07bdd7d1d
BLAKE2b-256 8d4ac54633d580fd0cb79193dcc8a4842061f3cd895c312c1a1bc6e4ded21992

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page