Audio-Video Synchronization — frame-accurate clap-based sync for corpus recording.

These details have not been verified by PyPI

Project description

-------------------------------------------------------------------------

         █████╗  ██╗   ██╗ ██╗  ███████╗  ███████╗
        ██╔══██╗ ██║   ██║ ██║  ██╔════╝  ██╔════╝
        ███████║ ██║   ██║ ██║  ███████╗  ███████╗
        ██╔══██║ ╚██╗ ██╔╝ ██║       ██║       ██║
        ██║  ██║  ╚████╔╝  ██║  ███████║  ███████║
        ╚═╝  ╚═╝   ╚═══╝   ╚═╝  ╚══════╝  ╚══════╝

        Audio-Video Synchronization in Python

        Copyright (C) 2026 Brigitte Bigi, CNRS
   Laboratoire Parole et Langage, Aix-en-Provence, France
-------------------------------------------------------------------------

AViSS description

Overview

Use cases

You recorded a speaker with one or two cameras and one or two separate audio recorders. You used a clap to mark a synchronization point. Now you need all your media files trimmed and aligned to the exact same frame boundary — ready for phonetic analysis or corpus annotation.

AViSS is the tool you need.

Features

AViSS performs frame-accurate, clap-based synchronization of audio and video files for speech corpus recordings. It is designed for researchers who need reproducible, high-quality media preparation without manual editing.

Among others, it allows the following:

Frame-accurate video trimming via OpenCV / SPPAS
Clap-based audio alignment (trim or pad to match the video frame boundary)
Support for 1 or 2 audio files and 1 or 2 video files per session
Optional video crop (x, y, w, h per video)
Optional copyright overlay on video
Optional video rotation (portrait mode)
Optional mono 16 kHz WAV export
Optional MP4 montage (H.264/AAC) for distribution
Optional WebM montage (libvpx-vp9, two-pass) for web distribution
Batch processing from a CSV file
Fully configurable column names and output filename structure

How it works

AViSS is a faithful Python migration of the original montage scripts (montage_step1.py / montage.py, B. Bigi, CNRS/LPL 2021-2024) distributed with the CLeLfPC corpus (https://hdl.handle.net/11403/clelfpc). The algorithm below is reproduced verbatim from those scripts.

Notation

Symbol	Meaning
`vc`	`video_clap + delay` — effective clap time in the video (seconds)
`fps`	frame rate of the video (frames/second)
`dur`	expected output duration (seconds)

Step 1 — clap frame (primary / reference video)

clap_frame_index = int(vc * fps)           # floor, 0-based
clap_frame_time  = clap_frame_index / fps
clap_delta       = vc - clap_frame_time    # sub-frame offset, in [0, 1/fps)

Step 2 — end frame (first excluded frame)

end_frame_index = 1 + int((vc + dur) * fps)
end_frame_time  = end_frame_index / fps

Step 3 — cross-sync (secondary video, fps2 ≠ fps_ref)

When two cameras have different frame rates, the reference camera is the one with the lowest fps. Its clap_delta is propagated to the secondary camera so both outputs share the same sub-frame offset at the clap.

shift_frames     = int(reference_delta * fps2)
clap_frame_index = int(vc2 * fps2) - shift_frames
end_frame_index  = 1 + int((vc2 + dur) * fps2) + shift_frames

Note: the formula uses int(A*fps) - int(d*fps), not int((A-d)*fps). These can differ by 1 frame when frac(A*fps) < frac(d*fps).

Audio alignment (per audio file)

Pass 1: shift audio so its effective clap (audio_clap + delay)
        matches vc — trim from the start or prepend silence.
Pass 2: pad with silence or trim the end to reach end_frame_time.
Pass 3: trim clap_frame_time from the start.

The output audio starts at the clap frame boundary, preserving the clap_delta sub-frame offset between the clap and the first sample.

Audio output files

Two files are produced per audio input:

<stem>-audio.wav — synchronized, original format (sample rate and channel count preserved). Used for montage.
<stem>.wav — mono 16 kHz WAV. If the input has more than one channel, all channels are mixed down to mono (average).

Scientific context

AViSS was developed at the Laboratoire Parole et Langage (LPL), CNRS, Aix-en-Provence, France, for the preparation of speech corpora used in phonetic research, including cued speech and read speech corpora.

Install AViSS

Requirements

The following external programs must be installed and available in the PATH:

ffmpeg — video and audio processing
sox — audio processing

From PyPI

> python -m pip install aviss

From its wheel package

Download the wheel file (aviss-xxx.whl) and install it with:

> python -m pip install aviss-xxx.whl

From the repository

Download or clone the repository, then install in editable mode:

> git clone https://github.com/brigitte-bigi/AViSS.git
> cd AViSS
> python -m pip install -e .

AViSS content

The AViSS package includes the following folders and files:

aviss/ : the source code of the API
aviss/core/ : pipeline, synchronization logic, audio and video operations
scripts/ : ready-to-use scripts for common workflows
tests/ : unit tests
docs/ : code documentation
pyproject.toml : package configuration

Quick start

Prepare the CSV file

The input CSV file describes one recording session per row. The first row is the header. Columns are separated by ; (or ,).

The following columns are required (names are configurable via settings_user.toml):

Column	Description
`audio_file`	relative path to the audio file
`audio_clap`	clap time in the audio (MM:SS.mmm)
`video_file`	relative path to the video file
`video_clap`	clap time in the video (MM:SS.mmm)
`delay`	offset after the clap before cutting (seconds)
`duration`	expected output duration (MM:SS.mmm)

Optional columns for crop, other media files, and output filename metadata are described in the sync section of Customizing settings.

Example:

ID;avSession;Serie;audio_file;video_file;audio_clap;video_clap;delay;duration
spk1;9;2;audio/RME_0038.wav;video/MVI_0038.MP4;00:03.843;00:06.410;0.200;04:08.250
spk2;8;1;audio/RME_0035.wav;video/MVI_0035.MP4;00:04.787;00:09.995;6.230;02:57.000

Command-line usage

Synchronize one row (-l N = Nth data row, header excluded):

> aviss sync -c corpus/sessions.csv -l 1

Synchronize all rows:

> aviss sync -c corpus/sessions.csv

Synchronize and produce a distribution MP4:

> aviss sync -c corpus/sessions.csv -l 1 --montage

Synchronize and produce a WebM for web distribution:

> aviss sync -c corpus/sessions.csv -l 1 --webm

Print the full processing report:

> aviss sync -c corpus/sessions.csv -l 1 --verbose

Python API usage

from aviss import avCsvReader, avPipeline, avExporter

# Parse one row from the CSV
reader  = avCsvReader("corpus/sessions.csv")
session = reader.read_row(1)

# Run the synchronization pipeline
pipeline = avPipeline(session)
result   = pipeline.run()

if result.success is True:
    exporter = avExporter(result,
                        stem="spk1_S09_s2",
                        work_dir="spk1_S09_s2")
    exporter.montage()

# Process all rows
sessions = avCsvReader("corpus/sessions.csv").read()
for session in sessions:
    result = avPipeline(session).run()
    if result.success is False:
        print(session, result.report)

Customizing settings

Place a settings_user.toml file in the same directory as your CSV file, then override only what you need:

[output]
crf       = 14
video_fps = 25.0
copyright = "Copyright (C) 2026 CNRS | LPL"

# Rotation — one integer per video in order.
# -1 = no rotation · 0 = CCW+vflip · 1 = CW · 2 = CCW portrait · 3 = CW+vflip
rotate = [2]        # single video, portrait CCW
# rotate = [-1, 2] # two videos: front=none, side=CCW portrait

[[output.name_cols]]
col    = "ID"
prefix = ""
fmt    = ""

[[output.name_cols]]
col    = "avSession"
prefix = "S"
fmt    = "02d"

[[output.name_cols]]
col    = "SerieLabel"
prefix = ""
fmt    = ""

[sync]
col_audio_file = "my_audio"

settings_user.toml is loaded automatically from the CSV directory at sync time.

output keys

Key	Default	Description
`crf`	`18`	Video encoding quality (H.264 CRF). Lower = better quality, larger file. Range: 0–51.
`video_fps`	`50.0`	Native frame rate of the recording camera (frames per second).
`copyright`	(none)	Text overlaid on the video (bottom-left). Use `\\:` to escape colons (ffmpeg).
`rotate`	(none)	Per-video transpose list. See values below.
`output_sep`	`"_"`	Separator between tokens in the output filename.
`work_dir_suffix`	`""`	Suffix appended to the working directory name.

Rotate values (one integer per video, in order — -1 = no rotation):

Value	Effect
`-1`	No rotation
`0`	90° counter-clockwise + vertical flip
`1`	90° clockwise
`2`	90° counter-clockwise (portrait mode)
`3`	90° clockwise + vertical flip

sync keys

Key	Default	Description
`col_audio_file`	`"audio_file"`	CSV column name for the audio file path.
`col_audio_clap`	`"audio_clap"`	CSV column name for the audio clap time.
`col_video_file`	`"video_file"`	CSV column name for the video file path.
`col_video_clap`	`"video_clap"`	CSV column name for the video clap time.
`col_video_name`	`"video_name"`	CSV column name for the optional video label (used in output filename suffix).
`col_video_crop_x`	`"video_crop_x"`	CSV column name for the crop left edge (pixels).
`col_video_crop_y`	`"video_crop_y"`	CSV column name for the crop top edge (pixels).
`col_video_crop_w`	`"video_crop_w"`	CSV column name for the crop width (pixels).
`col_video_crop_h`	`"video_crop_h"`	CSV column name for the crop height (pixels).
`col_delay`	`"delay"`	CSV column name for the delay after the clap (seconds).
`col_duration`	`"duration"`	CSV column name for the expected output duration.

output.name_cols format

Each [[output.name_cols]] entry defines one token in the output filename:

Key	Type	Description
`col`	str	CSV column header whose value is used
`prefix`	str	String prepended to the value (`"S"`, `"T"`, `""` for none)
`fmt`	str	`""` → raw string · `"02d"` → zero-padded integer · `"d"` → plain integer

Tokens are joined with output_sep (default "_"). A column whose cell is empty in the CSV is silently skipped.

Example: with col = "avSession", prefix = "S", fmt = "02d" and cell value 9, the token is S09.

Test the source code

Install the optional test dependencies:

> python -m pip install ".[dev]"

Unit tests

Run the unit test suite with coverage (requires coverage, included in the virtual environment):

> .venv/bin/python -m coverage run -m unittest discover -s tests -p "test_*.py" \
  && .venv/bin/python -m coverage report -m

Expected overall coverage: ≥ 68 %.

If coverage is not installed, run the tests without it:

> .venv/bin/python -m unittest discover -s tests -p "test_*.py"

Integration test

The integration test uses synthetic media files built from the demo files shipped in tests/demo/.

Generate test data

bash make_test_data.sh [demo_dir] [output_dir] [n_videos] [n_audios]

Argument	Default	Description
`demo_dir`	`demo`	Directory containing `demo.mp4` and `demo.wav`
`output_dir`	`data`	Directory where test files are written
`n_videos`	`1`	Number of video files to generate
`n_audios`	`1`	Number of audio files to generate

Each generated video/audio file contains random silence/black before and after the content so that every run exercises a different synchronization offset.

Single video + single audio (default):

> cd tests && bash make_test_data.sh && cd ..

Writes tests/data/test_audio.wav, tests/data/test_video.mp4 and tests/data/test.csv.

Two videos + one audio:

> cd tests && bash make_test_data.sh demo data 2 1 && cd ..

Writes test_video.mp4, test_video2.mp4, test_audio.wav and a CSV with columns video_file, video_file2.

Two videos + two audios:

> cd tests && bash make_test_data.sh demo data 2 2 && cd ..

Then run the pipeline on the first CSV row:

> .venv/bin/python main.py sync -c tests/data/test.csv -l 1 --verbose

Expected output — audio (ffprobe tests/data/demo_S01/demo_S01.wav):

Duration: 00:00:10.47, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le, 16000 Hz, 1 channels, s16, 256 kb/s

Expected output — video (ffprobe tests/data/demo_S01/demo_S01.mkv):

Duration: 00:00:10.47, start: 0.000000, bitrate: 3288 kb/s

Both files must have the same duration as tests/demo/demo.mp4.

Scripts

mix_mono.py — mix two mono audio files

Combines two mono WAV files into a single mono WAV by averaging both channels. Useful when two microphones recorded the same speaker and the result must be a single audio file before synchronization.

> python scripts/mix_mono.py audio1.wav audio2.wav output.wav

Argument	Description
`audio1`	First mono WAV file
`audio2`	Second mono WAV file
`output`	Output mono WAV file (must not already exist)

Requires sox. Both input files must be mono WAV at the same sample rate.

extract_audio.py — extract audio from a video

Extracts the audio track of a video file and converts it to mono WAV at 48 kHz, 16-bit PCM.

> python scripts/extract_audio.py video.mp4

Argument	Description
`video`	Input video file

The output file is written next to the input video with a .wav extension. Requires ffmpeg.

mp4_to_webm.py — convert MP4 to WebM

Converts an MP4 video to WebM (libvpx-vp9, two-pass encoding, CRF 16). An optional audio file can replace the video's audio track in the output.

> python scripts/mp4_to_webm.py video.mp4 [audio.wav]

Argument	Description
`video`	Input MP4 file
`audio`	Optional audio file to mux into the output

The output file is written next to the input video with a .webm extension. Requires ffmpeg.

Projects using AViSS

AViSS was developed at LPL, CNRS, to prepare the CLeLfPC corpus (Corpus de Lecture en Langue Française Parlée Complétée). This work is carried out in the framework of the AutoCuedSpeech project, which partially funded AViSS development.

Contact the author if you want to add a project here.

Help / How to contribute

If you want to report a bug or suggest a feature, please send an e-mail to the author. Any and all constructive comments are welcome.

If you plan to contribute to the code, please read carefully and agree both the code of conduct and the code style guide.

AViSS Documentation

Documentation is generated from the source code using ClammingPy: https://github.com/brigitte-bigi/ClammingPy

To generate the documentation locally:

> python -m pip install ClammingPy
> python makedoc.py

License/Copyright

See the accompanying LICENSE and AUTHORS.md files for the full list of contributors.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Changes

Version 1.0:
- Initial version. Faithful Python migration of the original montage scripts (B. Bigi, CNRS/LPL 2021-2024) distributed with CLeLfPC.
- Frame-accurate, clap-based synchronization of audio and video files.
- Support for any number of audio and video files per session.
- Optional video crop, copyright overlay, rotation (portrait mode).
- Mono 16 kHz WAV output (all channels mixed down).
- Optional MP4 montage (H.264/AAC) and WebM montage (libvpx-vp9, two-pass).
- Batch processing from a CSV file.
- Fully configurable column names and output filename structure.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0

May 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aviss-1.0-py3-none-any.whl (64.4 kB view details)

Uploaded May 24, 2026 Python 3

File details

Details for the file aviss-1.0-py3-none-any.whl.

File metadata

Download URL: aviss-1.0-py3-none-any.whl
Upload date: May 24, 2026
Size: 64.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for aviss-1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ef6f0ccea8a3678aa49c942e0bf009749bbb84e59f58771cb7e137b522976d9f`
MD5	`fee87f16d4f9e429990a9c881617245c`
BLAKE2b-256	`74b86d4d688dea502996c479b3a6893351a3ddf1bd0f956d7644a185d52e057d`

See more details on using hashes here.

aviss 1.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

AViSS description

Overview

Use cases

Features

How it works

Scientific context

Install AViSS

Requirements

From PyPI

From its wheel package

From the repository

AViSS content

Quick start

Prepare the CSV file

Command-line usage

Python API usage

Customizing settings

output keys

sync keys

output.name_cols format

Test the source code

Unit tests

Integration test

Generate test data

Scripts

mix_mono.py — mix two mono audio files

extract_audio.py — extract audio from a video

mp4_to_webm.py — convert MP4 to WebM

Projects using AViSS

Help / How to contribute

AViSS Documentation

License/Copyright

Changes

Project details

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes