Audio/lyric synchronisation — dual transcription, word-level alignment, gap detection, timeline validation
Project description
lyric-sync
Audio/lyric synchronisation for the AIMOScript video generation pipeline. Built from the real-world debugging of the MusicVideoCreator project — specifically designed to eliminate the negative durations, backwards timestamps, and reused-word problems from the earlier implementation.
What it solves
| Previous problem | Solution |
|---|---|
Negative durations (-42.37s) |
TimelineValidator 5-pass repair |
| Backwards timestamps (end before start) | ExclusionPoolMatcher prevents word reuse |
| Same transcription words matched to multiple lines | Exclusion pool — once a word is used, it's gone |
Swedish encoding garbage (ä instead of ä) |
fix_swedish_encoding() in all text paths |
| ElevenLabs timestamps ignored in favour of fixed durations | Consensus builder uses ElevenLabs timing as ground truth |
| Excessive instrumental segments | min_gap_duration threshold (default 1.5s) |
Installation
pip install lyric-sync # core only
pip install "lyric-sync[openai]" # + Whisper transcription
pip install "lyric-sync[elevenlabs]" # + ElevenLabs Scribe transcription
pip install "lyric-sync[all]" # all providers + pydub
Set API keys:
OPENAI_API_KEY=sk-...
ELEVENLABS_API_KEY=...
Quick start
from lyric_sync import LyricSyncer
syncer = LyricSyncer()
result = syncer.sync(
audio_path = "conny.mp3",
lyrics_path = "conny.txt",
)
for seg in result.segments:
print(f"{seg.start_time:.2f}–{seg.end_time:.2f} {seg.text}")
# Export for video renderer
from lyric_sync.exporter.json_exporter import JSONExporter
JSONExporter().export(result, "output/conny/")
The pipeline
audio + lyrics
│
① Transcribe
├─ OpenAI Whisper (verbose_json, word timestamps)
└─ ElevenLabs Scribe (word timestamps, 99 languages)
│
② Consensus merge
└─ ElevenLabs preferred for Swedish; fills gaps from Whisper
│
③ Align (ExclusionPoolMatcher)
└─ Sliding window fuzzy match, exclusion pool prevents reuse
│
④ Interpolate missing
└─ Linear interpolation between anchors for unmatched lines
│
⑤ Detect instrumental gaps
└─ [Intro] / [Instrumental] / [Outro] for gaps ≥ min_gap
│
⑥ Validate timeline
└─ Fix negatives, overlaps, enforce min duration, redistribute
│
SyncResult → JSON
Configuration
from lyric_sync import LyricSyncer, SyncConfig
config = SyncConfig(
# Transcription
use_openai = True,
use_elevenlabs = True,
openai_model = "whisper-1",
language = "sv", # ISO 639-1; Swedish default
prefer_elevenlabs = True, # ElevenLabs timing = ground truth
# Alignment
match_min_confidence = 0.55, # minimum word-match ratio
word_similarity_threshold = 0.70, # fuzzy word similarity threshold
# Gap detection
min_gap_duration = 1.5, # seconds; shorter gaps ignored
min_instrumental_duration = 1.0,
# Validation
fix_negative_durations = True,
fix_overlaps = True,
redistribute_on_violation = True,
min_segment_duration = 0.5,
)
syncer = LyricSyncer(config=config)
Output format
video_project_final.json (compatible with MusicVideoCreator / CineForge):
{
"version": "1.0",
"song_name": "Conny",
"audio_duration": 195.3,
"language": "sv",
"stats": {
"segment_count": 28,
"lyric_count": 22,
"instrumental_count": 6,
"interpolated_count": 2,
"mean_confidence": 0.847,
"redistributed_count": 0
},
"segments": [
{
"index": 0,
"text": "[Intro]",
"start_time": 0.0,
"end_time": 3.13,
"duration": 3.13,
"has_lyrics": false,
"segment_type": "intro",
"confidence": 1.0,
"is_interpolated": false
},
{
"index": 1,
"text": "Min handledare hette Conny han var rak som ett vattenpass",
"start_time": 3.13,
"end_time": 8.119,
"duration": 4.989,
"has_lyrics": true,
"segment_type": "lyric",
"confidence": 0.982,
"is_interpolated": false
}
]
}
Testing without API calls
from lyric_sync import LyricSyncer
from lyric_sync.models import TimedWord
words = [
TimedWord(word="Min", start=3.13, end=3.5),
TimedWord(word="handledare", start=3.6, end=4.2),
TimedWord(word="hette", start=4.3, end=4.7),
TimedWord(word="Conny", start=4.8, end=5.4),
]
lines = ["Min handledare hette Conny"]
result = LyricSyncer().sync_from_text(words, lines, audio_duration=60.0)
print(result.segments[0].start_time) # 3.13
print(result.segments[0].end_time) # 5.4
CLI
lyric-sync conny.mp3 --lyrics conny.txt --output output/conny/ --language sv
lyric-sync conny.mp3 --lyrics conny.txt --no-openai # ElevenLabs only
lyric-sync conny.mp3 --lyrics conny.txt --min-gap 2.0 --verbose
Package structure
lyric_sync/
├── __init__.py ← LyricSyncer + re-exports
├── syncer.py ← Main pipeline orchestrator
├── models.py ← TimedWord, LyricSegment, SyncResult, SyncConfig
├── utils.py ← Swedish encoding fix, normalise, fuzzy match
├── cli.py ← lyric-sync CLI
├── transcriber/
│ ├── openai_transcriber.py ← Whisper word timestamps
│ ├── elevenlabs_transcriber.py ← Scribe word timestamps
│ └── consensus.py ← Merge two transcriptions
├── aligner/
│ ├── exclusion_pool.py ← Core word-matching engine
│ └── aligner.py ← Align lines + interpolation
├── detector/
│ └── gap_detector.py ← Intro/Instrumental/Outro detection
├── validator/
│ └── timeline_validator.py ← Fix negatives, overlaps, redistribute
└── exporter/
└── json_exporter.py ← video_project_final.json
© Trollfabriken AITrix AB — Proprietary
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lyric_sync-1.0.0.tar.gz.
File metadata
- Download URL: lyric_sync-1.0.0.tar.gz
- Upload date:
- Size: 23.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e0f2209583615c6a6db2a1ba21d0f36c5c85061fc04091e20ccaa76b45297fc
|
|
| MD5 |
2475425d0ed2067b1229c7c796c594a9
|
|
| BLAKE2b-256 |
917bdefe6ff7788ee4a31303e9b095125ba096ac5fa2e736f707abb117a72b3e
|
File details
Details for the file lyric_sync-1.0.0-py3-none-any.whl.
File metadata
- Download URL: lyric_sync-1.0.0-py3-none-any.whl
- Upload date:
- Size: 27.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d5c163739f3622d9f201fdfaa06034783d475428962d9b949776731aa2d3f451
|
|
| MD5 |
9db2ee919d1dee099410432532daaf69
|
|
| BLAKE2b-256 |
ae674f3fe34072b7b2b1d958136e3f6e567067e9ee460f0097b4582766699957
|