Skip to main content

Audio/lyric synchronisation — dual transcription, word-level alignment, gap detection, timeline validation

Project description

lyric-sync

Audio/lyric synchronisation for the AIMOScript video generation pipeline. Built from the real-world debugging of the MusicVideoCreator project — specifically designed to eliminate the negative durations, backwards timestamps, and reused-word problems from the earlier implementation.


What it solves

Previous problem Solution
Negative durations (-42.37s) TimelineValidator 5-pass repair
Backwards timestamps (end before start) ExclusionPoolMatcher prevents word reuse
Same transcription words matched to multiple lines Exclusion pool — once a word is used, it's gone
Swedish encoding garbage (ä instead of ä) fix_swedish_encoding() in all text paths
ElevenLabs timestamps ignored in favour of fixed durations Consensus builder uses ElevenLabs timing as ground truth
Excessive instrumental segments min_gap_duration threshold (default 1.5s)

Installation

pip install lyric-sync                      # core only
pip install "lyric-sync[openai]"            # + Whisper transcription
pip install "lyric-sync[elevenlabs]"        # + ElevenLabs Scribe transcription
pip install "lyric-sync[all]"               # all providers + pydub

Set API keys:

OPENAI_API_KEY=sk-...
ELEVENLABS_API_KEY=...

Quick start

from lyric_sync import LyricSyncer

syncer = LyricSyncer()
result = syncer.sync(
    audio_path  = "conny.mp3",
    lyrics_path = "conny.txt",
)

for seg in result.segments:
    print(f"{seg.start_time:.2f}{seg.end_time:.2f}  {seg.text}")

# Export for video renderer
from lyric_sync.exporter.json_exporter import JSONExporter
JSONExporter().export(result, "output/conny/")

The pipeline

audio + lyrics
      │
  ① Transcribe
      ├─ OpenAI Whisper (verbose_json, word timestamps)
      └─ ElevenLabs Scribe (word timestamps, 99 languages)
      │
  ② Consensus merge
      └─ ElevenLabs preferred for Swedish; fills gaps from Whisper
      │
  ③ Align (ExclusionPoolMatcher)
      └─ Sliding window fuzzy match, exclusion pool prevents reuse
      │
  ④ Interpolate missing
      └─ Linear interpolation between anchors for unmatched lines
      │
  ⑤ Detect instrumental gaps
      └─ [Intro] / [Instrumental] / [Outro] for gaps ≥ min_gap
      │
  ⑥ Validate timeline
      └─ Fix negatives, overlaps, enforce min duration, redistribute
      │
   SyncResult → JSON

Configuration

from lyric_sync import LyricSyncer, SyncConfig

config = SyncConfig(
    # Transcription
    use_openai             = True,
    use_elevenlabs         = True,
    openai_model           = "whisper-1",
    language               = "sv",          # ISO 639-1; Swedish default
    prefer_elevenlabs      = True,          # ElevenLabs timing = ground truth

    # Alignment
    match_min_confidence   = 0.55,          # minimum word-match ratio
    word_similarity_threshold = 0.70,       # fuzzy word similarity threshold

    # Gap detection
    min_gap_duration       = 1.5,           # seconds; shorter gaps ignored
    min_instrumental_duration = 1.0,

    # Validation
    fix_negative_durations = True,
    fix_overlaps           = True,
    redistribute_on_violation = True,
    min_segment_duration   = 0.5,
)

syncer = LyricSyncer(config=config)

Output format

video_project_final.json (compatible with MusicVideoCreator / CineForge):

{
  "version": "1.0",
  "song_name": "Conny",
  "audio_duration": 195.3,
  "language": "sv",
  "stats": {
    "segment_count": 28,
    "lyric_count": 22,
    "instrumental_count": 6,
    "interpolated_count": 2,
    "mean_confidence": 0.847,
    "redistributed_count": 0
  },
  "segments": [
    {
      "index": 0,
      "text": "[Intro]",
      "start_time": 0.0,
      "end_time": 3.13,
      "duration": 3.13,
      "has_lyrics": false,
      "segment_type": "intro",
      "confidence": 1.0,
      "is_interpolated": false
    },
    {
      "index": 1,
      "text": "Min handledare hette Conny han var rak som ett vattenpass",
      "start_time": 3.13,
      "end_time": 8.119,
      "duration": 4.989,
      "has_lyrics": true,
      "segment_type": "lyric",
      "confidence": 0.982,
      "is_interpolated": false
    }
  ]
}

Testing without API calls

from lyric_sync import LyricSyncer
from lyric_sync.models import TimedWord

words = [
    TimedWord(word="Min", start=3.13, end=3.5),
    TimedWord(word="handledare", start=3.6, end=4.2),
    TimedWord(word="hette", start=4.3, end=4.7),
    TimedWord(word="Conny", start=4.8, end=5.4),
]

lines = ["Min handledare hette Conny"]

result = LyricSyncer().sync_from_text(words, lines, audio_duration=60.0)
print(result.segments[0].start_time)   # 3.13
print(result.segments[0].end_time)     # 5.4

CLI

lyric-sync conny.mp3 --lyrics conny.txt --output output/conny/ --language sv
lyric-sync conny.mp3 --lyrics conny.txt --no-openai   # ElevenLabs only
lyric-sync conny.mp3 --lyrics conny.txt --min-gap 2.0 --verbose

Package structure

lyric_sync/
├── __init__.py                   ← LyricSyncer + re-exports
├── syncer.py                     ← Main pipeline orchestrator
├── models.py                     ← TimedWord, LyricSegment, SyncResult, SyncConfig
├── utils.py                      ← Swedish encoding fix, normalise, fuzzy match
├── cli.py                        ← lyric-sync CLI
├── transcriber/
│   ├── openai_transcriber.py     ← Whisper word timestamps
│   ├── elevenlabs_transcriber.py ← Scribe word timestamps
│   └── consensus.py              ← Merge two transcriptions
├── aligner/
│   ├── exclusion_pool.py         ← Core word-matching engine
│   └── aligner.py                ← Align lines + interpolation
├── detector/
│   └── gap_detector.py           ← Intro/Instrumental/Outro detection
├── validator/
│   └── timeline_validator.py     ← Fix negatives, overlaps, redistribute
└── exporter/
    └── json_exporter.py          ← video_project_final.json

© Trollfabriken AITrix AB — Proprietary

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lyric_sync-1.0.0.tar.gz (23.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lyric_sync-1.0.0-py3-none-any.whl (27.7 kB view details)

Uploaded Python 3

File details

Details for the file lyric_sync-1.0.0.tar.gz.

File metadata

  • Download URL: lyric_sync-1.0.0.tar.gz
  • Upload date:
  • Size: 23.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for lyric_sync-1.0.0.tar.gz
Algorithm Hash digest
SHA256 7e0f2209583615c6a6db2a1ba21d0f36c5c85061fc04091e20ccaa76b45297fc
MD5 2475425d0ed2067b1229c7c796c594a9
BLAKE2b-256 917bdefe6ff7788ee4a31303e9b095125ba096ac5fa2e736f707abb117a72b3e

See more details on using hashes here.

File details

Details for the file lyric_sync-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: lyric_sync-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 27.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for lyric_sync-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d5c163739f3622d9f201fdfaa06034783d475428962d9b949776731aa2d3f451
MD5 9db2ee919d1dee099410432532daaf69
BLAKE2b-256 ae674f3fe34072b7b2b1d958136e3f6e567067e9ee460f0097b4582766699957

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page