Skip to main content

Bilingual audiobook interleaver

Project description

Bilbo

Bilingual audiobook interleaver. Takes two audiobooks of the same book in different languages and creates a single audiobook that alternates between them sentence-by-sentence.

Example

From The Alloy of Law by Brandon Sanderson (English + German):

https://github.com/user-attachments/assets/aff6c1cb-6f67-43a8-bc9f-3fd6ab3a5c8a

EN: The revolver was nothing fancy to look at, though the six-shot cylinder was machined with such care in the steel alloy frame that there was no play in its movement.

DE: Der Revolver machte zwar keinen besonders ansehnlichen Eindruck, doch die sechsschüssige Trommel war mit solcher Präzision in den Rahmen aus einer Stahllegierung eingesetzt, dass in ihren Bewegungen nicht das geringste Spiel war.

EN: There was no gleam to the metal or exotic material on the grip, but it fit his hand like it was meant to be there.

DE: Das Metall schimmerte nicht und in den Griff waren keinerlei exotische Materialien eingelassen, aber die Waffe lag so gut in seiner Hand, als wäre sie eigens dafür geschaffen worden.

EN: The waist-high fence was flimsy, the wood grayed with time, held together with fraying lengths of rope.

DE: Der hüfthohe Zaun war baufällig, das Holz, mit der Zeit grau geworden, wurde von ausgefransten Seilen zusammengehalten.

Prerequisites

  • Python 3.10+
  • ffmpeg and ffprobe on PATH
  • CUDA-capable GPU recommended (CPU works but is much slower)

Installation

pip install bilbo-audiobook

or CPU-only as

pip install bilbo-audiobook[cpu]

Usage

Process a book

To process an entire book from start to finish, run

bilbo process data/en-5min.m4a data/de-5min.m4a --title "My Book"

which runs the full pipeline (transcribe, segment, align, export) and stores results in ~/.bilbo/books/<slug>/.

── Stage 0: Input ────────────────────────────────
  ✓ Input audio copied

── Stage 1: Transcription ────────────────────────
  ✓ Model loaded  (large-v3-turbo, cuda)
  ✓ L1: 10 segments, L2: 13 segments
  Detected L1 language: en
  Detected L2 language: de

── Stage 2: Segmentation ─────────────────────────
  ✓ EN: 63 sentences, DE: 64 sentences
    EN: refined 54/63 endpoints via VAD — 54 extended (avg 91ms)
    DE: refined 57/64 endpoints via VAD — 57 extended (avg 30ms)

── Stage 3: Alignment ────────────────────────────
  ✓ LaBSE model loaded  (cuda)
  ✓ Embeddings computed
  ⠋ Filling gaps...
  ✓ 30 anchors
  ✓ 55 pairs

── Stage 4: Assembly ─────────────────────────────
  ✓ Metadata extracted
    Titles: The Alloy of Law: A Mistborn Novel / Hüter des Gesetzes: Mistborn 4
    Artists: Brandon Sanderson / Brandon Sanderson, Michael Siefener - Übersetzer
    Chapters: EN=1, DE=2
    Cover art: both sources
  ✓ Preprocessed (EN: 295s, DE: 343s)
  ⠋ Assembling...
  ✓ Metadata merged via LLM
    comment: Three hundred years after the events of the Mistborn trilogy, Scadrial is now on the verge of modernity. Yet the old magics of Allomancy and Feruchemy continue to play a role in this reborn world....
    title: The Alloy of Law: A Mistborn Novel / Hüter des Gesetzes: Mistborn 4
    artist: Brandon Sanderson
    album: The Alloy of Law (Unabridged) / Hüter des Gesetzes: Mistborn 4
  ✓ 10.5 minutes
  ✓ 1 output chapters

Done in 28.5s  (Stage 0: 0.0s, Stage 1: 4.8s, Stage 2: 0.2s, Stage 3: 5.8s, Stage 4: 16.2s)

Book 'My Book' saved.

If you're running on CPU only, this will take a VERY long time, unless you're running a short snippet.

Library management

bilbo list                          # List all books
bilbo info <title>                  # Show details about a book
bilbo rename <title> "New Title"    # Rename a book
bilbo delete <title>                # Delete a book

How it works

  1. Transcription — Speech-to-text via faster-whisper with word-level timestamps
  2. Segmentation — Sentence boundary detection via pySBD; start/end timestamps refined using Silero VAD speech regions
  3. Alignment — Cross-lingual sentence matching using LaBSE embeddings via sentence-transformers
  4. Assembly — Audio normalization/extraction/interleaving via ffmpeg
  5. Metadata — Cover art + text metadata merging (optionally via local LLM with ollama)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bilbo_audiobook-0.1.4.tar.gz (210.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bilbo_audiobook-0.1.4-py3-none-any.whl (39.4 kB view details)

Uploaded Python 3

File details

Details for the file bilbo_audiobook-0.1.4.tar.gz.

File metadata

  • Download URL: bilbo_audiobook-0.1.4.tar.gz
  • Upload date:
  • Size: 210.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bilbo_audiobook-0.1.4.tar.gz
Algorithm Hash digest
SHA256 c47294acee9e1fea6931f1954bd853eef259e8853a313c6b6526372ac3b831f3
MD5 71764d85a7128d74f0c710e20cf26108
BLAKE2b-256 4e35717e400deb045ef9a633ebd9a527dd8531d59d0dedd94c1876c98383e5a1

See more details on using hashes here.

File details

Details for the file bilbo_audiobook-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for bilbo_audiobook-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 f133479e13e8847562db5aa5e2e901e218f9b519e07b86c35cc74503eaa7b8ae
MD5 69585e1f8d3cfd956bba1b4b155a8887
BLAKE2b-256 aef21e165dd8d65fc63b4f35a48e898a8ab8b5f91e65f0d3141e547e017d8c58

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page