Composable lyric-audio alignment pipeline with staged execution, batch processing, and LRC/ASS export.

These details have not been verified by PyPI

Project description

py-roller

py-roller is a CLI Solution for automatic rolling lyrics generating.

To be specific, py-roller is a composable lyric-audio alignment pipeline CLI for staged execution, batch processing, and LRC/ASS export. Designed to support multiple local transcriber back-ends including faster-whisper and wav2vec2.

Package name: py-roller
CLI command: py-roller
Python import package: pyroller

Quick overview

py-roller treats alignment as a contiguous stage chain:

s -> f -> t -> p -> a -> w
splitter -> filter -> transcriber -> parser -> aligner -> writer

Core modes:

run: execute one contiguous stage chain for one task
batch: execute the same contiguous stage chain across many tasks

Core artifact types:

input audio / lyrics
intermediate vocal and filtered audio
timed_units
parsed_lyrics
alignment_result
roller outputs such as LRC or ASS

Installation

From source, first install the lightweight base package:

pip install -e .

Then let py-roller install the validated audio/transcriber runtime for your machine:

py-roller install

Profiles:

auto (default): choose the best validated profile for this machine, then automatically fall back to CPU if validation fails
cpu: official CPU-only stable profile
cu124: official CUDA 12.4 profile

Useful variants:

py-roller install --profile cpu
py-roller install --profile cu124
py-roller install --dry-run
py-roller doctor

Notes:

py-roller install still uses pip underneath, but it does not rely on pip to guess the correct Torch/Torchaudio flavor.
The command first installs the selected Torch profile, then installs the bundled audio-core runtime requirements with the matching constraints file, validates the resulting environment, and finally runs py-roller doctor unless you pass --skip-doctor.

After installation, the CLI command is:

py-roller

Quick start

Full pipeline: audio + lyrics -> LRC

py-roller run \
  --stages s,f,t,p,a,w \
  --audio ./song.mp3 \
  --lyrics ./song.txt \
  --filter-chain noise_gate,dereverb \
  --output-roller ./song.lrc
  --language zh # Choose as you like

Start from a prepared vocal track

py-roller run \
  --stages t,p,a,w \
  --audio ./vocals.wav \
  --lyrics ./song.txt \
  --writer-backend ass_karaoke \
  --output-roller ./song.ass
  --language zh # Choose as you like

Batch processing by stem

py-roller batch \
  --stages t,p,a,w \
  --audio ./audio_dir \
  --lyrics ./lyrics_dir \
  --output-roller ./out_dir
  --language zh # Choose as you like

Core execution model

Contiguous stage chains only

The CLI only accepts contiguous subchains of the canonical order.

Valid examples:

s,f,t,p,a,w
t,p,a,w
a,w
w

Invalid examples:

s,t,w
s,p,a

Legal chain starts

Explicit artifact inputs are only valid at the correct chain start:

--audio is valid when the chain starts at s, f, or t
--lyrics is valid when the chain includes p
--timed-units and --parsed-lyrics are only valid when the chain starts at a
--alignment-result is only valid when the chain starts at w

Final outputs vs intermediate artifacts

Final user-requested outputs are only the explicit --output-* paths:

--output-vocal-audio
--output-filtered-audio
--output-timed-units
--output-parsed-lyrics
--output-alignment-result
--output-roller

Everything else created under --intermediate is treated as intermediate state.

Common workflows

Start from raw audio

Use the full chain when you want splitting, filtering, transcription, alignment, and final writing in one command.

py-roller run \
  --stages s,f,t,p,a,w \
  --audio ./song.mp3 \
  --lyrics ./song.txt \
  --output-roller ./song.lrc

Start from filtered or vocal audio

Skip splitter/filter when you already have a suitable track for transcription.

py-roller run \
  --stages t,p,a,w \
  --audio ./vocals.wav \
  --lyrics ./song.txt \
  --output-roller ./song.lrc

Start from aligner artifacts

py-roller run \
  --stages a,w \
  --timed-units ./song.timed_units.json \
  --parsed-lyrics ./song.parsed_lyrics.json \
  --output-roller ./song.lrc

For repeated or partially omitted lyrics, explicitly choose the repetition strategy:

py-roller run \
  --stages a,w \
  --timed-units ./song.timed_units.json \
  --parsed-lyrics ./song.parsed_lyrics.json \
  --aligner-repetition few \
  --output-roller ./song.lrc

--aligner-repetition accepts:

none: default; preserves the existing global_dp_v1 behavior for lyrics where repeated lines are fully written out.
few: uses global DP as a proposal, then repairs weak repeated/omitted regions between trusted anchors with a local candidate lattice.
full: skips anchor reliance and uses per-line top-k candidate generation plus beam search for highly repetitive or anchorless songs.

Rewrite only from an existing alignment result

py-roller run \
  --stages w \
  --alignment-result ./song.alignment.json \
  --writer-backend ass_karaoke \
  --output-roller ./song.ass

Backend defaults

Default backend selection is language-aware. Please note that the default selection of language is "mul" which works poorly when tested on both Chinese and English, please use the --language flag to specify the desired language if the language is directly supported.

Transcriber defaults

zh -> faster_whisper
en -> faster_whisper
mul -> faster_whisper

Optional transcriber backends:

zh also supports --transcriber-backend mms_phonetic for the existing Chinese phonetic CTC path
mul also supports --transcriber-backend wav2vec2_phoneme for the existing multilingual phoneme-CTC fallback

Parser defaults

zh -> zh_router_pinyin
en -> en_arpabet
mul -> mul_ipa

Other defaults

aligner backend -> global_dp_v1
aligner repetition mode -> none
writer backend -> lrc_ms
language -> mul
writer_spacing -> keep
cleanup -> on-success
transcriber model store -> ~/.cache/py-roller/models/transcriber

Transcriber model store and offline behavior

Transcriber execution is local-only. py-roller does not send audio to a cloud transcription API.

Model resolution follows this order:

resolve transcriber_model_name (or the backend default model name)
look for the model in the py-roller transcriber model store
if not found and local-only mode is disabled, download or materialize it into the model store
load the resolved local model path for inference

Useful options:

--transcriber-model-path: choose the py-roller transcriber model store root
--transcriber-model-name: choose a model alias, model repo id, or an explicit local model path
- for faster_whisper, bare aliases like large-v2, large-v3, or turbo resolve to Systran/faster-whisper-* snapshots
--transcriber-local-files-only: refuse network access and read only from local files/cache

audio-core installs the project's official audio feature set around the faster-whisper and CTranslate2 local transcription stack.

Examples:

py-roller run   --stages t,p,a,w   --audio ./vocals.wav   --lyrics ./song.txt   --transcriber-model-path ./models/transcriber   --output-roller ./song.lrc

py-roller run   --stages t,p,a,w   --audio ./vocals.wav   --lyrics ./song.txt   --transcriber-model-path ./models/transcriber   --transcriber-local-files-only   --output-roller ./song.lrc

If you are on a restricted network, pre-populate the model store and then rerun with --transcriber-local-files-only.

Writer behavior

LRC

The default writer is lrc_ms which writes LRC lines with millisecond precision. Other supported writer backends are:

lrc_cs: writes LRC lines with centiscond precision
lrc_compressed: writes LRC lines with millisecond precision, but compresses consecutive lines with the same timestamp
ass_karaoke: see below

ASS karaoke

ass_karaoke writes ASS dialogue lines with karaoke timing tags.

Current defaults:

structural / spacing line output follows writer_spacing (keep by default)
display end time prefers matched unit timing instead of blindly extending to the next line
unmatched lines receive a short visible duration fallback

Example:

py-roller run \
  --stages w \
  --alignment-result ./song.alignment.json \
  --writer-backend ass_karaoke \
  --output-roller ./song.ass

Progress reporting

The project exposes a reusable progress-reporting interface so CLI and future GUI frontends can share the same stage updates.

Current behavior:

splitter: Demucs progress plus wrapper stage progress
filter: phase progress
transcriber: phase progress
aligner: phase progress plus DP row progress

In single-task run, progress is shown as CLI progress bars when the terminal supports it. In batch, per-task progress is logged to avoid multiple workers fighting for one terminal.

Intermediate files and cleanup

Intermediate files live under:

--intermediate/<task-id>/splitter
--intermediate/<task-id>/filter
--intermediate/<task-id>/logs

Default intermediate root:

<system temp>/py-roller-artifacts

Cleanup policy:

--cleanup on-success keeps successful runs tidy by removing per-task intermediate directories
--cleanup never keeps intermediate audio and logs for inspection

Batch mode

batch uses the same stage semantics as run, but applies them to many tasks.

Directory pairing

Directory mode currently supports:

--pair-by stem

Default candidate globs:

--audio-glob "*.mp3"
--lyrics-glob "*.txt"

Matching is non-recursive.

Example:

py-roller batch \
  --stages t,p,a,w \
  --audio ./audio_dir \
  --lyrics ./lyrics_dir \
  --output-roller ./out_dir

Batch controls

--jobs N: maximum number of parallel workers
--continue-on-error: keep processing remaining tasks after failures
--skip-existing: skip tasks whose declared final outputs already exist
--manifest jobs.yaml: load explicit per-task paths from YAML instead of pairing by stem

Parallelism guidance

--jobs controls how many tasks run at the same time. This is separate from any model-level batch size.

Recommended starting point:

CPU-only: --jobs 1 or --jobs 2
single GPU: usually --jobs 1

YAML manifest format

Manifest mode is useful when filenames do not match cleanly by stem.

The manifest defines per-task input and output paths only. It does not override stage selection, language, backend choice, filter settings, jobs, or other batch-level options.

Supported top-level forms:

tasks:
  - id: song01
    audio: ./audio/song01_master.mp3
    lyrics: ./lyrics/song01_final.txt
    output_roller: ./out/song01.lrc

or:

- id: song01
  audio: ./audio/song01_master.mp3
  lyrics: ./lyrics/song01_final.txt
  output_roller: ./out/song01.lrc

Allowed manifest input keys:

audio
lyrics
timed_units
parsed_lyrics
alignment_result

Allowed manifest output keys:

output_vocal_audio
output_filtered_audio
output_timed_units
output_parsed_lyrics
output_alignment_result
output_roller

Optional helper key:

id

Validation rules:

each task must be a mapping
unknown keys are rejected
inputs must match the selected chain start
outputs must be valid final outputs for the selected chain
task ids / stems must be unique
final output paths must not conflict across tasks
relative paths are resolved relative to the manifest file location

YAML config for default CLI options

Use --config to load YAML defaults.

Priority order:

built-in defaults < config YAML < explicit CLI arguments

Section model:

shared: defaults applied to both run and batch
run: currently no extra keys beyond shared
batch: defaults for batch-only options such as jobs and skip_existing

Example:

shared:
  language: mul
  writer_spacing: keep
  writer_backend: lrc_ms
  intermediate: ./tmp/py-roller-artifacts
  cleanup: on-success
  transcriber_device: cpu
  transcriber_model_path: ~/.cache/py-roller/models/transcriber
  transcriber_local_files_only: false
  splitter_backend: demucs
  splitter_demucs_model: htdemucs
  splitter_demucs_device: cpu
  splitter_demucs_jobs: 0
  splitter_demucs_overlap: 0.25
  splitter_demucs_segment: 8
  filter_chain:
    - noise_gate
    - dereverb

batch:
  jobs: 2
  audio_glob: "*.mp3"
  lyrics_glob: "*.txt"
  timed_units_glob: "*.json"
  parsed_lyrics_glob: "*.json"
  alignment_result_glob: "*.json"
  continue_on_error: true

filter_chain can be written either as a comma-separated string or as a YAML list.

Troubleshooting

Interruption and child process cleanup

Batch fail-fast actively stops launching further work after the first task failure when --continue-on-error is not set, and worker cleanup includes a Windows-specific process-tree branch.

For older runs or already orphaned processes, Linux/macOS cleanup examples are still useful:

pkill -TERM -f 'python .*pyroller'
pkill -TERM -f 'demucs.separate|demucs'

If anything still survives:

pkill -KILL -f 'python .*pyroller'
pkill -KILL -f 'demucs.separate|demucs'

Inspect candidates first with:

ps -ef | grep -E 'pyroller|demucs'

Dependency policy

py-roller install now prefers the newest validated dependency line instead of the older pre-2.6 Torch family. In practice this means:

Torch/TorchAudio/TorchVision are installed from the official 2.6.0 family for every built-in profile.
SOCKS-proxy support is installed by default through httpx[socks], so Hugging Face downloads do not fail just because socksio is missing.

If you upgrade or override these packages manually, run py-roller doctor before using transcription-heavy pipelines.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.8.0

Jun 3, 2026

0.6.4

May 20, 2026

0.6.3

May 20, 2026

0.6.2

May 17, 2026

0.6.1

May 17, 2026

0.6.0

May 16, 2026

0.5.10

May 15, 2026

0.5.9

May 15, 2026

0.5.8

May 15, 2026

0.5.7

May 14, 2026

0.5.6

May 13, 2026

0.5.5

May 12, 2026

0.5.4

May 12, 2026

0.5.3

May 12, 2026

0.5.2

May 12, 2026

0.5.1

May 12, 2026

This version

0.5.0

May 11, 2026

0.4.1

Apr 10, 2026

0.4.0

Apr 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_roller-0.5.0.tar.gz (88.2 kB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

py_roller-0.5.0-py3-none-any.whl (116.7 kB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file py_roller-0.5.0.tar.gz.

File metadata

Download URL: py_roller-0.5.0.tar.gz
Upload date: May 11, 2026
Size: 88.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for py_roller-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`0caf4e1853362dc9dd65971be82c077b1511297074828ae88a3146968c4ceef1`
MD5	`92d3d6006d3d86601c6970dd2eb0637c`
BLAKE2b-256	`f040bcb0f9d63ac73ea9f22f2cefa49ee2544d1234c8642353d9fe4883a7e954`

See more details on using hashes here.

Provenance

The following attestation bundles were made for py_roller-0.5.0.tar.gz:

Publisher: python-publish.yml on Harmonese/py-roller

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: py_roller-0.5.0.tar.gz
- Subject digest: 0caf4e1853362dc9dd65971be82c077b1511297074828ae88a3146968c4ceef1
- Sigstore transparency entry: 1502200641
- Sigstore integration time: May 11, 2026
Source repository:
- Permalink: Harmonese/py-roller@e2c4d6bd975351b8d85c5ffbdb5ff7721ef6b1bc
- Branch / Tag: refs/tags/v0.4.9-p
- Owner: https://github.com/Harmonese
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@e2c4d6bd975351b8d85c5ffbdb5ff7721ef6b1bc
- Trigger Event: release

File details

Details for the file py_roller-0.5.0-py3-none-any.whl.

File metadata

Download URL: py_roller-0.5.0-py3-none-any.whl
Upload date: May 11, 2026
Size: 116.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for py_roller-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`332be1d44d04ab0794553caf0810fc121cca3fb6a20420fe66cfa78e3a3f2905`
MD5	`92edef34a5a36d44518783d4eb86c7e8`
BLAKE2b-256	`ff9741dbdcd9a10dc79a8a88b6769dbd6fb46cadfe0061c752af4d7f115bde81`

See more details on using hashes here.

Provenance

The following attestation bundles were made for py_roller-0.5.0-py3-none-any.whl:

Publisher: python-publish.yml on Harmonese/py-roller

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: py_roller-0.5.0-py3-none-any.whl
- Subject digest: 332be1d44d04ab0794553caf0810fc121cca3fb6a20420fe66cfa78e3a3f2905
- Sigstore transparency entry: 1502200969
- Sigstore integration time: May 11, 2026
Source repository:
- Permalink: Harmonese/py-roller@e2c4d6bd975351b8d85c5ffbdb5ff7721ef6b1bc
- Branch / Tag: refs/tags/v0.4.9-p
- Owner: https://github.com/Harmonese
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@e2c4d6bd975351b8d85c5ffbdb5ff7721ef6b1bc
- Trigger Event: release

py-roller 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

py-roller

Quick overview

Installation

Quick start

Full pipeline: audio + lyrics -> LRC

Start from a prepared vocal track

Batch processing by stem

Core execution model

Contiguous stage chains only

Legal chain starts

Final outputs vs intermediate artifacts

Common workflows

Start from raw audio

Start from filtered or vocal audio

Start from aligner artifacts

Rewrite only from an existing alignment result

Backend defaults

Transcriber defaults

Parser defaults

Other defaults

Transcriber model store and offline behavior

Writer behavior

LRC

ASS karaoke

Progress reporting

Intermediate files and cleanup

Batch mode

Directory pairing

Batch controls

Parallelism guidance

YAML manifest format

YAML config for default CLI options

Troubleshooting

Interruption and child process cleanup

Dependency policy

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance