Composable lyric-audio alignment pipeline with staged execution, batch processing, and LRC/ASS export.

These details have not been verified by PyPI

Project description

py-roller

py-roller is a local command-line pipeline for generating rolling lyric files from audio and plain-text lyrics.

It can split vocals, filter audio, transcribe local audio with faster-whisper or wav2vec2-style backends, parse lyrics, align lyric lines, and export LRC or ASS karaoke output.

Package name: py-roller
CLI command: py-roller
Python import package: pyroller

Install

Use a fresh virtual environment when possible. First install the lightweight base package from source:

pip install -e .

Then install the validated audio/transcriber runtime:

py-roller install
py-roller doctor

py-roller install installs a pinned Torch/Torchaudio/TorchVision profile first, then installs the bundled audio-core requirements with matching constraints, validates the environment, and runs py-roller doctor unless --skip-doctor is passed.

Install profiles:

auto default: try the best validated profile for this machine, then fall back to CPU if validation fails.
cpu: force the CPU-only profile.
cu124: force the CUDA 12.4 profile.

Useful install commands:

py-roller install --profile cpu
py-roller install --profile cu124
py-roller install --dry-run
py-roller install --no-reset-torch

Quick start

Set --language explicitly whenever the song language is known. Use zh for Chinese, en for English, and mul only when you need the multilingual fallback.

Raw audio + lyrics -> LRC

py-roller run \
  --stages s,f,t,p,a,w \
  --audio ./song.mp3 \
  --lyrics ./song.txt \
  --language zh \
  --filter-chain noise_gate,dereverb \
  --output-roller ./song.lrc

Prepared vocal track + lyrics -> ASS karaoke

py-roller run \
  --stages t,p,a,w \
  --audio ./vocals.wav \
  --lyrics ./song.txt \
  --language zh \
  --writer-backend ass_karaoke \
  --output-roller ./song.ass

Batch process directories by filename stem

py-roller batch \
  --stages t,p,a,w \
  --audio ./audio_dir \
  --lyrics ./lyrics_dir \
  --language zh \
  --output-roller ./out_dir

Pipeline model

py-roller runs a contiguous chain of stages in this fixed order:

s -> f -> t -> p -> a -> w
splitter -> filter -> transcriber -> parser -> aligner -> writer

Valid examples:

s,f,t,p,a,w: full pipeline from raw audio and lyrics.
t,p,a,w: start from prepared vocal/filtered audio.
a,w: start from existing timed_units and parsed_lyrics artifacts.
w: rewrite from an existing alignment_result artifact.

Invalid examples:

s,t,w: skips required intermediate stages.
s,p,a: skips required intermediate stages.

Legal chain-start inputs:

--audio: valid when the chain starts at s, f, or t.
--lyrics: required when the chain includes p.
--timed-units and --parsed-lyrics: valid when the chain starts at a.
--alignment-result: valid when the chain starts at w.

Final outputs are only the explicit --output-* paths:

--output-vocal-audio
--output-filtered-audio
--output-timed-units
--output-parsed-lyrics
--output-alignment-result
--output-roller

Intermediate files under --intermediate are temporary working state unless --cleanup never is used.

Common workflows

Start from raw audio

py-roller run \
  --stages s,f,t,p,a,w \
  --audio ./song.mp3 \
  --lyrics ./song.txt \
  --language zh \
  --output-roller ./song.lrc

Start from already-separated vocals

py-roller run \
  --stages t,p,a,w \
  --audio ./vocals.wav \
  --lyrics ./song.txt \
  --language zh \
  --output-roller ./song.lrc

Start from aligner artifacts

py-roller run \
  --stages a,w \
  --timed-units ./song.timed_units.json \
  --parsed-lyrics ./song.parsed_lyrics.json \
  --output-roller ./song.lrc

For repeated or partially omitted lyrics, choose a repetition mode explicitly:

py-roller run \
  --stages a,w \
  --timed-units ./song.timed_units.json \
  --parsed-lyrics ./song.parsed_lyrics.json \
  --aligner-repetition few \
  --output-roller ./song.lrc

--aligner-repetition accepts:

none: default standard global_dp_v1 behavior; best when repeated lyric lines are written out in full.
few: uses global DP as a proposal, then repairs sparse repeated or omitted regions between trusted anchors.
full: uses per-line candidate generation plus beam search for highly repetitive or anchorless songs.

Rewrite only from an existing alignment result

py-roller run \
  --stages w \
  --alignment-result ./song.alignment.json \
  --writer-backend ass_karaoke \
  --output-roller ./song.ass

Backends and defaults

Backend selection is language-aware. The default language is mul for compatibility, but zh or en gives clearer transcriber/parser defaults when the song language is known.

Transcriber defaults

zh -> faster_whisper
en -> faster_whisper
mul -> faster_whisper

Additional transcriber backends:

zh also supports --transcriber-backend mms_phonetic for the Chinese phonetic CTC path.
mul also supports --transcriber-backend wav2vec2_phoneme for the multilingual phoneme CTC fallback.

Parser defaults

zh -> zh_router_pinyin
en -> en_arpabet
mul -> mul_ipa

Other defaults

aligner backend -> global_dp_v1
aligner repetition mode -> none
writer backend -> lrc_ms
writer spacing -> keep
cleanup policy -> on-success
transcriber model store -> ~/.cache/py-roller/models/transcriber

Transcriber models and Hugging Face downloads

Transcriber execution is local. py-roller does not send audio to a cloud transcription API.

Model resolution order:

Resolve --transcriber-model-name, or use the backend default model name.
Look for the model in the py-roller transcriber model store.
If not found and offline mode is not enabled, materialize/download the model into the model store.
Load the resolved local model path for inference.

Useful model options:

--transcriber-model-path: local model store root.
--transcriber-model-name: model alias, Hugging Face repo id, or explicit local path.
--transcriber-local-files-only: refuse network access and use only local files/cache.

For faster_whisper, aliases such as large-v2, large-v3, and turbo resolve to the corresponding Systran/faster-whisper-* snapshots.

Example with a custom model store:

py-roller run \
  --stages t,p,a,w \
  --audio ./vocals.wav \
  --lyrics ./song.txt \
  --language zh \
  --transcriber-model-path ./models/transcriber \
  --output-roller ./song.lrc

Offline run after the model already exists locally:

py-roller run \
  --stages t,p,a,w \
  --audio ./vocals.wav \
  --lyrics ./song.txt \
  --language zh \
  --transcriber-model-path ./models/transcriber \
  --transcriber-local-files-only \
  --output-roller ./song.lrc

Restricted or unstable networks

Hugging Face model downloads can be affected by proxies, timeouts, and XET/CAS behavior. py-roller exposes the common controls directly:

--transcriber-hf-xet {auto,on,off}: use off when XET/CAS hangs or fails on your network.
--transcriber-hf-proxy URL: use one HTTP or SOCKS proxy for model downloads.
--transcriber-hf-etag-timeout SECONDS: metadata/etag timeout.
--transcriber-hf-download-timeout SECONDS: large file download timeout.
--transcriber-hf-max-workers INT: snapshot download parallelism; lower values such as 1 or 2 are often better for fragile proxies.

Avoid XET/CAS when it is unreliable:

py-roller run \
  --stages t,p,a,w \
  --audio ./vocals.wav \
  --lyrics ./song.txt \
  --language zh \
  --transcriber-hf-xet off \
  --output-roller ./song.lrc

Use a local SOCKS proxy and conservative download settings:

py-roller run \
  --stages t,p,a,w \
  --audio ./vocals.wav \
  --lyrics ./song.txt \
  --language zh \
  --transcriber-hf-proxy socks5://127.0.0.1:7890 \
  --transcriber-hf-download-timeout 120 \
  --transcriber-hf-etag-timeout 30 \
  --transcriber-hf-max-workers 2 \
  --output-roller ./song.lrc

audio-core installs SOCKS support through httpx[socks]. If the environment was installed manually and SOCKS support is missing, run:

py-roller install

or install the missing dependency directly:

pip install "httpx[socks]"

Writer behavior

LRC

The default writer is lrc_ms, which writes LRC lines with millisecond precision.

Supported writer backends:

lrc_ms: millisecond precision.
lrc_cs: centisecond precision.
lrc_compressed: millisecond precision, with consecutive identical timestamps compressed.
ass_karaoke: ASS dialogue output with karaoke timing tags.

ASS karaoke

ass_karaoke writes ASS dialogue lines with karaoke timing tags.

Current behavior:

structural/spacing line output follows --writer-spacing (keep by default).
display end time prefers matched unit timing instead of blindly extending to the next line.
unmatched lines receive a short visible-duration fallback.

Example:

py-roller run \
  --stages w \
  --alignment-result ./song.alignment.json \
  --writer-backend ass_karaoke \
  --output-roller ./song.ass

Batch mode

batch uses the same stage semantics as run, but applies them to many tasks.

Directory pairing

Directory mode currently supports:

--pair-by stem

Default candidate globs:

--audio-glob "*.mp3"
--lyrics-glob "*.txt"

Matching is non-recursive.

Example:

py-roller batch \
  --stages t,p,a,w \
  --audio ./audio_dir \
  --lyrics ./lyrics_dir \
  --language zh \
  --output-roller ./out_dir

Batch controls

--jobs N: maximum number of parallel workers.
--continue-on-error: keep processing remaining tasks after failures.
--skip-existing: skip tasks whose declared final outputs already exist.
--manifest jobs.yaml: load explicit per-task paths from YAML instead of pairing by stem.

Parallelism guidance:

CPU-only: start with --jobs 1 or --jobs 2.
Single GPU: usually start with --jobs 1.

YAML manifest format

Manifest mode is useful when filenames do not match cleanly by stem.

The manifest defines per-task input and output paths only. It does not override stage selection, language, backend choice, filter settings, jobs, or other batch-level options.

Supported top-level forms:

tasks:
  - id: song01
    audio: ./audio/song01_master.mp3
    lyrics: ./lyrics/song01_final.txt
    output_roller: ./out/song01.lrc

or:

- id: song01
  audio: ./audio/song01_master.mp3
  lyrics: ./lyrics/song01_final.txt
  output_roller: ./out/song01.lrc

Allowed manifest input keys:

audio
lyrics
timed_units
parsed_lyrics
alignment_result

Allowed manifest output keys:

output_vocal_audio
output_filtered_audio
output_timed_units
output_parsed_lyrics
output_alignment_result
output_roller

Optional helper key:

id

Validation rules:

each task must be a mapping.
unknown keys are rejected.
inputs must match the selected chain start.
outputs must be valid final outputs for the selected chain.
task ids/stems must be unique.
final output paths must not conflict across tasks.
relative paths are resolved relative to the manifest file location.

YAML config for CLI defaults

Use --config to load YAML defaults.

Priority order:

built-in defaults < config YAML < explicit CLI arguments

Section model:

shared: defaults applied to both run and batch.
run: currently no extra keys beyond shared.
batch: defaults for batch-only options such as jobs and skip_existing.

Example:

shared:
  language: zh
  writer_spacing: keep
  writer_backend: lrc_ms
  intermediate: ./tmp/py-roller-artifacts
  cleanup: on-success
  transcriber_device: cpu
  transcriber_model_path: ~/.cache/py-roller/models/transcriber
  transcriber_local_files_only: false
  transcriber_hf_xet: auto
  transcriber_hf_proxy: null
  transcriber_hf_download_timeout: 120
  transcriber_hf_etag_timeout: 30
  transcriber_hf_max_workers: 2
  splitter_backend: demucs
  splitter_demucs_model: htdemucs
  splitter_demucs_device: cpu
  splitter_demucs_jobs: 0
  splitter_demucs_overlap: 0.25
  splitter_demucs_segment: 8
  filter_chain:
    - noise_gate
    - dereverb

batch:
  jobs: 2
  audio_glob: "*.mp3"
  lyrics_glob: "*.txt"
  timed_units_glob: "*.json"
  parsed_lyrics_glob: "*.json"
  alignment_result_glob: "*.json"
  continue_on_error: true

filter_chain can be written either as a comma-separated string or as a YAML list. Quote on or off for transcriber_hf_xet if your YAML parser treats them as booleans; py-roller also accepts boolean true/false there as on/off for convenience.

Progress, logs, and cleanup

The project exposes a reusable progress-reporting interface so CLI and future GUI frontends can share stage updates.

Current behavior:

splitter: Demucs progress plus wrapper stage progress.
filter: phase progress.
transcriber: phase progress.
aligner: phase progress plus DP row progress.

In single-task run, progress is shown as CLI progress bars when the terminal supports it. In batch, per-task progress is logged to avoid multiple workers fighting for one terminal.

Intermediate files live under:

--intermediate/<task-id>/splitter
--intermediate/<task-id>/filter
--intermediate/<task-id>/logs

Default intermediate root:

<system temp>/py-roller-artifacts

Cleanup policy:

--cleanup on-success: remove per-task intermediate directories after successful tasks.
--cleanup never: keep intermediate audio and logs for inspection.

Troubleshooting

Check the environment

py-roller doctor

doctor checks Python, Torch, Torchaudio, faster-whisper, CTranslate2, transformers, SOCKS proxy support, Demucs, and librosa.

If it reports a broken audio/transcriber environment, start with:

py-roller install

Interruption and child process cleanup

Batch fail-fast actively stops launching further work after the first task failure when --continue-on-error is not set, and worker cleanup includes a Windows-specific process-tree branch.

For older runs or already orphaned processes, Linux/macOS cleanup examples are still useful:

pkill -TERM -f 'python .*pyroller'
pkill -TERM -f 'demucs.separate|demucs'

If anything still survives:

pkill -KILL -f 'python .*pyroller'
pkill -KILL -f 'demucs.separate|demucs'

Inspect candidates first with:

ps -ef | grep -E 'pyroller|demucs'

Dependency policy

py-roller install prefers the newest validated dependency line for this release:

Torch/TorchAudio/TorchVision are installed from the official 2.6.0 family for every built-in profile.
SOCKS proxy support is installed by default through httpx[socks], so Hugging Face downloads do not fail merely because socksio is missing.

If you upgrade or override audio/transcriber packages manually, run py-roller doctor before using transcription-heavy pipelines.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.8.0

Jun 3, 2026

0.6.4

May 20, 2026

0.6.3

May 20, 2026

0.6.2

May 17, 2026

0.6.1

May 17, 2026

0.6.0

May 16, 2026

0.5.10

May 15, 2026

0.5.9

May 15, 2026

0.5.8

May 15, 2026

0.5.7

May 14, 2026

0.5.6

May 13, 2026

0.5.5

May 12, 2026

0.5.4

May 12, 2026

0.5.3

May 12, 2026

This version

0.5.2

May 12, 2026

0.5.1

May 12, 2026

0.5.0

May 11, 2026

0.4.1

Apr 10, 2026

0.4.0

Apr 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_roller-0.5.2.tar.gz (93.7 kB view details)

Uploaded May 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

py_roller-0.5.2-py3-none-any.whl (121.4 kB view details)

Uploaded May 12, 2026 Python 3

File details

Details for the file py_roller-0.5.2.tar.gz.

File metadata

Download URL: py_roller-0.5.2.tar.gz
Upload date: May 12, 2026
Size: 93.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for py_roller-0.5.2.tar.gz
Algorithm	Hash digest
SHA256	`afb8f7a146203b0ba664c096c1718de4c5cc42b930383aa5f5a40fe0766faa28`
MD5	`21e1df2530a67bf51c92a61aeca22868`
BLAKE2b-256	`bd143c5fd4c1e0e1edb7ff23ffec6cc17f8a410c367e48c1bc4f86b98e861c39`

See more details on using hashes here.

Provenance

The following attestation bundles were made for py_roller-0.5.2.tar.gz:

Publisher: python-publish.yml on Harmonese/py-roller

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: py_roller-0.5.2.tar.gz
- Subject digest: afb8f7a146203b0ba664c096c1718de4c5cc42b930383aa5f5a40fe0766faa28
- Sigstore transparency entry: 1517520280
- Sigstore integration time: May 12, 2026
Source repository:
- Permalink: Harmonese/py-roller@05251ffb60c340f8e0ae68253553494dc1a89ab7
- Branch / Tag: refs/tags/v0.5.2
- Owner: https://github.com/Harmonese
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@05251ffb60c340f8e0ae68253553494dc1a89ab7
- Trigger Event: release

File details

Details for the file py_roller-0.5.2-py3-none-any.whl.

File metadata

Download URL: py_roller-0.5.2-py3-none-any.whl
Upload date: May 12, 2026
Size: 121.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for py_roller-0.5.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fe797818b6dbf3c1d49b73c9924ce5945769a3fb845414d8a6d93a3b9787c017`
MD5	`173163066478274afa92cdf2fa20b2cd`
BLAKE2b-256	`f13b9befa04439bc4975ac6fc7c4650f8c652d5cf88734556e3c0032b4751f4f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for py_roller-0.5.2-py3-none-any.whl:

Publisher: python-publish.yml on Harmonese/py-roller

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: py_roller-0.5.2-py3-none-any.whl
- Subject digest: fe797818b6dbf3c1d49b73c9924ce5945769a3fb845414d8a6d93a3b9787c017
- Sigstore transparency entry: 1517520890
- Sigstore integration time: May 12, 2026
Source repository:
- Permalink: Harmonese/py-roller@05251ffb60c340f8e0ae68253553494dc1a89ab7
- Branch / Tag: refs/tags/v0.5.2
- Owner: https://github.com/Harmonese
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@05251ffb60c340f8e0ae68253553494dc1a89ab7
- Trigger Event: release

py-roller 0.5.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

py-roller

Install

Quick start

Raw audio + lyrics -> LRC

Prepared vocal track + lyrics -> ASS karaoke

Batch process directories by filename stem

Pipeline model

Common workflows

Start from raw audio

Start from already-separated vocals

Start from aligner artifacts

Rewrite only from an existing alignment result

Backends and defaults

Transcriber defaults

Parser defaults

Other defaults

Transcriber models and Hugging Face downloads

Restricted or unstable networks

Writer behavior

LRC

ASS karaoke

Batch mode

Directory pairing

Batch controls

YAML manifest format

YAML config for CLI defaults

Progress, logs, and cleanup

Troubleshooting

Check the environment

Interruption and child process cleanup

Dependency policy

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance