Inference-only implementation of OpenAI Jukebox for PyTorch 2.7+
Project description
Jukebox-Infer
Inference-only implementation of OpenAI Jukebox for modern PyTorch (2.7+)
High-quality music generation models for creating music from scratch or continuing existing audio tracks.
๐ Overview
Jukebox-Infer is a streamlined, inference-only version of OpenAI Jukebox, optimized for PyTorch 2.7+ with minimal dependencies.
Note: This project is based on OpenAI Jukebox. All credit for the original model and research belongs to OpenAI and the Jukebox authors.
๐ What's New
- v0.1.0 (Latest): Initial release - Clean inference-only implementation extracted from OpenAI Jukebox
โจ Features
- โ 100% Parity Verified - VQ-VAE features identical to original Jukebox (see Parity Verification)
- โ Inference-only - No training code, significantly reduced codebase (~47% reduction)
- โ Modern PyTorch - Compatible with PyTorch 2.7+
- โ Single-GPU - No MPI or distributed dependencies
- โ Minimal dependencies - Removed tensorboardX, apex, and training-specific libs
- โ Auto-download - Automatic checkpoint downloads on first use
- โ GPU acceleration - Full CUDA support with optimized device management
- โ
Simple API - High-level
Jukeboxclass for easy music generation - โ Audio continuation - Support for primed sampling from audio prompts
๐ Quick Start
Installation
# Using pip
pip install jukebox-infer
# Using UV (recommended for development)
uv pip install jukebox-infer
# For development/comparison with original Jukebox
cd jukebox-infer
pip install -e . # Must run from inside jukebox-infer/ directory
Note: If you're setting up both the original Jukebox and jukebox-infer for comparison testing, see ../JUKEBOX_SETUP.md for detailed environment setup instructions.
Command-Line Interface (Fastest)
# Basic generation (default: 20 seconds, The Beatles, Rock)
python quick_infer.py
# Custom artist and genre
python quick_infer.py --artist "Taylor Swift" --genre "Pop" --duration 30
# Audio continuation from existing audio
python quick_infer.py --prompt input.wav --prompt-duration 5 --duration 20 --output continuation.wav
# See all options
python quick_infer.py --help
Simple API (Recommended for Python)
from jukebox_infer import Jukebox
# Initialize model (checkpoints auto-download on first use)
model = Jukebox(model_name="1b_lyrics", device="cuda")
model.load(sample_length_in_seconds=20)
# Generate music
audio = model.generate(
artist="The Beatles",
genre="Rock",
duration_seconds=20,
output_path="output.wav"
)
Audio Continuation
CLI:
python quick_infer.py --prompt input.wav --prompt-duration 5 --duration 20 --output continuation.wav
Python API:
from jukebox_infer import Jukebox
model = Jukebox(model_name="1b_lyrics", device="cuda")
model.load(sample_length_in_seconds=20)
# Continue from existing audio
audio = model.generate_from_audio(
prompt_audio="input.wav",
prompt_duration=5, # Use first 5 seconds as prompt
total_duration=20, # Generate 20 seconds total
output_path="continuation.wav"
)
๐ฆ Download Checkpoints
Checkpoints are automatically downloaded when you first use a model. No manual download needed!
If you prefer to pre-download checkpoints manually:
# Option 1: Use the download script
bash download_checkpoints.sh
# Option 2: Use Python API
from jukebox_infer import download_checkpoints
download_checkpoints('1b_lyrics') # Downloads ~6.2GB
Checkpoints are cached in ~/.cache/jukebox/models/:
- VQ-VAE (7.4MB) - shared encoder/decoder
- Prior level 0 & 1 (4.4GB) - shared upsamplers
- Prior level 2 (1.8GB) - 1b_lyrics top-level model
๐ต Available Models
| Model | Parameters | Download Size | VRAM | Description |
|---|---|---|---|---|
1b_lyrics |
1B | ~6.2GB | ~12GB | Lyrics conditioning support |
๐ Requirements
- Python: โฅ3.10
- PyTorch: โฅ2.7.0
- GPU: CUDA-capable GPU (16GB+ VRAM recommended for 1b_lyrics)
- OS: Linux, macOS, Windows
โก Performance
Generation is intentionally slow due to autoregressive nature:
- ~5-15 seconds per second of audio on RTX 4090 (with GPU acceleration)
- 18 seconds: ~3-5 minutes
- 60 seconds: ~5-15 minutes
This matches the original implementation's performance characteristics.
Note: Generation speed depends on GPU, model size, and generation length. The autoregressive nature means longer generations take proportionally longer.
๐ Documentation
- PARITY_VERIFICATION.md - โ 100% parity verification with original Jukebox
- CHECKPOINT_ARCHITECTURE.md - Details on checkpoint structure and sharing between models
- Development Guidelines - Development principles, code style, and contribution guidelines
๐๏ธ Project Structure
jukebox-infer/
โโโ jukebox_infer/ # Main package
โ โโโ api.py # High-level Jukebox API
โ โโโ cli.py # CLI interface
โ โโโ make_models.py # Model loading and checkpoint management
โ โโโ sample.py # Sampling functions
โ โโโ prior/ # Prior model implementations
โ โโโ vqvae/ # VQ-VAE encoder/decoder
โ โโโ transformer/ # Transformer architecture
โ โโโ data/ # Data processing utilities
โโโ docs/ # Documentation
โ โโโ PARITY_VERIFICATION.md # โ
100% parity proof
โ โโโ CHECKPOINT_ARCHITECTURE.md
โ โโโ dev/ # Development guidelines
โ โโโ PRINCIPLES.md
โโโ examples/ # Example scripts
โโโ quick_infer.py # Quick inference script (standalone)
โโโ download_checkpoints.sh # Manual download script
โโโ pyproject.toml
โโโ LICENSE
โโโ README.md
โ Parity Verification
jukebox-infer has been rigorously verified to produce 100% identical VQ-VAE features compared to the original OpenAI Jukebox.
Test Results
| Metric | Result |
|---|---|
| max |ฮ| | 0.000000e+00 |
| mean |ฮ| | 0.000000e+00 |
| Feature shape | (1, 6146) - identical |
| Feature range | [8, 2035] - identical |
| Parity status | โ 100% VERIFIED |
What This Means
- โ Perfect numerical match - Zero difference in VQ-VAE feature extraction
- โ Drop-in replacement - Can completely replace original Jukebox for feature extraction
- โ No accuracy loss - Maintains 100% fidelity to original implementation
- โ Research confidence - Validated for academic and production use
Testing Methodology
Parity was verified using:
- Multiple audio durations (5s, 20s)
- Identical official OpenAI checkpoints
- Rigorous numerical comparison (rtol=1e-4, atol=1e-6)
- Both CPU and GPU modes tested
For full details, see PARITY_VERIFICATION.md
๐ Acknowledgments
Original Research by OpenAI
Jukebox-Infer is built upon the groundbreaking work of OpenAI Jukebox. The original Jukebox represents a major advancement in music generation, achieving state-of-the-art results through innovative hierarchical VQ-VAE and transformer architectures.
Research Paper
Jukebox: A Generative Model for Music
This seminal work introduced hierarchical music generation with conditioning on artist, genre, and lyrics, enabling high-quality music generation at multiple time scales.
Original Authors
- Prafulla Dhariwal
- Heewoo Jun
- Christine Payne
- Jong Wook Kim
- Alec Radford
- Ilya Sutskever
About This Implementation
Note: The original Jukebox repository is no longer actively maintained. This package was created to continue the excellent work by providing ongoing maintenance and PyTorch 2.7+ compatibility for the inference capabilities, while preserving 100% of the original model quality and algorithms.
What we maintain:
- PyTorch 2.7+ compatibility
- Modern dependency management
- Inference-only packaging
- GPU optimization
What remains unchanged:
- All model architectures (100% original)
- All generation algorithms (100% original)
- All model weights (100% original)
- VQ-VAE feature extraction (โ 100% parity verified - see PARITY_VERIFICATION.md)
๐ Citation
Please cite using the following bibtex entry:
@article{dhariwal2020jukebox,
title={Jukebox: A Generative Model for Music},
author={Dhariwal, Prafulla and Jun, Heewoo and Payne, Christine and Kim, Jong Wook and Radford, Alec and Sutskever, Ilya},
journal={arXiv preprint arXiv:2005.00341},
year={2020}
}
If you use Jukebox-Infer in your research, please cite the original Jukebox paper above. This package is merely a maintenance fork to ensure continued compatibility with modern PyTorch versions - all credit for the models, algorithms, and research belongs to the original authors.
๐ License
MIT License (same as original Jukebox)
Copyright (c) 2020 OpenAI (Original Jukebox) Copyright (c) 2025 (Jukebox-Infer modifications)
See LICENSE for details.
This project includes code adapted from OpenAI Jukebox (MIT License, Copyright 2020 OpenAI).
โ ๏ธ Limitations
- Inference only - No training capabilities
- Single GPU - No distributed inference
- Slow generation - Autoregressive model, ~5-15 seconds per second of audio
- Minimum duration - 1b_lyrics requires 17.84-600 seconds
- Large checkpoints - ~6.2GB download required
๐ค Contributing
We welcome contributions! Please:
- Read docs/dev/PRINCIPLES.md for development guidelines
- Follow the code style (ruff/black)
- Add tests for new features
- Update documentation
- Submit PRs with clear descriptions
Development Setup
# Install dependencies with UV
uv sync
# Run quick inference script
uv run python quick_infer.py
# Format and lint code
uv run ruff format . && uv run ruff check .
See docs/dev/PRINCIPLES.md for detailed development guidelines.
๐ Support
For issues and questions:
- GitHub Issues: github.com/openmirlab/jukebox-infer/issues
- Documentation:
docs/ - Examples:
examples/
Made with โค๏ธ for the ML community
Based on the excellent work by OpenAI and the Jukebox authors.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jukebox_infer-0.1.0.tar.gz.
File metadata
- Download URL: jukebox_infer-0.1.0.tar.gz
- Upload date:
- Size: 159.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff014b6b6b87b3c9e660dea43dfcc5b311189325ef2ee7ded5573ae5af685955
|
|
| MD5 |
c3f283b539b00281ab110b61fde46925
|
|
| BLAKE2b-256 |
f5a0fb9da94efdbc16f487391740843d59f491acf6866b924b98c260a63a03ea
|
Provenance
The following attestation bundles were made for jukebox_infer-0.1.0.tar.gz:
Publisher:
publish.yml on openmirlab/jukebox-infer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
jukebox_infer-0.1.0.tar.gz -
Subject digest:
ff014b6b6b87b3c9e660dea43dfcc5b311189325ef2ee7ded5573ae5af685955 - Sigstore transparency entry: 724358440
- Sigstore integration time:
-
Permalink:
openmirlab/jukebox-infer@1e4a84849970cfef93049a1a3caed62720b399fb -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/openmirlab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1e4a84849970cfef93049a1a3caed62720b399fb -
Trigger Event:
release
-
Statement type:
File details
Details for the file jukebox_infer-0.1.0-py3-none-any.whl.
File metadata
- Download URL: jukebox_infer-0.1.0-py3-none-any.whl
- Upload date:
- Size: 172.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
92f553a163961090344c61fbc2a34a5188706d741465108942d3665ac1e96a77
|
|
| MD5 |
8feecded2b2f86f5c68427107fef0dcc
|
|
| BLAKE2b-256 |
eeb34ce67b7a6fa752d5d9488d3c81e47caad167e6dff942e61ab306633ded87
|
Provenance
The following attestation bundles were made for jukebox_infer-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on openmirlab/jukebox-infer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
jukebox_infer-0.1.0-py3-none-any.whl -
Subject digest:
92f553a163961090344c61fbc2a34a5188706d741465108942d3665ac1e96a77 - Sigstore transparency entry: 724358441
- Sigstore integration time:
-
Permalink:
openmirlab/jukebox-infer@1e4a84849970cfef93049a1a3caed62720b399fb -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/openmirlab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1e4a84849970cfef93049a1a3caed62720b399fb -
Trigger Event:
release
-
Statement type: