Skip to main content

Inference-only implementation of OpenAI Jukebox for PyTorch 2.7+

Project description

Jukebox-Infer

Inference-only implementation of OpenAI Jukebox for modern PyTorch (2.7+)

Python 3.10+ PyTorch License: MIT

High-quality music generation models for creating music from scratch or continuing existing audio tracks.


๐Ÿ“Œ Overview

Jukebox-Infer is a streamlined, inference-only version of OpenAI Jukebox, optimized for PyTorch 2.7+ with minimal dependencies.

Note: This project is based on OpenAI Jukebox. All credit for the original model and research belongs to OpenAI and the Jukebox authors.


๐ŸŽ‰ What's New

  • v0.1.0 (Latest): Initial release - Clean inference-only implementation extracted from OpenAI Jukebox

โœจ Features

  • โœ… 100% Parity Verified - VQ-VAE features identical to original Jukebox (see Parity Verification)
  • โœ… Inference-only - No training code, significantly reduced codebase (~47% reduction)
  • โœ… Modern PyTorch - Compatible with PyTorch 2.7+
  • โœ… Single-GPU - No MPI or distributed dependencies
  • โœ… Minimal dependencies - Removed tensorboardX, apex, and training-specific libs
  • โœ… Auto-download - Automatic checkpoint downloads on first use
  • โœ… GPU acceleration - Full CUDA support with optimized device management
  • โœ… Simple API - High-level Jukebox class for easy music generation
  • โœ… Audio continuation - Support for primed sampling from audio prompts


๐Ÿš€ Quick Start

Installation

# Using pip
pip install jukebox-infer

# Using UV (recommended for development)
uv pip install jukebox-infer

# For development/comparison with original Jukebox
cd jukebox-infer
pip install -e .  # Must run from inside jukebox-infer/ directory

Note: If you're setting up both the original Jukebox and jukebox-infer for comparison testing, see ../JUKEBOX_SETUP.md for detailed environment setup instructions.

Command-Line Interface (Fastest)

# Basic generation (default: 20 seconds, The Beatles, Rock)
python quick_infer.py

# Custom artist and genre
python quick_infer.py --artist "Taylor Swift" --genre "Pop" --duration 30

# Audio continuation from existing audio
python quick_infer.py --prompt input.wav --prompt-duration 5 --duration 20 --output continuation.wav

# See all options
python quick_infer.py --help

Simple API (Recommended for Python)

from jukebox_infer import Jukebox

# Initialize model (checkpoints auto-download on first use)
model = Jukebox(model_name="1b_lyrics", device="cuda")
model.load(sample_length_in_seconds=20)

# Generate music
audio = model.generate(
    artist="The Beatles",
    genre="Rock",
    duration_seconds=20,
    output_path="output.wav"
)

Audio Continuation

CLI:

python quick_infer.py --prompt input.wav --prompt-duration 5 --duration 20 --output continuation.wav

Python API:

from jukebox_infer import Jukebox

model = Jukebox(model_name="1b_lyrics", device="cuda")
model.load(sample_length_in_seconds=20)

# Continue from existing audio
audio = model.generate_from_audio(
    prompt_audio="input.wav",
    prompt_duration=5,  # Use first 5 seconds as prompt
    total_duration=20,  # Generate 20 seconds total
    output_path="continuation.wav"
)


๐Ÿ“ฆ Download Checkpoints

Checkpoints are automatically downloaded when you first use a model. No manual download needed!

If you prefer to pre-download checkpoints manually:

# Option 1: Use the download script
bash download_checkpoints.sh

# Option 2: Use Python API
from jukebox_infer import download_checkpoints
download_checkpoints('1b_lyrics')  # Downloads ~6.2GB

Checkpoints are cached in ~/.cache/jukebox/models/:

  • VQ-VAE (7.4MB) - shared encoder/decoder
  • Prior level 0 & 1 (4.4GB) - shared upsamplers
  • Prior level 2 (1.8GB) - 1b_lyrics top-level model

๐ŸŽต Available Models

Model Parameters Download Size VRAM Description
1b_lyrics 1B ~6.2GB ~12GB Lyrics conditioning support

๐Ÿ“‹ Requirements

  • Python: โ‰ฅ3.10
  • PyTorch: โ‰ฅ2.7.0
  • GPU: CUDA-capable GPU (16GB+ VRAM recommended for 1b_lyrics)
  • OS: Linux, macOS, Windows


โšก Performance

Generation is intentionally slow due to autoregressive nature:

  • ~5-15 seconds per second of audio on RTX 4090 (with GPU acceleration)
  • 18 seconds: ~3-5 minutes
  • 60 seconds: ~5-15 minutes

This matches the original implementation's performance characteristics.

Note: Generation speed depends on GPU, model size, and generation length. The autoregressive nature means longer generations take proportionally longer.


๐Ÿ“š Documentation


๐Ÿ—๏ธ Project Structure

jukebox-infer/
โ”œโ”€โ”€ jukebox_infer/      # Main package
โ”‚   โ”œโ”€โ”€ api.py         # High-level Jukebox API
โ”‚   โ”œโ”€โ”€ cli.py         # CLI interface
โ”‚   โ”œโ”€โ”€ make_models.py # Model loading and checkpoint management
โ”‚   โ”œโ”€โ”€ sample.py      # Sampling functions
โ”‚   โ”œโ”€โ”€ prior/         # Prior model implementations
โ”‚   โ”œโ”€โ”€ vqvae/         # VQ-VAE encoder/decoder
โ”‚   โ”œโ”€โ”€ transformer/   # Transformer architecture
โ”‚   โ””โ”€โ”€ data/         # Data processing utilities
โ”œโ”€โ”€ docs/              # Documentation
โ”‚   โ”œโ”€โ”€ PARITY_VERIFICATION.md      # โœ… 100% parity proof
โ”‚   โ”œโ”€โ”€ CHECKPOINT_ARCHITECTURE.md
โ”‚   โ””โ”€โ”€ dev/           # Development guidelines
โ”‚       โ””โ”€โ”€ PRINCIPLES.md
โ”œโ”€โ”€ examples/          # Example scripts
โ”œโ”€โ”€ quick_infer.py     # Quick inference script (standalone)
โ”œโ”€โ”€ download_checkpoints.sh  # Manual download script
โ”œโ”€โ”€ pyproject.toml
โ”œโ”€โ”€ LICENSE
โ””โ”€โ”€ README.md


โœ… Parity Verification

jukebox-infer has been rigorously verified to produce 100% identical VQ-VAE features compared to the original OpenAI Jukebox.

Test Results

Metric Result
max |ฮ”| 0.000000e+00
mean |ฮ”| 0.000000e+00
Feature shape (1, 6146) - identical
Feature range [8, 2035] - identical
Parity status โœ… 100% VERIFIED

What This Means

  • โœ… Perfect numerical match - Zero difference in VQ-VAE feature extraction
  • โœ… Drop-in replacement - Can completely replace original Jukebox for feature extraction
  • โœ… No accuracy loss - Maintains 100% fidelity to original implementation
  • โœ… Research confidence - Validated for academic and production use

Testing Methodology

Parity was verified using:

  • Multiple audio durations (5s, 20s)
  • Identical official OpenAI checkpoints
  • Rigorous numerical comparison (rtol=1e-4, atol=1e-6)
  • Both CPU and GPU modes tested

For full details, see PARITY_VERIFICATION.md


๐Ÿ™ Acknowledgments

Original Research by OpenAI

Jukebox-Infer is built upon the groundbreaking work of OpenAI Jukebox. The original Jukebox represents a major advancement in music generation, achieving state-of-the-art results through innovative hierarchical VQ-VAE and transformer architectures.

Research Paper

Jukebox: A Generative Model for Music

This seminal work introduced hierarchical music generation with conditioning on artist, genre, and lyrics, enabling high-quality music generation at multiple time scales.

Original Authors

  • Prafulla Dhariwal
  • Heewoo Jun
  • Christine Payne
  • Jong Wook Kim
  • Alec Radford
  • Ilya Sutskever

About This Implementation

Note: The original Jukebox repository is no longer actively maintained. This package was created to continue the excellent work by providing ongoing maintenance and PyTorch 2.7+ compatibility for the inference capabilities, while preserving 100% of the original model quality and algorithms.

What we maintain:

  • PyTorch 2.7+ compatibility
  • Modern dependency management
  • Inference-only packaging
  • GPU optimization

What remains unchanged:

  • All model architectures (100% original)
  • All generation algorithms (100% original)
  • All model weights (100% original)
  • VQ-VAE feature extraction (โœ… 100% parity verified - see PARITY_VERIFICATION.md)

๐Ÿ“„ Citation

Please cite using the following bibtex entry:

@article{dhariwal2020jukebox,
  title={Jukebox: A Generative Model for Music},
  author={Dhariwal, Prafulla and Jun, Heewoo and Payne, Christine and Kim, Jong Wook and Radford, Alec and Sutskever, Ilya},
  journal={arXiv preprint arXiv:2005.00341},
  year={2020}
}

If you use Jukebox-Infer in your research, please cite the original Jukebox paper above. This package is merely a maintenance fork to ensure continued compatibility with modern PyTorch versions - all credit for the models, algorithms, and research belongs to the original authors.


๐Ÿ“„ License

MIT License (same as original Jukebox)

Copyright (c) 2020 OpenAI (Original Jukebox) Copyright (c) 2025 (Jukebox-Infer modifications)

See LICENSE for details.

This project includes code adapted from OpenAI Jukebox (MIT License, Copyright 2020 OpenAI).


โš ๏ธ Limitations

  • Inference only - No training capabilities
  • Single GPU - No distributed inference
  • Slow generation - Autoregressive model, ~5-15 seconds per second of audio
  • Minimum duration - 1b_lyrics requires 17.84-600 seconds
  • Large checkpoints - ~6.2GB download required

๐Ÿค Contributing

We welcome contributions! Please:

  1. Read docs/dev/PRINCIPLES.md for development guidelines
  2. Follow the code style (ruff/black)
  3. Add tests for new features
  4. Update documentation
  5. Submit PRs with clear descriptions

Development Setup

# Install dependencies with UV
uv sync

# Run quick inference script
uv run python quick_infer.py

# Format and lint code
uv run ruff format . && uv run ruff check .

See docs/dev/PRINCIPLES.md for detailed development guidelines.


๐Ÿ“ž Support

For issues and questions:


Made with โค๏ธ for the ML community

Based on the excellent work by OpenAI and the Jukebox authors.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jukebox_infer-0.1.0.tar.gz (159.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jukebox_infer-0.1.0-py3-none-any.whl (172.7 kB view details)

Uploaded Python 3

File details

Details for the file jukebox_infer-0.1.0.tar.gz.

File metadata

  • Download URL: jukebox_infer-0.1.0.tar.gz
  • Upload date:
  • Size: 159.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for jukebox_infer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ff014b6b6b87b3c9e660dea43dfcc5b311189325ef2ee7ded5573ae5af685955
MD5 c3f283b539b00281ab110b61fde46925
BLAKE2b-256 f5a0fb9da94efdbc16f487391740843d59f491acf6866b924b98c260a63a03ea

See more details on using hashes here.

Provenance

The following attestation bundles were made for jukebox_infer-0.1.0.tar.gz:

Publisher: publish.yml on openmirlab/jukebox-infer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file jukebox_infer-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: jukebox_infer-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 172.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for jukebox_infer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 92f553a163961090344c61fbc2a34a5188706d741465108942d3665ac1e96a77
MD5 8feecded2b2f86f5c68427107fef0dcc
BLAKE2b-256 eeb34ce67b7a6fa752d5d9488d3c81e47caad167e6dff942e61ab306633ded87

See more details on using hashes here.

Provenance

The following attestation bundles were made for jukebox_infer-0.1.0-py3-none-any.whl:

Publisher: publish.yml on openmirlab/jukebox-infer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page