Convert academic papers to high-quality audio with precise mathematical explanations and intelligent content processing

These details have not been verified by PyPI

Project links

Project description

Paper Voice

Convert academic papers to high-quality audio narration with precise mathematical explanations, enhanced figure descriptions, and intelligent content processing.

Features

🧮 Precise Math Explanations: LLM-powered contextual explanations of mathematical expressions
🖼️ Enhanced Figure Descriptions: AI-generated audio-friendly descriptions of figures and tables
📄 Multi-Format Support: PDFs, LaTeX, Markdown, and plain text with math notation
🔗 ArXiv Integration: Direct download and processing of LaTeX source with figures
⚡ Batch Processing: Process multiple papers simultaneously with parallel execution
🎯 Selective Enhancement: Preserves original text while enhancing only math, figures, and tables
🗣️ Multiple TTS Options: OpenAI TTS (with chunking) or offline pyttsx3
💻 CLI & Web Interface: Full command-line interface plus Streamlit web app
👁️ Vision Analysis: Optional GPT-4V integration for superior PDF content extraction

Installation

From PyPI (Recommended)

pip install paper_voice

From Source

git clone https://github.com/gojiplus/paper_voice.git
cd paper_voice
pip install -e .

Usage

Command Line Interface

Paper Voice includes a comprehensive CLI for batch processing and automation:

# Single file processing
paper_voice paper.pdf --api-key YOUR_KEY

# Process arXiv paper
paper_voice https://arxiv.org/abs/2301.12345 --api-key YOUR_KEY --output research.mp3

# LaTeX file with custom voice
paper_voice paper.tex --latex --api-key YOUR_KEY --voice nova

# Vision-enhanced PDF analysis
paper_voice paper.pdf --vision --api-key YOUR_KEY --output enhanced.mp3

# Batch processing a directory
paper_voice --batch papers/ --api-key YOUR_KEY --output-dir ./audio_output

# Batch processing multiple files
paper_voice --batch paper1.pdf paper2.pdf paper3.tex --api-key YOUR_KEY --output-dir ./batch_output

CLI Options

--batch: Enable batch processing for multiple files/directories
--vision: Use GPT-4V for enhanced PDF analysis
--voice: Choose TTS voice (alloy, echo, fable, onyx, nova, shimmer)
--speed: Adjust speech speed (0.25 to 4.0)
--max-workers: Set concurrent workers for batch processing (default: 3)
--output-dir: Output directory for batch processing
--offline: Use offline TTS instead of OpenAI TTS
--no-enhancement: Skip LLM enhancement for faster processing

Web Interface

streamlit run streamlit/app.py

Upload a PDF, LaTeX file, or enter text directly. For best results with mathematical content, provide an OpenAI API key to enable LLM-powered explanations.

Python API

from paper_voice import pdf_utils, math_to_speech
from paper_voice.arxiv_downloader import download_arxiv_paper

# Extract text from PDF
pages = pdf_utils.extract_raw_text("paper.pdf")

# Process mathematical expressions
processed = math_to_speech.process_text_with_math(pages[0])
print(processed)

# Download from arXiv
paper = download_arxiv_paper("2301.12345")
if paper:
    print(f"Downloaded: {paper.title}")
    print(f"LaTeX content: {len(paper.latex_content)} characters")

✨ What's New in v0.2.0

Precise Mathematical Explanations

Instead of basic conversions like "alpha squared plus beta", Paper Voice now generates contextual explanations:

Before: $\alpha^2 + \beta = \gamma$ → "alpha squared plus beta equals gamma"

After: $\alpha^2 + \beta = \gamma$ → "In machine learning context, this equation shows that the outcome gamma is determined by the sum of the learning rate alpha squared and the regularization parameter beta..."

Selective Enhancement

✅ Only enhances: Math expressions, figure captions, table descriptions
✅ Preserves exactly: All other academic text, structure, and content
❌ No summarization: Original text remains unchanged

Enhanced LaTeX Processing

LaTeX files now get full LLM enhancement while preserving document structure:

Mathematical expressions → Contextual explanations
Figure captions → Audio-friendly descriptions
Table content → Clear narration
Regular text → Preserved exactly as written

Requirements

Python 3.9+ (excluding 3.9.7)
OpenAI API key (optional but recommended for enhanced explanations)
ffmpeg (for audio processing)

Advanced Features

Batch Processing

Process multiple papers efficiently with parallel processing:

# Process all PDFs in a directory
paper_voice --batch research_papers/ --api-key YOUR_KEY --output-dir audio_papers/

# Process specific files with custom settings
paper_voice --batch paper1.pdf paper2.tex paper3.pdf \
  --api-key YOUR_KEY \
  --output-dir batch_output/ \
  --max-workers 5 \
  --voice nova \
  --vision

ArXiv Integration

Download and process papers directly from arXiv:

# Single arXiv paper
paper_voice https://arxiv.org/abs/2301.12345 --api-key YOUR_KEY

# Multiple arXiv papers
paper_voice --batch \
  https://arxiv.org/abs/2301.12345 \
  https://arxiv.org/abs/2302.67890 \
  --api-key YOUR_KEY \
  --output-dir arxiv_audio/

Vision-Enhanced PDF Analysis

Use GPT-4V for superior content extraction:

paper_voice complex_paper.pdf --vision --api-key YOUR_KEY

Examples

See the demos/ directory for usage examples:

demos/basic_usage.py - Simple math processing examples
demos/before_after_comparison.py - Shows improvement from LLM explanations

See the tests/ directory for comprehensive test cases including batch processing tests.

Contributing

See CONTRIBUTING.md for development guidelines and how to contribute to the project.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.0

Sep 13, 2025

This version

0.2.0

Sep 13, 2025

0.1.0

Sep 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paper_voice-0.2.0.tar.gz (80.6 kB view details)

Uploaded Sep 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

paper_voice-0.2.0-py3-none-any.whl (91.8 kB view details)

Uploaded Sep 13, 2025 Python 3

File details

Details for the file paper_voice-0.2.0.tar.gz.

File metadata

Download URL: paper_voice-0.2.0.tar.gz
Upload date: Sep 13, 2025
Size: 80.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for paper_voice-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`2da7782d6082b3176e3ddc5bf0d50167e8c86ffd7150dde1f20bd415103c4085`
MD5	`7f4b6a0eb341ef346ff83956ae2b0f19`
BLAKE2b-256	`6261ab7fc4ab5a6a7d671b3721c5de662ffd81bb78bb80b9c4dff09b755fedd3`

See more details on using hashes here.

File details

Details for the file paper_voice-0.2.0-py3-none-any.whl.

File metadata

Download URL: paper_voice-0.2.0-py3-none-any.whl
Upload date: Sep 13, 2025
Size: 91.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for paper_voice-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3ccfc3b0f3b2d8efa7b9fa84879b6c7f8148492c4e6c726aa7608a05fec2331b`
MD5	`d8fa1663f977f2364825c8c9650f46f1`
BLAKE2b-256	`9941bf02cc3c1c428ffe71fa5cc33fd16197517eb3593e3de293c4275ec13fa5`

See more details on using hashes here.

paper-voice 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Paper Voice

Features

Installation

From PyPI (Recommended)

From Source

Usage

Command Line Interface

CLI Options

Web Interface

Python API

✨ What's New in v0.2.0

Precise Mathematical Explanations

Selective Enhancement

Enhanced LaTeX Processing

Requirements

Advanced Features

Batch Processing

ArXiv Integration

Vision-Enhanced PDF Analysis

Examples

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes