Skip to main content

Convert academic papers to high-quality audio with precise mathematical explanations and intelligent content processing

Project description

Paper Voice

Convert academic papers to high-quality audio narration with precise mathematical explanations, enhanced figure descriptions, and intelligent content processing.

Features

  • 🧮 Precise Math Explanations: LLM-powered contextual explanations of mathematical expressions
  • 🖼️ Enhanced Figure Descriptions: AI-generated audio-friendly descriptions of figures and tables
  • 📄 Multi-Format Support: PDFs, LaTeX, Markdown, and plain text with math notation
  • 🔗 ArXiv Integration: Direct download and processing of LaTeX source with figures
  • Batch Processing: Process multiple papers simultaneously with parallel execution
  • 🎯 Selective Enhancement: Preserves original text while enhancing only math, figures, and tables
  • 🗣️ Multiple TTS Options: OpenAI TTS (with chunking) or offline pyttsx3
  • 💻 CLI & Web Interface: Full command-line interface plus Streamlit web app
  • 👁️ Vision Analysis: Optional GPT-4V integration for superior PDF content extraction

Installation

From PyPI (Recommended)

pip install paper_voice

From Source

git clone https://github.com/gojiplus/paper_voice.git
cd paper_voice
pip install -e .

Usage

Command Line Interface

Paper Voice includes a comprehensive CLI for batch processing and automation:

# Single file processing
paper_voice paper.pdf --api-key YOUR_KEY

# Process arXiv paper
paper_voice https://arxiv.org/abs/2301.12345 --api-key YOUR_KEY --output research.mp3

# LaTeX file with custom voice
paper_voice paper.tex --latex --api-key YOUR_KEY --voice nova

# Vision-enhanced PDF analysis
paper_voice paper.pdf --vision --api-key YOUR_KEY --output enhanced.mp3

# Batch processing a directory
paper_voice --batch papers/ --api-key YOUR_KEY --output-dir ./audio_output

# Batch processing multiple files
paper_voice --batch paper1.pdf paper2.pdf paper3.tex --api-key YOUR_KEY --output-dir ./batch_output

CLI Options

  • --batch: Enable batch processing for multiple files/directories
  • --vision: Use GPT-4V for enhanced PDF analysis
  • --voice: Choose TTS voice (alloy, echo, fable, onyx, nova, shimmer)
  • --speed: Adjust speech speed (0.25 to 4.0)
  • --max-workers: Set concurrent workers for batch processing (default: 3)
  • --output-dir: Output directory for batch processing
  • --offline: Use offline TTS instead of OpenAI TTS
  • --no-enhancement: Skip LLM enhancement for faster processing

Web Interface

streamlit run streamlit/app.py

Upload a PDF, LaTeX file, or enter text directly. For best results with mathematical content, provide an OpenAI API key to enable LLM-powered explanations.

Python API

from paper_voice import pdf_utils, math_to_speech
from paper_voice.arxiv_downloader import download_arxiv_paper

# Extract text from PDF
pages = pdf_utils.extract_raw_text("paper.pdf")

# Process mathematical expressions
processed = math_to_speech.process_text_with_math(pages[0])
print(processed)

# Download from arXiv
paper = download_arxiv_paper("2301.12345")
if paper:
    print(f"Downloaded: {paper.title}")
    print(f"LaTeX content: {len(paper.latex_content)} characters")

✨ What's New in v0.2.0

Precise Mathematical Explanations

Instead of basic conversions like "alpha squared plus beta", Paper Voice now generates contextual explanations:

Before: $\alpha^2 + \beta = \gamma$ → "alpha squared plus beta equals gamma"

After: $\alpha^2 + \beta = \gamma$ → "In machine learning context, this equation shows that the outcome gamma is determined by the sum of the learning rate alpha squared and the regularization parameter beta..."

Selective Enhancement

  • Only enhances: Math expressions, figure captions, table descriptions
  • Preserves exactly: All other academic text, structure, and content
  • No summarization: Original text remains unchanged

Enhanced LaTeX Processing

LaTeX files now get full LLM enhancement while preserving document structure:

  • Mathematical expressions → Contextual explanations
  • Figure captions → Audio-friendly descriptions
  • Table content → Clear narration
  • Regular text → Preserved exactly as written

Requirements

  • Python 3.9+ (excluding 3.9.7)
  • OpenAI API key (optional but recommended for enhanced explanations)
  • ffmpeg (for audio processing)

Advanced Features

Batch Processing

Process multiple papers efficiently with parallel processing:

# Process all PDFs in a directory
paper_voice --batch research_papers/ --api-key YOUR_KEY --output-dir audio_papers/

# Process specific files with custom settings
paper_voice --batch paper1.pdf paper2.tex paper3.pdf \
  --api-key YOUR_KEY \
  --output-dir batch_output/ \
  --max-workers 5 \
  --voice nova \
  --vision

ArXiv Integration

Download and process papers directly from arXiv:

# Single arXiv paper
paper_voice https://arxiv.org/abs/2301.12345 --api-key YOUR_KEY

# Multiple arXiv papers
paper_voice --batch \
  https://arxiv.org/abs/2301.12345 \
  https://arxiv.org/abs/2302.67890 \
  --api-key YOUR_KEY \
  --output-dir arxiv_audio/

Vision-Enhanced PDF Analysis

Use GPT-4V for superior content extraction:

paper_voice complex_paper.pdf --vision --api-key YOUR_KEY

Examples

See the demos/ directory for usage examples:

  • demos/basic_usage.py - Simple math processing examples
  • demos/before_after_comparison.py - Shows improvement from LLM explanations

See the tests/ directory for comprehensive test cases including batch processing tests.

Contributing

See CONTRIBUTING.md for development guidelines and how to contribute to the project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paper_voice-0.2.0.tar.gz (80.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

paper_voice-0.2.0-py3-none-any.whl (91.8 kB view details)

Uploaded Python 3

File details

Details for the file paper_voice-0.2.0.tar.gz.

File metadata

  • Download URL: paper_voice-0.2.0.tar.gz
  • Upload date:
  • Size: 80.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for paper_voice-0.2.0.tar.gz
Algorithm Hash digest
SHA256 2da7782d6082b3176e3ddc5bf0d50167e8c86ffd7150dde1f20bd415103c4085
MD5 7f4b6a0eb341ef346ff83956ae2b0f19
BLAKE2b-256 6261ab7fc4ab5a6a7d671b3721c5de662ffd81bb78bb80b9c4dff09b755fedd3

See more details on using hashes here.

File details

Details for the file paper_voice-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: paper_voice-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 91.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for paper_voice-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3ccfc3b0f3b2d8efa7b9fa84879b6c7f8148492c4e6c726aa7608a05fec2331b
MD5 d8fa1663f977f2364825c8c9650f46f1
BLAKE2b-256 9941bf02cc3c1c428ffe71fa5cc33fd16197517eb3593e3de293c4275ec13fa5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page