Skip to main content

Convert academic papers to high-quality audio with precise mathematical explanations and intelligent content processing

Project description

Paper Voice

Convert academic papers to high-quality audio narration with precise mathematical explanations using a simplified LLM-powered approach.

Features

  • 🧮 Natural Math Narration: Professor-style explanations of mathematical expressions
  • 📄 Multi-Format Support: PDFs, LaTeX, Markdown, and plain text with math notation
  • 🎯 Simple LLM Enhancement: Single comprehensive prompt for natural audio conversion
  • 🗣️ Multiple TTS Options: OpenAI TTS (with chunking) or offline pyttsx3
  • 💻 Web Interface: Easy-to-use Streamlit web app
  • Intelligent Chunking: Handles large documents with smart OpenAI API limits

Installation

From PyPI (Recommended)

pip install paper_voice

From Source

git clone https://github.com/gojiplus/paper_voice.git
cd paper_voice
pip install -e .

Usage

Web Interface (Recommended)

streamlit run paper_voice/streamlit/app.py

Upload a PDF, LaTeX file, or enter text directly. Provide an OpenAI API key for LLM-enhanced natural language conversion of mathematical expressions.

Python API

Simple Enhancement (New in v0.3.0)

from paper_voice.simple_llm_enhancer import enhance_document_simple

# Convert any academic content with math to natural language
content = "The equation $E = mc^2$ represents energy-mass equivalence."
enhanced = enhance_document_simple(content, api_key="your-openai-key")
print(enhanced)
# Output: "The equation energy equals mass times the speed of light squared represents energy-mass equivalence."

Complete Workflow

from paper_voice import pdf_utils
from paper_voice.simple_llm_enhancer import enhance_document_simple
from paper_voice import tts

# 1. Extract text from PDF
pages = pdf_utils.extract_raw_text("paper.pdf")
content = '\n\n'.join(pages)

# 2. Enhance with LLM (converts math to natural language)
enhanced_script = enhance_document_simple(content, api_key="your-openai-key")

# 3. Generate audio
tts.synthesize_speech_chunked(
    enhanced_script, 
    "output.mp3", 
    use_openai=True, 
    api_key="your-openai-key"
)

LaTeX Processing

from paper_voice.content_processor import process_content_unified

latex_content = r"""
\documentclass{article}
\begin{document}
The algorithm minimizes $J(\theta) = \frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x^{(i)}) - y^{(i)})^2$.
\end{document}
"""

processed = process_content_unified(
    content=latex_content,
    input_type='latex',
    api_key='your-openai-key',
    use_llm_enhancement=True
)

print(processed.enhanced_text)

✨ What's New in v0.3.0

Simplified LLM Architecture

  • Single comprehensive prompt: Handles all math conversion in one API call
  • Professor-style narration: Natural explanations instead of robotic "subscript" language
  • Intelligent chunking: Automatically handles large documents within OpenAI limits
  • Better error handling: Clear failures instead of silent returns

Natural Mathematical Explanations

Before: $p_C$ → "p subscript C"

After: $p_C$ → "p underscore capital C, the proportion of compliers"

Complex expressions:

  • $F_{1C}$ → "F underscore one capital C, the outcome distribution for treated compliers"
  • $E = mc^2$ → "energy equals mass times the speed of light squared"

Key API Changes

  • Main function: simple_llm_enhancer.enhance_document_simple()
  • Smart chunking for documents > 128K tokens
  • Single LLM call for most documents
  • Professor-style math conversion prompt

Requirements

  • Python 3.9+ (excluding 3.9.7)
  • OpenAI API key (required for LLM enhancement)
  • pydub (for audio chunking)
  • PyPDF2 or PyMuPDF (for PDF processing)

Optional Dependencies

# For better PDF processing
pip install PyMuPDF

# For offline TTS
pip install pyttsx3

# For audio format conversion
# Install ffmpeg via your system package manager

Architecture

Paper Voice uses a clean modular pipeline:

PDF → LaTeX/Markdown → LLM Enhancement → TTS

  1. PDF Extraction: Extract text with pdf_utils.extract_raw_text()
  2. LLM Enhancement: Convert math to natural language with simple_llm_enhancer.enhance_document_simple()
  3. Audio Generation: Create audio with tts.synthesize_speech_chunked()

Examples

Basic Usage

from paper_voice.simple_llm_enhancer import enhance_document_simple

# Simple math conversion
text = "The learning rate α controls convergence of $\\theta^* = \\arg\\min J(\\theta)$."
enhanced = enhance_document_simple(text, "your-api-key")
# Result: Natural professor-style explanation of the math

With Progress Tracking

def progress_callback(message):
    print(f"Progress: {message}")

enhanced = enhance_document_simple(
    content, 
    api_key, 
    progress_callback=progress_callback
)

Large Document Handling

The system automatically handles large documents:

  • Documents < 128K tokens: Single LLM call
  • Documents > 128K tokens: Intelligent chunking with natural breakpoints

Configuration

Set your OpenAI API key:

export OPENAI_API_KEY="your-key-here"

Or pass it directly to functions:

enhanced = enhance_document_simple(content, api_key="your-key")

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paper_voice-0.3.0.tar.gz (70.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

paper_voice-0.3.0-py3-none-any.whl (80.1 kB view details)

Uploaded Python 3

File details

Details for the file paper_voice-0.3.0.tar.gz.

File metadata

  • Download URL: paper_voice-0.3.0.tar.gz
  • Upload date:
  • Size: 70.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for paper_voice-0.3.0.tar.gz
Algorithm Hash digest
SHA256 daec92e30a28917a313bc9b8027e1e522ea5c8524ba211b1c12f9373df8a16f2
MD5 157fb060addfe5fd643323ffad2ac118
BLAKE2b-256 9dd1e3227606adedb4bbbaae491090a8eedf2b46b76d98847494628427235dac

See more details on using hashes here.

File details

Details for the file paper_voice-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: paper_voice-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 80.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for paper_voice-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fe939ca2607d51fb5c9ffac721e8296bb5618650168c9e7eac3010cdc7a041a8
MD5 2e3c476b28f291bc705e7fdf0cc2cec3
BLAKE2b-256 7ec9ae3c2acc0a420e89ed3afc958d2664f4d2f7b0aab46f91cff7435928ef59

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page