Convert academic papers to high-quality audio with precise mathematical explanations and intelligent content processing

These details have not been verified by PyPI

Project links

Project description

Paper Voice

Convert academic papers to high-quality audio narration with precise mathematical explanations using a simplified LLM-powered approach.

Features

🧮 Natural Math Narration: Professor-style explanations of mathematical expressions
📄 Multi-Format Support: PDFs, LaTeX, Markdown, and plain text with math notation
🎯 Simple LLM Enhancement: Single comprehensive prompt for natural audio conversion
🗣️ Multiple TTS Options: OpenAI TTS (with chunking) or offline pyttsx3
💻 Web Interface: Easy-to-use Streamlit web app
⚡ Intelligent Chunking: Handles large documents with smart OpenAI API limits

Installation

From PyPI (Recommended)

pip install paper_voice

From Source

git clone https://github.com/gojiplus/paper_voice.git
cd paper_voice
pip install -e .

Usage

Web Interface (Recommended)

streamlit run paper_voice/streamlit/app.py

Upload a PDF, LaTeX file, or enter text directly. Provide an OpenAI API key for LLM-enhanced natural language conversion of mathematical expressions.

Python API

Simple Enhancement (New in v0.3.0)

from paper_voice.simple_llm_enhancer import enhance_document_simple

# Convert any academic content with math to natural language
content = "The equation $E = mc^2$ represents energy-mass equivalence."
enhanced = enhance_document_simple(content, api_key="your-openai-key")
print(enhanced)
# Output: "The equation energy equals mass times the speed of light squared represents energy-mass equivalence."

Complete Workflow

from paper_voice import pdf_utils
from paper_voice.simple_llm_enhancer import enhance_document_simple
from paper_voice import tts

# 1. Extract text from PDF
pages = pdf_utils.extract_raw_text("paper.pdf")
content = '\n\n'.join(pages)

# 2. Enhance with LLM (converts math to natural language)
enhanced_script = enhance_document_simple(content, api_key="your-openai-key")

# 3. Generate audio
tts.synthesize_speech_chunked(
    enhanced_script, 
    "output.mp3", 
    use_openai=True, 
    api_key="your-openai-key"
)

LaTeX Processing

from paper_voice.content_processor import process_content_unified

latex_content = r"""
\documentclass{article}
\begin{document}
The algorithm minimizes $J(\theta) = \frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x^{(i)}) - y^{(i)})^2$.
\end{document}
"""

processed = process_content_unified(
    content=latex_content,
    input_type='latex',
    api_key='your-openai-key',
    use_llm_enhancement=True
)

print(processed.enhanced_text)

✨ What's New in v0.3.0

Simplified LLM Architecture

Single comprehensive prompt: Handles all math conversion in one API call
Professor-style narration: Natural explanations instead of robotic "subscript" language
Intelligent chunking: Automatically handles large documents within OpenAI limits
Better error handling: Clear failures instead of silent returns

Natural Mathematical Explanations

Before: $p_C$ → "p subscript C"

After: $p_C$ → "p underscore capital C, the proportion of compliers"

Complex expressions:

$F_{1C}$ → "F underscore one capital C, the outcome distribution for treated compliers"
$E = mc^2$ → "energy equals mass times the speed of light squared"

Key API Changes

Main function: simple_llm_enhancer.enhance_document_simple()
Smart chunking for documents > 128K tokens
Single LLM call for most documents
Professor-style math conversion prompt

Requirements

Python 3.9+ (excluding 3.9.7)
OpenAI API key (required for LLM enhancement)
pydub (for audio chunking)
PyPDF2 or PyMuPDF (for PDF processing)

Optional Dependencies

# For better PDF processing
pip install PyMuPDF

# For offline TTS
pip install pyttsx3

# For audio format conversion
# Install ffmpeg via your system package manager

Architecture

Paper Voice uses a clean modular pipeline:

PDF → LaTeX/Markdown → LLM Enhancement → TTS

PDF Extraction: Extract text with pdf_utils.extract_raw_text()
LLM Enhancement: Convert math to natural language with simple_llm_enhancer.enhance_document_simple()
Audio Generation: Create audio with tts.synthesize_speech_chunked()

Examples

Basic Usage

from paper_voice.simple_llm_enhancer import enhance_document_simple

# Simple math conversion
text = "The learning rate α controls convergence of $\\theta^* = \\arg\\min J(\\theta)$."
enhanced = enhance_document_simple(text, "your-api-key")
# Result: Natural professor-style explanation of the math

With Progress Tracking

def progress_callback(message):
    print(f"Progress: {message}")

enhanced = enhance_document_simple(
    content, 
    api_key, 
    progress_callback=progress_callback
)

Large Document Handling

The system automatically handles large documents:

Documents < 128K tokens: Single LLM call
Documents > 128K tokens: Intelligent chunking with natural breakpoints

Configuration

Set your OpenAI API key:

export OPENAI_API_KEY="your-key-here"

Or pass it directly to functions:

enhanced = enhance_document_simple(content, api_key="your-key")

License

MIT License - see LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0

Sep 13, 2025

0.2.0

Sep 13, 2025

0.1.0

Sep 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paper_voice-0.3.0.tar.gz (70.2 kB view details)

Uploaded Sep 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

paper_voice-0.3.0-py3-none-any.whl (80.1 kB view details)

Uploaded Sep 13, 2025 Python 3

File details

Details for the file paper_voice-0.3.0.tar.gz.

File metadata

Download URL: paper_voice-0.3.0.tar.gz
Upload date: Sep 13, 2025
Size: 70.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for paper_voice-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`daec92e30a28917a313bc9b8027e1e522ea5c8524ba211b1c12f9373df8a16f2`
MD5	`157fb060addfe5fd643323ffad2ac118`
BLAKE2b-256	`9dd1e3227606adedb4bbbaae491090a8eedf2b46b76d98847494628427235dac`

See more details on using hashes here.

File details

Details for the file paper_voice-0.3.0-py3-none-any.whl.

File metadata

Download URL: paper_voice-0.3.0-py3-none-any.whl
Upload date: Sep 13, 2025
Size: 80.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for paper_voice-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fe939ca2607d51fb5c9ffac721e8296bb5618650168c9e7eac3010cdc7a041a8`
MD5	`2e3c476b28f291bc705e7fdf0cc2cec3`
BLAKE2b-256	`7ec9ae3c2acc0a420e89ed3afc958d2664f4d2f7b0aab46f91cff7435928ef59`

See more details on using hashes here.

paper-voice 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Paper Voice

Features

Installation

From PyPI (Recommended)

From Source

Usage

Web Interface (Recommended)

Python API

Simple Enhancement (New in v0.3.0)

Complete Workflow

LaTeX Processing

✨ What's New in v0.3.0

Simplified LLM Architecture

Natural Mathematical Explanations

Key API Changes

Requirements

Optional Dependencies

Architecture

Examples

Basic Usage

With Progress Tracking

Large Document Handling

Configuration

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes