Convert academic papers to high-quality audio with precise mathematical explanations and intelligent content processing
Project description
Paper Voice
Convert academic papers to high-quality audio narration with precise mathematical explanations using a simplified LLM-powered approach.
Features
- 🧮 Natural Math Narration: Professor-style explanations of mathematical expressions
- 📄 Multi-Format Support: PDFs, LaTeX, Markdown, and plain text with math notation
- 🎯 Simple LLM Enhancement: Single comprehensive prompt for natural audio conversion
- 🗣️ Multiple TTS Options: OpenAI TTS (with chunking) or offline pyttsx3
- 💻 Web Interface: Easy-to-use Streamlit web app
- ⚡ Intelligent Chunking: Handles large documents with smart OpenAI API limits
Installation
From PyPI (Recommended)
pip install paper_voice
From Source
git clone https://github.com/gojiplus/paper_voice.git
cd paper_voice
pip install -e .
Usage
Web Interface (Recommended)
streamlit run paper_voice/streamlit/app.py
Upload a PDF, LaTeX file, or enter text directly. Provide an OpenAI API key for LLM-enhanced natural language conversion of mathematical expressions.
Python API
Simple Enhancement (New in v0.3.0)
from paper_voice.simple_llm_enhancer import enhance_document_simple
# Convert any academic content with math to natural language
content = "The equation $E = mc^2$ represents energy-mass equivalence."
enhanced = enhance_document_simple(content, api_key="your-openai-key")
print(enhanced)
# Output: "The equation energy equals mass times the speed of light squared represents energy-mass equivalence."
Complete Workflow
from paper_voice import pdf_utils
from paper_voice.simple_llm_enhancer import enhance_document_simple
from paper_voice import tts
# 1. Extract text from PDF
pages = pdf_utils.extract_raw_text("paper.pdf")
content = '\n\n'.join(pages)
# 2. Enhance with LLM (converts math to natural language)
enhanced_script = enhance_document_simple(content, api_key="your-openai-key")
# 3. Generate audio
tts.synthesize_speech_chunked(
enhanced_script,
"output.mp3",
use_openai=True,
api_key="your-openai-key"
)
LaTeX Processing
from paper_voice.content_processor import process_content_unified
latex_content = r"""
\documentclass{article}
\begin{document}
The algorithm minimizes $J(\theta) = \frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x^{(i)}) - y^{(i)})^2$.
\end{document}
"""
processed = process_content_unified(
content=latex_content,
input_type='latex',
api_key='your-openai-key',
use_llm_enhancement=True
)
print(processed.enhanced_text)
✨ What's New in v0.3.0
Simplified LLM Architecture
- Single comprehensive prompt: Handles all math conversion in one API call
- Professor-style narration: Natural explanations instead of robotic "subscript" language
- Intelligent chunking: Automatically handles large documents within OpenAI limits
- Better error handling: Clear failures instead of silent returns
Natural Mathematical Explanations
Before: $p_C$ → "p subscript C"
After: $p_C$ → "p underscore capital C, the proportion of compliers"
Complex expressions:
$F_{1C}$→ "F underscore one capital C, the outcome distribution for treated compliers"$E = mc^2$→ "energy equals mass times the speed of light squared"
Key API Changes
- Main function:
simple_llm_enhancer.enhance_document_simple() - Smart chunking for documents > 128K tokens
- Single LLM call for most documents
- Professor-style math conversion prompt
Requirements
- Python 3.9+ (excluding 3.9.7)
- OpenAI API key (required for LLM enhancement)
- pydub (for audio chunking)
- PyPDF2 or PyMuPDF (for PDF processing)
Optional Dependencies
# For better PDF processing
pip install PyMuPDF
# For offline TTS
pip install pyttsx3
# For audio format conversion
# Install ffmpeg via your system package manager
Architecture
Paper Voice uses a clean modular pipeline:
PDF → LaTeX/Markdown → LLM Enhancement → TTS
- PDF Extraction: Extract text with
pdf_utils.extract_raw_text() - LLM Enhancement: Convert math to natural language with
simple_llm_enhancer.enhance_document_simple() - Audio Generation: Create audio with
tts.synthesize_speech_chunked()
Examples
Basic Usage
from paper_voice.simple_llm_enhancer import enhance_document_simple
# Simple math conversion
text = "The learning rate α controls convergence of $\\theta^* = \\arg\\min J(\\theta)$."
enhanced = enhance_document_simple(text, "your-api-key")
# Result: Natural professor-style explanation of the math
With Progress Tracking
def progress_callback(message):
print(f"Progress: {message}")
enhanced = enhance_document_simple(
content,
api_key,
progress_callback=progress_callback
)
Large Document Handling
The system automatically handles large documents:
- Documents < 128K tokens: Single LLM call
- Documents > 128K tokens: Intelligent chunking with natural breakpoints
Configuration
Set your OpenAI API key:
export OPENAI_API_KEY="your-key-here"
Or pass it directly to functions:
enhanced = enhance_document_simple(content, api_key="your-key")
License
MIT License - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file paper_voice-0.3.0.tar.gz.
File metadata
- Download URL: paper_voice-0.3.0.tar.gz
- Upload date:
- Size: 70.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
daec92e30a28917a313bc9b8027e1e522ea5c8524ba211b1c12f9373df8a16f2
|
|
| MD5 |
157fb060addfe5fd643323ffad2ac118
|
|
| BLAKE2b-256 |
9dd1e3227606adedb4bbbaae491090a8eedf2b46b76d98847494628427235dac
|
File details
Details for the file paper_voice-0.3.0-py3-none-any.whl.
File metadata
- Download URL: paper_voice-0.3.0-py3-none-any.whl
- Upload date:
- Size: 80.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe939ca2607d51fb5c9ffac721e8296bb5618650168c9e7eac3010cdc7a041a8
|
|
| MD5 |
2e3c476b28f291bc705e7fdf0cc2cec3
|
|
| BLAKE2b-256 |
7ec9ae3c2acc0a420e89ed3afc958d2664f4d2f7b0aab46f91cff7435928ef59
|