Convert academic papers to high-quality audio with precise mathematical explanations and intelligent content processing
Project description
Paper Voice
Convert academic papers to high-quality audio narration with precise mathematical explanations, enhanced figure descriptions, and intelligent content processing.
Features
- 🧮 Precise Math Explanations: LLM-powered contextual explanations of mathematical expressions
- 🖼️ Enhanced Figure Descriptions: AI-generated audio-friendly descriptions of figures and tables
- 📄 Multi-Format Support: PDFs, LaTeX, Markdown, and plain text with math notation
- 🔗 ArXiv Integration: Direct download and processing of LaTeX source with figures
- ⚡ Batch Processing: Process multiple papers simultaneously with parallel execution
- 🎯 Selective Enhancement: Preserves original text while enhancing only math, figures, and tables
- 🗣️ Multiple TTS Options: OpenAI TTS (with chunking) or offline pyttsx3
- 💻 CLI & Web Interface: Full command-line interface plus Streamlit web app
- 👁️ Vision Analysis: Optional GPT-4V integration for superior PDF content extraction
Installation
From PyPI (Recommended)
pip install paper_voice
From Source
git clone https://github.com/gojiplus/paper_voice.git
cd paper_voice
pip install -e .
Usage
Command Line Interface
Paper Voice includes a comprehensive CLI for batch processing and automation:
# Single file processing
paper_voice paper.pdf --api-key YOUR_KEY
# Process arXiv paper
paper_voice https://arxiv.org/abs/2301.12345 --api-key YOUR_KEY --output research.mp3
# LaTeX file with custom voice
paper_voice paper.tex --latex --api-key YOUR_KEY --voice nova
# Vision-enhanced PDF analysis
paper_voice paper.pdf --vision --api-key YOUR_KEY --output enhanced.mp3
# Batch processing a directory
paper_voice --batch papers/ --api-key YOUR_KEY --output-dir ./audio_output
# Batch processing multiple files
paper_voice --batch paper1.pdf paper2.pdf paper3.tex --api-key YOUR_KEY --output-dir ./batch_output
CLI Options
--batch: Enable batch processing for multiple files/directories--vision: Use GPT-4V for enhanced PDF analysis--voice: Choose TTS voice (alloy, echo, fable, onyx, nova, shimmer)--speed: Adjust speech speed (0.25 to 4.0)--max-workers: Set concurrent workers for batch processing (default: 3)--output-dir: Output directory for batch processing--offline: Use offline TTS instead of OpenAI TTS--no-enhancement: Skip LLM enhancement for faster processing
Web Interface
streamlit run streamlit/app.py
Upload a PDF, LaTeX file, or enter text directly. For best results with mathematical content, provide an OpenAI API key to enable LLM-powered explanations.
Python API
from paper_voice import pdf_utils, math_to_speech
from paper_voice.arxiv_downloader import download_arxiv_paper
# Extract text from PDF
pages = pdf_utils.extract_raw_text("paper.pdf")
# Process mathematical expressions
processed = math_to_speech.process_text_with_math(pages[0])
print(processed)
# Download from arXiv
paper = download_arxiv_paper("2301.12345")
if paper:
print(f"Downloaded: {paper.title}")
print(f"LaTeX content: {len(paper.latex_content)} characters")
✨ What's New in v0.2.0
Precise Mathematical Explanations
Instead of basic conversions like "alpha squared plus beta", Paper Voice now generates contextual explanations:
Before: $\alpha^2 + \beta = \gamma$ → "alpha squared plus beta equals gamma"
After: $\alpha^2 + \beta = \gamma$ → "In machine learning context, this equation shows that the outcome gamma is determined by the sum of the learning rate alpha squared and the regularization parameter beta..."
Selective Enhancement
- ✅ Only enhances: Math expressions, figure captions, table descriptions
- ✅ Preserves exactly: All other academic text, structure, and content
- ❌ No summarization: Original text remains unchanged
Enhanced LaTeX Processing
LaTeX files now get full LLM enhancement while preserving document structure:
- Mathematical expressions → Contextual explanations
- Figure captions → Audio-friendly descriptions
- Table content → Clear narration
- Regular text → Preserved exactly as written
Requirements
- Python 3.9+ (excluding 3.9.7)
- OpenAI API key (optional but recommended for enhanced explanations)
- ffmpeg (for audio processing)
Advanced Features
Batch Processing
Process multiple papers efficiently with parallel processing:
# Process all PDFs in a directory
paper_voice --batch research_papers/ --api-key YOUR_KEY --output-dir audio_papers/
# Process specific files with custom settings
paper_voice --batch paper1.pdf paper2.tex paper3.pdf \
--api-key YOUR_KEY \
--output-dir batch_output/ \
--max-workers 5 \
--voice nova \
--vision
ArXiv Integration
Download and process papers directly from arXiv:
# Single arXiv paper
paper_voice https://arxiv.org/abs/2301.12345 --api-key YOUR_KEY
# Multiple arXiv papers
paper_voice --batch \
https://arxiv.org/abs/2301.12345 \
https://arxiv.org/abs/2302.67890 \
--api-key YOUR_KEY \
--output-dir arxiv_audio/
Vision-Enhanced PDF Analysis
Use GPT-4V for superior content extraction:
paper_voice complex_paper.pdf --vision --api-key YOUR_KEY
Examples
See the demos/ directory for usage examples:
demos/basic_usage.py- Simple math processing examplesdemos/before_after_comparison.py- Shows improvement from LLM explanations
See the tests/ directory for comprehensive test cases including batch processing tests.
Contributing
See CONTRIBUTING.md for development guidelines and how to contribute to the project.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file paper_voice-0.2.0.tar.gz.
File metadata
- Download URL: paper_voice-0.2.0.tar.gz
- Upload date:
- Size: 80.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2da7782d6082b3176e3ddc5bf0d50167e8c86ffd7150dde1f20bd415103c4085
|
|
| MD5 |
7f4b6a0eb341ef346ff83956ae2b0f19
|
|
| BLAKE2b-256 |
6261ab7fc4ab5a6a7d671b3721c5de662ffd81bb78bb80b9c4dff09b755fedd3
|
File details
Details for the file paper_voice-0.2.0-py3-none-any.whl.
File metadata
- Download URL: paper_voice-0.2.0-py3-none-any.whl
- Upload date:
- Size: 91.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ccfc3b0f3b2d8efa7b9fa84879b6c7f8148492c4e6c726aa7608a05fec2331b
|
|
| MD5 |
d8fa1663f977f2364825c8c9650f46f1
|
|
| BLAKE2b-256 |
9941bf02cc3c1c428ffe71fa5cc33fd16197517eb3593e3de293c4275ec13fa5
|