Convert academic PDFs to spoken audio, handling math, tables and figures
Project description
Paper Voice
Convert academic PDFs to audio with proper mathematical notation handling.
Features
- Math handling: Converts LaTeX expressions to natural speech (e.g.,
$\alpha^2$→ "alpha squared") - LLM-powered explanations: Uses GPT-4 to explain complex mathematical expressions clearly
- Figure/table summaries: Extracts and summarizes figure captions and table content
- Multiple TTS options: Offline (pyttsx3) or OpenAI TTS
- Direct LaTeX/Markdown support: Process text with math notation directly
- Web interface: Streamlit app for easy file uploads and processing
Installation
git clone <your-repo-url>
cd paper_voice
pip install -e .
Usage
Web Interface
streamlit run paper_voice/streamlit/app.py
Upload a PDF, LaTeX file, or enter text directly. For best results with mathematical content, provide an OpenAI API key to enable LLM-powered explanations.
Command Line
from paper_voice import pdf_utils, math_to_speech
# Extract text from PDF
pages = pdf_utils.extract_raw_text("paper.pdf")
# Process mathematical expressions
processed = math_to_speech.process_text_with_math(pages[0])
print(processed)
Requirements
- Python 3.9+ (excluding 3.9.7)
- OpenAI API key (optional, for enhanced math explanations)
- ffmpeg (for audio processing)
Applications
There are several Streamlit applications available:
paper_voice/streamlit/app.py- Basic PDF to audio conversionpaper_voice/streamlit/app_with_llm.py- Enhanced with LLM math explanationspaper_voice/streamlit/app_enhanced.py- Full-featured version with LaTeX/Markdown support
Examples
See the demos/ directory for usage examples:
demos/basic_usage.py- Simple math processing examplesdemos/before_after_comparison.py- Shows improvement from LLM explanations
See the tests/ directory for test cases.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file paper_voice-0.1.0.tar.gz.
File metadata
- Download URL: paper_voice-0.1.0.tar.gz
- Upload date:
- Size: 54.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb33b1f59e4e1b0e0f94d85ce73ac1099a6b47984737779b89998cf64a25d635
|
|
| MD5 |
99a1957f5528635eaac18fdcbe349d62
|
|
| BLAKE2b-256 |
efa45c4947d87f090c40274ae1b2ae6e2221675a6f464c754e3798837f7c1f57
|
File details
Details for the file paper_voice-0.1.0-py3-none-any.whl.
File metadata
- Download URL: paper_voice-0.1.0-py3-none-any.whl
- Upload date:
- Size: 61.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b39597bf9fcf8ad5d510a7e09a4411038475384f4adbe3219b3ef93cd4ae25c0
|
|
| MD5 |
f3b38d542bf67d70154599508cb8d128
|
|
| BLAKE2b-256 |
33c54efa0ca758198b263974a25eacf142072f0d3d525ea4b55c975ceb9562ba
|