Add Pinyin annotations to Chinese text in EPUB files with smart polyphonic character handling
Project description
EPUB Pinyin
A Python tool to add Pinyin annotations above Chinese characters in EPUB files. This tool helps Chinese language learners by automatically adding pronunciation guides to Chinese text while maintaining the EPUB format and structure.
Features
- Automatically detects Chinese characters in EPUB content
- Adds Pinyin annotations using
<ruby>tags - Preserves original EPUB structure and formatting
- Supports tone marks in Pinyin
- Handles complex EPUB structures
- Maintains original file organization
- Processes files in correct spine order
- NEW: Convert EPUB to PDF with Pinyin annotations
- NEW: Smart context-aware pronunciation for polyphonic characters
- NEW: Custom font support for Chinese text and Pinyin
- NEW: Professional PDF layout with proper Chinese typography
Installation
You can install the package directly from PyPI:
pip install epub-pinyin
Or install from source:
git clone https://github.com/tony-develop-2025/epub-pinyin.git
cd epub-pinyin
pip install -e .
Usage
EPUB Processing
Command Line
After installation, you can use the tool from the command line:
epub-pinyin input.epub -o output.epub
Or use the module directly:
python -m epub_pinyin.main input.epub -o output.epub
Where:
input.epubis your source EPUB file containing Chinese text-o output.epub(optional) specifies the output file name (defaults to "input_annotated.epub")
Python API
You can also use the tool programmatically in your Python code:
from epub_pinyin import process_epub
# Process an EPUB file
process_epub("input.epub", "output.epub")
PDF Conversion
Command Line
Convert EPUB to PDF with Pinyin annotations:
python -m epub_pinyin.main input.epub --pdf output.pdf
Or use the dedicated PDF converter:
python -m epub_pinyin.pdf_converter input.epub output.pdf
Python API
from epub_pinyin.pdf_converter import convert_epub_to_pdf
# Convert EPUB to PDF
convert_epub_to_pdf("input.epub", "output.pdf")
Advanced PDF Usage
For more control over PDF generation:
from epub_pinyin.epub_parser import EpubParser
from epub_pinyin.pdf_converter import PdfConverter
import tempfile
# Extract EPUB content
with tempfile.TemporaryDirectory() as temp_dir:
parser = EpubParser(temp_dir)
parser.extract_epub("input.epub")
# Convert to PDF with custom settings
converter = PdfConverter(parser)
converter.convert_to_pdf("output.pdf")
Examples
EPUB Output
When rendered in an EPUB reader, the Pinyin appears above each character:
nǐ hǎo shì jiè
你好,世界!
PDF Output
The PDF conversion creates a professional document with:
-
Smart Pinyin: Context-aware pronunciation for polyphonic characters
- 长袍 (zhǎng páo) vs 长度 (cháng dù)
- 银行 (yín háng) vs 行走 (xíng zǒu)
- 觉得 (jué de) vs 得到 (dé dào)
-
Professional Layout:
- Custom Chinese fonts (SimSun, FangSong, STSong)
- Separate font styling for Pinyin annotations
- Proper Chinese typography rules
- No punctuation at line start (except quotes)
-
Typography Features:
- Automatic line breaking with proper Chinese rules
- Ruby text positioning for Pinyin
- Optimized spacing and margins
- Chapter and page organization
Requirements
Core Requirements
- Python 3.7 or higher
- beautifulsoup4
- pypinyin
- lxml
PDF Conversion Requirements
- reportlab (for PDF generation)
- Custom Chinese fonts (optional, will use system fonts as fallback)
Installation with PDF Support
# Install with PDF support
pip install epub-pinyin[pdf]
# Or install all dependencies
pip install epub-pinyin[dev]
Development
To set up the development environment:
- Clone the repository:
git clone https://github.com/tony-develop-2025/epub-pinyin.git
cd epub-pinyin
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install development dependencies:
pip install -r requirements-dev.txt
- Run tests:
pytest tests/
- Test PDF conversion:
# Create a test EPUB file first, then convert to PDF
python -m epub_pinyin.main test.epub --pdf test.pdf
Configuration
Custom Fonts
You can add custom Chinese fonts to improve PDF output quality:
- Place your font files in the
epub_pinyin/fonts/directory - Supported formats:
.ttf,.ttc - The system will automatically detect and use available fonts
PDF Settings
The PDF converter includes several configurable options:
- Font sizes: Adjustable for title, chapter, body, and Pinyin text
- Margins: Customizable page margins
- Line spacing: Configurable line height
- Character width: Uniform character spacing for Chinese text
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Areas for Contribution
- Font Support: Add support for more Chinese font formats
- Typography: Improve Chinese typography rules
- Performance: Optimize PDF generation speed
- Testing: Add more comprehensive test cases
- Documentation: Improve user guides and examples
Troubleshooting
Common Issues
PDF Generation Fails
- Error: "No Chinese fonts found"
- Solution: Install ReportLab with PDF support:
pip install epub-pinyin[pdf] - Solution: Add custom fonts to
epub_pinyin/fonts/directory
- Solution: Install ReportLab with PDF support:
Pinyin Accuracy Issues
- Issue: Incorrect pronunciation for polyphonic characters
- Solution: The system uses context-aware pronunciation. For better accuracy, ensure proper word boundaries in your text.
Font Display Issues
- Issue: Chinese characters not displaying correctly in PDF
- Solution: Check that your font files are valid and supported
- Solution: Try different font formats (.ttf, .ttc)
Performance Tips
- For large EPUB files, PDF conversion may take some time
- Consider processing chapters separately for very large documents
- Ensure sufficient disk space for temporary files during conversion
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file epub_pinyin-0.2.1.tar.gz.
File metadata
- Download URL: epub_pinyin-0.2.1.tar.gz
- Upload date:
- Size: 7.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ff7a596f0b926b8b118e1e791c4fa124640499023cb3c32629c2725d079169a
|
|
| MD5 |
488d6a680b4afc30cf9ae9be66bf5ad9
|
|
| BLAKE2b-256 |
4187aa4ce403d221eb37ee7b1ae5000d73960b056adf447ca762d8524055ad9b
|
File details
Details for the file epub_pinyin-0.2.1-py3-none-any.whl.
File metadata
- Download URL: epub_pinyin-0.2.1-py3-none-any.whl
- Upload date:
- Size: 8.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3fd1c345d6ed567db71f77ae994c121f656325d109261498b1e09eb4d5281f20
|
|
| MD5 |
36c614f2994c84a6a66b96c4434c625d
|
|
| BLAKE2b-256 |
bb8fc4a46d83a472d9135220c6af6ab4d52d46f0f9d4a88437ccc7c2049cb7b6
|