Tiny helper that lists fonts used in a PDF via MuPDF (mutool).
Project description
PDF Font Checker
A lightweight Python utility that extracts and lists all fonts used in PDF documents using MuPDF's mutool command-line tool.
Features
Key Features Tested:
- Font extraction from PDF files
- Multiple mutool output format parsing
- Automatic MuPDF installation
- Command-line interface
- Python API
- Error handling
- Cross-platform compatibility
- Package building and distribution
Installation
From PyPI (recommended)
pip install pdf-font-checker
From Source
git clone https://github.com/genie360s/pdf-font-checker.git
cd pdf-font-checker
pip install -e .
Dependencies
This package requires MuPDF's mutool command-line tool. The package will attempt to automatically install it using your system's package manager:
- macOS: via Homebrew (
brew install mupdf-tools) - Linux: via apt, dnf, yum, pacman, or zypper (
mupdf-toolsormupdf)
Note: This package currently supports Linux and macOS only. Windows support is not available at this time.
If automatic installation fails, you can install MuPDF manually:
Manual Installation
macOS
brew install mupdf-tools
Ubuntu/Debian
sudo apt-get update
sudo apt-get install mupdf-tools
Fedora/CentOS/RHEL
sudo dnf install mupdf-tools
# or on older systems:
sudo yum install mupdf-tools
Arch Linux
sudo pacman -S mupdf-tools
Platform Support
This package is designed to work on:
- Linux (all major distributions)
- macOS (Intel and Apple Silicon)
- Windows (not supported)
Usage
Quick Reference
| Command | Output Format | Description |
|---|---|---|
pdf-font-checker file.pdf |
Text list | Font names only |
pdf-font-checker file.pdf --detailed |
Formatted text | Full PDF analysis |
pdf-font-checker file.pdf --dict |
Python dict | Structured data |
pdf-font-checker file.pdf --dict --json |
JSON | Structured JSON |
| Python Function | Return Type | Description |
|---|---|---|
get_pdf_info_dict() |
dict |
Recommended - Structured data |
get_pdf_terminal_output() |
str |
Terminal-style formatted output |
list_pdf_fonts() |
list |
Font names only |
analyze_pdf() |
dict |
Complete analysis with all metadata |
Command Line Interface
Basic Usage - Font Names Only
Extract just the font names from a PDF file:
pdf-font-checker document.pdf
Output:
Helvetica
AZHGJL+ArialMT
Detailed Analysis
Get comprehensive PDF metadata including version, pages, and detailed font information:
pdf-font-checker document.pdf --detailed
Output:
PDF Version: PDF-1.4
Pages: 2
Info Object (20 0 R): <</ModDate(D:20250207153904+03'00')/Creator(JasperReports Library version 6.6.0)/CreationDate(D:20250207153904+03'00')/Producer(iText 2.1.7 by 1T3XT)>>
Fonts (2):
1 (2 0 R): Type1 'Helvetica' WinAnsiEncoding (3 0 R)
1 (2 0 R): Type0 'AZHGJL+ArialMT' Identity-H (4 0 R)
Dictionary Format
Get structured data in dictionary format:
pdf-font-checker document.pdf --dict
Output:
{'pdf_version': 'PDF-1.4', 'total_no_of_fonts': 2, 'font_names': ['Helvetica', 'AZHGJL+ArialMT'], 'info_object': '20 0 R'}
JSON Output
Get any output in JSON format:
pdf-font-checker document.pdf --dict --json
Output:
{
"pdf_version": "PDF-1.4",
"total_no_of_fonts": 2,
"font_names": [
"Helvetica",
"AZHGJL+ArialMT"
],
"info_object": "20 0 R"
}
Disable Automatic Installation
Disable automatic MuPDF installation:
pdf-font-checker --no-auto-install document.pdf
Python API
1. Dictionary Format (Recommended)
Get structured PDF information in a simple dictionary format:
from pdf_font_checker import get_pdf_info_dict
# Get structured PDF info
result = get_pdf_info_dict("document.pdf")
print(result)
# Output: {'pdf_version': 'PDF-1.4', 'total_no_of_fonts': 2, 'font_names': ['Helvetica', 'AZHGJL+ArialMT'], 'info_object': '20 0 R'}
# Access individual fields
print(f"PDF Version: {result['pdf_version']}")
print(f"Number of fonts: {result['total_no_of_fonts']}")
print(f"Font names: {result['font_names']}")
print(f"Info object: {result['info_object']}")
2. Terminal-Style Output
Get the exact same output as the command line:
from pdf_font_checker import get_pdf_terminal_output
# Detailed output (same as --detailed)
detailed_output = get_pdf_terminal_output("document.pdf", detailed=True)
print(detailed_output)
# Simple output (just font names)
simple_output = get_pdf_terminal_output("document.pdf", detailed=False)
print(simple_output)
3. Font Names Only (Legacy)
Get just the list of font names:
from pdf_font_checker import list_pdf_fonts
fonts = list_pdf_fonts("document.pdf")
print("Fonts found:")
for font in fonts:
print(f" - {font}")
4. Complete Analysis
Get full detailed analysis with all metadata:
from pdf_font_checker import analyze_pdf
analysis = analyze_pdf("document.pdf")
print(f"PDF Version: {analysis['pdf_version']}")
print(f"Pages: {analysis['pages']}")
print(f"Font count: {analysis['font_count']}")
# Detailed font information
for font in analysis['fonts']:
print(f"Font: {font['name']} (Type: {font['type']}, Page: {font['page']})")
5. JSON Output in Python
Convert any result to JSON:
import json
from pdf_font_checker import get_pdf_info_dict
result = get_pdf_info_dict("document.pdf")
json_output = json.dumps(result, indent=2)
print(json_output)
Advanced Usage
Process Multiple PDF Files
from pdf_font_checker import get_pdf_info_dict, ensure_mutool
import json
# Ensure mutool is available before processing multiple files
ensure_mutool()
pdf_files = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
results = []
for pdf_file in pdf_files:
try:
info = get_pdf_info_dict(pdf_file, ensure=False) # Skip check after first
info['filename'] = pdf_file
results.append(info)
print(f"[x] Processed {pdf_file}: {info['total_no_of_fonts']} fonts")
except Exception as e:
print(f"[ ] Error processing {pdf_file}: {e}")
# Save results to JSON file
with open('pdf_analysis_results.json', 'w') as f:
json.dump(results, f, indent=2)
print(f"\nProcessed {len(results)} files successfully")
Extract Unique Fonts Across Multiple PDFs
from pdf_font_checker import get_pdf_info_dict
pdf_files = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
all_fonts = set()
pdf_versions = set()
for pdf_file in pdf_files:
try:
info = get_pdf_info_dict(pdf_file)
all_fonts.update(info['font_names'])
pdf_versions.add(info['pdf_version'])
print(f"{pdf_file}: {info['total_no_of_fonts']} fonts")
except Exception as e:
print(f"Error processing {pdf_file}: {e}")
print(f"\nSummary:")
print(f"Total unique fonts: {len(all_fonts)}")
print(f"PDF versions found: {sorted(pdf_versions)}")
print(f"Font list: {sorted(all_fonts)}")
Error Handling
from pdf_font_checker import get_pdf_info_dict
try:
result = get_pdf_info_dict("document.pdf")
if result['total_no_of_fonts'] == 0:
print("No fonts found in PDF")
else:
print(f"Found {result['total_no_of_fonts']} fonts")
except FileNotFoundError:
print("PDF file not found")
except RuntimeError as e:
print(f"PDF processing error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
Output Example
$ pdf-font-checker sample.pdf
Arial-Bold
Helvetica
TimesNewRomanPSMT
Calibri-Light
Verdana-Italic
Data API Ready Reference
get_pdf_info_dict(pdf_path, ensure=True, auto_install=True)
[RECOMMENDED] Extract PDF information in a structured dictionary format.
Parameters:
pdf_path(str): Path to the PDF fileensure(bool, default=True): Check for mutool availability before processingauto_install(bool, default=True): Attempt to install MuPDF tools automatically
Returns:
Dict[str, Any]: Dictionary containing:pdf_version: PDF version (e.g., "PDF-1.4")total_no_of_fonts: Number of fontsfont_names: List of font namesinfo_object: Info object reference
Example:
result = get_pdf_info_dict("document.pdf")
# {'pdf_version': 'PDF-1.4', 'total_no_of_fonts': 2, 'font_names': ['Helvetica', 'Arial'], 'info_object': '20 0 R'}
get_pdf_terminal_output(pdf_path, detailed=True, ensure=True, auto_install=True)
Get PDF analysis output formatted exactly like the terminal command.
Parameters:
pdf_path(str): Path to the PDF filedetailed(bool, default=True): If True, return detailed output; if False, return just font namesensure(bool, default=True): Check for mutool availability before processingauto_install(bool, default=True): Attempt to install MuPDF tools automatically
Returns:
str: Formatted output exactly like terminal command
analyze_pdf(pdf_path, ensure=True, auto_install=True)
Extract comprehensive PDF metadata and font information.
Parameters:
pdf_path(str): Path to the PDF fileensure(bool, default=True): Check for mutool availability before processingauto_install(bool, default=True): Attempt to install MuPDF tools automatically
Returns:
Dict[str, Any]: Comprehensive analysis containing:pdf_version: PDF versioninfo_object: Info object data with reference and contentpages: Number of pagesfonts: List of detailed font dictionariesfont_count: Total number of fontsfont_names: List of font names
list_pdf_fonts(pdf_path, ensure=True, auto_install=True)
Extract font names from a PDF file (legacy function for backward compatibility).
Parameters:
pdf_path(str): Path to the PDF fileensure(bool, default=True): Check for mutool availability before processingauto_install(bool, default=True): Attempt to install MuPDF tools automatically
Returns:
List[str]: List of unique font names found in the PDF
ensure_mutool(auto_install=True)
Ensure MuPDF's mutool is available on the system.
Parameters:
auto_install(bool, default=True): Attempt automatic installation if mutool is missing
Raises:
RuntimeError: If mutool cannot be found or installed
Development
Setting up Development Environment
git clone https://github.com/genie360s/pdf-font-checker.git
cd pdf-font-checker
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e .
# Install development dependencies
pip install pytest pytest-cov black flake8
Running Tests
# Run all tests
python -m pytest
# Run with coverage
python -m pytest --cov=pdf_font_checker
# Run specific test file
python -m pytest tests/test_core.py
# Run specific test
python -m pytest tests/test_core.py::TestPdfFontChecker::test_parse_mutool_fonts_various_formats
Code Quality
# Format code
black src/ tests/
# Lint code
flake8 src/ tests/
# Type checking (if mypy is installed)
mypy src/
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests for your changes
- Ensure all tests pass (
python -m pytest) - Format your code (
black .) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Troubleshooting
Common Issues
Q: "mutool not found" error A: Install MuPDF tools using your system package manager. See the Dependencies section above.
Q: "Permission denied" when auto-installing A: The automatic installation requires admin privileges on some systems. Install MuPDF manually or run with sudo (Linux) or as Administrator (Windows).
Q: No fonts detected in PDF A: Some PDFs may use embedded fonts in formats that mutool doesn't recognize, or the PDF might use images instead of text.
Q: Does this work on Windows? A: No, this package currently only supports Linux and macOS. Windows support may be added in future versions.
Getting Help
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Built on top of MuPDF - a lightweight PDF toolkit
- Inspired by the need for simple font analysis in PDF workflows
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf_font_checker-0.1.1.tar.gz.
File metadata
- Download URL: pdf_font_checker-0.1.1.tar.gz
- Upload date:
- Size: 16.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b60a4582e995f0aa84e2dfa96ace25a25caaa3e998f921d5e047cbde1ef1e42
|
|
| MD5 |
5134b4156159967f4bf13e8be828910b
|
|
| BLAKE2b-256 |
64648e7f583455d765a197ec2eb3f45eddf4e98cd004185f5aea3f854de17cc9
|
File details
Details for the file pdf_font_checker-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pdf_font_checker-0.1.1-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fbf4f57438efbbcb534f4c5915692ec0b68dcec1f771717438d92fe099b18239
|
|
| MD5 |
e378bebbc57f05e2a1577ea03fd8a9f5
|
|
| BLAKE2b-256 |
31f7d76cc23fd20a4148ee6af6475e4d27d496ffd4a9f96de497fcfb89ed6d1c
|