Skip to main content

Tiny helper that lists fonts used in a PDF via MuPDF (mutool).

Project description

PDF Font Checker

A lightweight Python utility that extracts and lists all fonts used in PDF documents using MuPDF's mutool command-line tool.

Features

  • 🔍 Extract font information from PDF files
  • 🚀 Automatic dependency management - automatically installs MuPDF tools if needed
  • 🖥️ Cross-platform support - works on Linux, macOS, and Windows
  • 📦 Simple API - easy to integrate into your Python projects
  • 🛠️ Command-line interface - use directly from terminal
  • 🧪 Well-tested - comprehensive test suite included

Installation

From PyPI (recommended)

pip install pdf-font-checker

From Source

git clone https://github.com/genie360s/pdf-font-checker.git
cd pdf-font-checker
pip install -e .

Dependencies

This package requires MuPDF's mutool command-line tool. The package will attempt to automatically install it using your system's package manager:

  • macOS: via Homebrew (brew install mupdf-tools)
  • Linux: via apt, dnf, yum, pacman, or zypper (mupdf-tools or mupdf)
  • Windows: Manual installation required

If automatic installation fails, you can install MuPDF manually:

Manual Installation

macOS

brew install mupdf-tools

Ubuntu/Debian

sudo apt-get update
sudo apt-get install mupdf-tools

Fedora/CentOS/RHEL

sudo dnf install mupdf-tools
# or on older systems:
sudo yum install mupdf-tools

Arch Linux

sudo pacman -S mupdf-tools

Usage

Command Line Interface

Extract fonts from a PDF file:

pdf-font-checker document.pdf

Disable automatic MuPDF installation:

pdf-font-checker --no-auto-install document.pdf

Python API

from pdf_font_checker import list_pdf_fonts

# Basic usage
fonts = list_pdf_fonts("document.pdf")
print("Fonts found:")
for font in fonts:
    print(f"  - {font}")

# Disable automatic installation of mutool
fonts = list_pdf_fonts("document.pdf", auto_install=False)

# Disable mutool availability check entirely
fonts = list_pdf_fonts("document.pdf", ensure=False)

Advanced Usage

from pdf_font_checker.core import list_pdf_fonts, ensure_mutool

# Ensure mutool is available before processing multiple files
ensure_mutool()

pdf_files = ["doc1.pdf", "doc2.pdf", "doc3.pdf"]
all_fonts = set()

for pdf_file in pdf_files:
    try:
        fonts = list_pdf_fonts(pdf_file, ensure=False)  # Skip check after first
        all_fonts.update(fonts)
        print(f"{pdf_file}: {len(fonts)} fonts")
    except Exception as e:
        print(f"Error processing {pdf_file}: {e}")

print(f"\nUnique fonts across all documents: {len(all_fonts)}")
for font in sorted(all_fonts):
    print(f"  - {font}")

Output Example

$ pdf-font-checker sample.pdf
Arial-Bold
Helvetica
TimesNewRomanPSMT
Calibri-Light
Verdana-Italic

API Reference

list_pdf_fonts(pdf_path, ensure=True, auto_install=True)

Extract font names from a PDF file.

Parameters:

  • pdf_path (str): Path to the PDF file
  • ensure (bool, default=True): Check for mutool availability before processing
  • auto_install (bool, default=True): Attempt to install MuPDF tools automatically

Returns:

  • List[str]: List of unique font names found in the PDF

Raises:

  • RuntimeError: If mutool is not available or PDF processing fails
  • FileNotFoundError: If the PDF file doesn't exist

ensure_mutool(auto_install=True)

Ensure MuPDF's mutool is available on the system.

Parameters:

  • auto_install (bool, default=True): Attempt automatic installation if mutool is missing

Raises:

  • RuntimeError: If mutool cannot be found or installed

Development

Setting up Development Environment

git clone https://github.com/genie360s/pdf-font-checker.git
cd pdf-font-checker

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode
pip install -e .

# Install development dependencies
pip install pytest pytest-cov black flake8

Running Tests

# Run all tests
python -m pytest

# Run with coverage
python -m pytest --cov=pdf_font_checker

# Run specific test file
python -m pytest tests/test_core.py

# Run specific test
python -m pytest tests/test_core.py::TestPdfFontChecker::test_parse_mutool_fonts_various_formats

Code Quality

# Format code
black src/ tests/

# Lint code
flake8 src/ tests/

# Type checking (if mypy is installed)
mypy src/

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Add tests for your changes
  5. Ensure all tests pass (python -m pytest)
  6. Format your code (black .)
  7. Commit your changes (git commit -m 'Add amazing feature')
  8. Push to the branch (git push origin feature/amazing-feature)
  9. Open a Pull Request

Troubleshooting

Common Issues

Q: "mutool not found" error A: Install MuPDF tools using your system package manager. See the Dependencies section above.

Q: "Permission denied" when auto-installing A: The automatic installation requires admin privileges on some systems. Install MuPDF manually or run with sudo (Linux) or as Administrator (Windows).

Q: No fonts detected in PDF A: Some PDFs may use embedded fonts in formats that mutool doesn't recognize, or the PDF might use images instead of text.

Q: Installation fails on Windows A: Windows support requires manual installation of MuPDF. Download from mupdf.com and ensure mutool.exe is in your PATH.

Getting Help

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Built on top of MuPDF - a lightweight PDF toolkit
  • Inspired by the need for simple font analysis in PDF workflows

Changelog

v0.1.0 (2025-09-02)

  • Initial release
  • Basic font extraction functionality
  • Cross-platform automatic dependency installation
  • Command-line interface
  • Python API

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_font_checker-0.1.0.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf_font_checker-0.1.0-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file pdf_font_checker-0.1.0.tar.gz.

File metadata

  • Download URL: pdf_font_checker-0.1.0.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for pdf_font_checker-0.1.0.tar.gz
Algorithm Hash digest
SHA256 13c8ce4345d2a2f66014df81c8a7d4579c80c3c5f771ce473850689140e0f59e
MD5 49792fb32946fa2ae4a471b951691a59
BLAKE2b-256 17ebf36febd29da61f1ecee762efb4231f4295737ca000f12660309d64af226c

See more details on using hashes here.

File details

Details for the file pdf_font_checker-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pdf_font_checker-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a7f5944ab3ded5ff6311e4a023fc8f7cadd5145115dad49925f4ba9ecad1e2f5
MD5 0f1c7d0ceda11635f904f901b4873709
BLAKE2b-256 e42bdeb32a38d0968d150fafa4831351d1969adb93c0b735da577866de4e56ba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page