Skip to main content

CLI for PDF text extraction using Meta's Nougat model with GPU acceleration

Project description

Nougat OCR CLI

CI PyPI version Python 3.11 License: MIT

A command-line tool for OCR processing using Meta's Nougat model. Extract text from PDFs with GPU acceleration (CUDA and Apple Metal).

Installation

Requires Python 3.11 and a GPU (recommended).

pip install nougat-ocr-cli

Or from source:

git clone https://github.com/r-uben/nougat-ocr-cli.git
cd nougat-ocr-cli
uv sync

Quick start

# Process a single file
nougat-ocr paper.pdf

# Process a directory
nougat-ocr ./papers/ -o ./results/

# Preview what would be processed (no model loading)
nougat-ocr ./papers/ --dry-run

# Process specific pages (zero-indexed)
nougat-ocr paper.pdf --pages 0-5

# Use CPU instead of GPU
nougat-ocr paper.pdf --device cpu

Options

Usage: nougat-ocr [OPTIONS] INPUT_PATH

Options:
  -o, --output-dir PATH           Output directory (default: <input_dir>/nougat_ocr_output/)
  --model TEXT                    Nougat model tag (default: 0.1.0-base)
  --batch-size N                  Batch size for inference (auto-detected if not set)
  --full-precision                Use FP32 instead of BF16 (slower but more accurate)
  --pages TEXT                    Page range (e.g., '0-5' or '1,3,5')
  --device [auto|cuda|mps|cpu]    Device for inference (default: auto)

  --reprocess                     Reprocess already-processed files
  --dry-run                       List files without loading the model
  -q, --quiet                     Suppress all output except errors
  -v, --verbose                   Enable verbose/debug output
  --info                          Show device and system info
  --version                       Show version
  --help                          Show this message

Output structure

nougat_ocr_output/
├── document_name/
│   └── document_name.md        # OCR markdown (clean text only)
├── another_document/
│   └── ...
└── metadata.json               # processing stats, checksums, file list

Device selection

Nougat auto-detects the best available device:

  1. CUDA — NVIDIA GPUs (fastest)
  2. MPS — Apple Metal on M-series Macs
  3. CPU — fallback (slow, not recommended for large documents)

Override with --device cuda|mps|cpu.

Development

# Install dev dependencies
uv sync --extra dev

# Run tests
uv run pytest

# Lint
uv run ruff check .

# Format
uv run ruff format .

# Type check
uv run mypy nougat_ocr/ --ignore-missing-imports

Limitations

  • Python 3.11 only (nougat-ocr dependency constraint)
  • Model weights: ~1.3 GB (auto-downloaded on first run)
  • GPU strongly recommended for reasonable performance
  • Supported formats: PDF, JPG, JPEG, PNG, WEBP, BMP, TIFF

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nougat_ocr_cli-0.3.0.tar.gz (114.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nougat_ocr_cli-0.3.0-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file nougat_ocr_cli-0.3.0.tar.gz.

File metadata

  • Download URL: nougat_ocr_cli-0.3.0.tar.gz
  • Upload date:
  • Size: 114.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for nougat_ocr_cli-0.3.0.tar.gz
Algorithm Hash digest
SHA256 b305d2d5fdb8c69a04a2eee1597e4d1d27f87afb48b58880aa3ff89444376f1b
MD5 1465e8f171b5c60c9f956787c20cc672
BLAKE2b-256 9d17b010e07b16f78e9be5399154b3c2635f72dad335de32636745c08327cf18

See more details on using hashes here.

File details

Details for the file nougat_ocr_cli-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: nougat_ocr_cli-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 14.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for nougat_ocr_cli-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fd5ea678af7bb44b5af8a70ea9cc6c5d1a108606794a67b6a80578bf2eecab22
MD5 cb4c239d7aa4b6e747c0cf908e43b0bc
BLAKE2b-256 7421d450a2d093073ba4bdb4222bcaba94bf6d6c9561f7b5dd8ed3c0aaee9ee8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page