CLI for PDF text extraction using Meta's Nougat model with GPU acceleration
Project description
Nougat OCR CLI
A command-line tool for OCR processing using Meta's Nougat model. Extract text from PDFs with GPU acceleration (CUDA and Apple Metal).
Installation
Requires Python 3.11 and a GPU (recommended).
pip install nougat-ocr-cli
Or from source:
git clone https://github.com/r-uben/nougat-ocr-cli.git
cd nougat-ocr-cli
uv sync
Quick start
# Process a single file
nougat-ocr paper.pdf
# Process a directory
nougat-ocr ./papers/ -o ./results/
# Preview what would be processed (no model loading)
nougat-ocr ./papers/ --dry-run
# Process specific pages (zero-indexed)
nougat-ocr paper.pdf --pages 0-5
# Use CPU instead of GPU
nougat-ocr paper.pdf --device cpu
Options
Usage: nougat-ocr [OPTIONS] INPUT_PATH
Options:
-o, --output-dir PATH Output directory (default: <input_dir>/nougat_ocr_output/)
--model TEXT Nougat model tag (default: 0.1.0-base)
--batch-size N Batch size for inference (auto-detected if not set)
--full-precision Use FP32 instead of BF16 (slower but more accurate)
--pages TEXT Page range (e.g., '0-5' or '1,3,5')
--device [auto|cuda|mps|cpu] Device for inference (default: auto)
--reprocess Reprocess already-processed files
--dry-run List files without loading the model
-q, --quiet Suppress all output except errors
-v, --verbose Enable verbose/debug output
--info Show device and system info
--version Show version
--help Show this message
Output structure
nougat_ocr_output/
├── document_name/
│ └── document_name.md # OCR markdown (clean text only)
├── another_document/
│ └── ...
└── metadata.json # processing stats, checksums, file list
Device selection
Nougat auto-detects the best available device:
- CUDA — NVIDIA GPUs (fastest)
- MPS — Apple Metal on M-series Macs
- CPU — fallback (slow, not recommended for large documents)
Override with --device cuda|mps|cpu.
Development
# Install dev dependencies
uv sync --extra dev
# Run tests
uv run pytest
# Lint
uv run ruff check .
# Format
uv run ruff format .
# Type check
uv run mypy nougat_ocr/ --ignore-missing-imports
Limitations
- Python 3.11 only (nougat-ocr dependency constraint)
- Model weights: ~1.3 GB (auto-downloaded on first run)
- GPU strongly recommended for reasonable performance
- Supported formats: PDF, JPG, JPEG, PNG, WEBP, BMP, TIFF
License
MIT License - see LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nougat_ocr_cli-0.3.0.tar.gz.
File metadata
- Download URL: nougat_ocr_cli-0.3.0.tar.gz
- Upload date:
- Size: 114.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b305d2d5fdb8c69a04a2eee1597e4d1d27f87afb48b58880aa3ff89444376f1b
|
|
| MD5 |
1465e8f171b5c60c9f956787c20cc672
|
|
| BLAKE2b-256 |
9d17b010e07b16f78e9be5399154b3c2635f72dad335de32636745c08327cf18
|
File details
Details for the file nougat_ocr_cli-0.3.0-py3-none-any.whl.
File metadata
- Download URL: nougat_ocr_cli-0.3.0-py3-none-any.whl
- Upload date:
- Size: 14.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd5ea678af7bb44b5af8a70ea9cc6c5d1a108606794a67b6a80578bf2eecab22
|
|
| MD5 |
cb4c239d7aa4b6e747c0cf908e43b0bc
|
|
| BLAKE2b-256 |
7421d450a2d093073ba4bdb4222bcaba94bf6d6c9561f7b5dd8ed3c0aaee9ee8
|