Skip to main content

Simple CLI wrapper for Nougat OCR with GPU acceleration support

Project description

Nougat OCR CLI

Simple, batteries-included CLI wrapper for Nougat OCR with GPU acceleration.

Features

  • GPU acceleration (CUDA & Apple Metal)
  • Simple CLI interface
  • Batch processing support
  • Clean Markdown output
  • Automatic model downloading
  • Python API with type hints

Installation

From PyPI

pip install nougat-ocr-cli

From GitHub

pip install git+https://github.com/rubenffuertes/nougat-ocr-cli.git

From source

git clone https://github.com/rubenffuertes/nougat-ocr-cli.git
cd nougat-ocr-cli
uv pip install -e .

CLI Usage

# Basic usage - outputs to current directory
nougat-ocr-cli document.pdf

# Specify output directory
nougat-ocr-cli document.pdf -o output/

# Process specific pages (zero-indexed)
nougat-ocr-cli document.pdf --pages 0-5
nougat-ocr-cli document.pdf --pages 1,3,5,7

# Use smaller model for faster processing
nougat-ocr-cli document.pdf --model 0.1.0-small

# Use full precision (FP32) for better accuracy
nougat-ocr-cli document.pdf --full-precision

# Set batch size manually
nougat-ocr-cli document.pdf --batch-size 4

CLI Options

Option Description
input Input PDF file to process
-o, --output Output directory (default: current directory)
--model Model version (default: 0.1.0-base)
--batch-size Batch size for processing (auto-detected)
--full-precision Use FP32 instead of BF16
--no-markdown Disable markdown post-processing
--pages Page range (e.g., '0-5' or '1,3,5')

Python API

from nougat_wrapper import NougatOCR
from pathlib import Path

# Initialize (loads model to GPU automatically)
ocr = NougatOCR()

# Extract text from PDF
result = ocr.extract_text(Path("paper.pdf"))

print(f"Extracted {result.pages} pages")
print(f"Failed pages: {result.placeholder_pages}")
print(result.text)  # Markdown output

Advanced Usage

ocr = NougatOCR(
    model_tag="0.1.0-small",  # Use smaller model
    batch_size=4,              # Process 4 pages at once
    full_precision=True,       # Use FP32 instead of BF16
)

# Only OCR pages 0, 1, 2 (zero-indexed)
result = ocr.extract_text(pdf_path, pages=[0, 1, 2])

Requirements

  • Python 3.11+
  • GPU recommended (CUDA or Apple Metal)
  • ~1.3 GB for model weights (auto-downloaded)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nougat_ocr_cli-0.1.0.tar.gz (201.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nougat_ocr_cli-0.1.0-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file nougat_ocr_cli-0.1.0.tar.gz.

File metadata

  • Download URL: nougat_ocr_cli-0.1.0.tar.gz
  • Upload date:
  • Size: 201.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for nougat_ocr_cli-0.1.0.tar.gz
Algorithm Hash digest
SHA256 61dae9142a81693b0e84bebafefc4807a0a6d9cbb73cf4d1019521f4aea24c8c
MD5 002c25425f7ef279b375a433ad903e05
BLAKE2b-256 07d572165729dda0d7acc89cdf3e542351e7f4d2efe623c3a70b0b8a9ab01430

See more details on using hashes here.

File details

Details for the file nougat_ocr_cli-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: nougat_ocr_cli-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for nougat_ocr_cli-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 97984b55479f0f7b78ec461225432c945a86efd1be530990c785ea67394a1971
MD5 e68ecefd4fc85e44a0a78a9e62346af5
BLAKE2b-256 4326d46459e281c137b9b24196c130c647874df2a3da457a8112d2439181cd48

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page