Skip to main content

A simple CLI to extract text from documents using the Mistral OCR API.

Project description

Mistral OCR

PyPI

A simple CLI to extract text from documents using the Mistral OCR API.

Installation

pip install mistral-ocr-tool

Or install from source:

git clone https://github.com/aburkard/mistral-ocr.git
cd mistral-ocr
pip install .

Configuration

Set your Mistral API key as an environment variable or in a .env file:

MISTRAL_API_KEY="your-api-key"

Usage

mistral-ocr <document_source> [options]

The document source can be a URL, a local file path, or - to read from stdin.

Examples

# Process a PDF from a URL
mistral-ocr https://example.com/document.pdf

# Process a local file
mistral-ocr ./invoice.pdf

# Pipe from stdin
cat document.pdf | mistral-ocr -

# Process specific pages only (0-indexed)
mistral-ocr large-doc.pdf --pages 0,2,5

# Output as JSON (great for piping to jq)
mistral-ocr document.pdf --json | jq '.pages[0].markdown'

# Extract tables as HTML
mistral-ocr document.pdf --table-format html

# Include headers and footers
mistral-ocr document.pdf --extract-headers --extract-footers

# Include base64-encoded images in response
mistral-ocr document.pdf --include-images

# Check page count and estimated cost before processing
mistral-ocr large-doc.pdf --dry-run

Options

Option Description
-p, --pages Comma-separated page numbers to process (0-indexed)
--json Output full JSON response instead of markdown
--table-format Table output format: markdown or html
--extract-headers Include page headers
--extract-footers Include page footers
--include-images Include base64-encoded images in response
--image-limit N Maximum number of images to extract
--image-min-size N Minimum image dimension in pixels
--model NAME Model override (default: mistral-ocr-latest)
--dry-run Show page count and estimated cost without processing
-v, --verbose Enable verbose logging

Development

uv sync --group dev
uv run pytest tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mistral_ocr_tool-1.0.0.tar.gz (52.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mistral_ocr_tool-1.0.0-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file mistral_ocr_tool-1.0.0.tar.gz.

File metadata

  • Download URL: mistral_ocr_tool-1.0.0.tar.gz
  • Upload date:
  • Size: 52.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mistral_ocr_tool-1.0.0.tar.gz
Algorithm Hash digest
SHA256 18a3f5eb05443027f448f3f4905171697ca5fb0ef7d74aa445e93c3c3a8d9997
MD5 f5aeb374acb96e7e6f5f3912220f3451
BLAKE2b-256 9df0ace38caf10d530d4ce4e3f7a69be03fd82867e7eec52c6adf75c1d8c8e4b

See more details on using hashes here.

File details

Details for the file mistral_ocr_tool-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: mistral_ocr_tool-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 5.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mistral_ocr_tool-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c3758eb0b1c2e6bf891b398a6ba2121206267291acd8faebfd137cc2e2ebbd37
MD5 58fe8deb2f3e8912b81aa7e8c1d04dba
BLAKE2b-256 031135c32d15758ae2f3422006f34a87896fc93620fa00fd53bd7a4ee01187a4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page