A simple CLI to extract text from documents using the Mistral OCR API.
Project description
Mistral OCR
A simple CLI to extract text from documents using the Mistral OCR API.
Installation
pip install mistral-ocr-tool
Or install from source:
git clone https://github.com/aburkard/mistral-ocr.git
cd mistral-ocr
pip install .
Configuration
Set your Mistral API key as an environment variable or in a .env file:
MISTRAL_API_KEY="your-api-key"
Usage
mistral-ocr <document_source> [options]
The document source can be a URL, a local file path, or - to read from stdin.
Examples
# Process a PDF from a URL
mistral-ocr https://example.com/document.pdf
# Process a local file
mistral-ocr ./invoice.pdf
# Pipe from stdin
cat document.pdf | mistral-ocr -
# Process specific pages only (0-indexed)
mistral-ocr large-doc.pdf --pages 0,2,5
# Output as JSON (great for piping to jq)
mistral-ocr document.pdf --json | jq '.pages[0].markdown'
# Extract tables as HTML
mistral-ocr document.pdf --table-format html
# Include headers and footers
mistral-ocr document.pdf --extract-headers --extract-footers
# Save markdown and images to a directory
mistral-ocr document.pdf -o output/
# Include base64 images in JSON output (for programmatic use)
mistral-ocr document.pdf --json --include-images
# Check page count and estimated cost before processing
mistral-ocr large-doc.pdf --dry-run
Options
| Option | Description |
|---|---|
-p, --pages |
Comma-separated page numbers to process (0-indexed) |
--json |
Output full JSON response instead of markdown |
-o, --output-dir |
Save markdown and images to a directory |
--table-format |
Table output format: markdown or html |
--extract-headers |
Include page headers |
--extract-footers |
Include page footers |
--include-images |
Include images (requires --json or -o) |
--image-limit N |
Maximum number of images to extract |
--image-min-size N |
Minimum image dimension in pixels |
--model NAME |
Model override (default: mistral-ocr-latest) |
--dry-run |
Show page count and estimated cost without processing |
-v, --verbose |
Enable verbose logging |
Development
uv sync --group dev
uv run pytest tests/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mistral_ocr_tool-1.1.0.tar.gz.
File metadata
- Download URL: mistral_ocr_tool-1.1.0.tar.gz
- Upload date:
- Size: 53.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
249e364fcd2f268fc81be5f56d80f1475246773b0b00d7a3eedf19f3e6ad46dd
|
|
| MD5 |
eeb15f7fac5efeecd6fcd735857ec6dc
|
|
| BLAKE2b-256 |
6f315bbfb70a886ebd17eeeba5cd2bc2797cb810120cd922c596a501123b70b9
|
File details
Details for the file mistral_ocr_tool-1.1.0-py3-none-any.whl.
File metadata
- Download URL: mistral_ocr_tool-1.1.0-py3-none-any.whl
- Upload date:
- Size: 6.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
02c1a80742d07baebe364a921f8baa7680cd83faf217627207991fda37a9ecb0
|
|
| MD5 |
808f65ae3fbe03cd249625c42a1ff149
|
|
| BLAKE2b-256 |
abc33b29f7459abc7c6c0b721db0c8d8a19528c39d78205f47900b244ab70086
|