A simple CLI to extract text from documents using the Mistral OCR API.
Project description
Mistral OCR
A simple CLI to extract text from documents using the Mistral OCR API.
Installation
pip install .
Or with uv:
uv sync
Configuration
Set your Mistral API key as an environment variable or in a .env file:
MISTRAL_API_KEY="your-api-key"
Usage
mistral-ocr <document_source> [options]
The document source can be a URL, a local file path, or - to read from stdin.
Examples
# Process a PDF from a URL
mistral-ocr https://example.com/document.pdf
# Process a local file
mistral-ocr ./invoice.pdf
# Pipe from stdin
cat document.pdf | mistral-ocr -
# Process specific pages only (0-indexed)
mistral-ocr large-doc.pdf --pages 0,2,5
# Output as JSON (great for piping to jq)
mistral-ocr document.pdf --json | jq '.pages[0].markdown'
# Extract tables as HTML
mistral-ocr document.pdf --table-format html
# Include headers and footers
mistral-ocr document.pdf --extract-headers --extract-footers
# Include base64-encoded images in response
mistral-ocr document.pdf --include-images
# Check page count and estimated cost before processing
mistral-ocr large-doc.pdf --dry-run
Options
| Option | Description |
|---|---|
-p, --pages |
Comma-separated page numbers to process (0-indexed) |
--json |
Output full JSON response instead of markdown |
--table-format |
Table output format: markdown or html |
--extract-headers |
Include page headers |
--extract-footers |
Include page footers |
--include-images |
Include base64-encoded images in response |
--image-limit N |
Maximum number of images to extract |
--image-min-size N |
Minimum image dimension in pixels |
--model NAME |
Model override (default: mistral-ocr-latest) |
--dry-run |
Show page count and estimated cost without processing |
-v, --verbose |
Enable verbose logging |
Development
uv sync --group dev
uv run pytest tests/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mistral_ocr_tool-0.1.0.tar.gz.
File metadata
- Download URL: mistral_ocr_tool-0.1.0.tar.gz
- Upload date:
- Size: 55.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b35d211c7a31cf1c83f9b7ac23c22af8ea89fe8d4b077a1faaa253f1bec62e9d
|
|
| MD5 |
f7fe4467526eede391b7f73d51406850
|
|
| BLAKE2b-256 |
cc6abb8099a31b99a9df8664dea7ff30b1fd71f607b3720608910184db7e131e
|
File details
Details for the file mistral_ocr_tool-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mistral_ocr_tool-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d71d989aa75c64ec58353bdd4e7598157d8627a3ffb8fa84693b0382c6c6ed87
|
|
| MD5 |
b1010b9ab4d844deaecb80364319afde
|
|
| BLAKE2b-256 |
b89ed3fd2cf6109c806e731eb90a1eadd923de05f62af9a1fbdf79e178656a63
|