Modern, polished CLI to extract text from PDFs using the Mistral OCR API.
Project description
Mistral OCR CLI
Modern, polished CLI to extract text from PDFs using the Mistral OCR API.
Features
- Elegant TUI with progress bars and rich output
- Single file or batch processing
- Output in text, JSON, or Markdown
- Parallel batch processing with
--jobs - Config helper and
.envsupport
Quickstart
- Install
uv tool install mistral-ocr-cli # via pipx-like tool install
# or
uv pip install mistral-ocr-cli # into current environment
- Configure API key
export MISTRAL_API_KEY=your_key_here
# or
echo "MISTRAL_API_KEY=your_key_here" >> .env
- Extract text
ocr extract file.pdf -o out.txt
ocr extract file1.pdf file2.pdf --batch --output-dir outputs --jobs 4
Usage
ocr extract [OPTIONS] FILES...
Options:
-o, --output PATH Output file (single-file mode)
-f, --format [text|json|markdown]
-b, --batch Enable batch mode
-O, --output-dir PATH Directory for batch outputs
-j, --jobs INTEGER RANGE Parallel jobs for batch [default: 1]
-v, --verbose Verbose logs
-q, --quiet Only errors
--version Show version
--help Show help
Programmatic use
from ocr.pdf2text import pdf_to_text
text = pdf_to_text("/path/file.pdf")
Development
uv pip install -e .[dev]
uv run pre-commit install
uv run pytest -q
Releasing is handled via standard tags and GitHub Releases.
License
MIT
Test coverage
# Terminal report
make coverage
# HTML report in htmlcov/
make coverhtml
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
upspawn_ocr_cli-0.1.0b2.tar.gz
(13.1 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file upspawn_ocr_cli-0.1.0b2.tar.gz.
File metadata
- Download URL: upspawn_ocr_cli-0.1.0b2.tar.gz
- Upload date:
- Size: 13.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c015b78a21285f73972b637b7131728b1ce0d4faaf5e02593799b9a0a8337e1
|
|
| MD5 |
7e47533538f7c249e5b9e3c794fc48c5
|
|
| BLAKE2b-256 |
d7cbc75757fe46714b757766f0b53ac30be3572392e5d385d370e796e5679688
|
File details
Details for the file upspawn_ocr_cli-0.1.0b2-py3-none-any.whl.
File metadata
- Download URL: upspawn_ocr_cli-0.1.0b2-py3-none-any.whl
- Upload date:
- Size: 9.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b538ad786bcf5db8f815c04b8dd24a41343a70cc5685e22ea4db3e4713838e4b
|
|
| MD5 |
d22643bfa45a6e058b66e21afc7380d5
|
|
| BLAKE2b-256 |
e4cb359d6a101f15711a48a4b2cd77d6c358cd81647fe5878f9f4488882dd56c
|