Modern CLI to extract text from PDFs using Mistral cloud or local Ollama models (glm-ocr, deepseek-ocr, LightOnOCR-2).
Project description
Mistral OCR CLI
Modern, polished CLI to extract text from PDFs using the Mistral OCR API.
Features
- Elegant TUI with progress bars and rich output
- Single file or batch processing
- Output in text, JSON, or Markdown
- Parallel batch processing with
--jobs - Config helper and
.envsupport
Quickstart
- Install
uv tool install mistral-ocr-cli # via pipx-like tool install
# or
uv pip install mistral-ocr-cli # into current environment
- Configure API key
export MISTRAL_API_KEY=your_key_here
# or
echo "MISTRAL_API_KEY=your_key_here" >> .env
- Extract text
ocr extract file.pdf -o out.txt
ocr extract file1.pdf file2.pdf --batch --output-dir outputs --jobs 4
Usage
ocr extract [OPTIONS] FILES...
Options:
-o, --output PATH Output file (single-file mode)
-f, --format [text|json|markdown]
-b, --batch Enable batch mode
-O, --output-dir PATH Directory for batch outputs
-j, --jobs INTEGER RANGE Parallel jobs for batch [default: 1]
-v, --verbose Verbose logs
-q, --quiet Only errors
--version Show version
--help Show help
Programmatic use
from ocr.pdf2text import pdf_to_text
text = pdf_to_text("/path/file.pdf")
Development
uv pip install -e .[dev]
uv run pre-commit install
uv run pytest -q
Releasing is handled via standard tags and GitHub Releases.
License
MIT
Test coverage
# Terminal report
make coverage
# HTML report in htmlcov/
make coverhtml
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
upspawn_ocr_cli-0.1.0b4.tar.gz
(14.1 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file upspawn_ocr_cli-0.1.0b4.tar.gz.
File metadata
- Download URL: upspawn_ocr_cli-0.1.0b4.tar.gz
- Upload date:
- Size: 14.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1523557c27256f324146a437bb428e1f225e71ffd68cd2da7159a54ccc841407
|
|
| MD5 |
8b3853cf43d1ea6d876b3bca8c0ed2e9
|
|
| BLAKE2b-256 |
07cb212b9a8f41da3de5baa3187281e05851c5239f063f53c9b7180cf182e2c0
|
File details
Details for the file upspawn_ocr_cli-0.1.0b4-py3-none-any.whl.
File metadata
- Download URL: upspawn_ocr_cli-0.1.0b4-py3-none-any.whl
- Upload date:
- Size: 10.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e70bae6bc212b885c79684fe0a07192f6e0b104db1fe5086190a40e918e9ddc5
|
|
| MD5 |
0f87a58cada0b11a21a212b89919ca6a
|
|
| BLAKE2b-256 |
e7075817618e71c8cd0dd39178efb7ac1cbdf0b05de3ec4be4854829fb1bfa4d
|