Fast and accurate OCR on images and PDFs using Apple Vision framework directly from command line.
Project description
Apple Vision Framework Python Utilities
Fast and accurate OCR on images and PDFs using Apple Vision framework (pyobjc-framework-Vision
) directly from command line.
Features
- Fast and accurate, multi-language support (
-l
,--lang
), powered by Apple's industry-strength Vision framework (pyobjc-framework-Vision
). - Supports all common input image formats: PNG, JPEG, TIFF and WebP.
- Supports PDF input (the file gets converted to images first). This tool does NOT assume a file is PDF just because it has a
.pdf
extension, you need to pass-p
,--pdf
flag. - Outputs extracted text only by default, but can output in JSON format containing confidence of recognition for each line with
-j
,--json
flag. - Supports text clipping based on start and end markers (
-s
,-S
,-e
,-E
).
Demo
Below is the output of running the tests:
https://g.teddysc.me/96d5b1217b90035c163b3c97ce99112f
Installation
Requires Python >= 3.11, <4.0.
Since this package uses Apple's Vision framework, it only works on macOS.
To OCR PDFs with -p
, you need to install required dependency poppler
with brew install poppler
(detailed guide).
pipx
This is the recommended installation method.
$ pipx install apple-vision-utils
pip
$ pip install apple-vision-utils
uv tool
installation doesn't work
I tried to install this with uv tool install
using different Python versions on Apple Silicon Mac, it didn't work. May be caused by some peculiarities of objc interfacing libs. Just use pipx
for now.
Usage
Command Line
$ apple-ocr --help
usage: apple-ocr [-h] [-j] [-p] [-l LANG] [--pdf2image-only] [--pdf2image-dir PDF2IMAGE_DIR] [-s START_MARKER_INCLUSIVE] [-S START_MARKER_EXCLUSIVE] [-e END_MARKER_INCLUSIVE] [-E END_MARKER] [-V] file_path
Extract text from an image or PDF using Apple's Vision framework.
positional arguments:
file_path Path to the image or PDF file.
options:
-h, --help show this help message and exit
-j, --json Output results in JSON format.
-p, --pdf Specify if the input file is a PDF.
-l LANG, --lang LANG Specify the language for text recognition (e.g., eng,
fra, deu, zh-Hans for Simplified Chinese, zh-Hant for
Traditional Chinese). Default is 'zh-Hant', which
works with images containing both Chinese characters
and latin letters.
--pdf2image-only Only convert PDF to images without performing OCR.
--pdf2image-dir PDF2IMAGE_DIR
Specify the directory to store output images. By
default, a secure temporary directory is created.
-s START_MARKER_INCLUSIVE, --start-marker-inclusive START_MARKER_INCLUSIVE
Specify the start marker (included, as the first line of the extracted text) for text extraction in PDF.
-S START_MARKER_EXCLUSIVE, --start-marker-exclusive START_MARKER_EXCLUSIVE
Specify the start marker (excluded, as the first line of the extracted text) for text extraction in PDF.
-e END_MARKER_INCLUSIVE, --end-marker-inclusive END_MARKER_INCLUSIVE
Specify the end marker (included, as the last line of the extracted text) for text extraction in PDF.
-E END_MARKER, --end-marker END_MARKER
Specify the end marker (excluded, as the last line of the extracted text) for text extraction in PDF.
-V, --version show program's version number and exit
As a Library
You can also use the utility functions in your own Python code:
from apple_vision_utils.utils import image_to_text, pdf_to_images, process_pdf, clip_results
# Extract text from an image
results = image_to_text("path/to/image.png", lang="eng")
# Convert PDF to images
images = pdf_to_images("path/to/document.pdf")
# Process PDF for text recognition
pdf_results = process_pdf("path/to/document.pdf", lang="eng")
# Clip text results based on markers
clipped_results = clip_results(results, start_marker_inclusive="Start", end_marker_exclusive="End")
Develop
$ git clone https://github.com/tddschn/apple-vision-utils.git
$ cd apple-vision-utils
$ poetry install
Test
# in the root of the project
poetry install
poetry shell
cd tests && ./test.sh
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file apple_vision_utils-1.0.2.tar.gz
.
File metadata
- Download URL: apple_vision_utils-1.0.2.tar.gz
- Upload date:
- Size: 5.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.12.8 Darwin/24.2.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
574d560a5d0b1885ec09e02c31d8c26678ce58ecd6d4fb767dbb81854bea60d0
|
|
MD5 |
9237274e63d37984ddc7798c3f75db0e
|
|
BLAKE2b-256 |
40861947d4267acdbe80883e425a0956d2c2ff8ea9554411b0c0e72dd2565e0e
|
File details
Details for the file apple_vision_utils-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: apple_vision_utils-1.0.2-py3-none-any.whl
- Upload date:
- Size: 7.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.12.8 Darwin/24.2.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
eb8aee26f820b9f053824c52a7b2f666f6a3452d14b46bdade40a11257b152e5
|
|
MD5 |
a285de5e880376c5c8d6d59c9cb91f4c
|
|
BLAKE2b-256 |
c6b2ab79ad54ad6882d421825c904b3d6818de2cc690b0f693c3620d30d77247
|