Read inventory cards, extract images, and create structured data.
Project description
inventory-ocr
This is a tool to extract structured data from fixed layout documents such as inventory cards, inventory books or tables. It requires layout definitions provided in inventory_ocr/config/regions.yaml. Currently, it is optimized to work with a collection of jewelry from the Schmuckmuseum Pforzheim, but with minor adaptations it should be adaptable to other use cases.
Specifically, to adapt this for your own data:
- Define a different document layout in
inventory_ocr/config/regions.yaml - Implement your own post-processing by subclassing the abstract PostProcessor class in
inventory_ocr/postprocessor.py
Installation
Set up your environment using your favourite environment management tool. We recommend venv.
MacOS
Installation using pip install inventory-ocr will enable the usage of the apple vision API for ocr.
Linux/Windows
Install using pip install inventory-ocr[pero] or pip install inventory-ocr[mistral] depending on the OCR backend you want to use.
Usage
Invoke using inventory-ocr <input folder> with input folder specifying a path to the inventory cards to be processed. Results will be saved to output/ relative to the current working directory.
Command Line Arguments
inventory-ocr <input_dir> [options]
Required Arguments:
input_dir: Path to the input directory containing files to process
Optional Arguments:
--output_dir: Path to the output directory (default:./output)--layout_config: Path to the layout configuration file (default:config/regions.yaml)--annotate: Launch the interactive layout annotation UI to create/save a regions file--ocr_engine: OCR engine to use (choices:auto,ocrmac,pero,mistral,dummy; default:auto)--eval: Run in evaluation mode (uses dummy detector and benchmarking postprocessor)
Annotation / Generating Regions
- The repository contains an example regions file under
inventory_ocr/config/example_regions.yaml. If present, that example will be used by default as the layout definition. - To create your own layout interactively, run the tool with the
--annotateflag. This launches a small Gradio UI that lets you mark regions on a template image. When you press "Extract Layout" the regions are written to the layout configuration path (see--layout_config, defaultinventory_ocr/config/regions.yaml) and the annotation UI will close. - If you prefer not to use the example, simply remove
inventory_ocr/config/example_regions.yaml(or replace it with your ownregions.yaml), or run with--annotateto generate a new regions file which will be saved to the configured--layout_configpath.
Example:
# use the bundled example regions (if present)
inventory-ocr ./data
# interactively generate and save regions to the default config path
inventory-ocr ./data --annotate
# specify a custom path for the generated regions
inventory-ocr ./data --annotate --layout_config ./my_regions.yaml
Generated YAML uses a top-level regions: mapping where each field name maps to a 4-element list [x1, y1, x2, y2] with normalized coordinates (0..1).
Supported OCR Engines
The tool supports multiple OCR engines with automatic platform-based selection:
- PERO OCR (
pero): Default for non-macOS platforms - Apple Vision API (
ocrmac): Default for macOS platforms - Mistral OCR (
mistral): AI-powered OCR using Mistral models - Dummy (
dummy): For development and testing purposes - Auto (
auto): Automatically selectsocrmacon macOS,peroon other platforms
Installing Optional Dependencies
Different OCR engines require additional dependencies:
For PERO OCR (non-macOS default):
pip install inventory-ocr[pero]
For Mistral OCR:
pip install inventory-ocr[mistral]
You'll also need to obtain a Mistral API key and store it in a .env file in your project root:
-
Copy
.env.templateto.env:cp .env.template .env
-
Edit
.envand replaceINSERT_YOUR_KEY_HEREwith your actual Mistral API key:MISTRAL_API_KEY=your_actual_api_key_here
Note: Never share your API key publicly or commit it to version control.
For Apple Vision API (macOS only):
The ocrmac package is automatically installed on macOS systems. No additional installation required.
Note: The auto engine selection will use Apple Vision API on macOS (if available) and PERO OCR on other platforms.
Configuration
Region Definition
Regions (in relative coordinates) and field names are stored in inventory_ocr/config/regions.yaml (or the path you pass via --layout_config) and can be adapted according to the use case.
The region definition format uses normalized coordinates where each region is defined as [x1, y1, x2, y2] with values between 0 and 1:
x1, y1: Top-left corner coordinates (relative to document width and height)x2, y2: Bottom-right corner coordinates (relative to document width and height)
custom_header_mappings: am: 'erworben am' # Maps "am" field to "erworben am" in output
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file inventory_ocr-0.0.2.tar.gz.
File metadata
- Download URL: inventory_ocr-0.0.2.tar.gz
- Upload date:
- Size: 41.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
731587c35dd04e253e64b21baba42e6e868ecd92ca2517e8df4404ec78a37d8b
|
|
| MD5 |
179194a23519b7232cb18d70b7809f61
|
|
| BLAKE2b-256 |
79a728f5cfd9f2601bfdd91b53c5e49b754e8fb8272a9532eab887fa5e7472eb
|
File details
Details for the file inventory_ocr-0.0.2-py3-none-any.whl.
File metadata
- Download URL: inventory_ocr-0.0.2-py3-none-any.whl
- Upload date:
- Size: 41.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d150307194a1303425dab7f593846eaf8b0effc758d6e7f70fecdef125597b08
|
|
| MD5 |
8ff9cb49e2ae1483ef45747bfef2f6e4
|
|
| BLAKE2b-256 |
173f5814d741263eb9572f0620d94a5952c3dc0ed80f7c181ff6bf522f6b25be
|