Turn handwritten forms, notes, and scanned paperwork into automation-ready JSON.
Project description
Handwriting JSON
Turn handwritten forms, notes, and scanned paperwork into automation-ready JSON.
Handwriting JSON is a Python package and CLI for automating handwritten document workflows. It uses vision LLMs and optional schema guidance to convert PDFs/images into structured JSON your applications can use.
It is built for the messy documents that still slow teams down: registration forms, field notes, inspection sheets, school permission slips, donation forms, clinic intake paperwork, KYC forms, surveys, maintenance reports, delivery notes, lab slips, and old scanned records.
The project was inspired by OmmSai, a healthcare automation project that processed roughly 15,000 handwritten prescription files for a charitable healthcare event. Prescriptions are now just one example. The package is designed for many handwritten-document automation workflows.
Why It Exists
Many business workflows still start with paper. A person fills out a form, writes a note, signs a slip, or scans an old record. OCR can return text, but automation usually needs structured data.
Handwriting JSON focuses on the automation step:
handwritten document -> schema-guided extraction -> structured JSON -> downstream workflow
Use it to turn:
- handwritten signup sheets into CRM records
- field notes into tickets
- inspection forms into compliance reports
- school slips into student records
- donation forms into spreadsheets
- clinic intake forms into review queues
- scanned records into searchable JSON
Features
- Extract structured JSON from handwritten PDFs and images.
- Guide extraction with JSON Schema or an example JSON object.
- Use multiple vision LLM providers through LiteLLM.
- Process one document or a directory of documents from the CLI.
- Use the same extraction path from Python code or the command line.
- Keep domain-specific behavior in examples and presets.
Install
pip install handwriting-json
Provider credentials are configured through the environment variables expected by LiteLLM for the model you choose.
For local development:
git clone https://github.com/ramdhavepreetam/handwriting-json.git
cd handwriting-json
python3 -m pip install -e ".[dev]"
Python API
from handwriting_json import extract
result = extract(
"handwritten_registration_form.jpg",
model="anthropic/claude-sonnet-4-5",
schema={
"full_name": "",
"phone": "",
"email": "",
"address": "",
"date": "",
"notes": "",
"signature_present": False
},
)
print(result.data)
CLI
handwriting-json extract \
--input handwritten_signup_form.jpg \
--schema examples/signup_form_schema.json \
--output result.json \
--model anthropic/claude-sonnet-4-5
Batch mode:
handwriting-json batch \
--input-dir ./forms \
--output results.jsonl \
--model anthropic/claude-sonnet-4-5
Check installation:
handwriting-json version
python3 -m handwriting_json --help
Example Schemas
Registration form:
{
"full_name": "",
"phone": "",
"email": "",
"address": "",
"date": "",
"notes": "",
"signature_present": false
}
Field inspection note:
{
"site_name": "",
"inspection_date": "",
"inspector": "",
"issues": [],
"recommended_action": "",
"urgency": ""
}
School permission slip:
{
"student_name": "",
"parent_name": "",
"class": "",
"event": "",
"consent_given": false,
"emergency_contact": ""
}
More examples live in examples/.
Schema Guidance
You can pass either:
- a formal JSON Schema, or
- a simpler example JSON object.
The schema is injected into the prompt so the model knows the desired output shape. Formal JSON Schema responses are also validated after extraction.
Why This Is Not Just OCR
OCR asks:
What text is visible?
Handwriting JSON asks:
What structured data should this document become so software can use it?
That distinction matters for automation. A CRM, ticketing system, spreadsheet import, compliance workflow, or review queue does not need a paragraph of text. It needs fields.
Why LiteLLM, Not LangChain/LangGraph?
V0.1 is a focused extraction library: normalize input, build a schema-guided prompt, call a vision model, parse JSON, and optionally validate the output.
LiteLLM solves the provider-routing problem without adding orchestration complexity. LangChain or LangGraph may become useful later for multi-step workflows such as OCR fallback, validation repair loops, routing by document type, and human review queues.
Roadmap
- V0.1: Python package, CLI, schema guidance, LiteLLM provider abstraction.
- V0.1.x: stronger examples, provider setup docs, README demos.
- V0.2: checkpointed batch processing and validation repair loop.
- Later: Docker image, REST API mode, OCR fallback, cost reporting, field-level evidence.
Links
- GitHub: https://github.com/ramdhavepreetam/handwriting-json
- PyPI: https://pypi.org/project/handwriting-json/
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file handwriting_json-0.1.1.tar.gz.
File metadata
- Download URL: handwriting_json-0.1.1.tar.gz
- Upload date:
- Size: 17.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6a00510e57aece2eb856871bf6c5d6ee3665882d63abf62a3d461cb31cd50e9
|
|
| MD5 |
7dbe2fbba70f8af60a01c9ff5e65561f
|
|
| BLAKE2b-256 |
cef3a6dbe360dd4ea7918c9e90984b42ac3505e71d4a547fadeae07721db9d01
|
File details
Details for the file handwriting_json-0.1.1-py3-none-any.whl.
File metadata
- Download URL: handwriting_json-0.1.1-py3-none-any.whl
- Upload date:
- Size: 15.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd77004ede1aa316ea5b7caa2fa4c8e4edc1ebaef5adafc660c17008ef01917a
|
|
| MD5 |
ff18c75a8cba48f85c5856cdc3614a10
|
|
| BLAKE2b-256 |
55d25348b9b4c01b89a8d85e8599c5b4d62661a3e6d86282fe53f617364474e8
|