Skip to main content

VLM Run Hub for various industry-specific schemas

Project description

VLM Run Logo

VLM Run Hub

Website | Platform | Docs | Blog | Discord | Catalog

PyPI Version PyPI Version PyPI Downloads
PyPi Downloads Discord PyPi Version


Welcome to VLM Run Hub, a comprehensive repository of pre-defined Pydantic schemas for extracting structured data from unstructured visual domains such as images, videos, and documents. Designed for Vision Language Models (VLMs) and optimized for real-world use cases, VLM Run Hub simplifies the integration of visual ETL into your workflows.

Image JSON
{
  "issuing_state": "MT",
  "license_number": "0812319684104",
  "first_name": "Brenda",
  "middle_name": "Lynn",
  "last_name": "Sample",
  "address": {
    "street": "123 MAIN STREET",
    "city": "HELENA",
    "state": "MT",
    "zip_code": "59601"
  },
  "date_of_birth": "1968-08-04",
  "gender": "F",
  "height": "5'06\"",
  "weight": 150.0,
  "eye_color": "BRO",
  "issue_date": "2015-02-15",
  "expiration_date": "2023-08-04",
  "license_class": "D"
}

๐Ÿ’ก Motivation

While vision models like OpenAI's GPT-4o and Anthropic's Claude Vision excel in exploratory tasks like "chat with images," they often lack practicality for automation and integration, where strongly-typed, validated outputs are crucial.

The Structured Outputs API (popularized by GPT-4o, Gemini) addresses this by constraining LLMs to return data in precise, strongly-typed formats such as Pydantic models. This eliminates complex parsing and validation, ensuring outputs conform to expected types and structures. These schemas can be nested and include complex types like lists and dictionaries, enabling seamless integration with existing systems while leveraging the full capabilities of the model.

๐Ÿงฐ Why use this hub of pre-defined Pydantic schemas?

  • ๐Ÿ“š Easy to use: Pydantic is a well-understood and battle-tested data model for structured data.
  • ๐Ÿ”‹ Batteries included: Each schema in this repo has been validated across real-world industry use casesโ€”from healthcare to finance to mediaโ€”saving you weeks of development effort.
  • ๐Ÿ” Automatic Data-validation: Built-in Pydantic validation ensures your extracted data is clean, accurate, and reliable, reducing errors and simplifying downstream workflows.
  • ๐Ÿ”Œ Type-safety: With Pydantic's type-safety and compatibility with tools like mypy and pyright, you can build composable, modular systems that are robust and maintainable.
  • ๐Ÿงฐ Model-agnostic: Use the same schema with multiple VLM providers, no need to rewrite prompts for different VLMs.
  • ๐Ÿš€ Optimized for Visual ETL: Purpose-built for extracting structured data from images, videos, and documents, this repo bridges the gap between unstructured data and actionable insights.

๐Ÿ“– Schema Catalog

The VLM Run Hub maintains a comprehensive catalog of all available schemas in the vlmrun/hub/catalog.yaml file. The catalog is automatically validated to ensure consistency and completeness of schema documentation. We refer the developer to the catalog-spec.yaml for the full YAML specification.

Category Domains
๐Ÿ“„ Document Processing document.bank-statement document.invoice document.receipt
document.resume document.us-drivers-license document.utility-bill
document.us-passport document.business-card document.insurance-claim
document.bank-check document.request-for-proposal document.india.aadhaar-card
document.india.pan-card
๐Ÿ’ฐ Accounting & Finance accounting.form-w2 accounting.form-payslip finance.balance-sheet
๐Ÿฅ Healthcare healthcare.medical-insurance-card healthcare.hipaa-release healthcare.pathology-report
๐Ÿ›’ Retail retail.ecommerce-product-caption retail.product-catalog food.nutrition-facts-label
๐Ÿ“บ Media media.tv-news media.nba-game-state media.nfl-game-state
๐Ÿญ Other Industries aerospace.remote-sensing logistics.bill-of-lading real-estate.lease-agreement
social.twitter-card

If you have a new schema you want to add to the catalog, please refer to the SCHEMA-GUIDELINES.md for the full guidelines.

๐Ÿš€ Getting Started

Let's say we want to extract invoice metadata from an invoice image. You can readily use our Invoice schema we have defined under vlmrun.hub.schemas.document.invoice and use it with any VLM of your choosing.

For a comprehensive walkthrough of available schemas and their usage, check out our Schema Showcase Notebook.

๐Ÿ’พ Installation

pip install vlmrun-hub

With VLM Run Python SDK

import os
from PIL import Image
from vlmrun.client import VLMRun
from vlmrun.client.types import PredictionResponse
from vlmrun.common.utils import download_image

VLMRUN_BASE_URL = os.getenv("VLMRUN_BASE_URL", "https://api.vlm.run/v1")
VLMRUN_API_KEY = os.getenv("VLMRUN_API_KEY", None)

client = VLMRun(base_url=VLMRUN_BASE_URL, api_key=VLMRUN_API_KEY)

IMAGE_URL = "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/document.invoice/invoice_1.jpg"
image: Image.Image = download_image(IMAGE_URL)

response: PredictionResponse = client.image.generate(
    images=[image],
    domain="document.invoice",
)

With Instructor / OpenAI

import instructor
from openai import OpenAI

from vlmrun.hub.schemas.document.invoice import Invoice

IMAGE_URL = "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/document.invoice/invoice_1.jpg"

client = instructor.from_openai(
    OpenAI(), mode=instructor.Mode.MD_JSON
)
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        { "role": "user", "content": [
            {"type": "text", "text": "Extract the invoice in JSON."},
            {"type": "image_url", "image_url": {"url": IMAGE_URL}, "detail": "auto"}
        ]}
    ],
    response_model=Invoice,
    temperature=0,
)
JSON Response:
Image JSON Output ๐Ÿ”
{
  "invoice_id": "9999999",
  "period_start": null,
  "period_end": null,
  "invoice_issue_date": "2023-11-11",
  "invoice_due_date": null,
  "order_id": null,
  "customer_id": null,
  "issuer": "Anytown, USA",
  "issuer_address": {
    "street": "123 Main Street",
    "city": "Anytown",
    "state": "USA",
    "postal_code": "01234",
    "country": null
  },
  "customer": "Fred Davis",
  "customer_email": "email@invoice.com",
  "customer_phone": "(800) 123-4567",
  "customer_billing_address": {
    "street": "1335 Martin Luther King Jr Ave",
    "city": "Dunedin",
    "state": "FL",
    "postal_code": "34698",
    "country": null
  },
  "customer_shipping_address": {
    "street": "249 Windward Passage",
    "city": "Clearwater",
    "state": "FL",
    "postal_code": "33767",
    "country": null
  },
  "items": [
    {
      "description": "Service",
      "quantity": 1,
      "currency": null,
      "unit_price": 200.0,
      "total_price": 200.0
    },
    {
      "description": "Parts AAA",
      "quantity": 1,
      "currency": null,
      "unit_price": 100.0,
      "total_price": 100.0
    },
    {
      "description": "Parts BBB",
      "quantity": 2,
      "currency": null,
      "unit_price": 50.0,
      "total_price": 100.0
    }
  ],
  "subtotal": 400.0,
  "tax": null,
  "total": 400.0,
  "currency": null,
  "notes": "",
  "others": null
}

With OpenAI Structured Outputs API

import instructor
from openai import OpenAI

from vlmrun.hub.schemas.document.invoice import Invoice

IMAGE_URL = "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/document.invoice/invoice_1.jpg"

client = OpenAI()
completion = client.beta.chat.completions.parse(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": [
            {"type": "text", "text": "Extract the invoice in JSON."},
            {"type": "image_url", "image_url": {"url": IMAGE_URL}, "detail": "auto"}
        ]},
    ],
    response_format=Invoice,
    temperature=0,
)

When working with the OpenAI Structured Outputs API, you need to ensure that the response_format is a valid Pydantic model with the supported types.

Locally with Ollama

Note: For certain vlmrun.common utilities, you will need to install our main Python SDK via pip install vlmrun.

from ollama import chat

from vlmrun.common.image import encode_image
from vlmrun.common.utils import remote_image
from vlmrun.hub.schemas.document.invoice import Invoice


IMAGE_URL = "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/document.invoice/invoice_1.jpg"

img = remote_image(IMAGE_URL)
chat_response = chat(
    model="llama3.2-vision:11b",
    format=Invoice.model_json_schema(),
    messages=[
        {
            "role": "user",
            "content": "Extract the invoice in JSON.",
            "images": [encode_image(img, format="JPEG").split(",")[1]],
        },
    ],
    options={
        "temperature": 0
    },
)
response = Invoice.model_validate_json(
    chat_response.message.content
)

๐Ÿ“– Qualitative Results

We periodically run popular VLMs on each of the examples & schemas in the catalog.yaml file and publish the results in the benchmarks directory.

Provider Model Date Results
OpenAI gpt-4o-2024-11-20 2025-01-09 link
OpenAI gpt-4o-mini-2024-07-18 2025-01-09 link
Gemini gemini-2.0-flash-exp 2025-01-10 link
Ollama llama3.2-vision:11b 2025-01-10 link
Ollama Qwen2.5-VL-7B-Instruct:Q4_K_M_benxh 2025-02-20 link
Ollama + Instructor Qwen2.5-VL-7B-Instruct:Q4_K_M_benxh 2025-02-20 link
Microsoft phi-4 2025-01-10 link

๐Ÿ“‚ Directory Structure

Schemas are organized by industry for easy navigation:

vlmrun
โ””โ”€โ”€ hub
    โ”œโ”€โ”€ schemas
    |   โ”œโ”€โ”€ <industry>
    |   |   โ”œโ”€โ”€ <use-case-1>.py
    |   |   โ”œโ”€โ”€ <use-case-2>.py
    |   |   โ””โ”€โ”€ ...
    โ”‚ย ย  โ”œโ”€โ”€ aerospace
    โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ remote_sensing.py
    โ”‚ย ย  โ”œโ”€โ”€ document  # all document schemas are here
    |   |   โ”œโ”€โ”€ invoice.py
    |   |   โ”œโ”€โ”€ us_drivers_license.py
    |   |   โ””โ”€โ”€ ...
    โ”‚ย ย  โ”œโ”€โ”€ healthcare
    โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ medical_insurance_card.py
    โ”‚ย ย  โ””โ”€โ”€ retail
    โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ ecommerce_product_caption.py
    โ”‚ย ย  โ””โ”€โ”€ contrib  # all contributions are welcome here!
    โ”‚ย ย      โ””โ”€โ”€ <schema-name>.py
    โ””โ”€โ”€ version.py

โœจ How to Contribute

We're building this hub for the community, and contributions are always welcome! Follow the CONTRIBUTING and SCHEMA-GUIDELINES.md to get started.

๐Ÿ”— Quick Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vlmrun_hub-0.1.35.tar.gz (65.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vlmrun_hub-0.1.35-py3-none-any.whl (66.0 kB view details)

Uploaded Python 3

File details

Details for the file vlmrun_hub-0.1.35.tar.gz.

File metadata

  • Download URL: vlmrun_hub-0.1.35.tar.gz
  • Upload date:
  • Size: 65.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vlmrun_hub-0.1.35.tar.gz
Algorithm Hash digest
SHA256 a15b0a7b6d1c32f752f65aadb51148c3d69791db9b7e30e69e6cbdc82a067f65
MD5 7c48a95386a053d462f3973dce596027
BLAKE2b-256 85addf0f9f4c93aa30c7fe6818c5152054cde2ad5dc606892313b55dfa3c5860

See more details on using hashes here.

File details

Details for the file vlmrun_hub-0.1.35-py3-none-any.whl.

File metadata

  • Download URL: vlmrun_hub-0.1.35-py3-none-any.whl
  • Upload date:
  • Size: 66.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vlmrun_hub-0.1.35-py3-none-any.whl
Algorithm Hash digest
SHA256 d8759ec2ba698ac43cd217fb7ffd2b79f214d998339236e8f37fe9a9e7642e9c
MD5 ac1c9f364e8740bc69d2ee78406d2658
BLAKE2b-256 1142cf00ae0daf18b7d30ca48e84bf1b492c644df5ba7afa24c0c0417c78eb9f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page