vlmrun-hub

VLM Run Hub for various industry-specific schemas

These details have not been verified by PyPI

Project links

Project description

VLM Run Hub

Welcome to VLM Run Hub, a comprehensive repository of pre-defined Pydantic schemas for extracting structured data from unstructured visual domains such as images, videos, and documents. Designed for Vision Language Models (VLMs) and optimized for real-world use cases, VLM Run Hub simplifies the integration of visual ETL into your workflows.

Website | Docs | Blog | Discord | Schema Catalog

💡 Motivation

While vision models like OpenAI’s GPT-4o and Anthropic’s Claude Vision excel in exploratory tasks like "chat with images," they often lack practicality for automation and integration, where strongly-typed, validated outputs are crucial.

The Structured Outputs API (popularized by GPT-4o, Gemini) addresses this by constraining LLMs to return data in precise, strongly-typed formats such as Pydantic models. This eliminates complex parsing and validation, ensuring outputs conform to expected types and structures. These schemas can be nested and include complex types like lists and dictionaries, enabling seamless integration with existing systems while leveraging the full capabilities of the model.

Why use this repo / pre-defined Pydantic schemas?

📚 Easy to use: Pydantic is a well-understood and battle-tested data model for structured data.
🔋 Batteries included: Each schema in this repo has been validated across real-world industry use cases—from healthcare to finance to media—saving you weeks of development effort.
🔍 Automatic Data-validation: Built-in Pydantic validation ensures your extracted data is clean, accurate, and reliable, reducing errors and simplifying downstream workflows.
🔌 Type-safety: With Pydantic’s type-safety and compatibility with tools like mypy and pyright, you can build composable, modular systems that are robust and maintainable.
🧰 Model-agnostic: Use the same schema with multiple VLM providers, no need to rewrite prompts for different VLMs.
🚀 Optimized for Visual ETL: Purpose-built for extracting structured data from images, videos, and documents, this repo bridges the gap between unstructured data and actionable insights.

🚀 Getting Started

Let's say we want to extract invoice metadata from an invoice image. You can readily use our Invoice schema we have defined under vlmrun.hub.schemas.document.invoice and use it with any VLM of your choosing.

With Instructor / OpenAI

import instructor
from openai import OpenAI

from vlmrun.hub.schemas.document.invoice import Invoice

IMAGE_URL = "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/document.invoice/invoice_1.jpg"

client = instructor.from_openai(
    OpenAI(), mode=instructor.Mode.MD_JSON
)
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        { "role": "user", "content": [
            {"type": "text", "text": "Extract the invoice in JSON."},
            {"type": "image_url", "image_url": {"url": IMAGE_URL}, "detail": "auto"}
        ]}
    ],
    response_model=Invoice,
    temperature=0,
)

JSON Response:

Image

JSON Output 🔐

{
  "invoice_id": "9999999",
  "period_start": null,
  "period_end": null,
  "invoice_issue_date": "2023-11-11",
  "invoice_due_date": null,
  "order_id": null,
  "customer_id": null,
  "issuer": "Anytown, USA",
  "issuer_address": {
    "street": "123 Main Street",
    "city": "Anytown",
    "state": "USA",
    "postal_code": "01234",
    "country": null
  },
  "customer": "Fred Davis",
  "customer_email": "email@invoice.com",
  "customer_phone": "(800) 123-4567",
  "customer_billing_address": {
    "street": "1335 Martin Luther King Jr Ave",
    "city": "Dunedin",
    "state": "FL",
    "postal_code": "34698",
    "country": null
  },
  "customer_shipping_address": {
    "street": "249 Windward Passage",
    "city": "Clearwater",
    "state": "FL",
    "postal_code": "33767",
    "country": null
  },
  "items": [
    {
      "description": "Service",
      "quantity": 1,
      "currency": null,
      "unit_price": 200.0,
      "total_price": 200.0
    },
    {
      "description": "Parts AAA",
      "quantity": 1,
      "currency": null,
      "unit_price": 100.0,
      "total_price": 100.0
    },
    {
      "description": "Parts BBB",
      "quantity": 2,
      "currency": null,
      "unit_price": 50.0,
      "total_price": 100.0
    }
  ],
  "subtotal": 400.0,
  "tax": null,
  "total": 400.0,
  "currency": null,
  "notes": "",
  "others": null
}

With VLM Run

import requests

from vlmrun.hub.schemas.document.invoice import Invoice


IMAGE_URL = "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/document.invoice/invoice_1.jpg"

json_data = {
    "image": IMAGE_URL,
    "model": "vlm-1",
    "domain": "document.invoice",
    "json_schema": Invoice.model_json_schema(),
}
response = requests.post(
    f"https://api.vlm.run/v1/image/generate",
    headers={"Authorization": f"Bearer <your-api-key>"},
    json=json_data,
)

With OpenAI Structured Outputs API

import instructor
from openai import OpenAI

from vlmrun.hub.schemas.document.invoice import Invoice

IMAGE_URL = "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/document.invoice/invoice_1.jpg"

client = OpenAI()
completion = client.beta.chat.completions.parse(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": [
            {"type": "text", "text": "Extract the invoice in JSON."},
            {"type": "image_url", "image_url": {"url": IMAGE_URL}, "detail": "auto"}
        ]},
    ],
    response_format=Invoice,
    temperature=0,
)

When working with the OpenAI Structured Outputs API, you need to ensure that the response_format is a valid Pydantic model with the supported types.

Locally with Ollama

from ollama import chat

from vlmrun.hub.schemas.document.invoice import Invoice
from vlmrun.hub.utils import encode_image, remote_image

IMAGE_URL = "https://storage.googleapis.com/vlm-data-public-prod/hub/examples/document.invoice/invoice_1.jpg"

img = remote_image(IMAGE_URL)
chat_response = chat(
    model="llama3.2-vision:11b",
    format=Invoice.model_json_schema(),
    messages=[
        {
            "role": "user",
            "content": "Extract the invoice in JSON.",
            "images": [encode_image(img, format="JPEG").split(",")[1]],
        },
    ],
    options={
        "temperature": 0
    },
)
response = Invoice.model_validate_json(
    chat_response.message.content
)

📖 Schema Catalog

The VLM Run Hub maintains a comprehensive catalog of all available schemas in the vlmrun/hub/catalog.yaml file. This catalog provides:

Domain-specific schema references
Detailed descriptions and prompts
Sample data references
Version information
Metadata including relevant tags

The catalog is automatically validated to ensure consistency and completeness of schema documentation. We refer the developer to the catalog-spec.yaml for the full YAML specification.

📖 Qualitative Results

We periodically run popular VLMs on each of the examples & schemas in the catalog.yaml file and publish the results in the benchmarks directory.

Provider	Model	Date	Results
OpenAI + Instructor	gpt-4o-2024-11-20	2025-01-06	link

📂 Directory Structure

Schemas are organized by industry for easy navigation:

vlmrun
└── hub
    ├── schemas
    |   ├── <industry>
    |   |   ├── <use-case-1>.py
    |   |   ├── <use-case-2>.py
    |   |   └── ...
    │   ├── aerospace
    │   │   └── remote_sensing.py
    │   ├── document  # all document schemas are here
    |   |   ├── invoice.py
    |   |   ├── us_drivers_license.py
    |   |   └── ...
    │   ├── healthcare
    │   │   └── medical_insurance_card.py
    │   └── retail
    │   │   └── ecommerce_product_caption.py
    │   └── contrib  # all contributions are welcome here!
    │       └── <schema-name>.py
    └── version.py

✨ How to Contribute

We’re building this hub for the community, and contributions are always welcome! Follow the CONTRIBUTING and SCHEMA-GUIDELINES.md to get started.

🔗 Quick Links

💬 Send us an email at support@vlm.run or join our Discord for help.
📣 Follow us on Twitter, and LinkedIn to keep up-to-date on our products.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.35

Dec 15, 2025

0.1.34

Apr 7, 2025

0.1.33

Feb 25, 2025

0.1.31

Feb 25, 2025

0.1.30

Feb 20, 2025

0.1.29

Feb 17, 2025

0.1.28

Feb 14, 2025

0.1.27

Feb 14, 2025

0.1.26

Feb 13, 2025

0.1.25

Feb 13, 2025

0.1.24

Feb 13, 2025

0.1.22

Feb 6, 2025

0.1.20

Feb 5, 2025

0.1.19a0 pre-release

Feb 5, 2025

0.1.18

Jan 16, 2025

0.1.17

Jan 16, 2025

0.1.16

Jan 16, 2025

0.1.15

Jan 14, 2025

0.1.14

Jan 11, 2025

0.1.13

Jan 10, 2025

0.1.12

Jan 10, 2025

This version

0.1.11

Jan 9, 2025

0.1.10

Jan 6, 2025

0.1.8

Jan 5, 2025

0.1.7

Jan 5, 2025

0.1.6

Jan 5, 2025

0.1.5

Jan 5, 2025

0.1.4

Jan 4, 2025

0.1.3

Jan 3, 2025

0.1.2

Dec 24, 2024

0.1.1

Dec 24, 2024

0.1.0

Dec 18, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vlmrun_hub-0.1.11.tar.gz (29.7 kB view details)

Uploaded Jan 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vlmrun_hub-0.1.11-py3-none-any.whl (19.9 kB view details)

Uploaded Jan 9, 2025 Python 3

File details

Details for the file vlmrun_hub-0.1.11.tar.gz.

File metadata

Download URL: vlmrun_hub-0.1.11.tar.gz
Upload date: Jan 9, 2025
Size: 29.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for vlmrun_hub-0.1.11.tar.gz
Algorithm	Hash digest
SHA256	`2892e2459b64d0b1d4afceb9ada930be8e47b5f231d79eac39e5ef0f6a9b8b84`
MD5	`176f468af2d53e0df3a8fb8e400ee899`
BLAKE2b-256	`9af00e156e0d0990770ac33262db2ebe7750a7d7626699555d3e412168e11633`

See more details on using hashes here.

File details

Details for the file vlmrun_hub-0.1.11-py3-none-any.whl.

File metadata

Download URL: vlmrun_hub-0.1.11-py3-none-any.whl
Upload date: Jan 9, 2025
Size: 19.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for vlmrun_hub-0.1.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`efe10f68107328d39d15c007b5268af5c2aaae21ebc1efa5fa8c3293a038afb7`
MD5	`1d28afd91de7b347d720474d556536eb`
BLAKE2b-256	`4ea5bcdad6040d1bb65bb4b2176e2f34c0624cf11b0f4f860ea1823efd76bb2d`

See more details on using hashes here.

vlmrun-hub 0.1.11

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

VLM Run Hub

💡 Motivation

Why use this repo / pre-defined Pydantic schemas?

🚀 Getting Started

With Instructor / OpenAI

With VLM Run

With OpenAI Structured Outputs API

Locally with Ollama

📖 Schema Catalog

📖 Qualitative Results

📂 Directory Structure

✨ How to Contribute

🔗 Quick Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes