Skip to main content

A fork of 1rgs' jsonformer maintained by Guardrails AI

Project description

Jsonformer: A Bulletproof Way to Generate Structured JSON from Language Models.

Problem: Getting models to output structured JSON is hard

Solution: Only generate the content tokens and fill in the fixed tokens

colab

cover

Generating structured JSON from language models is a challenging task. The generated JSON must be syntactically correct, and it must conform to a schema that specifies the structure of the JSON.

Current approaches to this problem are brittle and error-prone. They rely on prompt engineering, fine-tuning, and post-processing, but they still fail to generate syntactically correct JSON in many cases.

Jsonformer is a new approach to this problem. In structured data, many tokens are fixed and predictable. Jsonformer is a wrapper around Hugging Face models that fills in the fixed tokens during the generation process, and only delegates the generation of content tokens to the language model. This makes it more efficient and bulletproof than existing approaches.

This currently supports a subset of JSON Schema. Below is a list of the supported schema types:

  • number
  • boolean
  • string
  • array
  • object

Example

from jsonformer import Jsonformer
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-12b")
tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-12b")

json_schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "number"},
        "is_student": {"type": "boolean"},
        "courses": {
            "type": "array",
            "items": {"type": "string"}
        }
    }
}

prompt = "Generate a person's information based on the following schema:"
jsonformer = Jsonformer(model, tokenizer, json_schema, prompt)
generated_data = jsonformer()

print(generated_data)

Jsonformer works on complex schemas, even with tiny models. Here is an example of a schema with nested objects and arrays, generated by a 3B parameter model.

{"type": "object", "properties": {"car": {"type": "object", "properties": {"make": {"type": "string"}, "model": {"type": "string"}, "year": {"type": "number"}, "colors": {"type": "array", "items": {"type": "string"}}, "features": {"type": "object", "properties": {"audio": {"type": "object", "properties": {"brand": {"type": "string"}, "speakers": {"type": "number"}, "hasBluetooth": {"type": "boolean"}}}, "safety": {"type": "object", "properties": {"airbags": {"type": "number"}, "parkingSensors": {"type": "boolean"}, "laneAssist": {"type": "boolean"}}}, "performance": {"type": "object", "properties": {"engine": {"type": "string"}, "horsepower": {"type": "number"}, "topSpeed": {"type": "number"}}}}}}}, "owner": {"type": "object", "properties": {"firstName": {"type": "string"}, "lastName": {"type": "string"}, "age": {"type": "number"}}}}}
{
  car: {
    make: "audi",
    model: "model A8",
    year: 2016.0,
    colors: [
      "blue"
    ],
    features: {
      audio: {
        brand: "sony",
        speakers: 2.0,
        hasBluetooth: True
      },
      safety: {
        airbags: 2.0,
        parkingSensors: True,
        laneAssist: True
      },
      performance: {
        engine: "4.0",
        horsepower: 220.0,
        topSpeed: 220.0
      }
    }
  },
  owner: {
    firstName: "John",
    lastName: "Doe",
    age: 40.0
  }
}

Features

  • Bulletproof JSON generation: Jsonformer ensures that the generated JSON is always syntactically correct and conforms to the specified schema.
  • Efficiency: By generating only the content tokens and filling in the fixed tokens, Jsonformer is more efficient than generating a full JSON string and parsing it.
  • Flexible and extendable: Jsonformer is built on top of the Hugging Face transformers library, making it compatible with any model that supports the Hugging Face interface.

Installation

pip install jsonformer

Development

Poetry is used for dependency management.

poetry install
poetry run python -m jsonformer.example

License

Jsonformer is released under the MIT License. You are free to use, modify, and distribute this software for any purpose, commercial or non-commercial, as long as the original copyright and license notice are included.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

guardrails_jsonformer-0.13.1.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

guardrails_jsonformer-0.13.1-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file guardrails_jsonformer-0.13.1.tar.gz.

File metadata

  • Download URL: guardrails_jsonformer-0.13.1.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.0 CPython/3.13.7 Linux/6.11.0-1018-azure

File hashes

Hashes for guardrails_jsonformer-0.13.1.tar.gz
Algorithm Hash digest
SHA256 857e7518d7d4251c518e4f567485a90af56adaabe13a685c2f9fc8b5811b53a6
MD5 249f4d2e4f1e227d654384cf34354230
BLAKE2b-256 113a3c14b7d9f5b3be9dadc7b3af9b4641eb0ac965f77f46b2e3be80e12735f5

See more details on using hashes here.

File details

Details for the file guardrails_jsonformer-0.13.1-py3-none-any.whl.

File metadata

File hashes

Hashes for guardrails_jsonformer-0.13.1-py3-none-any.whl
Algorithm Hash digest
SHA256 748f8ef574ae7a2661e3f62f72b6ff86642aa3f717033fc9bb047b39b3dd0ae3
MD5 733602c385255f6b61fea1d8ec48c5f7
BLAKE2b-256 c9fa61e8b5d427d737f23249f42bb5340206bab70ba39e22cdb2c96ef8f78fe5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page