Skip to main content

A Python library for dynamic JSON generation based on schemas using language models.

Project description

jsonAI

Table of Contents

jsonAI is a Python library for generating structured data based on JSON schemas using pre-trained language models. It supports a wide range of data types and output formats, making it ideal for applications requiring dynamic data generation.

Features

  • Dynamic JSON Generation: Generate JSON objects based on schemas with support for complex types.
  • Output Formats: Supports JSON, XML, YAML, and CSV.
  • Validation: Validate generated data against schemas.
  • Tool Integration: Execute tools based on generated data.
  • Async Support: Asynchronous generation and tool execution.

Installation

pip install jsonAI

Architecture Overview

The jsonAI library is modular and consists of the following components:

  • Jsonformer: Orchestrates the generation process, handles output formatting, and validates data.
  • TypeGenerator: Generates values for individual data types.
  • OutputFormatter: Converts generated data into the desired format.
  • SchemaValidator: Validates data against JSON schemas.
  • ToolRegistry: Manages tools for execution.
  • AsyncJsonformer: Provides asynchronous support for generation and tool execution.

Testing

The project includes comprehensive tests for each component and integration:

  • Unit Tests: Test individual components.
  • Integration Tests: Validate the interaction between components.

To run tests:

pytest tests/

Examples

Basic JSON Generation

from transformers import AutoModelForCausalLM, AutoTokenizer
from jsonAI.main import Jsonformer

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"},
        "isStudent": {"type": "boolean"}
    }
}

prompt = "Generate a person's profile."
jsonformer = Jsonformer(model, tokenizer, schema, prompt)
output = jsonformer()
print(output)

XML Output

YAML Output

schema = {
    "type": "object",
    "properties": {
        "city": {"type": "string"},
        "population": {"type": "integer"}
    }
}
prompt = "Generate a city profile."
jsonformer = Jsonformer(model, tokenizer, schema, prompt, output_format="yaml")
output = jsonformer()
print(output)

CSV Output

schema = {
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "score": {"type": "number"}
        }
    }
}
prompt = "Generate a list of students and their scores."
jsonformer = Jsonformer(model, tokenizer, schema, prompt, output_format="csv")
output = jsonformer()
print(output)

CLI Example

jsonai generate --schema schema.json --prompt "Generate a product" --output-format json

Tool Calling Example

def send_email(email):
    print(f"Sending email to {email}")
    return "Email sent"

tool_registry = ToolRegistry()
tool_registry.register_tool("send_email", send_email)

schema = {
    "type": "object",
    "properties": {
        "email": {"type": "string", "format": "email"}
    },
    "x-jsonai-tool-call": {
        "name": "send_email",
        "arguments": {"email": "email"}
    }
}
prompt = "Generate a user email."
jsonformer = Jsonformer(model, tokenizer, schema, prompt, tool_registry=tool_registry)
output = jsonformer()
print(output)

MCP Integration Example

def mcp_callback(tool_name, server_name, kwargs):
    # Simulate MCP call
    return f"Called {tool_name} on {server_name} with {kwargs}"

schema = {
    "type": "object",
    "properties": {
        "query": {"type": "string"}
    },
    "x-jsonai-tool-call": {
        "name": "search_tool",
        "arguments": {"query": "query"}
    }
}
jsonformer = Jsonformer(model, tokenizer, schema, prompt, mcp_callback=mcp_callback)
output = jsonformer()
print(output)

Complex Schema Example

schema = {
    "type": "object",
    "properties": {
        "user": {
            "type": "object",
            "properties": {
                "id": {"type": "uuid"},
                "name": {"type": "string"},
                "email": {"type": "string", "format": "email"}
            }
        },
        "roles": {
            "type": "array",
            "items": {"type": "string", "enum": ["admin", "user", "guest"]}
        },
        "profile": {
            "oneOf": [
                {"type": "object", "properties": {"age": {"type": "integer"}}},
                {"type": "object", "properties": {"birthdate": {"type": "date"}}}
            ]
        }
    },
    "x-jsonai-tool-call": {
        "name": "send_welcome_email",
        "arguments": {"email": "user.email"}
    }
}
# ...setup model, tokenizer, tool_registry, etc...
jsonformer = Jsonformer(model, tokenizer, schema, prompt, tool_registry=tool_registry)
output = jsonformer()
print(output)
schema = {
    "type": "object",
    "properties": {
        "book": {
            "type": "object",
            "properties": {
                "title": {"type": "string"},
                "author": {"type": "string"},
                "year": {"type": "integer"}
            }
        }
    }
}

prompt = "Generate details for a book."
jsonformer = Jsonformer(model, tokenizer, schema, prompt, output_format="xml")
output = jsonformer()
print(output)

Output Format × Type Coverage

Type Example JSON XML YAML CSV*
number 3.14
integer 42
boolean true
string "hello"
datetime "2023-06-29T12:00:00Z"
date "2023-06-29"
time "12:00:00"
uuid "123e4567-e89b-12d3-a456-426614174000"
binary "SGVsbG8="
null null (⚠️) (⚠️)
array [1,2,3] (⚠️)
object {"a":1} (⚠️)
enum "red"
p_enum "blue"
p_integer 7

✅ = Supported ⚠️ = Supported with caveats (e.g., nulls in XML/CSV, arrays/objects in CSV) *CSV: Only arrays of objects (tabular) are practical

Integrations & Capabilities

  • LLM Integration: Use with HuggingFace Transformers, OpenAI, vLLM, Ollama, etc.
  • FastAPI: Serve generation endpoints via FastAPI (see examples/fastapi_example.py).
  • Tool Registry: Register and call Python or MCP tools from schemas.
  • Async Support: Use AsyncJsonformer for async workflows.

See the examples/ directory for more advanced usage and integration patterns.

License

This project is licensed under the MIT License.

Streaming Support

jsonAI now supports streaming data generation for real-time applications. Use the stream_generate_data method in Jsonformer or AsyncJsonformer to generate data incrementally.

Example

# Streaming with Jsonformer
jsonformer = Jsonformer(model_backend, json_schema, prompt)
for data_chunk in jsonformer.stream_generate_data():
    print(data_chunk)

# Streaming with AsyncJsonformer
async def async_stream():
    async_jsonformer = AsyncJsonformer(jsonformer)
    async for data_chunk in async_jsonformer.stream_generate_data():
        print(data_chunk)

asyncio.run(async_stream())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jsonai-0.13.0.tar.gz (21.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jsonai-0.13.0-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file jsonai-0.13.0.tar.gz.

File metadata

  • Download URL: jsonai-0.13.0.tar.gz
  • Upload date:
  • Size: 21.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.5 Linux/6.11.0-1015-azure

File hashes

Hashes for jsonai-0.13.0.tar.gz
Algorithm Hash digest
SHA256 14ec7e163917443b010eb0038648319e7d114e8b67e3a618110f2a8d9d4c7d18
MD5 b4926c6ef9e391bc1b9c4e08e910fc0d
BLAKE2b-256 582b829f18cf9e4230b3b8da060c0dcb2891d463752388f9745105bb1d19a669

See more details on using hashes here.

File details

Details for the file jsonai-0.13.0-py3-none-any.whl.

File metadata

  • Download URL: jsonai-0.13.0-py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.5 Linux/6.11.0-1015-azure

File hashes

Hashes for jsonai-0.13.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3f0d17f923e1d1f8366f5e35fb1de0dd60f3de922150e57bf93e41421da00fa7
MD5 2caefee167b169edf385cf4c05759c6d
BLAKE2b-256 005c6058f06fdc866d7d99f51aecc2a6aa2db8e2814082e69708a5f6c4f29381

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page