A Python library for dynamic JSON generation based on schemas using language models.
Project description
jsonAI
Table of Contents
- Features
- Installation
- Architecture Overview
- Testing
- Examples
- Output Format × Type Coverage
- Integrations & Capabilities
- License
jsonAI is a Python library for generating structured data based on JSON schemas using pre-trained language models. It supports a wide range of data types and output formats, making it ideal for applications requiring dynamic data generation.
Features
- Dynamic JSON Generation: Generate JSON objects based on schemas with support for complex types.
- Output Formats: Supports JSON, XML, YAML, and CSV.
- Validation: Validate generated data against schemas.
- Tool Integration: Execute tools based on generated data.
- Async Support: Asynchronous generation and tool execution.
Installation
pip install jsonAI
Architecture Overview
The jsonAI library is modular and consists of the following components:
Jsonformer: Orchestrates the generation process, handles output formatting, and validates data.TypeGenerator: Generates values for individual data types.OutputFormatter: Converts generated data into the desired format.SchemaValidator: Validates data against JSON schemas.ToolRegistry: Manages tools for execution.AsyncJsonformer: Provides asynchronous support for generation and tool execution.
Testing
The project includes comprehensive tests for each component and integration:
- Unit Tests: Test individual components.
- Integration Tests: Validate the interaction between components.
To run tests:
pytest tests/
Examples
Basic JSON Generation
from transformers import AutoModelForCausalLM, AutoTokenizer
from jsonAI.main import Jsonformer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"isStudent": {"type": "boolean"}
}
}
prompt = "Generate a person's profile."
jsonformer = Jsonformer(model, tokenizer, schema, prompt)
output = jsonformer()
print(output)
XML Output
YAML Output
schema = {
"type": "object",
"properties": {
"city": {"type": "string"},
"population": {"type": "integer"}
}
}
prompt = "Generate a city profile."
jsonformer = Jsonformer(model, tokenizer, schema, prompt, output_format="yaml")
output = jsonformer()
print(output)
CSV Output
schema = {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"score": {"type": "number"}
}
}
}
prompt = "Generate a list of students and their scores."
jsonformer = Jsonformer(model, tokenizer, schema, prompt, output_format="csv")
output = jsonformer()
print(output)
CLI Example
jsonai generate --schema schema.json --prompt "Generate a product" --output-format json
Tool Calling Example
def send_email(email):
print(f"Sending email to {email}")
return "Email sent"
tool_registry = ToolRegistry()
tool_registry.register_tool("send_email", send_email)
schema = {
"type": "object",
"properties": {
"email": {"type": "string", "format": "email"}
},
"x-jsonai-tool-call": {
"name": "send_email",
"arguments": {"email": "email"}
}
}
prompt = "Generate a user email."
jsonformer = Jsonformer(model, tokenizer, schema, prompt, tool_registry=tool_registry)
output = jsonformer()
print(output)
MCP Integration Example
def mcp_callback(tool_name, server_name, kwargs):
# Simulate MCP call
return f"Called {tool_name} on {server_name} with {kwargs}"
schema = {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"x-jsonai-tool-call": {
"name": "search_tool",
"arguments": {"query": "query"}
}
}
jsonformer = Jsonformer(model, tokenizer, schema, prompt, mcp_callback=mcp_callback)
output = jsonformer()
print(output)
Complex Schema Example
schema = {
"type": "object",
"properties": {
"user": {
"type": "object",
"properties": {
"id": {"type": "uuid"},
"name": {"type": "string"},
"email": {"type": "string", "format": "email"}
}
},
"roles": {
"type": "array",
"items": {"type": "string", "enum": ["admin", "user", "guest"]}
},
"profile": {
"oneOf": [
{"type": "object", "properties": {"age": {"type": "integer"}}},
{"type": "object", "properties": {"birthdate": {"type": "date"}}}
]
}
},
"x-jsonai-tool-call": {
"name": "send_welcome_email",
"arguments": {"email": "user.email"}
}
}
# ...setup model, tokenizer, tool_registry, etc...
jsonformer = Jsonformer(model, tokenizer, schema, prompt, tool_registry=tool_registry)
output = jsonformer()
print(output)
schema = {
"type": "object",
"properties": {
"book": {
"type": "object",
"properties": {
"title": {"type": "string"},
"author": {"type": "string"},
"year": {"type": "integer"}
}
}
}
}
prompt = "Generate details for a book."
jsonformer = Jsonformer(model, tokenizer, schema, prompt, output_format="xml")
output = jsonformer()
print(output)
Output Format × Type Coverage
| Type | Example | JSON | XML | YAML | CSV* |
|---|---|---|---|---|---|
| number | 3.14 | ✅ | ✅ | ✅ | ✅ |
| integer | 42 | ✅ | ✅ | ✅ | ✅ |
| boolean | true | ✅ | ✅ | ✅ | ✅ |
| string | "hello" | ✅ | ✅ | ✅ | ✅ |
| datetime | "2023-06-29T12:00:00Z" | ✅ | ✅ | ✅ | ✅ |
| date | "2023-06-29" | ✅ | ✅ | ✅ | ✅ |
| time | "12:00:00" | ✅ | ✅ | ✅ | ✅ |
| uuid | "123e4567-e89b-12d3-a456-426614174000" | ✅ | ✅ | ✅ | ✅ |
| binary | "SGVsbG8=" | ✅ | ✅ | ✅ | ✅ |
| null | null | ✅ | (⚠️) | ✅ | (⚠️) |
| array | [1,2,3] | ✅ | ✅ | ✅ | (⚠️) |
| object | {"a":1} | ✅ | ✅ | ✅ | (⚠️) |
| enum | "red" | ✅ | ✅ | ✅ | ✅ |
| p_enum | "blue" | ✅ | ✅ | ✅ | ✅ |
| p_integer | 7 | ✅ | ✅ | ✅ | ✅ |
✅ = Supported ⚠️ = Supported with caveats (e.g., nulls in XML/CSV, arrays/objects in CSV) *CSV: Only arrays of objects (tabular) are practical
Integrations & Capabilities
- LLM Integration: Use with HuggingFace Transformers, OpenAI, vLLM, Ollama, etc.
- FastAPI: Serve generation endpoints via FastAPI (see
examples/fastapi_example.py). - Tool Registry: Register and call Python or MCP tools from schemas.
- Async Support: Use
AsyncJsonformerfor async workflows.
See the examples/ directory for more advanced usage and integration patterns.
License
This project is licensed under the MIT License.
Streaming Support
jsonAI now supports streaming data generation for real-time applications. Use the stream_generate_data method in Jsonformer or AsyncJsonformer to generate data incrementally.
Example
# Streaming with Jsonformer
jsonformer = Jsonformer(model_backend, json_schema, prompt)
for data_chunk in jsonformer.stream_generate_data():
print(data_chunk)
# Streaming with AsyncJsonformer
async def async_stream():
async_jsonformer = AsyncJsonformer(jsonformer)
async for data_chunk in async_jsonformer.stream_generate_data():
print(data_chunk)
asyncio.run(async_stream())
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jsonai-0.13.0.tar.gz.
File metadata
- Download URL: jsonai-0.13.0.tar.gz
- Upload date:
- Size: 21.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.13.5 Linux/6.11.0-1015-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14ec7e163917443b010eb0038648319e7d114e8b67e3a618110f2a8d9d4c7d18
|
|
| MD5 |
b4926c6ef9e391bc1b9c4e08e910fc0d
|
|
| BLAKE2b-256 |
582b829f18cf9e4230b3b8da060c0dcb2891d463752388f9745105bb1d19a669
|
File details
Details for the file jsonai-0.13.0-py3-none-any.whl.
File metadata
- Download URL: jsonai-0.13.0-py3-none-any.whl
- Upload date:
- Size: 25.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.13.5 Linux/6.11.0-1015-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f0d17f923e1d1f8366f5e35fb1de0dd60f3de922150e57bf93e41421da00fa7
|
|
| MD5 |
2caefee167b169edf385cf4c05759c6d
|
|
| BLAKE2b-256 |
005c6058f06fdc866d7d99f51aecc2a6aa2db8e2814082e69708a5f6c4f29381
|