A Python library for dynamic JSON generation based on schemas using language models.
Project description
JsonAI - Production-Ready Structured JSON Generation with LLMs
JsonAI is a comprehensive Python library for generating structured JSON data using Large Language Models (LLMs). It provides enterprise-grade features including robust JSON schema validation, multiple model backends, REST API, React frontend, CLI interface, and production deployment configurations.
🚀 Features
Core Capabilities
- Multiple LLM Backends: Support for Ollama, OpenAI, and HuggingFace models
- Complete JSON Schema Support: All JSON schema types including primitives, arrays, objects, enums, and complex nested structures
- Performance Optimization: Advanced caching, batch processing, and async operations
- Production Ready: Docker deployment, Kubernetes configs, monitoring, and scaling
Interfaces & APIs
- REST API: FastAPI-based service with OpenAPI documentation
- React Frontend: Modern web interface for JSON generation
- CLI Interface: Powerful command-line tools for automation and batch processing
- Python Library: Direct programmatic access with async support
Enterprise Features
- Caching System: Intelligent multi-level caching with TTL and LRU strategies
- Batch Processing: Concurrent processing of multiple requests
- Performance Monitoring: Built-in metrics and performance tracking
- Schema Validation: Comprehensive validation with custom rules support
- Multiple Output Formats: JSON, YAML, XML, and CSV support
📦 Installation
Option 1: pip (Recommended)
pip install jsonai
Option 2: From Source
git clone https://github.com/yourusername/JsonAI.git
cd JsonAI
poetry install
Option 3: Docker
# Quick start with Docker
docker run -p 8000:8000 jsonai:latest
# Full stack with Docker Compose
docker-compose up -d
Architecture Overview
The jsonAI library is modular and consists of the following components:
Jsonformer: Orchestrates the generation process, handles output formatting, and validates data.TypeGenerator: Generates values for individual data types.OutputFormatter: Converts generated data into the desired format.SchemaValidator: Validates data against JSON schemas.ToolRegistry: Manages tools for execution.AsyncJsonformer: Provides asynchronous support for generation and tool execution.
Testing
The project includes comprehensive tests for each component and integration:
- Unit Tests: Test individual components.
- Integration Tests: Validate the interaction between components.
To run tests:
pytest tests/
Examples
Basic JSON Generation
from transformers import AutoModelForCausalLM, AutoTokenizer
from jsonAI.main import Jsonformer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"isStudent": {"type": "boolean"}
}
}
prompt = "Generate a person's profile."
jsonformer = Jsonformer(model, tokenizer, schema, prompt)
output = jsonformer()
print(output)
XML Output
YAML Output
schema = {
"type": "object",
"properties": {
"city": {"type": "string"},
"population": {"type": "integer"}
}
}
prompt = "Generate a city profile."
jsonformer = Jsonformer(model, tokenizer, schema, prompt, output_format="yaml")
output = jsonformer()
print(output)
CSV Output
schema = {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"score": {"type": "number"}
}
}
}
prompt = "Generate a list of students and their scores."
jsonformer = Jsonformer(model, tokenizer, schema, prompt, output_format="csv")
output = jsonformer()
print(output)
CLI Example
Basic CLI Usage
python -m jsonAI.cli generate --schema schema.json --prompt "Generate a product" --output-format json
Using Ollama Backend (Recommended for LLMs)
python -m jsonAI.cli generate --schema complex_schema.json --prompt "Generate a comprehensive person profile as JSON." --use-ollama --ollama-model qwen3:1.7b
Features
- Robustly extracts the first valid JSON object from any LLM output (even if wrapped in tags or surrounded by extra text)
- Supports all JSON schema types: primitives, enums, arrays, objects, null, oneOf, nested/complex
- Validates output against the schema and warns if invalid
- Pretty-prints objects/arrays, prints primitives/null as-is
- Production-ready for any schema and LLM output style
Example Output
{
"id": "profile with all supported JSON schema types.",
"name": "re",
"age": 30,
"is_active": true,
"email": "example@example.com",
"roles": ["admin", "user"],
"address": {"street": "123 Main St", "city": "Anytown", "zip": "12345", "country": "USA"},
"preferences": {"newsletter": true, "theme": "dark", "language": "en"},
"tags": ["tech", "developer"],
"score": 95,
"metadata": {"key1": "value1", "key2": "value2"},
"status": "active",
"history": [{"date": "2023-01-01", "event": "joined", "details": "Account created"}],
"profile_picture": "https://example.com/avatar.jpg",
"settings": {"notifications": true, "privacy": "private"},
"null_field": null
}
See complex_schema.json for a comprehensive schema example.
Tool Calling Example
def send_email(email):
print(f"Sending email to {email}")
return "Email sent"
tool_registry = ToolRegistry()
tool_registry.register_tool("send_email", send_email)
schema = {
"type": "object",
"properties": {
"email": {"type": "string", "format": "email"}
},
"x-jsonai-tool-call": {
"name": "send_email",
"arguments": {"email": "email"}
}
}
prompt = "Generate a user email."
jsonformer = Jsonformer(model, tokenizer, schema, prompt, tool_registry=tool_registry)
output = jsonformer()
print(output)
MCP Integration Example
def mcp_callback(tool_name, server_name, kwargs):
# Simulate MCP call
return f"Called {tool_name} on {server_name} with {kwargs}"
schema = {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"x-jsonai-tool-call": {
"name": "search_tool",
"arguments": {"query": "query"}
}
}
jsonformer = Jsonformer(model, tokenizer, schema, prompt, mcp_callback=mcp_callback)
output = jsonformer()
print(output)
Complex Schema Example
schema = {
"type": "object",
"properties": {
"user": {
"type": "object",
"properties": {
"id": {"type": "uuid"},
"name": {"type": "string"},
"email": {"type": "string", "format": "email"}
}
},
"roles": {
"type": "array",
"items": {"type": "string", "enum": ["admin", "user", "guest"]}
},
"profile": {
"oneOf": [
{"type": "object", "properties": {"age": {"type": "integer"}}},
{"type": "object", "properties": {"birthdate": {"type": "date"}}}
]
}
},
"x-jsonai-tool-call": {
"name": "send_welcome_email",
"arguments": {"email": "user.email"}
}
}
# ...setup model, tokenizer, tool_registry, etc...
jsonformer = Jsonformer(model, tokenizer, schema, prompt, tool_registry=tool_registry)
output = jsonformer()
print(output)
schema = {
"type": "object",
"properties": {
"book": {
"type": "object",
"properties": {
"title": {"type": "string"},
"author": {"type": "string"},
"year": {"type": "integer"}
}
}
}
}
prompt = "Generate details for a book."
jsonformer = Jsonformer(model, tokenizer, schema, prompt, output_format="xml")
output = jsonformer()
print(output)
Tool Chaining Example
You can chain multiple tools together using the x-jsonai-tool-chain schema key. Each tool in the chain receives arguments from the generated data and/or previous tool outputs.
from jsonAI.main import Jsonformer
from jsonAI.tool_registry import ToolRegistry
def add(x, y):
return {"sum": x + y}
def multiply(sum, factor):
return {"product": sum * factor}
registry = ToolRegistry()
registry.register_tool("add", add)
registry.register_tool("multiply", multiply)
schema = {
"type": "object",
"properties": {
"x": {"type": "integer"},
"y": {"type": "integer"},
"factor": {"type": "integer"}
},
"x-jsonai-tool-chain": [
{
"name": "add",
"arguments": {"x": "x", "y": "y"}
},
{
"name": "multiply",
"arguments": {"sum": "sum", "factor": "factor"}
}
]
}
prompt = "Calculate (x + y) * factor."
jsonformer = Jsonformer(
model_backend=None, # Not used in this example
json_schema=schema,
prompt=prompt,
tool_registry=registry
)
# Provide input data (simulate generated data)
jsonformer.value = {"x": 2, "y": 3, "factor": 4}
generated = jsonformer.generate_data()
result = jsonformer._execute_tool_call(generated)
print(result)
# Output will include all intermediate and final tool results.
Output Format × Type Coverage
| Type | Example | JSON | XML | YAML | CSV* |
|---|---|---|---|---|---|
| number | 3.14 | ✅ | ✅ | ✅ | ✅ |
| integer | 42 | ✅ | ✅ | ✅ | ✅ |
| boolean | true | ✅ | ✅ | ✅ | ✅ |
| string | "hello" | ✅ | ✅ | ✅ | ✅ |
| datetime | "2023-06-29T12:00:00Z" | ✅ | ✅ | ✅ | ✅ |
| date | "2023-06-29" | ✅ | ✅ | ✅ | ✅ |
| time | "12:00:00" | ✅ | ✅ | ✅ | ✅ |
| uuid | "123e4567-e89b-12d3-a456-426614174000" | ✅ | ✅ | ✅ | ✅ |
| binary | "SGVsbG8=" | ✅ | ✅ | ✅ | ✅ |
| null | null | ✅ | (⚠️) | ✅ | (⚠️) |
| array | [1,2,3] | ✅ | ✅ | ✅ | (⚠️) |
| object | {"a":1} | ✅ | ✅ | ✅ | (⚠️) |
| enum | "red" | ✅ | ✅ | ✅ | ✅ |
| p_enum | "blue" | ✅ | ✅ | ✅ | ✅ |
| p_integer | 7 | ✅ | ✅ | ✅ | ✅ |
✅ = Supported ⚠️ = Supported with caveats (e.g., nulls in XML/CSV, arrays/objects in CSV) *CSV: Only arrays of objects (tabular) are practical
Integrations & Capabilities
- LLM Integration: Use with HuggingFace Transformers, OpenAI, vLLM, Ollama, etc.
- FastAPI: Serve generation endpoints via FastAPI (see
examples/fastapi_example.py). - Tool Registry: Register and call Python or MCP tools from schemas.
- Async Support: Use
AsyncJsonformerfor async workflows.
See the examples/ directory for more advanced usage and integration patterns.
License
This project is licensed under the MIT License.
Streaming Support
jsonAI now supports streaming data generation for real-time applications. Use the stream_generate_data method in Jsonformer or AsyncJsonformer to generate data incrementally.
Example
# Streaming with Jsonformer
jsonformer = Jsonformer(model_backend, json_schema, prompt)
for data_chunk in jsonformer.stream_generate_data():
print(data_chunk)
# Streaming with AsyncJsonformer
async def async_stream():
async_jsonformer = AsyncJsonformer(jsonformer)
async for data_chunk in async_jsonformer.stream_generate_data():
print(data_chunk)
asyncio.run(async_stream())
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jsonai-0.15.0.tar.gz.
File metadata
- Download URL: jsonai-0.15.0.tar.gz
- Upload date:
- Size: 32.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.13.5 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a198e25598aae5f22bc124a8abdb1086a07b9363477b056756fd7f161ab89e06
|
|
| MD5 |
446eecc02b1bc9df97e58481d465d96c
|
|
| BLAKE2b-256 |
fce661c5bdf8c1ecea346fe6985331892ecf9b50b71b0267188bf78a7c0e3010
|
File details
Details for the file jsonai-0.15.0-py3-none-any.whl.
File metadata
- Download URL: jsonai-0.15.0-py3-none-any.whl
- Upload date:
- Size: 36.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.13.5 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
87d3af51ba6d0353454bda4e28ea450543985670fe3f7875f34c7f0dd7df141c
|
|
| MD5 |
b873fad3581dd6b3ddc83f5dd5b22d2d
|
|
| BLAKE2b-256 |
3045d076760a08dd59880cb65c9eb1fed746799372a5773a9bf932a7a9c8533a
|