A library for generating structured JSON using GPT-4o.
Project description
jsonpaws
jsonpaws is an open-source Python library for generating structured and consistent JSON outputs using GPT-4o. It provides modes for analysis, synthesis, and image analysis to extract and generate structured JSON data from unstructured text or based on a given schema.
Features
- Analysis Mode: Extract structured information from unstructured text. Perfect for data extraction and transformation tasks.
- Synthesis Mode: Generate realistic structured JSON data from a specified schema, ideal for creating synthetic datasets for simulations or testing.
- Image Analysis Mode: Analyze images to extract structured data and generate JSON outputs, making it suitable for computer vision applications.
- Customizable: Allows users to configure the OpenAI model and temperature settings for tailored data generation.
- Easy Integration: Seamlessly integrates into existing Python projects with a straightforward API and minimal setup.
Installation
Install jsonpaws using pip:
pip install json_paws
Getting Started
Setting the API Key
To use jsonpaws, you'll need an OpenAI API key. You can set it as an environment variable or pass it directly to the library.
Environment Variable
export OPENAI_API_KEY=your_api_key
Directly in Your Code
import openai
openai.api_key = "your_api_key"
Importing jsonpaws
Begin by importing the necessary components from the library:
from jsonpaws import JSONSchemaParser, PromptGenerator, ContentGenerator, JSONProcessor
Usage
Analysis Mode
In Analysis Mode, jsonpaws extracts structured data from unstructured text using a predefined JSON schema.
Example
import json
from jsonpaws import JSONSchemaParser, PromptGenerator, ContentGenerator, JSONProcessor
# Set your OpenAI API key here
api_key = "OPENAI-API-KEY"
# Define the JSON schema
json_schema = {
"type": "object",
"properties": {
"report_date": {"type": "string"},
"patients": {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {"type": "string"},
"firstName": {"type": "string"},
"lastName": {"type": "string"},
"age": {"type": "number", "minimum": 0, "maximum": 120},
"gender": {"type": "string", "enum": ["male", "female"]},
"diagnosis": {"type": "string"},
"medications": {
"type": "array",
"items": {"type": "string"}
}
}
}
}
}
}
# Initialize the schema parser
schema_parser = JSONSchemaParser(json_schema)
# For analysis mode
instructions = """
Extract patient information from the text and populate the JSON schema accordingly.
The report dated August 4, 2024, details the patient records for the day.
Among the patients, there is John Doe, a 45-year-old male diagnosed with hypertension, who is currently prescribed lisinopril and atorvastatin.
Another patient, Jane Smith, a 30-year-old female, has been diagnosed with diabetes and is on metformin.
The list also includes Sam Brown, a 60-year-old male suffering from arthritis, for which he is taking ibuprofen and methotrexate.
"""
prompt_generator_analysis = PromptGenerator(mode='analysis')
content_generator_analysis = ContentGenerator(api_key=api_key, model='gpt-4o', mode='analysis', instructions=instructions)
analysis_processor = JSONProcessor(schema_parser, prompt_generator_analysis, content_generator_analysis, mode='analysis')
# Process the data
generated_json_analysis = analysis_processor.process(instructions=instructions, schema=json_schema)
# Print the extracted structured JSON
print("Generated JSON (Analysis):", json.dumps(generated_json_analysis, indent=4))
Output
Generated JSON (Analysis): {
"report_date": "August 4, 2024",
"patients": [
{
"id": "1",
"firstName": "John",
"lastName": "Doe",
"age": 45,
"gender": "male",
"diagnosis": "hypertension",
"medications": [
"lisinopril",
"atorvastatin"
]
},
{
"id": "2",
"firstName": "Jane",
"lastName": "Smith",
"age": 30,
"gender": "female",
"diagnosis": "diabetes",
"medications": [
"metformin"
]
},
{
"id": "3",
"firstName": "Sam",
"lastName": "Brown",
"age": 60,
"gender": "male",
"diagnosis": "arthritis",
"medications": [
"ibuprofen",
"methotrexate"
]
}
]
}
Synthesis Mode
In Synthesis Mode, jsonpaws generates realistic structured JSON data from a specified schema.
Example
import json
from jsonpaws import JSONSchemaParser, PromptGenerator, ContentGenerator, JSONProcessor
# Set your OpenAI API key here
api_key = "OPENAI-API-KEY"
# Define the JSON schema
json_schema = {
"type": "object",
"properties": {
"report_date": {"type": "string"},
"patients": {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {"type": "string"},
"firstName": {"type": "string"},
"lastName": {"type": "string"},
"age": {"type": "number", "minimum": 0, "maximum": 120},
"gender": {"type": "string", "enum": ["male", "female"]},
"diagnosis": {"type": "string"},
"medications": {
"type": "array",
"items": {"type": "string"}
}
}
}
}
}
}
# Initialize the schema parser
schema_parser = JSONSchemaParser(json_schema)
# Instructions for synthesis mode
instructions = """
Generate a JSON with a report date and a list of patients, where each patient has fields like id, firstName, lastName, age, gender, diagnosis, and medications.
"""
# For synthesis mode
prompt_generator_synthesis = PromptGenerator(mode='synthesis')
content_generator_synthesis = ContentGenerator(api_key=api_key, model='gpt-4o', mode='synthesis', instructions=instructions)
synthesis_processor = JSONProcessor(schema_parser, prompt_generator_synthesis, content_generator_synthesis, mode='synthesis')
# Generate the synthetic JSON data
generated_json_synthesis = synthesis_processor.process(instructions=instructions, schema=json_schema)
# Print the generated synthetic JSON
print("Generated JSON (Synthesis):", json.dumps(generated_json_synthesis, indent=4))
Output
Generated JSON (Synthesis): {
"report_date": {
"report_date": "2023-10-18",
"patients": [
{
"id": "P001",
"firstName": "John",
"lastName": "Doe",
"age": 45,
"gender": "male",
"diagnosis": "Hypertension",
"medications": [
"Lisinopril",
"Amlodipine"
]
},
{
"id": "P002",
"firstName": "Jane",
"lastName": "Smith",
"age": 30,
"gender": "female",
"diagnosis": "Diabetes Type 2",
"medications": [
"Metformin",
"Glyburide"
]
},
{
"id": "P003",
"firstName": "Emily",
"lastName": "Johnson",
"age": 60,
"gender": "female",
"diagnosis": "Osteoarthritis",
"medications": [
"Ibuprofen",
"Glucosamine"
]
},
{
"id": "P004",
"firstName": "Michael",
"lastName": "Brown",
"age": 72,
"gender": "male",
"diagnosis": "Congestive Heart Failure",
"medications": [
"Furosemide",
"Digoxin"
]
}
]
},
"patients": {
"report_date": "2023-10-15",
"patients": [
{
"id": "P001",
"firstName": "John",
"lastName": "Doe",
"age": 45,
"gender": "male",
"diagnosis": "Hypertension",
"medications": [
"Lisinopril",
"Amlodipine"
]
},
{
"id": "P002",
"firstName": "Jane",
"lastName": "Smith",
"age": 34,
"gender": "female",
"diagnosis": "Type 2 Diabetes",
"medications": [
"Metformin",
"Glipizide"
]
},
{
"id": "P003",
"firstName": "Emily",
"lastName": "Johnson",
"age": 28,
"gender": "female",
"diagnosis": "Anxiety Disorder",
"medications": [
"Sertraline"
]
},
{
"id": "P004",
"firstName": "Michael",
"lastName": "Williams",
"age": 62,
"gender": "male",
"diagnosis": "Chronic Obstructive Pulmonary Disease",
"medications": [
"Tiotropium",
"Albuterol"
]
},
{
"id": "P005",
"firstName": "David",
"lastName": "Brown",
"age": 50,
"gender": "male",
"diagnosis": "Hyperlipidemia",
"medications": [
"Atorvastatin"
]
},
{
"id": "P006",
"firstName": "Sophia",
"lastName": "Davis",
"age": 40,
"gender": "female",
"diagnosis": "Asthma",
"medications": [
"Fluticasone",
"Salbutamol"
]
},
{
"id": "P007",
"firstName": "Daniel",
"lastName": "Miller",
"age": 75,
"gender": "male",
"diagnosis": "Heart Failure",
"medications": [
"Furosemide",
"Carvedilol"
]
},
{
"id": "P008",
"firstName": "Olivia",
"lastName": "Garcia",
"age": 22,
"gender": "female",
"diagnosis": "Major Depressive Disorder",
"medications": [
"Citalopram"
]
},
{
"id": "P009",
"firstName": "James",
"lastName": "Martinez",
"age": 30,
"gender": "male",
"diagnosis": "Bipolar Disorder",
"medications": [
"Lithium",
"Lamotrigine"
]
},
{
"id": "P010",
"firstName": "Ava",
"lastName": "Hernandez",
"age": 55,
"gender": "female",
"diagnosis": "Osteoarthritis",
"medications": [
"Ibuprofen",
"Glucosamine"
]
}
]
}
}
Image Analysis Mode
In Image Analysis Mode, jsonpaws analyzes images to extract structured JSON data.
Example
import json
from jsonpaws import JSONSchemaParser, PromptGenerator, ContentGenerator, JSONProcessor
# Set your OpenAI API key here
api_key = 'OPENAI-API-KEY'
# Define the JSON schema
json_schema = {
"type": "object",
"properties": {
"logos_in_image": {
"type": "array",
"items": {
"type": "string"
}
},
"scene": {
"type": "array",
"items": {
"type": "string"
}
},
"demographics": {
"type": "object",
"properties": {
"age_group": {
"type": "array",
"items": {
"type": "string",
"enum": [
"Under 5 years", "5 to 9 years", "10 to 14 years",
"15 to 19 years", "20 to 24 years", "25 to 29 years",
"30 to 34 years", "35 to 39 years", "40 to 44 years",
"45 to 49 years", "50 to 54 years", "55 to 59 years",
"60 to 64 years", "65 to 69 years", "70 to 74 years",
"75 to 79 years", "80 to 84 years", "85 years and over"
]
}
},
"races": {
"type": "array",
"items": {
"type": "string",
"enum": [
"White", "African American", "American Indian", "Asian", "Other"
]
}
},
"gender": {
"type": "array",
"items": {
"type": "string",
"enum": ["Male", "Female", "Non-binary", "Other"]
}
}
}
},
"objects": {
"type": "array",
"items": {
"type": "string"
}
},
"activity": {
"type": "array",
"items": {
"type": "string"
}
}
}
}
# Instructions for image analysis
instructions = '''
Analyze the following image and provide the results in JSON
'''
image_url = "https://cdn1.picuki.com/hosted-by-instagram/q/0exhNuNYnjBGZDHIdN5WmL9I2PwkAQ9OKftSQ7e71yJjMBhsLH6QvJA0mpCl6yRxIwVgFDeSYztj4oouUFlQAz17NEfWT7yKRTxV7q2RVean0Fph9JBikLc0LnIbZ3ap8MQvU2bABCxWFOkXULjh7uZDu7%7C%7CzNnZSyWaRMdsCmWYK4dv1CPol9YIosuzX2A3a5YcOLCkX+2UyMEgvsNzX5DwDWeKiYIMm66d5R%7C%7CkKiMQB5aHgnjH+LmMpRG1%7C%7CA23O6tqHoOAAuizgd2ge8VGoXo0fNk4G2WTsvDgntakgpIajOctq0Ppl4qLWFGQEDWpp9005xJG7liKaazL43hdQxjLXkOfmI6dgo5H9eNKtau24nAThT5D%7C%7CNf1PXnhSV7GDFVDUfaXmOOlgt61KCuhegEql+yeRQLnGxCxdIwNYuT+CIsNmcunN5vyiwiLcii+ChQs%7C%7CqP39dLYBngh%7C%7Cppyuz1Y9RnLFOttGP2mEgAl7EIY=.jpeg"
# Initialize components for image analysis
schema_parser = JSONSchemaParser(json_schema)
prompt_generator = PromptGenerator(mode='image')
content_generator = ContentGenerator(api_key=api_key, mode='image', instructions=instructions, temperature=0)
json_processor = JSONProcessor(
schema_parser=schema_parser,
prompt_generator=prompt_generator,
content_generator=content_generator,
mode='image'
)
result = json_processor.process(instructions=instructions, schema=json_schema, image_url=image_url)
print("Generated JSON (Image Analysis):", json.dumps(result, indent=4))
Output
Generated JSON (Image Analysis): {
"logos_in_image": [
"Adidas"
],
"scene": [
"indoor",
"cafe",
"relaxing area"
],
"demographics": {
"age_group": [
"20 to 24 years"
],
"races": [
"Other"
],
"gender": [
"Female"
]
},
"objects": [
"green smoothie",
"couch",
"plants",
"sunglasses"
],
"activity": [
"sitting",
"drinking"
]
}
Customization
jsonpaws allows users to customize the OpenAI model and temperature settings:
- Model: Specify the model to use (e.g., gpt-4o, gpt-4o-mini).
- Temperature: Control the randomness of the output. A higher temperature results in more random output. Example
# Customize the content generator
content_generator_custom = ContentGenerator(
api_key=api_key,
model='gpt-4o',
mode='synthesis',
temperature=0.8 # Higher temperature for more randomness
)
# Use the custom content generator in your processor
synthesis_processor_custom = JSONProcessor(schema_parser, prompt_generator_synthesis, content_generator_custom, mode='synthesis')
# Generate the synthetic JSON data
generated_json_custom = synthesis_processor_custom.process(data={}, schema=json_schema)
print("Generated JSON (Custom):", json.dumps(generated_json_custom, indent=4))
Contributing
Contributions are welcome! Please submit a pull request or open an issue to discuss potential improvements or features.
License
This project is licensed under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file json_paws-0.1.9.tar.gz
.
File metadata
- Download URL: json_paws-0.1.9.tar.gz
- Upload date:
- Size: 10.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b11a7610fa6de209e943ea67f4bd9ed37930f44b6ff180227c646a13dcbf8100 |
|
MD5 | cfc5701e42b476a24416285d30bf55c7 |
|
BLAKE2b-256 | 9b99d719257164bd0a460cbae73d7f9b9aa6b91a6a908790d830ff7116644499 |
File details
Details for the file json_paws-0.1.9-py3-none-any.whl
.
File metadata
- Download URL: json_paws-0.1.9-py3-none-any.whl
- Upload date:
- Size: 12.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | df035c05ec666435c58a110a9afb575ec4f6d24eec0f0c9df4777724a0b5a23e |
|
MD5 | 97a317be281cda9a2a7dde78fa261048 |
|
BLAKE2b-256 | 8aec46994dcd1cd528df5bfa3b3b35e037a7975ee3c87c10e6f74fe18ac3e8e0 |