Skip to main content

A library for generating structured JSON using GPT-4o.

Project description

jsonpaws

jsonpaws is an open-source Python library for generating structured and consistent JSON outputs using GPT-4o. It provides modes for analysis, synthesis, and image analysis to extract and generate structured JSON data from unstructured text or based on a given schema.

Features

  • Analysis Mode: Extract structured information from unstructured text. Perfect for data extraction and transformation tasks.
  • Synthesis Mode: Generate realistic structured JSON data from a specified schema, ideal for creating synthetic datasets for simulations or testing.
  • Image Analysis Mode: Analyze images to extract structured data and generate JSON outputs, making it suitable for computer vision applications.
  • Customizable: Allows users to configure the OpenAI model and temperature settings for tailored data generation.
  • Easy Integration: Seamlessly integrates into existing Python projects with a straightforward API and minimal setup.

Installation

Install jsonpaws using pip:

pip install json_paws

Getting Started

Setting the API Key

To use jsonpaws, you'll need an OpenAI API key. You can set it as an environment variable or pass it directly to the library.

Environment Variable

export OPENAI_API_KEY=your_api_key

Directly in Your Code

import openai

openai.api_key = "your_api_key"

Importing jsonpaws

Begin by importing the necessary components from the library:

from jsonpaws import JSONSchemaParser, PromptGenerator, ContentGenerator, JSONProcessor

Usage

Analysis Mode

In Analysis Mode, jsonpaws extracts structured data from unstructured text using a predefined JSON schema.

Example

import json
from jsonpaws import JSONSchemaParser, PromptGenerator, ContentGenerator, JSONProcessor

# Set your OpenAI API key here
api_key = "OPENAI-API-KEY"

# Define the JSON schema
json_schema = {
    "type": "object",
    "properties": {
        "report_date": {"type": "string"},
        "patients": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "id": {"type": "string"},
                    "firstName": {"type": "string"},
                    "lastName": {"type": "string"},
                    "age": {"type": "number", "minimum": 0, "maximum": 120},
                    "gender": {"type": "string", "enum": ["male", "female"]},
                    "diagnosis": {"type": "string"},
                    "medications": {
                        "type": "array",
                        "items": {"type": "string"}
                    }
                }
            }
        }
    }
}

# Initialize the schema parser
schema_parser = JSONSchemaParser(json_schema)

# For analysis mode
instructions = """
Extract patient information from the text and populate the JSON schema accordingly.
The report dated August 4, 2024, details the patient records for the day.
Among the patients, there is John Doe, a 45-year-old male diagnosed with hypertension, who is currently prescribed lisinopril and atorvastatin.
Another patient, Jane Smith, a 30-year-old female, has been diagnosed with diabetes and is on metformin.
The list also includes Sam Brown, a 60-year-old male suffering from arthritis, for which he is taking ibuprofen and methotrexate.
"""
prompt_generator_analysis = PromptGenerator(mode='analysis')
content_generator_analysis = ContentGenerator(api_key=api_key, model='gpt-4o', mode='analysis', instructions=instructions)
analysis_processor = JSONProcessor(schema_parser, prompt_generator_analysis, content_generator_analysis, mode='analysis')

# Process the data
generated_json_analysis = analysis_processor.process(instructions=instructions, schema=json_schema)

# Print the extracted structured JSON
print("Generated JSON (Analysis):", json.dumps(generated_json_analysis, indent=4))

Output

Generated JSON (Analysis): {
    "report_date": "August 4, 2024",
    "patients": [
        {
            "id": "1",
            "firstName": "John",
            "lastName": "Doe",
            "age": 45,
            "gender": "male",
            "diagnosis": "hypertension",
            "medications": [
                "lisinopril",
                "atorvastatin"
            ]
        },
        {
            "id": "2",
            "firstName": "Jane",
            "lastName": "Smith",
            "age": 30,
            "gender": "female",
            "diagnosis": "diabetes",
            "medications": [
                "metformin"
            ]
        },
        {
            "id": "3",
            "firstName": "Sam",
            "lastName": "Brown",
            "age": 60,
            "gender": "male",
            "diagnosis": "arthritis",
            "medications": [
                "ibuprofen",
                "methotrexate"
            ]
        }
    ]
}

Synthesis Mode

In Synthesis Mode, jsonpaws generates realistic structured JSON data from a specified schema.

Example

import json
from jsonpaws import JSONSchemaParser, PromptGenerator, ContentGenerator, JSONProcessor

# Set your OpenAI API key here
api_key = "OPENAI-API-KEY"

# Define the JSON schema
json_schema = {
    "type": "object",
    "properties": {
        "report_date": {"type": "string"},
        "patients": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "id": {"type": "string"},
                    "firstName": {"type": "string"},
                    "lastName": {"type": "string"},
                    "age": {"type": "number", "minimum": 0, "maximum": 120},
                    "gender": {"type": "string", "enum": ["male", "female"]},
                    "diagnosis": {"type": "string"},
                    "medications": {
                        "type": "array",
                        "items": {"type": "string"}
                    }
                }
            }
        }
    }
}

# Initialize the schema parser
schema_parser = JSONSchemaParser(json_schema)

# Instructions for synthesis mode
instructions = """
Generate a JSON with a report date and a list of patients, where each patient has fields like id, firstName, lastName, age, gender, diagnosis, and medications.
"""

# For synthesis mode
prompt_generator_synthesis = PromptGenerator(mode='synthesis')
content_generator_synthesis = ContentGenerator(api_key=api_key, model='gpt-4o', mode='synthesis', instructions=instructions)
synthesis_processor = JSONProcessor(schema_parser, prompt_generator_synthesis, content_generator_synthesis, mode='synthesis')

# Generate the synthetic JSON data
generated_json_synthesis = synthesis_processor.process(instructions=instructions, schema=json_schema)

# Print the generated synthetic JSON
print("Generated JSON (Synthesis):", json.dumps(generated_json_synthesis, indent=4))

Output

Generated JSON (Synthesis): {
    "report_date": {
        "report_date": "2023-10-18",
        "patients": [
            {
                "id": "P001",
                "firstName": "John",
                "lastName": "Doe",
                "age": 45,
                "gender": "male",
                "diagnosis": "Hypertension",
                "medications": [
                    "Lisinopril",
                    "Amlodipine"
                ]
            },
            {
                "id": "P002",
                "firstName": "Jane",
                "lastName": "Smith",
                "age": 30,
                "gender": "female",
                "diagnosis": "Diabetes Type 2",
                "medications": [
                    "Metformin",
                    "Glyburide"
                ]
            },
            {
                "id": "P003",
                "firstName": "Emily",
                "lastName": "Johnson",
                "age": 60,
                "gender": "female",
                "diagnosis": "Osteoarthritis",
                "medications": [
                    "Ibuprofen",
                    "Glucosamine"
                ]
            },
            {
                "id": "P004",
                "firstName": "Michael",
                "lastName": "Brown",
                "age": 72,
                "gender": "male",
                "diagnosis": "Congestive Heart Failure",
                "medications": [
                    "Furosemide",
                    "Digoxin"
                ]
            }
        ]
    },
    "patients": {
        "report_date": "2023-10-15",
        "patients": [
            {
                "id": "P001",
                "firstName": "John",
                "lastName": "Doe",
                "age": 45,
                "gender": "male",
                "diagnosis": "Hypertension",
                "medications": [
                    "Lisinopril",
                    "Amlodipine"
                ]
            },
            {
                "id": "P002",
                "firstName": "Jane",
                "lastName": "Smith",
                "age": 34,
                "gender": "female",
                "diagnosis": "Type 2 Diabetes",
                "medications": [
                    "Metformin",
                    "Glipizide"
                ]
            },
            {
                "id": "P003",
                "firstName": "Emily",
                "lastName": "Johnson",
                "age": 28,
                "gender": "female",
                "diagnosis": "Anxiety Disorder",
                "medications": [
                    "Sertraline"
                ]
            },
            {
                "id": "P004",
                "firstName": "Michael",
                "lastName": "Williams",
                "age": 62,
                "gender": "male",
                "diagnosis": "Chronic Obstructive Pulmonary Disease",
                "medications": [
                    "Tiotropium",
                    "Albuterol"
                ]
            },
            {
                "id": "P005",
                "firstName": "David",
                "lastName": "Brown",
                "age": 50,
                "gender": "male",
                "diagnosis": "Hyperlipidemia",
                "medications": [
                    "Atorvastatin"
                ]
            },
            {
                "id": "P006",
                "firstName": "Sophia",
                "lastName": "Davis",
                "age": 40,
                "gender": "female",
                "diagnosis": "Asthma",
                "medications": [
                    "Fluticasone",
                    "Salbutamol"
                ]
            },
            {
                "id": "P007",
                "firstName": "Daniel",
                "lastName": "Miller",
                "age": 75,
                "gender": "male",
                "diagnosis": "Heart Failure",
                "medications": [
                    "Furosemide",
                    "Carvedilol"
                ]
            },
            {
                "id": "P008",
                "firstName": "Olivia",
                "lastName": "Garcia",
                "age": 22,
                "gender": "female",
                "diagnosis": "Major Depressive Disorder",
                "medications": [
                    "Citalopram"
                ]
            },
            {
                "id": "P009",
                "firstName": "James",
                "lastName": "Martinez",
                "age": 30,
                "gender": "male",
                "diagnosis": "Bipolar Disorder",
                "medications": [
                    "Lithium",
                    "Lamotrigine"
                ]
            },
            {
                "id": "P010",
                "firstName": "Ava",
                "lastName": "Hernandez",
                "age": 55,
                "gender": "female",
                "diagnosis": "Osteoarthritis",
                "medications": [
                    "Ibuprofen",
                    "Glucosamine"
                ]
            }
        ]
    }
}          

Image Analysis Mode

In Image Analysis Mode, jsonpaws analyzes images to extract structured JSON data.

Example

import json
from jsonpaws import JSONSchemaParser, PromptGenerator, ContentGenerator, JSONProcessor


# Set your OpenAI API key here
api_key = 'OPENAI-API-KEY'

# Define the JSON schema
json_schema = {
    "type": "object",
    "properties": {
        "logos_in_image": {
            "type": "array",
            "items": {
                "type": "string"
            }
        },
        "scene": {
            "type": "array",
            "items": {
                "type": "string"
            }
        },
        "demographics": {
            "type": "object",
            "properties": {
                "age_group": {
                    "type": "array",
                    "items": {
                        "type": "string",
                        "enum": [
                            "Under 5 years", "5 to 9 years", "10 to 14 years",
                            "15 to 19 years", "20 to 24 years", "25 to 29 years",
                            "30 to 34 years", "35 to 39 years", "40 to 44 years",
                            "45 to 49 years", "50 to 54 years", "55 to 59 years",
                            "60 to 64 years", "65 to 69 years", "70 to 74 years",
                            "75 to 79 years", "80 to 84 years", "85 years and over"
                        ]
                    }
                },
                "races": {
                    "type": "array",
                    "items": {
                        "type": "string",
                        "enum": [
                            "White", "African American", "American Indian", "Asian", "Other"
                        ]
                    }
                },
                "gender": {
                    "type": "array",
                    "items": {
                        "type": "string",
                        "enum": ["Male", "Female", "Non-binary", "Other"]
                    }
                }
            }
        },
        "objects": {
            "type": "array",
            "items": {
                "type": "string"
            }
        },
        "activity": {
            "type": "array",
            "items": {
                "type": "string"
            }
        }
    }
}


# Instructions for image analysis
instructions = '''
Analyze the following image and provide the results in JSON
'''
image_url = "https://cdn1.picuki.com/hosted-by-instagram/q/0exhNuNYnjBGZDHIdN5WmL9I2PwkAQ9OKftSQ7e71yJjMBhsLH6QvJA0mpCl6yRxIwVgFDeSYztj4oouUFlQAz17NEfWT7yKRTxV7q2RVean0Fph9JBikLc0LnIbZ3ap8MQvU2bABCxWFOkXULjh7uZDu7%7C%7CzNnZSyWaRMdsCmWYK4dv1CPol9YIosuzX2A3a5YcOLCkX+2UyMEgvsNzX5DwDWeKiYIMm66d5R%7C%7CkKiMQB5aHgnjH+LmMpRG1%7C%7CA23O6tqHoOAAuizgd2ge8VGoXo0fNk4G2WTsvDgntakgpIajOctq0Ppl4qLWFGQEDWpp9005xJG7liKaazL43hdQxjLXkOfmI6dgo5H9eNKtau24nAThT5D%7C%7CNf1PXnhSV7GDFVDUfaXmOOlgt61KCuhegEql+yeRQLnGxCxdIwNYuT+CIsNmcunN5vyiwiLcii+ChQs%7C%7CqP39dLYBngh%7C%7Cppyuz1Y9RnLFOttGP2mEgAl7EIY=.jpeg"

# Initialize components for image analysis
schema_parser = JSONSchemaParser(json_schema)
prompt_generator = PromptGenerator(mode='image')
content_generator = ContentGenerator(api_key=api_key, mode='image', instructions=instructions, temperature=0)

json_processor = JSONProcessor(
    schema_parser=schema_parser,
    prompt_generator=prompt_generator,
    content_generator=content_generator,
    mode='image'
)

result = json_processor.process(instructions=instructions, schema=json_schema, image_url=image_url)
print("Generated JSON (Image Analysis):", json.dumps(result, indent=4))

Output

Generated JSON (Image Analysis): {
    "logos_in_image": [
        "Adidas"
    ],
    "scene": [
        "indoor",
        "cafe",
        "relaxing area"
    ],
    "demographics": {
        "age_group": [
            "20 to 24 years"
        ],
        "races": [
            "Other"
        ],
        "gender": [
            "Female"
        ]
    },
    "objects": [
        "green smoothie",
        "couch",
        "plants",
        "sunglasses"
    ],
    "activity": [
        "sitting",
        "drinking"
    ]
}

Customization

jsonpaws allows users to customize the OpenAI model and temperature settings:

  • Model: Specify the model to use (e.g., gpt-4o, gpt-4o-mini).
  • Temperature: Control the randomness of the output. A higher temperature results in more random output. Example
# Customize the content generator
content_generator_custom = ContentGenerator(
    api_key=api_key,
    model='gpt-4o',
    mode='synthesis',
    temperature=0.8  # Higher temperature for more randomness
)

# Use the custom content generator in your processor
synthesis_processor_custom = JSONProcessor(schema_parser, prompt_generator_synthesis, content_generator_custom, mode='synthesis')

# Generate the synthetic JSON data
generated_json_custom = synthesis_processor_custom.process(data={}, schema=json_schema)
print("Generated JSON (Custom):", json.dumps(generated_json_custom, indent=4))

Contributing

Contributions are welcome! Please submit a pull request or open an issue to discuss potential improvements or features.

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

json_paws-0.1.9.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

json_paws-0.1.9-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file json_paws-0.1.9.tar.gz.

File metadata

  • Download URL: json_paws-0.1.9.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for json_paws-0.1.9.tar.gz
Algorithm Hash digest
SHA256 b11a7610fa6de209e943ea67f4bd9ed37930f44b6ff180227c646a13dcbf8100
MD5 cfc5701e42b476a24416285d30bf55c7
BLAKE2b-256 9b99d719257164bd0a460cbae73d7f9b9aa6b91a6a908790d830ff7116644499

See more details on using hashes here.

File details

Details for the file json_paws-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: json_paws-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for json_paws-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 df035c05ec666435c58a110a9afb575ec4f6d24eec0f0c9df4777724a0b5a23e
MD5 97a317be281cda9a2a7dde78fa261048
BLAKE2b-256 8aec46994dcd1cd528df5bfa3b3b35e037a7975ee3c87c10e6f74fe18ac3e8e0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page