A Strict JSON Framework for LLM Outputs, that fixes problems that json.loads() cannot solve

These details have not been verified by PyPI

Project links

GitHub Statistics

Project description

Strict JSON

A Strict JSON Framework for LLM Outputs, that fixes problems that json.loads() cannot solve

Works for JSON outputs with multiple ' or " or { or } or \ or unmatched braces/brackets that may break a json.loads()
Updated 5 Feb 2024 (v2.2.0)
- HUGE: Nested output formats of multiple lists and dictionaries are now supported!
- Now supports int, float, str, dict, list, Dict[], List[], Enum[] type forcing with LLM-based error correction
- Better handling of naming of variables in strict_function by using list of variables using variable_names
- Removed input_type and output_type from strict_function. Reason: input_type not needed as LLMs can flexibly perceive inputs, output_type is now handled with type forcing

Previous Versions

8 Jan 2024 (v2.0.2) [New: Installable by pip, Support for OpenAI JSON Mode, Functions]
Created: 28 Oct 2023
Collaborators welcome
Video tutorial: https://www.youtube.com/watch?v=IjTUKAciTCg
Discussion Channel (my discord - John's AI Group): discord.gg/bzp87AHJy5

How do I use this?

Download package via command line pip install strictjson
Set up your OpenAPI API Key. Refer to Tutorial.ipynb for how to do it for Jupyter Notebooks.
Import the required functions from strictjson and use them!

How does it work?

Extract JSON values as a string using a special regex (add delimiters to key to make ###key###) to split keys and values. (New!) Also works for nested datatypes by splitting recursively.
Uses ast.literal_eval to best match the extracted output value to a literal (e.g. int, string, dict).
Ensures that all JSON fields are output by LLM, with optional type checking, if not it will feed in error message to LLM to iteratively correct its generation (default: 3 tries)

Features:

Basic Generation

system_prompt: Write in whatever you want the LLM to become. "You are a <purpose in life>"
user_prompt: The user input. Later, when we use it as a function, this is the function input
output_format: JSON of output variables in a dictionary, with the key as the output key, and the value as the output description
- The output keys will be preserved exactly, while GPT will generate content to match the description of the value as best as possible

Example Usage

res = strict_json(system_prompt = 'You are a classifier',
                    user_prompt = 'It is a beautiful and sunny day',
                    output_format = {'Sentiment': 'Type of Sentiment',
                                    'Adjectives': 'List of adjectives',
                                    'Words': 'Number of words'})
                                    
print(res)

Example output

{'Sentiment': 'positive', 'Adjectives': ['beautiful', 'sunny'], 'Words': 7}

Advanced Generation

More advanced demonstration involving code that would typically break json.loads()

Example Usage

res = strict_json(system_prompt = 'You are a code generator, generating code to fulfil a task',
                    user_prompt = 'Given array p, output a function named func_sum to return its sum',
                    output_format = {'Elaboration': 'How you would do it',
                                     'C': 'Code',
                                    'Python': 'Code'})
                                    
print(res)

Example output

{'Elaboration': 'To calculate the sum of an array, we can iterate through each element of the array and add it to a running total.',

'C': 'int func_sum(int p[], int size) {\n int sum = 0;\n for (int i = 0; i < size; i++) {\n sum += p[i];\n }\n return sum;\n}',

'Python': 'def func_sum(p):\n sum = 0\n for num in p:\n sum += num\n return sum'}

Type forcing

Generally, strict_json will infer the data type automatically for you for the output fields
However, if you would like very specific data types, you can do data forcing using type: <data_type> at the last part of the output field description
<data_type> must be of the form int, float, str, dict, list, Dict[], List[], Enum[]
The Enum and List are not case sensitive, so enum and list works just as well
For Enum[list_of_category_names], it is best to give an "Other" category in case the LLM fails to classify correctly with the other options.
If list or List[] is not formatted correctly in LLM's output, we will correct it by asking the LLM to list out the elements line by line
For dict, we can further check whether keys are present using Dict[list_of_key_names]
Other types will first be forced by rule-based conversion, any further errors will be fed into LLM's error feedback mechanism
If <data_type> is not the specified data types, it can still be useful to shape the output for the LLM. However, no type checking will be done.

Example Usage 1

res = strict_json(system_prompt = 'You are a classifier',
                    user_prompt = 'It is a beautiful and sunny day',
                    output_format = {'Sentiment': 'Type of Sentiment, type: Enum["Pos", "Neg", "Other"]',
                                    'Adjectives': 'List of adjectives, type: List[str]',
                                    'Words': 'Number of words, type: int'})
                                    
print(res)

Example output 1

{'Sentiment': 'Pos', 'Adjectives': ['beautiful', 'sunny'], 'Words': 7}

Example Usage 2

res = strict_json(system_prompt = 'You are an expert at organising birthday parties',
                    user_prompt = 'Give me some information on how to organise a birthday',
                    output_format = {'Key steps for organising birthdays': 'list of 3 steps, type: list',
                                    'Lucky draw numbers': '3 numbers from 1-50, type: List[int]',
                                    'Sample venues': 'Describe two venues, type: List[Dict["Venue", "Description", "Cost"]]'})

print(res)

Example output 2

{'Key steps for organising birthdays': ['1. Determine the budget and guest list', '2. Choose a theme and plan the decorations', '3. Arrange for food, drinks, and entertainment'],

'Lucky draw numbers': [10, 25, 42],

'Sample venues': [{'Venue': 'Local park', 'Description': 'Outdoor space with picnic tables and playground', 'Cost': 'Free'},

{'Venue': 'Party venue', 'Description': 'Indoor space with party rooms and entertainment options', 'Cost': 'Varies depending on package'}]}

Strict JSON Functions

Enhances strict_json() with a function-like interface for repeated use of modular LLM-based functions
Inputs (compulsory):
- fn_description - Function description to describe process of transforming input variables to output variables
- output_format - Dictionary containing output variables names and description for each variable. There must be at least one output variable
Inputs (optional):
- examples - Examples in Dictionary form with the input and output variables (list if more than one)
- variable_names - How the variables should be named in a list
- kwargs - Additional arguments you would like to pass on to the strict_json function
Outputs: JSON of output variables in a dictionary (similar to strict_json)

Example Usage 1 (Description only)

# Construct the function: var1 will be first input variable, var2 will be second input variable and so on
fn = strict_function(fn_description = 'Output a sentence with words var1 and var2 in the style of var3', 
                     output_format = {'output': 'sentence'})

# Use the function
fn('ball', 'dog', 'happy')

Example Output 1

{'output': 'The happy dog chased the ball.'}

Example Usage 2 (Examples only)

# Construct the function: infer pattern from just examples without description (here it is multiplication)
fn = strict_function(fn_description = 'Map input to output based on examples', 
                     output_format = {'output': 'final answer'}, 
                     examples = [{'var1': 3, 'var2': 2, 'output': 6}, 
                                 {'var1': 5, 'var2': 3, 'output': 15}, 
                                 {'var1': 7, 'var2': 4, 'output': 28}])

# Use the function
fn(2, 10)

Example Output 2

{'output': 20}

Example Usage 3 (Description and Variable Names and Examples)

# Construct the function: description and examples with variable names
# variable names will be referenced in order of input
fn = strict_function(fn_description = 'Output the sum and difference of num1 and num2', 
                 output_format = {'sum': 'sum of two numbers', 
                                  'difference': 'absolute difference of two numbers'}, 
                 variable_names = ['num1', 'num2'],
                 examples = {'num1': 2, 'num2': 4, 'sum': 6, 'difference': 2})

# Use the function
fn(3, 4)

Example Output 3

{'sum': 7, 'difference': 1}

Integrating with OpenAI JSON Mode

If you want to use the OpenAI JSON Mode (which is pretty good btw), you can simply add in openai_json_mode = True in strict_json or strict_function
Note that the model must be one of gpt-4-1106-preview or gpt-3.5-turbo-1106. We will set it to gpt-3.5-turbo-1106 by default if you provide an invalid model

Example Usage

res = strict_json(system_prompt = 'You are a classifier',
                    user_prompt = 'It is a beautiful and sunny day',
                    output_format = {'Sentiment': 'Type of Sentiment',
                                    'Adjectives': 'List of adjectives',
                                    'Words': 'Number of words'},
                    openai_json_mode = True) # Toggle this to True
                                    
print(res)

Example output

{'Sentiment': 'Positive', 'Adjectives': ['beautiful', 'sunny'], 'Words': 6}

Future Features:

Agents with Tool Use
Conversational Agents

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

Release history Release notifications | RSS feed

5.1.1

Jul 25, 2024

5.1.0

Jul 10, 2024

5.0.0

Jul 5, 2024

4.1.0

May 5, 2024

4.0.1

Apr 22, 2024

4.0.0

Mar 4, 2024

3.0.2

Feb 16, 2024

3.0.1

Feb 15, 2024

2.2.2

Feb 9, 2024

2.2.1

Feb 8, 2024

This version

2.2.0

Feb 4, 2024

2.1.0

Feb 3, 2024

2.0.2

Jan 8, 2024

2.0.1

Jan 6, 2024

2.0.0

Jan 5, 2024

0.0.1

Jan 5, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strictjson-2.2.0.tar.gz (14.7 kB view hashes)

Uploaded Feb 4, 2024 Source

Built Distribution

strictjson-2.2.0-py3-none-any.whl (12.1 kB view hashes)

Uploaded Feb 4, 2024 Python 3

Hashes for strictjson-2.2.0.tar.gz

Hashes for strictjson-2.2.0.tar.gz
Algorithm	Hash digest
SHA256	`86b3149c149f963c1a47e861737aa870e276acbdf5e360981820ed051cae78c4`
MD5	`0e0b9bd1b2a38dd4d17be023b4badafc`
BLAKE2b-256	`e1ccb287b7003c470afdfc088cb3ad5d827c80933be752b323913cfe369a8f31`

Hashes for strictjson-2.2.0-py3-none-any.whl

Hashes for strictjson-2.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`64df6a362f0af52a8dd0c306bcd6b943a59d3df05b2ec12f9bc8feba6454d344`
MD5	`799a8c1cc56590e2747b3e2565b66a5c`
BLAKE2b-256	`a9ad782478b1c252ae41c39291ca9b5a205dc7110a9953f7d08f6e7a8cd63d72`

strictjson 2.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Strict JSON

How do I use this?

How does it work?

Features:

Basic Generation

Example Usage

Example output

Advanced Generation

Example Usage

Example output

Type forcing

Example Usage 1

Example output 1

Example Usage 2

Example output 2

Strict JSON Functions

Example Usage 1 (Description only)

Example Output 1

Example Usage 2 (Examples only)

Example Output 2

Example Usage 3 (Description and Variable Names and Examples)

Example Output 3

Integrating with OpenAI JSON Mode

Example Usage

Example output

Future Features:

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution