A Strict JSON Framework for LLM Outputs, that fixes problems that json.loads() cannot solve
Project description
Strict JSON
A Strict JSON Framework for LLM Outputs, that fixes problems that json.loads() cannot solve
- Works for JSON outputs with multiple ' or " or { or } or \ or unmatched braces/brackets that may break a json.loads()
- Updated 5 Feb 2024 (v2.2.0)
- HUGE: Nested output formats of multiple lists and dictionaries are now supported!
- Now supports
int
,float
,str
,dict
,list
,Dict[]
,List[]
,Enum[]
type forcing with LLM-based error correction - Better handling of naming of variables in
strict_function
by using list of variables usingvariable_names
- Removed
input_type
andoutput_type
fromstrict_function
. Reason:input_type
not needed as LLMs can flexibly perceive inputs,output_type
is now handled with type forcing
Previous Versions
-
8 Jan 2024 (v2.0.2) [New: Installable by pip, Support for OpenAI JSON Mode, Functions]
-
Created: 28 Oct 2023
-
Collaborators welcome
-
Video tutorial: https://www.youtube.com/watch?v=IjTUKAciTCg
-
Discussion Channel (my discord - John's AI Group): discord.gg/bzp87AHJy5
How do I use this?
- Download package via command line
pip install strictjson
- Set up your OpenAPI API Key. Refer to
Tutorial.ipynb
for how to do it for Jupyter Notebooks. - Import the required functions from
strictjson
and use them!
How does it work?
- Extract JSON values as a string using a special regex (add delimiters to
key
to make###key###
) to split keys and values. (New!) Also works for nested datatypes by splitting recursively. - Uses
ast.literal_eval
to best match the extracted output value to a literal (e.g. int, string, dict). - Ensures that all JSON fields are output by LLM, with optional type checking, if not it will feed in error message to LLM to iteratively correct its generation (default: 3 tries)
Features:
Basic Generation
- system_prompt: Write in whatever you want the LLM to become. "You are a <purpose in life>"
- user_prompt: The user input. Later, when we use it as a function, this is the function input
- output_format: JSON of output variables in a dictionary, with the key as the output key, and the value as the output description
- The output keys will be preserved exactly, while GPT will generate content to match the description of the value as best as possible
Example Usage
res = strict_json(system_prompt = 'You are a classifier',
user_prompt = 'It is a beautiful and sunny day',
output_format = {'Sentiment': 'Type of Sentiment',
'Adjectives': 'List of adjectives',
'Words': 'Number of words'})
print(res)
Example output
{'Sentiment': 'positive', 'Adjectives': ['beautiful', 'sunny'], 'Words': 7}
Advanced Generation
- More advanced demonstration involving code that would typically break
json.loads()
Example Usage
res = strict_json(system_prompt = 'You are a code generator, generating code to fulfil a task',
user_prompt = 'Given array p, output a function named func_sum to return its sum',
output_format = {'Elaboration': 'How you would do it',
'C': 'Code',
'Python': 'Code'})
print(res)
Example output
{'Elaboration': 'To calculate the sum of an array, we can iterate through each element of the array and add it to a running total.',
'C': 'int func_sum(int p[], int size) {\n int sum = 0;\n for (int i = 0; i < size; i++) {\n sum += p[i];\n }\n return sum;\n}',
'Python': 'def func_sum(p):\n sum = 0\n for num in p:\n sum += num\n return sum'}
Type forcing
- Generally,
strict_json
will infer the data type automatically for you for the output fields - However, if you would like very specific data types, you can do data forcing using
type: <data_type>
at the last part of the output field description <data_type>
must be of the formint
,float
,str
,dict
,list
,Dict[]
,List[]
,Enum[]
- The
Enum
andList
are not case sensitive, soenum
andlist
works just as well - For
Enum[list_of_category_names]
, it is best to give an "Other" category in case the LLM fails to classify correctly with the other options. - If
list
orList[]
is not formatted correctly in LLM's output, we will correct it by asking the LLM to list out the elements line by line - For
dict
, we can further check whether keys are present usingDict[list_of_key_names]
- Other types will first be forced by rule-based conversion, any further errors will be fed into LLM's error feedback mechanism
- If
<data_type>
is not the specified data types, it can still be useful to shape the output for the LLM. However, no type checking will be done.
Example Usage 1
res = strict_json(system_prompt = 'You are a classifier',
user_prompt = 'It is a beautiful and sunny day',
output_format = {'Sentiment': 'Type of Sentiment, type: Enum["Pos", "Neg", "Other"]',
'Adjectives': 'List of adjectives, type: List[str]',
'Words': 'Number of words, type: int'})
print(res)
Example output 1
{'Sentiment': 'Pos', 'Adjectives': ['beautiful', 'sunny'], 'Words': 7}
Example Usage 2
res = strict_json(system_prompt = 'You are an expert at organising birthday parties',
user_prompt = 'Give me some information on how to organise a birthday',
output_format = {'Key steps for organising birthdays': 'list of 3 steps, type: list',
'Lucky draw numbers': '3 numbers from 1-50, type: List[int]',
'Sample venues': 'Describe two venues, type: List[Dict["Venue", "Description", "Cost"]]'})
print(res)
Example output 2
{'Key steps for organising birthdays': ['1. Determine the budget and guest list', '2. Choose a theme and plan the decorations', '3. Arrange for food, drinks, and entertainment'],
'Lucky draw numbers': [10, 25, 42],
'Sample venues': [{'Venue': 'Local park', 'Description': 'Outdoor space with picnic tables and playground', 'Cost': 'Free'},
{'Venue': 'Party venue', 'Description': 'Indoor space with party rooms and entertainment options', 'Cost': 'Varies depending on package'}]}
Strict JSON Functions
-
Enhances
strict_json()
with a function-like interface for repeated use of modular LLM-based functions -
Inputs (compulsory):
- fn_description - Function description to describe process of transforming input variables to output variables
- output_format - Dictionary containing output variables names and description for each variable. There must be at least one output variable
-
Inputs (optional):
- examples - Examples in Dictionary form with the input and output variables (list if more than one)
- variable_names - How the variables should be named in a list
- kwargs - Additional arguments you would like to pass on to the
strict_json
function
-
Outputs: JSON of output variables in a dictionary (similar to
strict_json
)
Example Usage 1 (Description only)
# Construct the function: var1 will be first input variable, var2 will be second input variable and so on
fn = strict_function(fn_description = 'Output a sentence with words var1 and var2 in the style of var3',
output_format = {'output': 'sentence'})
# Use the function
fn('ball', 'dog', 'happy')
Example Output 1
{'output': 'The happy dog chased the ball.'}
Example Usage 2 (Examples only)
# Construct the function: infer pattern from just examples without description (here it is multiplication)
fn = strict_function(fn_description = 'Map input to output based on examples',
output_format = {'output': 'final answer'},
examples = [{'var1': 3, 'var2': 2, 'output': 6},
{'var1': 5, 'var2': 3, 'output': 15},
{'var1': 7, 'var2': 4, 'output': 28}])
# Use the function
fn(2, 10)
Example Output 2
{'output': 20}
Example Usage 3 (Description and Variable Names and Examples)
# Construct the function: description and examples with variable names
# variable names will be referenced in order of input
fn = strict_function(fn_description = 'Output the sum and difference of num1 and num2',
output_format = {'sum': 'sum of two numbers',
'difference': 'absolute difference of two numbers'},
variable_names = ['num1', 'num2'],
examples = {'num1': 2, 'num2': 4, 'sum': 6, 'difference': 2})
# Use the function
fn(3, 4)
Example Output 3
{'sum': 7, 'difference': 1}
Integrating with OpenAI JSON Mode
- If you want to use the OpenAI JSON Mode (which is pretty good btw), you can simply add in
openai_json_mode = True
instrict_json
orstrict_function
- Note that the model must be one of
gpt-4-1106-preview
orgpt-3.5-turbo-1106
. We will set it togpt-3.5-turbo-1106
by default if you provide an invalid model
Example Usage
res = strict_json(system_prompt = 'You are a classifier',
user_prompt = 'It is a beautiful and sunny day',
output_format = {'Sentiment': 'Type of Sentiment',
'Adjectives': 'List of adjectives',
'Words': 'Number of words'},
openai_json_mode = True) # Toggle this to True
print(res)
Example output
{'Sentiment': 'Positive', 'Adjectives': ['beautiful', 'sunny'], 'Words': 6}
Future Features:
- Agents with Tool Use
- Conversational Agents
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for strictjson-2.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 64df6a362f0af52a8dd0c306bcd6b943a59d3df05b2ec12f9bc8feba6454d344 |
|
MD5 | 799a8c1cc56590e2747b3e2565b66a5c |
|
BLAKE2b-256 | a9ad782478b1c252ae41c39291ca9b5a205dc7110a9953f7d08f6e7a8cd63d72 |