Parse JSON outputs with logprobs from the OpenAI API to convert enum fields into probability distributions.
Project description
distenum
distenum parses JSON outputs with logprobs from the OpenAI API and converts enum-type string fields into probability distributions instead of a single label.
What it does
With structured outputs, the API returns a single enum value (e.g. "positive"). If you request logprobs=True, you also get token-level logprobs. distenum turns those logprobs into a probability distribution over your enum options so you can see how confident the model was in each choice.
Example: For a sentiment field with enum: ["positive", "negative", "neutral"]:
| From the API (content only) | With distenum (using logprobs) |
|---|---|
"sentiment": "positive" |
"sentiment": {"positive": 0.72, "negative": 0.18, "neutral": 0.10} |
So instead of a single label, you get a distribution you can use for uncertainty, ranking, or thresholding (e.g. only accept when positive probability > 0.8).
Install
pip install distenum
To run the example script (calls the OpenAI API):
pip install distenum[openai]
Quick start
Install the package and the OpenAI client: pip install distenum[openai]. Then call the API with logprobs=True and top_logprobs=20, and pass the response logprobs into distenum:
from openai import OpenAI
from distenum import parse_using_schema_and_logprobs
schema = {
"type": "object",
"properties": {
"sentiment": {
"type": "string",
"enum": ["positive", "negative", "neutral"]
}
}
}
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-2024-08-06",
messages=[{"role": "user", "content": "Your prompt"}],
response_format={
"type": "json_schema",
"json_schema": {"name": "my_schema", "strict": True, "schema": schema}
},
logprobs=True,
top_logprobs=20,
)
logprobs_data = response.choices[0].logprobs
parsed = parse_using_schema_and_logprobs(schema, logprobs_data)
# parsed["sentiment"] might be: {"positive": 0.72, "negative": 0.18, "neutral": 0.10}
Enum design tips
- Different prefixes: Enum values are matched to token logprobs by prefix. Prefer enum labels that do not share a common prefix (e.g.
"positive","negative","neutral"are good;"pos"and"positive"can blur probabilities). - Fewer is better: The API returns at most 20 logprobs per token (
top_logprobs=20). With many enum values, most will get no mass; keep enums small for meaningful distributions.
API
-
parse_using_schema_and_logprobs(schema_dict, logprobs_data)
Parses the logprobs stream according to the JSON Schema. Fields of typestringwith anenumare returned as a dict mapping each enum label to a probability (non-negative, summing to 1). Other fields are parsed as normal JSON values. -
tokenize(logprobs_data)
Low-level generator that yields tokens and their top-logprobs from the OpenAI logprobs content.
Performance
The parser walks token-level logprobs and builds probability distributions for enum fields, so it is slower than parsing the same JSON with the standard library. A rough comparison (same logical structure, 100k iterations):
| Parser | Time (100k parses) | Throughput | Avg per parse |
|---|---|---|---|
json.loads |
~0.16 s | ~630k/sec | ~1.6 µs |
| distenum | ~3.0 s | ~33k/sec | ~30 µs |
So distenum is typically about 15–20× slower than json.loads for the same structure. In absolute terms, ~30 µs per parse is negligible compared to an OpenAI API call (typically hundreds of milliseconds to several seconds). Parsing a single response adds no meaningful latency.
To run the benchmark yourself from the repo root:
PYTHONPATH=. python scripts/benchmark_parser.py
Example script
From the repo root (with distenum[openai] installed):
python example_sentiment_openai.py
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file distenum-0.1.0.tar.gz.
File metadata
- Download URL: distenum-0.1.0.tar.gz
- Upload date:
- Size: 11.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35e424a47e38587f55d26870915e948b43b6a8a255074c4695b8e92290976a94
|
|
| MD5 |
deb95158cb9d0c698d36ad5d47f6e34a
|
|
| BLAKE2b-256 |
a151e899a627e6dd7ff3e70cdba84759ea335b9158730040cfd1a3f11a914659
|
File details
Details for the file distenum-0.1.0-py3-none-any.whl.
File metadata
- Download URL: distenum-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c6b97ba0a182c5e19d3ee4923e778d3fc58b79ef7ff2a4e13c1be9d1c3ab28bc
|
|
| MD5 |
d370edb1cf1428b355581ec678ab90d3
|
|
| BLAKE2b-256 |
3f9b63d19a567c66b10169254682ff5297142ee4f120d93ab8c5b31807fafe19
|