Skip to main content

Parse JSON outputs with logprobs from the OpenAI API to convert enum fields into probability distributions.

Project description

distenum

distenum parses JSON outputs with logprobs from the OpenAI API and converts enum-type string fields into probability distributions instead of a single label.

What it does

With structured outputs, the API returns a single enum value (e.g. "positive"). If you request logprobs=True, you also get token-level logprobs. distenum turns those logprobs into a probability distribution over your enum options so you can see how confident the model was in each choice.

Example: For a sentiment field with enum: ["positive", "negative", "neutral"]:

From the API (content only) With distenum (using logprobs)
"sentiment": "positive" "sentiment": {"positive": 0.72, "negative": 0.18, "neutral": 0.10}

So instead of a single label, you get a distribution you can use for uncertainty, ranking, or thresholding (e.g. only accept when positive probability > 0.8).

Install

pip install distenum

To run the example script (calls the OpenAI API):

pip install distenum[openai]

Quick start

Install the package and the OpenAI client: pip install distenum[openai]. Then call the API with logprobs=True and top_logprobs=20, and pass the response logprobs into distenum:

from openai import OpenAI
from distenum import parse_using_schema_and_logprobs

schema = {
    "type": "object",
    "properties": {
        "sentiment": {
            "type": "string",
            "enum": ["positive", "negative", "neutral"]
        }
    }
}

client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o-2024-08-06",
    messages=[{"role": "user", "content": "Your prompt"}],
    response_format={
        "type": "json_schema",
        "json_schema": {"name": "my_schema", "strict": True, "schema": schema}
    },
    logprobs=True,
    top_logprobs=20,
)

logprobs_data = response.choices[0].logprobs
parsed = parse_using_schema_and_logprobs(schema, logprobs_data)
# parsed["sentiment"] might be: {"positive": 0.72, "negative": 0.18, "neutral": 0.10}

Enum design tips

  • Different prefixes: Enum values are matched to token logprobs by prefix. Prefer enum labels that do not share a common prefix (e.g. "positive", "negative", "neutral" are good; "pos" and "positive" can blur probabilities).
  • Fewer is better: The API returns at most 20 logprobs per token (top_logprobs=20). With many enum values, most will get no mass; keep enums small for meaningful distributions.

API

  • parse_using_schema_and_logprobs(schema_dict, logprobs_data)
    Parses the logprobs stream according to the JSON Schema. Fields of type string with an enum are returned as a dict mapping each enum label to a probability (non-negative, summing to 1). Other fields are parsed as normal JSON values.

  • tokenize(logprobs_data)
    Low-level generator that yields tokens and their top-logprobs from the OpenAI logprobs content.

Performance

The parser walks token-level logprobs and builds probability distributions for enum fields, so it is slower than parsing the same JSON with the standard library. A rough comparison (same logical structure, 100k iterations):

Parser Time (100k parses) Throughput Avg per parse
json.loads ~0.16 s ~630k/sec ~1.6 µs
distenum ~3.0 s ~33k/sec ~30 µs

So distenum is typically about 15–20× slower than json.loads for the same structure. In absolute terms, ~30 µs per parse is negligible compared to an OpenAI API call (typically hundreds of milliseconds to several seconds). Parsing a single response adds no meaningful latency.

To run the benchmark yourself from the repo root:

PYTHONPATH=. python scripts/benchmark_parser.py

Example script

From the repo root (with distenum[openai] installed):

python example_sentiment_openai.py

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

distenum-0.1.1.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

distenum-0.1.1-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file distenum-0.1.1.tar.gz.

File metadata

  • Download URL: distenum-0.1.1.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for distenum-0.1.1.tar.gz
Algorithm Hash digest
SHA256 ab110767477e2921c85bf18251569c1de1b339c9981262d1f6d71279b3d0522e
MD5 7830dbbf433048faa86b0d74ae9da6a7
BLAKE2b-256 02240e15dd58660136dd7e68798cd2ddfced7224fd6ed493d2725753e56c3dc1

See more details on using hashes here.

File details

Details for the file distenum-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: distenum-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for distenum-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b4bb5919670b4f41ebdd703a304f936c74b2fe5ddc7ed35407b85c9f9d3a77e7
MD5 77ed5d2bf7b2d9038f11676abb5a40eb
BLAKE2b-256 ae2c7aa8f7f6a549a0e23d3b7cb3c19c6d194ca66842d2545c2fb81d6f1ad7e5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page