Skip to main content

Parse JSON outputs with logprobs from the OpenAI API to convert enum fields into probability distributions.

Project description

distenum

distenum parses JSON outputs with logprobs from the OpenAI API and converts enum-type string fields into probability distributions instead of a single label.

What it does

With structured outputs, the API returns a single enum value (e.g. "positive"). If you request logprobs=True, you also get token-level logprobs. distenum turns those logprobs into a probability distribution over your enum options so you can see how confident the model was in each choice.

Example: For a sentiment field with enum: ["positive", "negative", "neutral"]:

From the API (content only) With distenum (using logprobs)
"sentiment": "positive" "sentiment": {"positive": 0.72, "negative": 0.18, "neutral": 0.10}

So instead of a single label, you get a distribution you can use for uncertainty, ranking, or thresholding (e.g. only accept when positive probability > 0.8).

Install

pip install distenum

To run the example script (calls the OpenAI API):

pip install distenum[openai]

Quick start

Install the package and the OpenAI client: pip install distenum[openai]. Then call the API with logprobs=True and top_logprobs=20, and pass the response logprobs into distenum:

from openai import OpenAI
from distenum import parse_using_schema_and_logprobs

schema = {
    "type": "object",
    "properties": {
        "sentiment": {
            "type": "string",
            "enum": ["positive", "negative", "neutral"]
        }
    }
}

client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o-2024-08-06",
    messages=[{"role": "user", "content": "Your prompt"}],
    response_format={
        "type": "json_schema",
        "json_schema": {"name": "my_schema", "strict": True, "schema": schema}
    },
    logprobs=True,
    top_logprobs=20,
)

logprobs_data = response.choices[0].logprobs
parsed = parse_using_schema_and_logprobs(schema, logprobs_data)
# parsed["sentiment"] might be: {"positive": 0.72, "negative": 0.18, "neutral": 0.10}

Enum design tips

  • Different prefixes: Enum values are matched to token logprobs by prefix. Prefer enum labels that do not share a common prefix (e.g. "positive", "negative", "neutral" are good; "pos" and "positive" can blur probabilities).
  • Fewer is better: The API returns at most 20 logprobs per token (top_logprobs=20). With many enum values, most will get no mass; keep enums small for meaningful distributions.

API

  • parse_using_schema_and_logprobs(schema_dict, logprobs_data)
    Parses the logprobs stream according to the JSON Schema. Fields of type string with an enum are returned as a dict mapping each enum label to a probability (non-negative, summing to 1). Other fields are parsed as normal JSON values.

  • tokenize(logprobs_data)
    Low-level generator that yields tokens and their top-logprobs from the OpenAI logprobs content.

Performance

The parser walks token-level logprobs and builds probability distributions for enum fields, so it is slower than parsing the same JSON with the standard library. A rough comparison (same logical structure, 100k iterations):

Parser Time (100k parses) Throughput Avg per parse
json.loads ~0.16 s ~630k/sec ~1.6 µs
distenum ~3.0 s ~33k/sec ~30 µs

So distenum is typically about 15–20× slower than json.loads for the same structure. In absolute terms, ~30 µs per parse is negligible compared to an OpenAI API call (typically hundreds of milliseconds to several seconds). Parsing a single response adds no meaningful latency.

To run the benchmark yourself from the repo root:

PYTHONPATH=. python scripts/benchmark_parser.py

Example script

From the repo root (with distenum[openai] installed):

python example_sentiment_openai.py

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

distenum-0.1.0.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

distenum-0.1.0-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file distenum-0.1.0.tar.gz.

File metadata

  • Download URL: distenum-0.1.0.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for distenum-0.1.0.tar.gz
Algorithm Hash digest
SHA256 35e424a47e38587f55d26870915e948b43b6a8a255074c4695b8e92290976a94
MD5 deb95158cb9d0c698d36ad5d47f6e34a
BLAKE2b-256 a151e899a627e6dd7ff3e70cdba84759ea335b9158730040cfd1a3f11a914659

See more details on using hashes here.

File details

Details for the file distenum-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: distenum-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for distenum-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c6b97ba0a182c5e19d3ee4923e778d3fc58b79ef7ff2a4e13c1be9d1c3ab28bc
MD5 d370edb1cf1428b355581ec678ab90d3
BLAKE2b-256 3f9b63d19a567c66b10169254682ff5297142ee4f120d93ab8c5b31807fafe19

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page