Skip to main content

A Python library for constraining language model outputs to follow CFG, REGEX and JSON (experimental).

Project description

download

A Python library for constraining language model outputs to follow CFG, REGEX and JSON (experimental).

Features

  • Zero dependencies
  • Parses all context-free grammars, including ambiguous grammars
  • Returns tokens constrained to a specified vocabulary if needed
  • Type annotations with mypy
  • Includes a Rust implementation

Quick Start

Installation

pip install lextrail

Usage Modes

The library supports two ways to generate constrained text, depending on your use case:

Trail

Use a Trail object when you want to generate the complete next element without vocabulary constraints.

CFG

from lextrail.guide import trail_cfg

example = r"""
    start: expression

    expression: term (("+" | "-") term)

    term: factor (("*" | "/") factor)

    factor: NUMBER

    NUMBER: /-?[0-9]+/
"""

trail = trail_cfg(example)

Regex

from lextrail.guide import trail_rex

example = r"[a-z]+@[a-z]+\.(com|org|net)"

trail = trail_rex(example)

You can also combine both TERMINAL and REGEX expressions using trail_exp.

from lextrail.guide import trail_exp

example = r"/[0-9]\.[0-9]/ "+" /[0-9]\.[0-9]/"

trail = trail_exp(example)

JSON

This is an experimental version. Not intended for production use.

  • Currently supported keywords: type, enum, const, properties, required, items, prefixItems, oneOf
  • Constraint intersection (e.g., combining prefixItems with items, or const with enum) is not yet implemented
from lextrail.json import trail_json

example = r"""
    {
        "type": "object",
        "properties": {
            "user": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "email": {"type": "string"}
                },
                "required": ["email"]
            }
        }
    }
"""

trail = trail_json(example)

Then, run a random simulation.

import random

from lextrail.guide import get_next_values

response, value = [], ""

while values := get_next_values(trail, value):
    value = random.choice(values)
    response.append(value)

print("".join(response))

ASM

Use an ASM object when you need to constrain the next token to a predefined vocabulary.

Example

from lextrail.assemble import asm_cfg

example = r"""
    start: L0

    L0: ("A" | "B")+ L1

    L1: ("C" | "D") L2

    L2: "E" L3*

    L3: /FGH/
"""

asm = asm_cfg(example, ["AD", "EF", "GH"])

If you launch a simulation, then the proposals will be elements of the provided vocabulary.

import random

from lextrail.assemble import get_next_tokens

response, value = [], ""

while values := get_next_tokens(asm, value):
    value = random.choice(values)
    response.append(value)

print("".join(response))

assert response == ["AD", "EF", "GH", ""]

You can do it with any of the formats.

# CFG
from lextrail.assemble import asm_cfg

asm_cfg(.., [..])

# REGEX
from lextrail.assemble import asm_rex

asm_rex(.., [..])

# MIXED
from lextrail.assemble import asm_exp

asm_exp(.., [..])

# JSON
from lextrail.json import asm_json

asm_json(.., [..])

Playground

I've built a playground to showcase the different simulations, you can use either a Trail object or an ASM one.

from lextrail.guide import trail_cfg
from lextrail.playground import run_playground

example = r"""
    start:  expression

    expression: term* (( "+" | "-") term)+

    term: factor* (("*" | "/") factor)+

    factor: NUMBER?

    NUMBER: /[0-1]+/
"""

trail = trail_cfg(example)

run_playground(trail)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lextrail-0.1.0.tar.gz (156.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lextrail-0.1.0-py3-none-any.whl (165.6 kB view details)

Uploaded Python 3

File details

Details for the file lextrail-0.1.0.tar.gz.

File metadata

  • Download URL: lextrail-0.1.0.tar.gz
  • Upload date:
  • Size: 156.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for lextrail-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d4d018877c172e8456cd4a95e654007b8df5c0447d936b2ebb3f51a5c6b40bca
MD5 c13cf8f2e38293535b2128d7cf251f5d
BLAKE2b-256 92262eb994cf8b8e2b9ac6e659ffe832a34b6cb64117172abc7c900cc52da50d

See more details on using hashes here.

File details

Details for the file lextrail-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: lextrail-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 165.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for lextrail-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8321017cd5f39405035aa688df1c50046d59abb0070d907b70e1d5ba188b3c9a
MD5 4e493677579fd953a918b656c364a071
BLAKE2b-256 b8ef02450e9d4b00b9ad83b178aacbacb44d0dffe460a2ab1757f8311b138577

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page