grammar-utils

Utilities for regex and grammar parsing and constraining

These details have been verified by PyPI

Project links

Github

GitHub Statistics

Maintainers

ad-freiburg

These details have not been verified by PyPI

Project description

Grammar utilities

This repository contains Python utilities (backed by Rust) to parse and constrain text with regular expressions and context-free grammars (LR(1)). Parsing is supported both for prefixes and full strings.

Context-free grammars already included in this repository are:

JSON
SPARQL

Installation

You can install the Python package from PyPI:

pip install grammar-utils

Only Linux is currently supported when installing from PyPI. Windows is causing some issues in CI so builds for this platform are not yet available.

Alternatively, you can clone this repository and build the package yourself:

git clone https://github.com/bastiscode/grammar-utils
cd grammar-utils
pip install maturin[patchelf]
maturin develop --release

Usage

Two use cases are supported by this library: parsing and constraining.

Parsing

Given a context-free grammar, parse a string and return the corresponding parse tree.

from grammar_utils.parse import load_lr1_parser

parser = load_lr1_parser("json")
tree = parser.parse('{"key": "value"}')
print(tree)
# you can calso get a pruned parse tree, skipping empty or collapsing single child nodes
pruned_tree = parser.parse('{"key": "value"}', skip_empty=True, collapse_single=True)
print(pruned_tree)

Parsing is also supported for prefixes, in which case the input should be a list of bytes and not a string. Here a tree for the already fixed terminals is returned, as well as the suffix of the input where we do not know yet what the next terminal is.

from grammar_utils.parse import load_lr1_parser

parser = load_lr1_parser("json")
tree, rest = parser.prefix_parse(b'{"key"")
print(tree)
print(rest)
# pruning is also supported here
pruned_tree, rest = parser.prefix_parse(b'{"key"', skip_empty=True, collapse_single=True)
print(pruned_tree)
print(rest)

You can also use your own grammars.

from grammar_utils.parse import LR1Parser

# define your own grammar and lexer
grammar = "..."
lexer = "..."
parser = LR1Parser(grammar, lexer, vocab)

Constraining

Constraints are used to check what symbols from the vocabulary can follow the current prefix such that the regular expression or context-free grammar can still be satisfied.

import random
from grammar_utils import load_byte_vocab
from grammar_utils.constrain import load_lr1_constraint, load_regex_constraint

vocab = load_byte_vocab()
constraint = load_lr1_constraint("json", vocab)
# reset constraint to a given prefix, default is an empty prefix
constraint.reset(b'{"key"')
# get the next possible symbols
next_indices = constraint.get()
# the indices refer to the vocabulary (decode only for human-readable strings)
print(f"allowed continuations: {[bytes(vocab[i]).decode() for i in next_indices]}")
# you can forward the constraint with a valid index
constraint.next(random.choice(next_indices))
# check if constraint is satisfied (should be False)
print(constraint.is_match())

# same for regular expressions
constraint = load_regex_constraint("boolean", vocab)
constraint.reset(b"tr")
next_indices = constraint.get()
# should only be 'u'
print(f"allowed continuations: {[bytes(vocab[i]).decode() for i in next_indices]}")
constraint.next(next_indices[0])
print(constraint.is_match())
next_indices = constraint.get()
# should only be 'e'
print(f"allowed continuations: {[bytes(vocab[i]).decode() for i in next_indices]}")
constraint.next(next_indices[0])
# should be True
print(constraint.is_match())

You can also use your own grammars and regexes.

from grammar_utils import load_byte_vocab
from grammar_utils.constrain import LR1Constraint, RegexConstraint

vocab = load_byte_vocab()

# define your own grammar and lexer
grammar = "..."
lexer = "..."
constraint = LR1Constraint(grammar, lexer, vocab)

# define your own regex
regex = "..."
constraint = RegexConstraint(regex, vocab)

Use cases

Forcing a language model to generate structured text

The following example shows how to use a regex constraint to force GPT2 to output either "true" or "false" after a given prompt.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from grammar_utils.constrain import load_regex_constraint

gpt2 = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
vocab = [
    token.replace("Ġ", " ").encode()
    for token, _ in sorted(tokenizer.get_vocab().items(), key=lambda x: x[1])
]
constraint = load_regex_constraint("boolean", vocab)
prefix = "Constrained decoding is cool: "
input_ids = tokenizer.encode(prefix)
while not (constraint.is_match() or constraint.is_invalid()):
    input_tensor = torch.tensor([input_ids])
    logits = gpt2(input_tensor).logits
    valid_indices = torch.from_numpy(constraint.get())
    valid_logits = logits[0, -1, valid_indices]
    index = valid_indices[torch.argmax(valid_logits)]
    constraint.next(index)
    input_ids.append(index)
    print(tokenizer.decode(input_ids))

Project details

These details have been verified by PyPI

Project links

Github

GitHub Statistics

Maintainers

ad-freiburg

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.6

Mar 31, 2026

This version

0.1.5

Mar 5, 2026

0.1.4

Feb 18, 2026

0.1.3

Apr 10, 2025

0.1.2

Jan 21, 2025

0.1.0

Nov 22, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grammar_utils-0.1.5.tar.gz (180.7 kB view details)

Uploaded Mar 5, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

grammar_utils-0.1.5-cp310-abi3-win_amd64.whl (1.1 MB view details)

Uploaded Mar 5, 2026 CPython 3.10+Windows x86-64

grammar_utils-0.1.5-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded Mar 5, 2026 CPython 3.10+manylinux: glibc 2.17+ x86-64

grammar_utils-0.1.5-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.4 MB view details)

Uploaded Mar 5, 2026 CPython 3.10+manylinux: glibc 2.17+ ARM64

grammar_utils-0.1.5-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (2.5 MB view details)

Uploaded Mar 5, 2026 CPython 3.10+macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file grammar_utils-0.1.5.tar.gz.

File metadata

Download URL: grammar_utils-0.1.5.tar.gz
Upload date: Mar 5, 2026
Size: 180.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for grammar_utils-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`bea974993e435d40b1b000b27262a354e8a80f361c9015f7265b8aa2c18cefe0`
MD5	`decdee0ebb8126c91173dd5524d64a5c`
BLAKE2b-256	`314e98eb5c9eac050a48c75424beac63f3bbcd64d4998a3c9b2a316f10d764ca`

See more details on using hashes here.

Provenance

The following attestation bundles were made for grammar_utils-0.1.5.tar.gz:

Publisher: release.yml on bastiscode/grammar-utils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: grammar_utils-0.1.5.tar.gz
- Subject digest: bea974993e435d40b1b000b27262a354e8a80f361c9015f7265b8aa2c18cefe0
- Sigstore transparency entry: 1041594000
- Sigstore integration time: Mar 5, 2026
Source repository:
- Permalink: bastiscode/grammar-utils@f3bc0e32fc5ed4e08fdfd4acb9c799f9d5e7e7a5
- Branch / Tag: refs/tags/v0.1.5
- Owner: https://github.com/bastiscode
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@f3bc0e32fc5ed4e08fdfd4acb9c799f9d5e7e7a5
- Trigger Event: push

File details

Details for the file grammar_utils-0.1.5-cp310-abi3-win_amd64.whl.

File metadata

Download URL: grammar_utils-0.1.5-cp310-abi3-win_amd64.whl
Upload date: Mar 5, 2026
Size: 1.1 MB
Tags: CPython 3.10+, Windows x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for grammar_utils-0.1.5-cp310-abi3-win_amd64.whl
Algorithm	Hash digest
SHA256	`c7fdfd894dfb10c154cce48dc232a131e860917c4f2f96457aad001ba556899e`
MD5	`3526b488ad75da303e7680e5920ada91`
BLAKE2b-256	`951096839b812658540ba7213f8d4784d4c41b0aa86aec43488965a650d1d375`

See more details on using hashes here.

Provenance

The following attestation bundles were made for grammar_utils-0.1.5-cp310-abi3-win_amd64.whl:

Publisher: release.yml on bastiscode/grammar-utils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: grammar_utils-0.1.5-cp310-abi3-win_amd64.whl
- Subject digest: c7fdfd894dfb10c154cce48dc232a131e860917c4f2f96457aad001ba556899e
- Sigstore transparency entry: 1041594299
- Sigstore integration time: Mar 5, 2026
Source repository:
- Permalink: bastiscode/grammar-utils@f3bc0e32fc5ed4e08fdfd4acb9c799f9d5e7e7a5
- Branch / Tag: refs/tags/v0.1.5
- Owner: https://github.com/bastiscode
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@f3bc0e32fc5ed4e08fdfd4acb9c799f9d5e7e7a5
- Trigger Event: push

File details

Details for the file grammar_utils-0.1.5-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: grammar_utils-0.1.5-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Mar 5, 2026
Size: 1.4 MB
Tags: CPython 3.10+, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for grammar_utils-0.1.5-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`cd75f2f86f61b070eeb7e4849235f680ab884e22c748bcbfd319d100a6f445d8`
MD5	`ca657d34868b6ec48563a46b5070b271`
BLAKE2b-256	`0704d3945dc67581dac5c3ab3f5fcd9104b553c6ba8f7c4e9968a34fd292cee4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for grammar_utils-0.1.5-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on bastiscode/grammar-utils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: grammar_utils-0.1.5-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Subject digest: cd75f2f86f61b070eeb7e4849235f680ab884e22c748bcbfd319d100a6f445d8
- Sigstore transparency entry: 1041594150
- Sigstore integration time: Mar 5, 2026
Source repository:
- Permalink: bastiscode/grammar-utils@f3bc0e32fc5ed4e08fdfd4acb9c799f9d5e7e7a5
- Branch / Tag: refs/tags/v0.1.5
- Owner: https://github.com/bastiscode
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@f3bc0e32fc5ed4e08fdfd4acb9c799f9d5e7e7a5
- Trigger Event: push

File details

Details for the file grammar_utils-0.1.5-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

Download URL: grammar_utils-0.1.5-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Upload date: Mar 5, 2026
Size: 1.4 MB
Tags: CPython 3.10+, manylinux: glibc 2.17+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for grammar_utils-0.1.5-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`0f9ea36e62854eaf73da3d47660635cd3d813e080db85441de0a4803e8fc1615`
MD5	`7853b4d38c95cadbeb2c2d30b32b3ae3`
BLAKE2b-256	`9cce16d8f6dae206f45075cfb16e3b9dfa6b1ee23fd0a3774b25bebb906f1e64`

See more details on using hashes here.

Provenance

The following attestation bundles were made for grammar_utils-0.1.5-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on bastiscode/grammar-utils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: grammar_utils-0.1.5-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Subject digest: 0f9ea36e62854eaf73da3d47660635cd3d813e080db85441de0a4803e8fc1615
- Sigstore transparency entry: 1041594224
- Sigstore integration time: Mar 5, 2026
Source repository:
- Permalink: bastiscode/grammar-utils@f3bc0e32fc5ed4e08fdfd4acb9c799f9d5e7e7a5
- Branch / Tag: refs/tags/v0.1.5
- Owner: https://github.com/bastiscode
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@f3bc0e32fc5ed4e08fdfd4acb9c799f9d5e7e7a5
- Trigger Event: push

File details

Details for the file grammar_utils-0.1.5-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

Download URL: grammar_utils-0.1.5-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Upload date: Mar 5, 2026
Size: 2.5 MB
Tags: CPython 3.10+, macOS 10.12+ universal2 (ARM64, x86-64), macOS 10.12+ x86-64, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for grammar_utils-0.1.5-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm	Hash digest
SHA256	`abadf389629c179e3e59d633ec679feccb31cc3ea9982049197f5e911e9e96f0`
MD5	`9a24629b91b0007dc12d55aef82d4625`
BLAKE2b-256	`843c980ad5cb5524c5225c69e83cd9343200753b7f92ed36b96b302f60fce7b1`

See more details on using hashes here.

Provenance

The following attestation bundles were made for grammar_utils-0.1.5-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: release.yml on bastiscode/grammar-utils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: grammar_utils-0.1.5-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
- Subject digest: abadf389629c179e3e59d633ec679feccb31cc3ea9982049197f5e911e9e96f0
- Sigstore transparency entry: 1041594065
- Sigstore integration time: Mar 5, 2026
Source repository:
- Permalink: bastiscode/grammar-utils@f3bc0e32fc5ed4e08fdfd4acb9c799f9d5e7e7a5
- Branch / Tag: refs/tags/v0.1.5
- Owner: https://github.com/bastiscode
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@f3bc0e32fc5ed4e08fdfd4acb9c799f9d5e7e7a5
- Trigger Event: push

grammar-utils 0.1.5

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Grammar utilities

Installation

Usage

Parsing

Constraining

Use cases

Forcing a language model to generate structured text

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance