Skip to main content

A composable Python DSL for building GBNF grammars compatible with llama.cpp

Project description

pygbnf

A composable Python DSL for building GBNF grammars compatible with llama.cpp.

  1. Define context-free grammars using expressive Python functions,

  2. Compile them into valid GBNF strings for constrained LLM generation.

  3. Real-time rule matching during inference.

Quick Start

from pygbnf import Grammar, GrammarLLM, select

g = Grammar()

@g.rule
def answer():
    return select(["yes", "no", "maybe"])

g.start("answer")

llm = GrammarLLM("http://localhost:8080/v1")

for token, _ in llm.stream(
    messages=[{"role": "user", "content": "Is the sky blue?"}],
    grammar=g,
):
    print(token, end="", flush=True)
print()

The grammar constrains the LLM output — it can only produce yes, no, or maybe.

Guidance-Style GBNF

import pygbnf as cfg
from pygbnf import select, one_or_more, zero_or_more

g = cfg.Grammar()

@g.rule
def number():
    n = one_or_more(select("0123456789"))
    return select(['-' + n, n])

@g.rule
def operator():
    return select(['+', '*', '**', '/', '-'])

@g.rule
def expression():
    return select([
        number(),
        expression() + zero_or_more(" ") + operator()
            + zero_or_more(" ") + expression(),
        "(" + expression() + ")"
    ])

g.start("expression")
print(g.to_gbnf())

Output:

root ::= expression

number ::= "-" [0123456789]+ | [0123456789]+
operator ::= "+" | "*" | "**" | "/" | "-"
expression ::=
    number
  | expression " "* operator " "* expression
  | "(" expression ")"

Installation

pip install pygbnf

Or install from source:

git clone https://github.com/al/pygbnf.git
cd pygbnf
pip install -e .

LLM Usage

pygbnf includes GrammarLLM, a thin wrapper around any OpenAI-compatible endpoint (llama.cpp, vLLM, Ollama…) that injects the GBNF grammar automatically.

Streaming with rule matching

Enable match=True (or pass only/exclude) to get real-time RuleEvents as the LLM generates tokens:

from pygbnf import Grammar, GrammarLLM, select, one_or_more

g = Grammar()

@g.rule
def name():
    """A person's name."""
    return one_or_more(select("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ "))

@g.rule
def greeting():
    """A greeting message."""
    return select(["hello", "hi", "hey"]) + " " + name()

g.start("greeting")

llm = GrammarLLM("http://localhost:8080/v1")

for token, events in llm.stream(
    messages=[{"role": "user", "content": "Greet Alice."}],
    grammar=g,
    match=True,
):
    print(token, end="", flush=True)
    if events:
        for ev in events:
            print(f"\n  ← [{ev.rule}] {ev.text!r} (doc: {ev.doc})")
print()

Each RuleEvent carries:

  • rule — the matched rule name
  • text — the matched text
  • fn — the original Python function
  • doc — the function's docstring

Non-streaming completion

text, events = llm.complete(
    messages=[{"role": "user", "content": "Is the sky blue?"}],
    grammar=g,
    match=True,
)
print(text)
for ev in events:
    print(f"  [{ev.rule}] {ev.text!r}")

Schema-based grammar with LLM

Combine grammar_from_type with GrammarLLM to constrain output to a JSON schema:

from dataclasses import dataclass
from pygbnf import grammar_from_type, GrammarLLM

@dataclass
class City:
    name: str
    country: str
    population: int

g = grammar_from_type(City)
llm = GrammarLLM("http://localhost:8080/v1")

text, _ = llm.complete(
    messages=[{"role": "user", "content": "Describe Tokyo in JSON."}],
    grammar=g,
)
print(text)
# → {"name": "Tokyo", "country": "Japan", "population": 13960000}

Tool calling with Toolkit

Toolkit is a decorator-based tool registry. Register functions with @toolkit.tool, then pass the toolkit to llm.stream() or llm.complete() — the grammar and system prompt are injected automatically.

import enum
from pygbnf import GrammarLLM, Toolkit

toolkit = Toolkit()

class Units(enum.Enum):
    CELSIUS = "celsius"
    FAHRENHEIT = "fahrenheit"

@toolkit.tool
def get_weather(city: str, units: Units = Units.CELSIUS) -> str:
    """Get current weather for a city."""
    return f"22° {units.value} in {city}"

@toolkit.tool
def search_web(query: str, max_results: int = 5) -> str:
    """Search the web."""
    return f"Found {max_results} results for {query!r}"

llm = GrammarLLM("http://localhost:8080/v1")

# Stream with toolkit — grammar + system prompt auto-injected
result = ""
for token, _ in llm.stream(
    messages=[{"role": "user", "content": "Weather in Tokyo?"}],
    toolkit=toolkit,
):
    print(token, end="", flush=True)
    result += token

# Dispatch the JSON result to the matching function
output = toolkit.dispatch(result)
print(output)  # → "22° celsius in Tokyo"

The toolkit:

  • Builds a GBNF grammar constraining the LLM to produce {"function": "...", "arguments": {...}} with only registered tool names and typed arguments
  • Generates a system prompt listing available tools with signatures and docstrings
  • Dispatches the parsed JSON to the right function, converting enum strings back to Python Enum instances automatically

You can also use llm.tool_call() as a one-liner that streams + dispatches:

output = llm.tool_call(toolkit, "Weather in Tokyo?")
print(output)  # → "22° celsius in Tokyo"

Note: GrammarLLM requires the openai package: pip install openai. The LLM server must support the grammar field in its API (llama.cpp does natively).

Architecture

AST Nodes

Every grammar construct is a frozen dataclass node. Nodes compose via + (sequence) and | (alternative):

Node Description GBNF
Literal Double-quoted string "hello"
CharacterClass Character class [0-9]
Sequence Ordered concatenation a b c
Alternative Choice between options a | b | c
Repeat Quantified repetition x+, x*, x?, x{2,5}
RuleReference Reference to named rule expression
TokenReference Token-level constraint <think>, <[1000]>
Group Parenthesised group (a b)
Optional_ Optional element x?

DSL Combinators

from pygbnf import select, one_or_more, zero_or_more, optional, repeat, group

# Character class from string
select("0123456789")          # → [0123456789]

# Alternative from list
select(["+", "-", "*"])       # → "+" | "-" | "*"

# Repetition
one_or_more(x)                # → x+
zero_or_more(x)               # → x*
optional(x)                   # → x?
repeat(x, 2, 5)              # → x{2,5}

# Grouping
group(a + b)                  # → (a b)

# Operators
a + b                         # → a b   (sequence)
a | b                         # → a | b (alternative)

Rule Definition

Rules are defined with the @g.rule decorator. Calling a rule function inside another rule creates a rule reference (not an inline expansion):

g = cfg.Grammar()

@g.rule
def digit():
    return select("0123456789")

@g.rule
def number():
    return one_or_more(digit())  # → digit+  (reference, not inlined)

Forward references work naturally — rules can reference rules defined later.

Token Constraints

llama.cpp supports token-level matching:

from pygbnf import token, token_id, not_token, not_token_id

token("think")        # → <think>
token_id(1000)        # → <[1000]>
not_token("think")    # → !<think>
not_token_id(1001)    # → !<[1001]>

Grammar Helpers

Common patterns prebuilt:

from pygbnf import (
    WS, ws, ws_required,           # whitespace
    keyword, identifier, number,    # basic tokens
    float_number, string_literal,   # complex tokens
    comma_list, between,           # structural patterns
    separated_by, spaced_comma_list,
)

comma_list(identifier())   # → ident ("," " "* ident)*
between("(", expr, ")")    # → "(" expr ")"

Recursion Analysis

Detect left recursion in your grammar:

cycles = g.detect_left_recursion()
# Warns: "Left recursion detected: expression -> expression"
# Suggests: rewrite as base (op base)*

Examples

See the examples/ directory:

File Description
quickstart.py The quick-start example from this README
arithmetic.py Arithmetic expressions with operator precedence
csv_grammar.py CSV file format
json_grammar.py Full JSON grammar
simple_lang.py A small programming language
token_demo.py Token-level constraints
demo_llm.py LLM constrained generation (requires llama-server)
demo_schema.py Schema → grammar examples
demo_hybrid.py DSL + Python types mixed
demo_simple_lang.py Mini-language generation with LLM
demo_vision.py Vision + grammar: solve math from an image

Run any example:

python examples/arithmetic.py

Schema Generation

Auto-generate grammars from Python types and dataclasses:

from dataclasses import dataclass
from pygbnf import grammar_from_type

@dataclass
class Movie:
    title: str
    year: int
    rating: float

g = grammar_from_type(Movie)
print(g.to_gbnf())

Also supports function signatures:

from pygbnf import grammar_from_args

def search(query: str, limit: int = 10):
    ...

g = grammar_from_args(search)
print(g.to_gbnf())

Requirements

  • Python 3.8+
  • No external dependencies

Acknowledgements

  • guidance-ai — pygbnf's composable API is inspired by their approach to constrained generation
  • llama.cpp — for the GBNF format and the underlying inference engine

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygbnf-0.2.0.tar.gz (99.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pygbnf-0.2.0-py3-none-any.whl (37.0 kB view details)

Uploaded Python 3

File details

Details for the file pygbnf-0.2.0.tar.gz.

File metadata

  • Download URL: pygbnf-0.2.0.tar.gz
  • Upload date:
  • Size: 99.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for pygbnf-0.2.0.tar.gz
Algorithm Hash digest
SHA256 60c43d80c4f0b670007a55209d349e12c650bab049716c15ee990c60326a8d6b
MD5 f1ca0e80a2f4a6008bf79280018cfad9
BLAKE2b-256 25abeb5976a1f5a5d40881ffb2cce5937376364cd53c9b9822b220c1f8945a33

See more details on using hashes here.

File details

Details for the file pygbnf-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: pygbnf-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 37.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for pygbnf-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c2c3a8914c319759ac9c4734236f546c646dc25750e516dde1e974d2779e4502
MD5 5868c2273cc94640d45f3d5e99c1c621
BLAKE2b-256 c036ba0074fd81edff9b6ea90a60437f8ead343255273e12a94bc1a8dac8aaef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page