Skip to main content

Force LLMs to use a specific context-free grammar in completions

Project description

parserLLM

Use a context-free grammar and a parser generator to determine valid next tokens for an LLM generation.

Extending ReLLM to handle context-free grammars in addition to regular expressions.

Usage

pip install parserllm

See examples/example.py for an example of how to use this library.

Run it with python examples/example.py.

How does it work?

See this post for a more in-depth explanation.

The general strategy goes like this:

First, define a context-free grammar. You might use this for a simplified version of JSON (in EBNF form):

?start: value

    ?value: object

          | array

          | string

          | "true"             -> true

          | "false"            -> false

          | "null"             -> null

    array  : "[" [value ("," value)*] "]"

    object : "{" [pair ("," pair)*] "}"

    pair   : string ":" value

    string : ESCAPED_STRING

    %import common.ESCAPED_STRING

    %import common.SIGNED_NUMBER

    %import common.WS

    %ignore WS

Next, to practically support multiple CFGs, use a parser generator to parse the language. This library uses Lark, simply because it’s written in Python and fairly easy to use.

Next, run the partial output through the parser generator. At step zero, this is just the empty string. The parser will return all of the possible next tokens. You can see the valid first completion of this grammar is any “value,” which can be an array, string, true/false, or null. This means the valid starting tokens are {, [, , true, false, and null.

Ncompile those tokens to their regular expressions. Now we have an equivalent problem to ReLLM. Simply run the regexps through ReLLM to generate the next possible token. ReLLM will squash the logits of the non-matching characters and the LLM will only consider valid partial or full next tokens.

Iterate until max tokens are reached, or the parser sees only an empty string or stop token as the next token.

Some interesting features:

  • You can describe the syntax of most programming and configuration languages as a CFG.
  • The LLM won’t produce an invalid result, but there’s no guarantee it will finish and produce a stop token.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parserllm-0.0.2.tar.gz (3.7 kB view details)

Uploaded Source

Built Distribution

parserllm-0.0.2-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file parserllm-0.0.2.tar.gz.

File metadata

  • Download URL: parserllm-0.0.2.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.11.3 Darwin/22.4.0

File hashes

Hashes for parserllm-0.0.2.tar.gz
Algorithm Hash digest
SHA256 845d16f1e0ef66e115896c949cc3011e418892b3e360a84444583661cb7e3f76
MD5 552820d50c70de112d2acf544c8e043d
BLAKE2b-256 ee6c572e552026da25894f7ff8eccc6e28d04c54df7d6bd4a56a5adfdc042134

See more details on using hashes here.

File details

Details for the file parserllm-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: parserllm-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 4.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.11.3 Darwin/22.4.0

File hashes

Hashes for parserllm-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c915091436bdf4992598d73880a75cad4e49f907cd2f1db166721e66f3e37428
MD5 7f28918ff0daf03c35777a65c7b9fbd6
BLAKE2b-256 367eff3f38fd86859401271a5c541976244e9bff8345aaeb00d2d6da5aca671d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page