Skip to main content

No project description provided

Project description

gramforge ⚒️

gramforge (formerly unigram) is a pythonic library for random (depth first) generation with context-sensitive grammars (but also context free grammars) for synthetic data creation. One particularity is the option to generate in multiple languages in parallel (for example, tptp and pseudo-english).

Example with LogicNLI grammar:
pip install gramforge

from gramforge import init_grammar, generate

def LogicNLI():
    ADJECTIVES = ['rich', 'quiet', 'old', 'tall', 'kind', 'brave', 'wise',
                  'happy', 'strong', 'curious', 'patient', 'funny', 'generous', 'humble']
    # (We selected adjectives with no clear semantic interference)
    NAMES = ['mary', 'paul', 'fred', 'alice', 'john', 'susan', 'lucy']

    R = init_grammar(['tptp','eng'])
    R('start(' + ','.join(['rule']*16) + ',' + ','.join(['fact']*8) + ')',
      '&\n'.join([f'({i})' for i in range(24)]),
      '\n'.join([f'{i}' for i in range(24)]))

    R('hypothesis(person,a)', '1(0)', '0 is 1')
    for a in ADJECTIVES:
        R('adj', a)
        R('adj', f'~{a}', f'not {a}', weight=0.2)

    R('property(adj,adj)', '(0(?)&1(?))', 'both 0 and 1')
    R('property(adj,adj)', '(0(?)|1(?))', '0 or 1')
    R('property(adj,adj)', '(0(?)<~>1(?))', 'either 0 or 1', weight=0.5)
    R('property(adj)', '0(?)', '0')

    R('rule(property,property)', '![X]:(0[?←X]=>1[?←X])',
      'everyone who is 0 is 1')
    R('rule(property,property)', '![X]:(0[?←X]<=>1[?←X])',
      'everyone who is 0 is 1 and vice versa')

    for p in NAMES:
        R('person', p)

    R('fact(person,property)', '1[?←0]', '0 is 1')
    R('fact(property)', '?[X]:(0[?←X])', 'someone is 0', weight=0.2)
    R('rule(fact,fact)', '(0)=>(1)', 'if 0 then 1')
    R('rule(fact,fact)', '(0)<=>(1)', 'if 0 then 1 and vice versa')
    return R

eng, tptp = "eng","tptp"
grammar = LogicNLI()
x=generate(grammar)
print(x@eng)
print(x@tptp)

Pre-loaded grammars

We feature pre-written grammars including:

  • tinypy_grammar reproducing the tinypy, a synthetic toy grammar of python for LLM training/evaluation
  • FOL_grammar a sophisticated controlled grammar for first order logic (tptp) aligned with simplified English
  • arith_grammar a simple grammar for arithmetics
  • regex_grammar a grammar generating regular expressions
  • dyck_grammar nested parentheses

Example:

from gramforge.grammars import FOL_grammar, tinypy_grammar
from gramforge import generate
g=tinypy_grammar()
x=generate(g)
print(x@'py')

Abstract syntax trees

Generated expressions (x.generate) behave like anytree trees, fully exposing the abstract syntax tree which can be helpful for debugging, visualization or analysis of the generated examples.

Depth constraints

Generating synthetic data requires complexity management. gramforge implements efficient management of min_depth and max_depth constraints, with a "bushiness" knob (default=0.7) preventing the generated expressions from generating "spikes" that just overfit the minimum depth requirement.

Citation for the gramforge framework:

@inproceedings{sileo-2024-scaling,
    title = "Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars",
    author = "Sileo, Damien",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-main.301/",
    doi = "10.18653/v1/2024.emnlp-main.301",
    pages = "5275--5283",
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gramforge-1.0.6.tar.gz (30.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gramforge-1.0.6-py3-none-any.whl (35.0 kB view details)

Uploaded Python 3

File details

Details for the file gramforge-1.0.6.tar.gz.

File metadata

  • Download URL: gramforge-1.0.6.tar.gz
  • Upload date:
  • Size: 30.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for gramforge-1.0.6.tar.gz
Algorithm Hash digest
SHA256 e66ee8b6029e7de3fb5796c2730945abc92d35b0bdf9d9668f405d768ef4ad15
MD5 778a84827cd6a4c781802c5dc06fbce6
BLAKE2b-256 91aa2421ae83ddda5304c5f9478de3a1d216cc188d15e1b33c1c2e21c5314e9a

See more details on using hashes here.

File details

Details for the file gramforge-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: gramforge-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 35.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for gramforge-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 9228ef8a99f477c4e5e7d4b60fb1acd33034312d5142cfc655e9bd29bff0896e
MD5 2594a29a87c5d8bdcf5710e1e191bea9
BLAKE2b-256 5db6fe3414552158c9ccc68c994f6ca3a2d0bc9ed2a659c08d744e136f5271e0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page