Skip to main content

No project description provided

Project description

unigram

Unigram is a library for random (depth first) generation with context-sensitive grammars (but also context free grammars) for synthetic data creation.

One particularity is the option to generate in multiple languages in parallel (for example, tptp and pseudo-english).

Example with LogicNLI grammar:

pip install unigram

from unigram import init_grammar, generate
def LogicNLI():
    ADJECTIVES = ['rich', 'quiet', 'old', 'tall', 'kind', 'brave', 'wise',
                  'happy', 'strong', 'curious', 'patient', 'funny', 'generous', 'humble']
    # (We selected adjectives with no clear semantic interference)
    NAMES = ['mary', 'paul', 'fred', 'alice', 'john', 'susan', 'lucy']

    R = init_grammar(['tptp','eng'])
    R('start(' + ','.join(['rule']*16) + ',' + ','.join(['fact']*8) + ')',
      '&\n'.join([f'({i})' for i in range(24)]),
      '\n'.join([f'{i}' for i in range(24)]))

    R('hypothesis(person,a)', '1(0)', '0 is 1')
    for a in ADJECTIVES:
        R('adj', a)
        R('adj', f'~{a}', f'not {a}', weight=0.2)

    R('property(adj,adj)', '(0(?)&1(?))', 'both 0 and 1')
    R('property(adj,adj)', '(0(?)|1(?))', '0 or 1')
    R('property(adj,adj)', '(0(?)<~>1(?))', 'either 0 or 1', weight=0.5)
    R('property(adj)', '0(?)', '0')

    R('rule(property,property)', '![X]:(0[?←X]=>1[?←X])',
      'everyone who is 0 is 1')
    R('rule(property,property)', '![X]:(0[?←X]<=>1[?←X])',
      'everyone who is 0 is 1 and vice versa')

    for p in NAMES:
        R('person', p)

    R('fact(person,property)', '1[?←0]', '0 is 1')
    R('fact(property)', '?[X]:(0[?←X])', 'someone is 0', weight=0.2)
    R('rule(fact,fact)', '(0)=>(1)', 'if 0 then 1')
    R('rule(fact,fact)', '(0)<=>(1)', 'if 0 then 1 and vice versa')
    return R


eng, tptp = "eng","tptp"
grammar = LogicNLI()
x=generate(grammar)
print(x@eng)
print(x@tptp)

Pre-loaded grammars

We feature pre-written grammars including:

  • tinypy_grammar, reproducing the tinypy, a synthetic toy grammar of python for LLM training/evaluation
  • FOL_grammar, a sophisticated controlled grammar for first order logic aligned with simplified English
  • arith_grammar (a simple grammar for arithmeics)
  • regex_grammar, a grammar generating regular expressions

Example:

from unigram.grammars import FOL_grammar, tinypy_grammar
from unigram import generate
g=tinypy_grammar()
x=generate(g)
print(x@'py')

Citation for the unigram framework:

@inproceedings{sileo-2024-scaling,
    title = "Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars",
    author = "Sileo, Damien",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-main.301/",
    doi = "10.18653/v1/2024.emnlp-main.301",
    pages = "5275--5283",
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unigram-0.15.2.tar.gz (26.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unigram-0.15.2-py3-none-any.whl (29.7 kB view details)

Uploaded Python 3

File details

Details for the file unigram-0.15.2.tar.gz.

File metadata

  • Download URL: unigram-0.15.2.tar.gz
  • Upload date:
  • Size: 26.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for unigram-0.15.2.tar.gz
Algorithm Hash digest
SHA256 64c121c674d46d287385cc54f85970a57f04ba85fa9646882b36c382bfd1c157
MD5 b20e45c6dc870e44d89ae85be90a8e49
BLAKE2b-256 076f35a27dee0b8de9636a45f23ddb4a75a5121b9c6fa6f91ebe18f06ac7e1af

See more details on using hashes here.

File details

Details for the file unigram-0.15.2-py3-none-any.whl.

File metadata

  • Download URL: unigram-0.15.2-py3-none-any.whl
  • Upload date:
  • Size: 29.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for unigram-0.15.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3fa8a572cfe3a9aee1e9a99440ea725f956d491a549381209dbbdade98908e50
MD5 aeb92e7d15c022ca1e3b6df9f50c3268
BLAKE2b-256 5e6b266ec146c600d24e91810c024c0872a4c65dceeeb612d4797fff67809c24

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page