Skip to main content

No project description provided

Project description

unigram

Unigram is a library for random (depth first) generation with context-sensitive grammars (but also context free grammars) for synthetic data creation.

One particularity is the option to generate in multiple languages in parallel (for example, tptp and pseudo-english).

Example with LogicNLI grammar:

pip install unigram

from unigram import init_grammar, generate
def LogicNLI():
    ADJECTIVES = ['rich', 'quiet', 'old', 'tall', 'kind', 'brave', 'wise',
                  'happy', 'strong', 'curious', 'patient', 'funny', 'generous', 'humble']
    # (We selected adjectives with no clear semantic interference)
    NAMES = ['mary', 'paul', 'fred', 'alice', 'john', 'susan', 'lucy']

    R = init_grammar(['tptp','eng'])
    R('start(' + ','.join(['rule']*16) + ',' + ','.join(['fact']*8) + ')',
      '&\n'.join([f'({i})' for i in range(24)]),
      '\n'.join([f'{i}' for i in range(24)]))

    R('hypothesis(person,a)', '1(0)', '0 is 1')
    for a in ADJECTIVES:
        R('adj', a)
        R('adj', f'~{a}', f'not {a}', weight=0.2)

    R('property(adj,adj)', '(0(?)&1(?))', 'both 0 and 1')
    R('property(adj,adj)', '(0(?)|1(?))', '0 or 1')
    R('property(adj,adj)', '(0(?)<~>1(?))', 'either 0 or 1', weight=0.5)
    R('property(adj)', '0(?)', '0')

    R('rule(property,property)', '![X]:(0[?←X]=>1[?←X])',
      'everyone who is 0 is 1')
    R('rule(property,property)', '![X]:(0[?←X]<=>1[?←X])',
      'everyone who is 0 is 1 and vice versa')

    for p in NAMES:
        R('person', p)

    R('fact(person,property)', '1[?←0]', '0 is 1')
    R('fact(property)', '?[X]:(0[?←X])', 'someone is 0', weight=0.2)
    R('rule(fact,fact)', '(0)=>(1)', 'if 0 then 1')
    R('rule(fact,fact)', '(0)<=>(1)', 'if 0 then 1 and vice versa')
    return R


eng, tptp = "eng","tptp"
grammar = LogicNLI()
x=generate(grammar)
print(x@eng)
print(x@tptp)

Pre-loaded grammars

We feature pre-written grammars including:

  • tinypy_grammar, reproducing the tinypy, a synthetic toy grammar of python for LLM training/evaluation
  • FOL_grammar, a sophisticated controlled grammar for first order logic aligned with simplified English
  • arith_grammar (a simple grammar for arithmeics)
  • regex_grammar, a grammar generating regular expressions

Example:

from unigram.grammars import FOL_grammar, tinypy_grammar
from unigram import generate
g=tinypy_grammar()
x=generate(g)
print(x@'py')

Citation for the unigram framework:

@inproceedings{sileo-2024-scaling,
    title = "Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars",
    author = "Sileo, Damien",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-main.301/",
    doi = "10.18653/v1/2024.emnlp-main.301",
    pages = "5275--5283",
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unigram-0.15.3.tar.gz (26.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unigram-0.15.3-py3-none-any.whl (30.1 kB view details)

Uploaded Python 3

File details

Details for the file unigram-0.15.3.tar.gz.

File metadata

  • Download URL: unigram-0.15.3.tar.gz
  • Upload date:
  • Size: 26.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for unigram-0.15.3.tar.gz
Algorithm Hash digest
SHA256 45f1cfac442fdf0a3993c8c233426a9ee63cc71c1fa69e5ba248d4e92ba21e15
MD5 a3004afc06e289536ebdf0019e4e307b
BLAKE2b-256 d59b4cc08b7623577cf11cefa9affc7b93fa1dc1c0183137e782c404a0a0cf49

See more details on using hashes here.

File details

Details for the file unigram-0.15.3-py3-none-any.whl.

File metadata

  • Download URL: unigram-0.15.3-py3-none-any.whl
  • Upload date:
  • Size: 30.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for unigram-0.15.3-py3-none-any.whl
Algorithm Hash digest
SHA256 5611e305721fa8bb576ff881019745a9ec5bf30b98763cea4098733329911ee8
MD5 fd1a3c539768f70fca95c1a45cd1c4e4
BLAKE2b-256 2aa043b3e166d05a18ed97b7166ba9efb7b376393e687f3efb6724a86ac38fda

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page