Skip to main content

No project description provided

Project description

unigram

Unigram is a library for random (depth first) generation with context-sensitive grammars (but also context free grammars) for synthetic data creation.

One particularity is the option to generate in multiple languages in parallel (for example, tptp and pseudo-english).

Example with LogicNLI grammar:

pip install unigram

from unigram import init_grammar, generate
def LogicNLI():
    ADJECTIVES = ['rich', 'quiet', 'old', 'tall', 'kind', 'brave', 'wise',
                  'happy', 'strong', 'curious', 'patient', 'funny', 'generous', 'humble']
    # (We selected adjectives with no clear semantic interference)
    NAMES = ['mary', 'paul', 'fred', 'alice', 'john', 'susan', 'lucy']

    R = init_grammar(['tptp','eng'])
    R('start(' + ','.join(['rule']*16) + ',' + ','.join(['fact']*8) + ')',
      '&\n'.join([f'({i})' for i in range(24)]),
      '\n'.join([f'{i}' for i in range(24)]))

    R('hypothesis(person,a)', '1(0)', '0 is 1')
    for a in ADJECTIVES:
        R('adj', a)
        R('adj', f'~{a}', f'not {a}', weight=0.2)

    R('property(adj,adj)', '(0(?)&1(?))', 'both 0 and 1')
    R('property(adj,adj)', '(0(?)|1(?))', '0 or 1')
    R('property(adj,adj)', '(0(?)<~>1(?))', 'either 0 or 1', weight=0.5)
    R('property(adj)', '0(?)', '0')

    R('rule(property,property)', '![X]:(0[?←X]=>1[?←X])',
      'everyone who is 0 is 1')
    R('rule(property,property)', '![X]:(0[?←X]<=>1[?←X])',
      'everyone who is 0 is 1 and vice versa')

    for p in NAMES:
        R('person', p)

    R('fact(person,property)', '1[?←0]', '0 is 1')
    R('fact(property)', '?[X]:(0[?←X])', 'someone is 0', weight=0.2)
    R('rule(fact,fact)', '(0)=>(1)', 'if 0 then 1')
    R('rule(fact,fact)', '(0)<=>(1)', 'if 0 then 1 and vice versa')
    return R


eng, tptp = "eng","tptp"
grammar = LogicNLI()
x=generate(grammar)
print(x@eng)
print(x@tptp)

Citation:

@inproceedings{sileo-2024-scaling,
    title = "Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars",
    author = "Sileo, Damien",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-main.301/",
    doi = "10.18653/v1/2024.emnlp-main.301",
    pages = "5275--5283",
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unigram-0.10.0.tar.gz (22.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unigram-0.10.0-py3-none-any.whl (25.0 kB view details)

Uploaded Python 3

File details

Details for the file unigram-0.10.0.tar.gz.

File metadata

  • Download URL: unigram-0.10.0.tar.gz
  • Upload date:
  • Size: 22.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for unigram-0.10.0.tar.gz
Algorithm Hash digest
SHA256 349588070553f0c206a4960306ddf14da816ed900b3049399c62f8826fe91483
MD5 2d3eb9fae080cbf7320b02d9c4e89994
BLAKE2b-256 c1cdfc76fef8937a7803393bb65f2fe3e8e3f855e8b7559fae12d8a617569687

See more details on using hashes here.

Provenance

The following attestation bundles were made for unigram-0.10.0.tar.gz:

Publisher: python-publish.yml on sileod/unigram

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file unigram-0.10.0-py3-none-any.whl.

File metadata

  • Download URL: unigram-0.10.0-py3-none-any.whl
  • Upload date:
  • Size: 25.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for unigram-0.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bfdf9ead46eb6343a50aeee3d8bf3ceed4d2f5c67e008ef297bea549f0e2b3fe
MD5 67ade3cbbdabed4d9d1c1e00062b0d71
BLAKE2b-256 1850bd5fd3958267c525bf99ea4e4cdacb23ebf03a3a7684911eef43f823c413

See more details on using hashes here.

Provenance

The following attestation bundles were made for unigram-0.10.0-py3-none-any.whl:

Publisher: python-publish.yml on sileod/unigram

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page