Skip to main content

No project description provided

Project description

unigram

Unigram is a library for random (depth first) generation with context-sensitive grammars (but also context free grammars) for synthetic data creation.

One particularity is the option to generate in multiple languages in parallel (for example, tptp and pseudo-english).

Example with LogicNLI grammar:

pip install unigram

from unigram import init_grammar, generate
def LogicNLI():
    ADJECTIVES = ['rich', 'quiet', 'old', 'tall', 'kind', 'brave', 'wise',
                  'happy', 'strong', 'curious', 'patient', 'funny', 'generous', 'humble']
    # (We selected adjectives with no clear semantic interference)
    NAMES = ['mary', 'paul', 'fred', 'alice', 'john', 'susan', 'lucy']

    R = init_grammar(['tptp','eng'])
    R('start(' + ','.join(['rule']*16) + ',' + ','.join(['fact']*8) + ')',
      '&\n'.join([f'({i})' for i in range(24)]),
      '\n'.join([f'{i}' for i in range(24)]))

    R('hypothesis(person,a)', '1(0)', '0 is 1')
    for a in ADJECTIVES:
        R('adj', a)
        R('adj', f'~{a}', f'not {a}', weight=0.2)

    R('property(adj,adj)', '(0(?)&1(?))', 'both 0 and 1')
    R('property(adj,adj)', '(0(?)|1(?))', '0 or 1')
    R('property(adj,adj)', '(0(?)<~>1(?))', 'either 0 or 1', weight=0.5)
    R('property(adj)', '0(?)', '0')

    R('rule(property,property)', '![X]:(0[?←X]=>1[?←X])',
      'everyone who is 0 is 1')
    R('rule(property,property)', '![X]:(0[?←X]<=>1[?←X])',
      'everyone who is 0 is 1 and vice versa')

    for p in NAMES:
        R('person', p)

    R('fact(person,property)', '1[?←0]', '0 is 1')
    R('fact(property)', '?[X]:(0[?←X])', 'someone is 0', weight=0.2)
    R('rule(fact,fact)', '(0)=>(1)', 'if 0 then 1')
    R('rule(fact,fact)', '(0)<=>(1)', 'if 0 then 1 and vice versa')
    return R


eng, tptp = "eng","tptp"
grammar = LogicNLI()
x=generate(grammar)
print(x@eng)
print(x@tptp)

Citation:

@inproceedings{sileo-2024-scaling,
    title = "Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars",
    author = "Sileo, Damien",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-main.301/",
    doi = "10.18653/v1/2024.emnlp-main.301",
    pages = "5275--5283",
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unigram-0.9.0.tar.gz (22.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unigram-0.9.0-py3-none-any.whl (25.0 kB view details)

Uploaded Python 3

File details

Details for the file unigram-0.9.0.tar.gz.

File metadata

  • Download URL: unigram-0.9.0.tar.gz
  • Upload date:
  • Size: 22.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for unigram-0.9.0.tar.gz
Algorithm Hash digest
SHA256 3c9f860fecd35634d5a0101f88143fde2397b4e85af8be0389f80a4e8e0a30f0
MD5 125915b1d96f4d17105503d836c8543d
BLAKE2b-256 41ddbb2c690ed2a381ab29143123bec88a728b899ec35ae078468c3b0ab2bb4d

See more details on using hashes here.

Provenance

The following attestation bundles were made for unigram-0.9.0.tar.gz:

Publisher: python-publish.yml on sileod/unigram

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file unigram-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: unigram-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 25.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for unigram-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 20a5da89ecabc8d132b790f93e4f317dce5b0013f52d51a382442204e453e131
MD5 c73eaf30dc1c8780657221e6231236d9
BLAKE2b-256 7c93a3c9baf4d61cc4969e08ede94885e7aad56584f43b8e371165c2fab2d3e9

See more details on using hashes here.

Provenance

The following attestation bundles were made for unigram-0.9.0-py3-none-any.whl:

Publisher: python-publish.yml on sileod/unigram

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page