No project description provided
Project description
unigram
Unigram is a library for random (depth first) generation with context-sensitive grammars (but also context free grammars) for synthetic data creation.
One particularity is the option to generate in multiple languages in parallel (for example, tptp and pseudo-english).
Example with LogicNLI grammar:
pip install unigram
from unigram import init_grammar, generate
def LogicNLI():
ADJECTIVES = ['rich', 'quiet', 'old', 'tall', 'kind', 'brave', 'wise',
'happy', 'strong', 'curious', 'patient', 'funny', 'generous', 'humble']
# (We selected adjectives with no clear semantic interference)
NAMES = ['mary', 'paul', 'fred', 'alice', 'john', 'susan', 'lucy']
R = init_grammar(['tptp','eng'])
R('start(' + ','.join(['rule']*16) + ',' + ','.join(['fact']*8) + ')',
'&\n'.join([f'({i})' for i in range(24)]),
'\n'.join([f'{i}' for i in range(24)]))
R('hypothesis(person,a)', '1(0)', '0 is 1')
for a in ADJECTIVES:
R('adj', a)
R('adj', f'~{a}', f'not {a}', weight=0.2)
R('property(adj,adj)', '(0(?)&1(?))', 'both 0 and 1')
R('property(adj,adj)', '(0(?)|1(?))', '0 or 1')
R('property(adj,adj)', '(0(?)<~>1(?))', 'either 0 or 1', weight=0.5)
R('property(adj)', '0(?)', '0')
R('rule(property,property)', '![X]:(0[?←X]=>1[?←X])',
'everyone who is 0 is 1')
R('rule(property,property)', '![X]:(0[?←X]<=>1[?←X])',
'everyone who is 0 is 1 and vice versa')
for p in NAMES:
R('person', p)
R('fact(person,property)', '1[?←0]', '0 is 1')
R('fact(property)', '?[X]:(0[?←X])', 'someone is 0', weight=0.2)
R('rule(fact,fact)', '(0)=>(1)', 'if 0 then 1')
R('rule(fact,fact)', '(0)<=>(1)', 'if 0 then 1 and vice versa')
return R
eng, tptp = "eng","tptp"
grammar = LogicNLI()
x=generate(grammar)
print(x@eng)
print(x@tptp)
Citation:
@inproceedings{sileo-2024-scaling,
title = "Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars",
author = "Sileo, Damien",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-main.301/",
doi = "10.18653/v1/2024.emnlp-main.301",
pages = "5275--5283",
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file unigram-0.9.0.tar.gz.
File metadata
- Download URL: unigram-0.9.0.tar.gz
- Upload date:
- Size: 22.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c9f860fecd35634d5a0101f88143fde2397b4e85af8be0389f80a4e8e0a30f0
|
|
| MD5 |
125915b1d96f4d17105503d836c8543d
|
|
| BLAKE2b-256 |
41ddbb2c690ed2a381ab29143123bec88a728b899ec35ae078468c3b0ab2bb4d
|
Provenance
The following attestation bundles were made for unigram-0.9.0.tar.gz:
Publisher:
python-publish.yml on sileod/unigram
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
unigram-0.9.0.tar.gz -
Subject digest:
3c9f860fecd35634d5a0101f88143fde2397b4e85af8be0389f80a4e8e0a30f0 - Sigstore transparency entry: 263298098
- Sigstore integration time:
-
Permalink:
sileod/unigram@5027b9aefa5c683e064772a67b5a3c9b8c1a2a11 -
Branch / Tag:
refs/tags/v0.0.9 - Owner: https://github.com/sileod
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@5027b9aefa5c683e064772a67b5a3c9b8c1a2a11 -
Trigger Event:
release
-
Statement type:
File details
Details for the file unigram-0.9.0-py3-none-any.whl.
File metadata
- Download URL: unigram-0.9.0-py3-none-any.whl
- Upload date:
- Size: 25.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20a5da89ecabc8d132b790f93e4f317dce5b0013f52d51a382442204e453e131
|
|
| MD5 |
c73eaf30dc1c8780657221e6231236d9
|
|
| BLAKE2b-256 |
7c93a3c9baf4d61cc4969e08ede94885e7aad56584f43b8e371165c2fab2d3e9
|
Provenance
The following attestation bundles were made for unigram-0.9.0-py3-none-any.whl:
Publisher:
python-publish.yml on sileod/unigram
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
unigram-0.9.0-py3-none-any.whl -
Subject digest:
20a5da89ecabc8d132b790f93e4f317dce5b0013f52d51a382442204e453e131 - Sigstore transparency entry: 263298100
- Sigstore integration time:
-
Permalink:
sileod/unigram@5027b9aefa5c683e064772a67b5a3c9b8c1a2a11 -
Branch / Tag:
refs/tags/v0.0.9 - Owner: https://github.com/sileod
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@5027b9aefa5c683e064772a67b5a3c9b8c1a2a11 -
Trigger Event:
release
-
Statement type: