A package for generating multilingual symbolic GSM math problems
Project description
multilingual-gsm-symbolic
A Python package for generating synthetic multilingual math problems from symbolic templates. See the Data section for available languages.
Installation
pip install multilingual-gsm-symbolic
Quickstart
from multilingual_gsm_symbolic import load_data, load_replacements, available_languages
# see possible languages
languages = available_languages()
lang = "eng"
print(languages[lang])
# {"number of samples": 100}
# Load English templates (default)
templates = load_data(lang)
# Load language-specific replacement values (used in some templates)
replacements = load_replacements(lang)
# Generate concrete questions from a template
template = templates[0]
questions = template.generate_questions(n=10, language="eng", replacements=replacements)
for q in questions:
print(q.question)
print(q.answer)
print()
Template format
Templates are JSON files with four fields:
| Field | Description |
|---|---|
question |
Concrete question (the original example) |
answer |
Concrete answer with calculation steps |
question_annotated |
Template with variable placeholders and #init / #conditions / #answer sections |
answer_annotated |
Answer template with inline expressions |
Annotated question syntax
{variable, default_value} — placeholder in the question text
#init:
- $var = range(low, high) — variable sampled from a range
- $var = sample([a, b, c]) — variable sampled from a list
#conditions:
- is_int(x / y) — constraint that must hold for a combination to be valid
#answer: x * y + z — Python expression evaluated to produce the numeric answer
Example: fog bank problem
{
"question": "A fog bank rolls in over a city at 3 miles/hour. The city is 42 miles wide. How many hours will it take for the fog bank to cover the city?",
"question_annotated": "A fog bank rolls in over a city at {speed,3} miles/hour. The city is {width,42} miles wide. How many hours will it take for the fog bank to cover the city?\n#init:\n- $speed = range(1, 20)\n- $width = range(2, 100)\n#conditions:\n- is_int(width / speed)\n#answer: width // speed",
"answer": "At 3 miles/hour, it will take 42/3=14 hours for the fog to cover the city.",
"answer_annotated": "At {speed} miles/hour, it will take {width}/{speed}={width//speed} hours for the fog to cover the city."
}
Example: shopping problem
{
"question": "A store sells apples for $2 each and oranges for $3 each. If you buy 4 apples and 5 oranges, how much do you spend?",
"question_annotated": "A store sells apples for ${apple_price,2} each and oranges for ${orange_price,3} each. If you buy {n_apples,4} apples and {n_oranges,5} oranges, how much do you spend?\n#init:\n- $apple_price = range(1, 10)\n- $orange_price = range(1, 10)\n- $n_apples = range(1, 20)\n- $n_oranges = range(1, 20)\n#conditions:\n- True\n#answer: apple_price * n_apples + orange_price * n_oranges",
"answer": "You spend 4*2 + 5*3 = 8 + 15 = $23.",
"answer_annotated": "You spend {n_apples}*{apple_price} + {n_oranges}*{orange_price} = {n_apples*apple_price} + {n_oranges*orange_price} = ${apple_price*n_apples + orange_price*n_oranges}."
}
Available helper functions
| Function | Description |
|---|---|
range(start, end[, step]) |
All integers in [start, end) |
sample([a, b, c]) |
One value from the list |
range_sample(start, end, step) |
Uniform sample from a range |
sample_sequential(items, n) |
n consecutive items from a list |
arange_sample(start, end, step) |
Sample from np.arange(start, end, step) |
is_int(x) |
True if x is an integer |
divides(a, b) |
True if a divides b |
frac_format(x) |
Format x as a fraction string |
📖 API reference
load_data(language="eng", directory=None) → list[AnnotatedQuestion]
Load symbolic templates.
language—"eng"(default) or"dan", or any language code for which a template folder existsdirectory— override the bundled data; load templates from this path instead
load_replacements(language="eng") → dict
Load language-specific named values (e.g. lists of names, places) used inside templates.
load_gsm(language="eng", directory=None) → list[GSMProblem]
Load the bundled concrete problems for a given language.
AnnotatedQuestion
Core class. Constructed from a JSON template file via AnnotatedQuestion.from_json(path).
Key methods:
| Method | Description |
|---|---|
generate_questions(n, language, replacements) |
Generate n concrete Question instances |
get_default_assignments(replacements) |
Extract the example variable values from the template |
format_question(assignments, language) |
Render the question text for a given assignment |
format_answer(assignments, language) |
Render the answer text for a given assignment |
Question
Dataclass holding a single generated problem: question, answer, id_orig, id_shuffled.
GSMProblem
Pydantic model for a concrete problem loaded from disk: question, answer, id_orig, filepath.
Data
The English templates are derived from Apple's GSM-Symbolic paper. The Danish templates are manual translations and localizations of the English set, validated both computationally and manually. The original concrete problems are from GSM8k.
| Language | Code | Templates |
|---|---|---|
| English | eng |
100 |
| Danish | dan |
100 |
Acknowledgement
The symbolic template engine and the danish subset were originally developed as part of the m-gsm-symbolic project at the Centre for Humanities Computing by:
The initial template format was derived from Apple's GSM-Symbolic paper and the original concrete problems are from GSM8k.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file multilingual_gsm_symbolic-0.1.0.tar.gz.
File metadata
- Download URL: multilingual_gsm_symbolic-0.1.0.tar.gz
- Upload date:
- Size: 121.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d44a7c40a47385c43c23852f602cdcc5fda8e837c880f2cff8d5fe6f0241e622
|
|
| MD5 |
23181968860ca52310bb87ecae5c494b
|
|
| BLAKE2b-256 |
2145d6144f987048764958d8084d9fa6b8df62cacb10782cf158cb48c46e814e
|
File details
Details for the file multilingual_gsm_symbolic-0.1.0-py3-none-any.whl.
File metadata
- Download URL: multilingual_gsm_symbolic-0.1.0-py3-none-any.whl
- Upload date:
- Size: 206.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
38ff2370943c2c55a66e63aa4be32b908f033d05e31efd222f6cd0593e26f01e
|
|
| MD5 |
e7fb045c48231008c92d68b6d78af3db
|
|
| BLAKE2b-256 |
e9473e635bebd92e0b7d94d36bac1dc74e8469480d0d876e1c93d3bece126b78
|