Skip to main content

A package for generating multilingual symbolic GSM math problems

Project description

multilingual-gsm-symbolic

A Python package for generating synthetic multilingual math problems from symbolic templates. See the Data section for available languages.

Example of a symbolic template and generated questions

Installation

pip install multilingual-gsm-symbolic

Quickstart

from multilingual_gsm_symbolic import load_data, load_replacements, available_languages

# see possible languages
languages = available_languages()

lang = "eng"
print(languages[lang])
# {"number of samples": 100}

# Load English templates (default)
templates = load_data(lang)

# Load language-specific replacement values (used in some templates)
replacements = load_replacements(lang)

# Generate concrete questions from a template
template = templates[0]
questions = template.generate_questions(n=10, language="eng", replacements=replacements)

for q in questions:
    print(q.question)
    print(q.answer)
    print()

Template format

Templates are JSON files with four fields:

Field Description
question Concrete question (the original example)
answer Concrete answer with calculation steps
question_annotated Template with variable placeholders and #init / #conditions / #answer sections
answer_annotated Answer template with inline expressions

Annotated question syntax

{variable, default_value}   — placeholder in the question text
#init:
- $var = range(low, high)   — variable sampled from a range
- $var = sample([a, b, c])  — variable sampled from a list
#conditions:
- is_int(x / y)             — constraint that must hold for a combination to be valid
#answer: x * y + z          — Python expression evaluated to produce the numeric answer
Example: fog bank problem
{
  "question": "A fog bank rolls in over a city at 3 miles/hour. The city is 42 miles wide. How many hours will it take for the fog bank to cover the city?",
  "question_annotated": "A fog bank rolls in over a city at {speed,3} miles/hour. The city is {width,42} miles wide. How many hours will it take for the fog bank to cover the city?\n#init:\n- $speed = range(1, 20)\n- $width = range(2, 100)\n#conditions:\n- is_int(width / speed)\n#answer: width // speed",
  "answer": "At 3 miles/hour, it will take 42/3=14 hours for the fog to cover the city.",
  "answer_annotated": "At {speed} miles/hour, it will take {width}/{speed}={width//speed} hours for the fog to cover the city."
}
Example: shopping problem
{
  "question": "A store sells apples for $2 each and oranges for $3 each. If you buy 4 apples and 5 oranges, how much do you spend?",
  "question_annotated": "A store sells apples for ${apple_price,2} each and oranges for ${orange_price,3} each. If you buy {n_apples,4} apples and {n_oranges,5} oranges, how much do you spend?\n#init:\n- $apple_price = range(1, 10)\n- $orange_price = range(1, 10)\n- $n_apples = range(1, 20)\n- $n_oranges = range(1, 20)\n#conditions:\n- True\n#answer: apple_price * n_apples + orange_price * n_oranges",
  "answer": "You spend 4*2 + 5*3 = 8 + 15 = $23.",
  "answer_annotated": "You spend {n_apples}*{apple_price} + {n_oranges}*{orange_price} = {n_apples*apple_price} + {n_oranges*orange_price} = ${apple_price*n_apples + orange_price*n_oranges}."
}

Available helper functions

Function Description
range(start, end[, step]) All integers in [start, end)
sample([a, b, c]) One value from the list
range_sample(start, end, step) Uniform sample from a range
sample_sequential(items, n) n consecutive items from a list
arange_sample(start, end, step) Sample from np.arange(start, end, step)
is_int(x) True if x is an integer
divides(a, b) True if a divides b
frac_format(x) Format x as a fraction string

📖 API reference

load_data(language="eng", directory=None) → list[AnnotatedQuestion]

Load symbolic templates.

  • language"eng" (default) or "dan", or any language code for which a template folder exists
  • directory — override the bundled data; load templates from this path instead

load_replacements(language="eng") → dict

Load language-specific named values (e.g. lists of names, places) used inside templates.

load_gsm(language="eng", directory=None) → list[GSMProblem]

Load the bundled concrete problems for a given language.

AnnotatedQuestion

Core class. Constructed from a JSON template file via AnnotatedQuestion.from_json(path).

Key methods:

Method Description
generate_questions(n, language, replacements) Generate n concrete Question instances
get_default_assignments(replacements) Extract the example variable values from the template
format_question(assignments, language) Render the question text for a given assignment
format_answer(assignments, language) Render the answer text for a given assignment

Question

Dataclass holding a single generated problem: question, answer, id_orig, id_shuffled.

GSMProblem

Pydantic model for a concrete problem loaded from disk: question, answer, id_orig, filepath.

Data

The English templates are derived from Apple's GSM-Symbolic paper. The Danish templates are manual translations and localizations of the English set, validated both computationally and manually. The original concrete problems are from GSM8k.

Language Code Templates
English eng 100
Danish dan 100

Acknowledgement

The symbolic template engine and the danish subset were originally developed as part of the m-gsm-symbolic project at the Centre for Humanities Computing by:

The initial template format was derived from Apple's GSM-Symbolic paper and the original concrete problems are from GSM8k.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multilingual_gsm_symbolic-0.1.0.tar.gz (121.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

multilingual_gsm_symbolic-0.1.0-py3-none-any.whl (206.0 kB view details)

Uploaded Python 3

File details

Details for the file multilingual_gsm_symbolic-0.1.0.tar.gz.

File metadata

  • Download URL: multilingual_gsm_symbolic-0.1.0.tar.gz
  • Upload date:
  • Size: 121.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for multilingual_gsm_symbolic-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d44a7c40a47385c43c23852f602cdcc5fda8e837c880f2cff8d5fe6f0241e622
MD5 23181968860ca52310bb87ecae5c494b
BLAKE2b-256 2145d6144f987048764958d8084d9fa6b8df62cacb10782cf158cb48c46e814e

See more details on using hashes here.

File details

Details for the file multilingual_gsm_symbolic-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: multilingual_gsm_symbolic-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 206.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for multilingual_gsm_symbolic-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 38ff2370943c2c55a66e63aa4be32b908f033d05e31efd222f6cd0593e26f01e
MD5 e7fb045c48231008c92d68b6d78af3db
BLAKE2b-256 e9473e635bebd92e0b7d94d36bac1dc74e8469480d0d876e1c93d3bece126b78

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page