Skip to main content

A package for generating multilingual symbolic GSM math problems

Project description

multilingual-gsm-symbolic

A Python package for generating synthetic multilingual math problems from symbolic templates. See the Data section for available languages.

Example of a symbolic template and generated questions

Installation

pip install multilingual-gsm-symbolic

Quickstart

from multilingual_gsm_symbolic import load_data, load_replacements, available_languages

# see possible languages
languages = available_languages()

lang = "eng"
print(languages[lang])
# {"number of samples": 100}

# Load English templates (default)
templates = load_data(lang)

# Load language-specific replacement values (used in some templates)
replacements = load_replacements(lang)

# Generate concrete questions from a template
template = templates[0]
questions = template.generate_questions(n=10, language="eng", replacements=replacements)

for q in questions:
    print(q.question)
    print(q.answer)
    print()

Template format

Templates are JSON files with four fields:

Field Description
question Concrete question (the original example)
answer Concrete answer with calculation steps
question_annotated Template with variable placeholders and #init / #conditions / #answer sections
answer_annotated Answer template with inline expressions

Annotated question syntax

{variable, default_value}   — placeholder in the question text
#init:
- $var = range(low, high)   — variable sampled from a range
- $var = sample([a, b, c])  — variable sampled from a list
#conditions:
- is_int(x / y)             — constraint that must hold for a combination to be valid
#answer: x * y + z          — Python expression evaluated to produce the numeric answer
Example: fog bank problem
{
  "question": "A fog bank rolls in over a city at 3 miles/hour. The city is 42 miles wide. How many hours will it take for the fog bank to cover the city?",
  "question_annotated": "A fog bank rolls in over a city at {speed,3} miles/hour. The city is {width,42} miles wide. How many hours will it take for the fog bank to cover the city?\n#init:\n- $speed = range(1, 20)\n- $width = range(2, 100)\n#conditions:\n- is_int(width / speed)\n#answer: width // speed",
  "answer": "At 3 miles/hour, it will take 42/3=14 hours for the fog to cover the city.",
  "answer_annotated": "At {speed} miles/hour, it will take {width}/{speed}={width//speed} hours for the fog to cover the city."
}
Example: shopping problem
{
  "question": "A store sells apples for $2 each and oranges for $3 each. If you buy 4 apples and 5 oranges, how much do you spend?",
  "question_annotated": "A store sells apples for ${apple_price,2} each and oranges for ${orange_price,3} each. If you buy {n_apples,4} apples and {n_oranges,5} oranges, how much do you spend?\n#init:\n- $apple_price = range(1, 10)\n- $orange_price = range(1, 10)\n- $n_apples = range(1, 20)\n- $n_oranges = range(1, 20)\n#conditions:\n- True\n#answer: apple_price * n_apples + orange_price * n_oranges",
  "answer": "You spend 4*2 + 5*3 = 8 + 15 = $23.",
  "answer_annotated": "You spend {n_apples}*{apple_price} + {n_oranges}*{orange_price} = {n_apples*apple_price} + {n_oranges*orange_price} = ${apple_price*n_apples + orange_price*n_oranges}."
}

Available helper functions

Function Description
range(start, end[, step]) All integers in [start, end)
sample([a, b, c]) One value from the list
range_sample(start, end, step) Uniform sample from a range
sample_sequential(items, n) n consecutive items from a list
arange_sample(start, end, step) Sample from np.arange(start, end, step)
is_int(x) True if x is an integer
divides(a, b) True if a divides b
frac_format(x) Format x as a fraction string

📖 API reference

load_data(language="eng", directory=None) → list[AnnotatedQuestion]

Load symbolic templates.

  • language"eng" (default) or "dan", or any language code for which a template folder exists
  • directory — override the bundled data; load templates from this path instead

load_replacements(language="eng") → dict

Load language-specific named values (e.g. lists of names, places) used inside templates.

load_gsm(language="eng", directory=None) → list[GSMProblem]

Load the bundled concrete problems for a given language.

AnnotatedQuestion

Core class. Constructed from a JSON template file via AnnotatedQuestion.from_json(path).

Key methods:

Method Description
generate_questions(n, language, replacements) Generate n concrete Question instances
get_default_assignments(replacements) Extract the example variable values from the template
format_question(assignments, language) Render the question text for a given assignment
format_answer(assignments, language) Render the answer text for a given assignment

Question

Dataclass holding a single generated problem: question, answer, id_orig, id_shuffled.

GSMProblem

Pydantic model for a concrete problem loaded from disk: question, answer, id_orig, filepath.

Data

The English templates are derived from Apple's GSM-Symbolic paper. The Danish templates are manual translations and localizations of the English set, validated both computationally and manually. The original concrete problems are from GSM8k.

Language Code Templates
English eng 100
Danish dan 100

Acknowledgement

The symbolic template engine and the danish subset were originally developed as part of the m-gsm-symbolic project at the Centre for Humanities Computing by:

The initial template format was derived from Apple's GSM-Symbolic paper and the original concrete problems are from GSM8k.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multilingual_gsm_symbolic-0.1.1.tar.gz (121.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

multilingual_gsm_symbolic-0.1.1-py3-none-any.whl (206.1 kB view details)

Uploaded Python 3

File details

Details for the file multilingual_gsm_symbolic-0.1.1.tar.gz.

File metadata

  • Download URL: multilingual_gsm_symbolic-0.1.1.tar.gz
  • Upload date:
  • Size: 121.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for multilingual_gsm_symbolic-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d8ccf11e025178ba26feba545ee0128733e54e45bd8a6d0331c1fa4c78c11552
MD5 cbf339575e736bf7a8f3c8ea65ae57b9
BLAKE2b-256 7891cf57a604ded73ddb9eae3d7e602e45bcf80c923ce155f39d229bb7146aa9

See more details on using hashes here.

File details

Details for the file multilingual_gsm_symbolic-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: multilingual_gsm_symbolic-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 206.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for multilingual_gsm_symbolic-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2ae39d583a9042c92e4a665cf9810aefc475721709ec546900be40dea23681d2
MD5 f1abf5013443a77a4e60c8be40458528
BLAKE2b-256 ef5694da80bb0ad8fec8ef17f0566bc37697d478a9c1f06958d159c35b69067b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page