Skip to main content

A tool designed to evaluate the performance of large language models on mathematical tasks.

Project description

LLM-Math

基于 math-evaluation-harness 改造的 plug-and-play 数学评测包.

Usage

  1. set_seed(seed)

    设置全局种子.

  2. basic_check(A, B)

    检查 A, B 两个纯数学表达式是否一致,返回 True / False.

  3. check(prompt_type, data_name, target, pred)

    检查 pred 是否与 target 一致,返回 True / False. target 即为数据集的某一行.

  4. engine = MathEval(model_path, args)

    加载模型,args 为加载参数,见 vllm.LLM.

  5. engine.set_sampling_args(args)

    设定推理参数,再次使用该命令可更新参数,见 vllm.SamplingParams.

  6. results = engine.generate(inputs)

    进行批量推理.

  7. results = engine.chat(messages)

    进行单次对话.

  8. engine.test(datasets=["gsm8k", "math"], prompt_type="direct", args)

    进行评测. 可用的参数:

    prompt_type="cot": prompt 的类型设置.

    split="test": 测试集的选取.

    num_test_sample=-1, 随机选取进行测试的数量.

    shuffle=True, 是否随机打乱测试集.

    save_outputs=True, 是否保存模型输出.

  • 支持的 prompt 类型: tool-integrated, direct, cot, pal, self-instruct, self-instruct-boxed, tora, wizard_zs, platypus_fs, deepseek-math, kpmath.

  • 支持的数据集: gsm8k, math, svamp, asdiv, mawps, tabmwp, mathqa, mmlu_stem, sat_math.

Notes

  • 模型需支持 vLLM.

  • set_sampling_args 中设定的 stop_word 在 test 中不起作用.

  • 由于设置全局种子也会设定模型加载和推理的种子,所以模型加载和推理时不需要再次设定种子.

  • tensor_parallel_size 默认设为 torch.cuda.device_count().

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_math-0.2.0.tar.gz (6.5 MB view details)

Uploaded Source

Built Distribution

llm_math-0.2.0-py3-none-any.whl (6.6 MB view details)

Uploaded Python 3

File details

Details for the file llm_math-0.2.0.tar.gz.

File metadata

  • Download URL: llm_math-0.2.0.tar.gz
  • Upload date:
  • Size: 6.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.19.1 CPython/3.9.19 Darwin/24.1.0

File hashes

Hashes for llm_math-0.2.0.tar.gz
Algorithm Hash digest
SHA256 053f3619c3b4beb02ae482f8284d9ed9a3f25c38c01527b262020ff45e3a63e8
MD5 3b5d60ce53d47e17b6379afa7e106032
BLAKE2b-256 13123650a5fab63fafd2127505a64ca538390802bc62884c670621a03d572502

See more details on using hashes here.

File details

Details for the file llm_math-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: llm_math-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 6.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.19.1 CPython/3.9.19 Darwin/24.1.0

File hashes

Hashes for llm_math-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 716d745fca35076691781c640a709362d79410b64b67a102868c54595d1e601c
MD5 2df0b255a074248f3f73d642744aa820
BLAKE2b-256 24cf311f026e517ebf120f11e8fd9da83e74cf24db76011828e299db0784ffc1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page