Skip to main content

A task package for ML Research Bench

Project description

ML Research Benchmark Tasks

This repository contains the tasks for ML Research Benchmark, a benchmarkdesigned to evaluate the capabilities of AI agents in accelerating ML research and development. The benchmark consists of 9 competition-level tasks that span the spectrum of activities typically undertaken by ML researchers.

Introduction

The MLRB aims to measure the acceleration of AI agents in ML research and development. It focuses on competition-level tasks that reflect the current frontiers of ML research, providing a more nuanced and challenging evaluation environment than existing benchmarks.

arXiv

Installation

pip install mlrb-agent-tasks

Usage

The library exposes a single function, get_task

get_task:

  • path: path to copy the task to
  • benchmark: name of the benchmark
  • task: name of the task

This function will copy the task to the specified path and return a dictionary with the task name and prompt.

{
    "name": str, - name of the task
    "prompt": str, - prompt for the task
}

Example Usage

from mlrb_agent_tasks import get_task

# Example usage
result = get_task("./", "full_benchmark", "llm_efficiency")
print(result['prompt'])

Contributing

We welcome contributions to the ML Research Benchmark! Please read our CONTRIBUTING.md file for guidelines on how to submit issues, feature requests, and pull requests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

For questions or feedback, please open an issue in this repository or contact matt@algorithmicresearchgroup.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlrb_agent_tasks-0.0.23.tar.gz (38.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlrb_agent_tasks-0.0.23-py3-none-any.whl (56.0 kB view details)

Uploaded Python 3

File details

Details for the file mlrb_agent_tasks-0.0.23.tar.gz.

File metadata

  • Download URL: mlrb_agent_tasks-0.0.23.tar.gz
  • Upload date:
  • Size: 38.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for mlrb_agent_tasks-0.0.23.tar.gz
Algorithm Hash digest
SHA256 1e3e78b5c7cb9160205d4b206803b303a7b5b0d1da4a1225cf34f536acf5ff41
MD5 35e93eb48bf3dbc77abb92c9eaceffd9
BLAKE2b-256 7237cce338189d644ea6fa831d5300c25e760bb75441143236dd6d5e55dd19ce

See more details on using hashes here.

File details

Details for the file mlrb_agent_tasks-0.0.23-py3-none-any.whl.

File metadata

File hashes

Hashes for mlrb_agent_tasks-0.0.23-py3-none-any.whl
Algorithm Hash digest
SHA256 c2fe007b53bcb3a543a6833234c8474e7198c1c8144c4c0c60056ff66b2d7018
MD5 91aee643f2cc028675ca04a911894dfe
BLAKE2b-256 0d5a54639025c455d5ac622c6a5c626a1332b4e17fb5ec87812e74420e2cd177

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page