A task package for ML Research Bench
Project description
ML Research Benchmark Tasks
This repository contains the tasks for ML Research Benchmark, a benchmarkdesigned to evaluate the capabilities of AI agents in accelerating ML research and development. The benchmark consists of 9 competition-level tasks that span the spectrum of activities typically undertaken by ML researchers.
Introduction
The MLRB aims to measure the acceleration of AI agents in ML research and development. It focuses on competition-level tasks that reflect the current frontiers of ML research, providing a more nuanced and challenging evaluation environment than existing benchmarks.
- :paperclip: ML Research Benchmark Paper
- :robot: ML Research Agent
- :white_check_mark: ML Research Tasks
- :chart_with_upwards_trend: ML Research Evaluation
Installation
pip install mlrb-agent-tasks
Usage
The library exposes a single function, get_task
get_task:
- path: path to copy the task to
- benchmark: name of the benchmark
- task: name of the task
This function will copy the task to the specified path and return a dictionary with the task name and prompt.
{
"name": str, - name of the task
"prompt": str, - prompt for the task
}
Example Usage
from mlrb_agent_tasks import get_task
# Example usage
result = get_task("./", "full_benchmark", "llm_efficiency")
print(result['prompt'])
Contributing
We welcome contributions to the ML Research Benchmark! Please read our CONTRIBUTING.md file for guidelines on how to submit issues, feature requests, and pull requests.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contact
For questions or feedback, please open an issue in this repository or contact matt@algorithmicresearchgroup.com.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlrb_agent_tasks-0.0.23.tar.gz.
File metadata
- Download URL: mlrb_agent_tasks-0.0.23.tar.gz
- Upload date:
- Size: 38.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e3e78b5c7cb9160205d4b206803b303a7b5b0d1da4a1225cf34f536acf5ff41
|
|
| MD5 |
35e93eb48bf3dbc77abb92c9eaceffd9
|
|
| BLAKE2b-256 |
7237cce338189d644ea6fa831d5300c25e760bb75441143236dd6d5e55dd19ce
|
File details
Details for the file mlrb_agent_tasks-0.0.23-py3-none-any.whl.
File metadata
- Download URL: mlrb_agent_tasks-0.0.23-py3-none-any.whl
- Upload date:
- Size: 56.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2fe007b53bcb3a543a6833234c8474e7198c1c8144c4c0c60056ff66b2d7018
|
|
| MD5 |
91aee643f2cc028675ca04a911894dfe
|
|
| BLAKE2b-256 |
0d5a54639025c455d5ac622c6a5c626a1332b4e17fb5ec87812e74420e2cd177
|