4 projects
swebench
The official SWE-bench package - a benchmark for evaluating LMs on software engineering
sweagent
The official SWE-agent package - an open source Agent Computer Interface for running language models as software engineers
intercode-bench
The official InterCode benchmark package - a framework for interactive code tasks
webshop
Python package for WebShop environment