Open-ended Scientific Discovery via Bayesian Surprise
Project description
Open-ended Scientific Discovery via Bayesian Surprise
Link to our NeurIPS 2025 paper: AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise
Deployment
Image Tagging Strategy
The autodiscovery Docker image follows an environment-based tagging strategy:
- Dev environment (
mainbranch)::dev,:dev-${commit_sha} - Prod environment (
env/prodbranch)::prod,:prod-${commit_sha}
Images are automatically built and pushed by GitHub Actions when changes merge to main or env/prod.
Note: We do not use :latest tags. All deployments must explicitly specify :dev or :prod to prevent accidental environment mixing.
Deploying to Cloud Run
Deploy or update the Cloud Run Job from the root of the repo:
For development environment:
make deploy-autodiscovery
# Or with explicit env tag:
ENV_TAG=dev SKIP_BUILD=true make deploy-autodiscovery
For production environment:
ENV_TAG=prod SKIP_BUILD=true make deploy-autodiscovery
The SKIP_BUILD=true flag skips building the image (uses the image already built by GitHub Actions). Omit it to build locally.
Datasets
DiscoveryBench
git clone https://github.com/allenai/discoverybench.git temp_db
cp -r temp_db/discoverybench discoverybench
rm -rf temp_db
Blade
git clone https://github.com/behavioral-data/BLADE.git temp_db
cp -r temp_db/blade_bench/datasets blade
rm -rf temp_db
BYO-Datasets!
You can also use your own datasets. To do this, pass in a dataset metadata JSON file containing descriptions of the paths of datasets (relative to the metadata file) and their column descriptions in natural language. You can have a look at the metadata files in the DiscoveryBench directory from above as examples.
Run AutoDS (MCTS-based hypothesis search and verification)
For example, to explore the DiscoveryBench NLS SES dataset, the following command can be used:
# From the repo root
uv run --package autodiscovery python -m autodiscovery.run \
--work_dir="work" \
--out_dir="outputs" \
--dataset_metadata="discoverybench/real/test/nls_ses/metadata.json" \
--n_experiments=16 \
--model="gemini-3-flash-preview" \
--belief_model="gemini-3-flash-preview" \
--vision_model="gemini-3-flash-preview"
To resume a previous exploration, use the --continue_from_dir flag to specify the directory containing the previous
exploration logs. This will allow the script to continue from where it left off, using the MCTS nodes it had generated
so far.
✍️ Get in touch!
Please reach out to us on email or open a GitHub issue in case of any issues running the code: dagarwal@cs.umass.edu (Dhruv Agarwal), bodhisattwam@allenai.org (Bodhisattwa Prasad Majumder).
📄 Citation
If you find our work useful, please cite our paper:
@inproceedings{
agarwal2025autodiscovery,
title={AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise},
author={Dhruv Agarwal and Bodhisattwa Prasad Majumder and Reece Adamson and Megha Chakravorty and Satvika Reddy Gavireddy and Aditya Parashar and Harshit Surana and Bhavana Dalvi Mishra and Andrew McCallum and Ashish Sabharwal and Peter Clark},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=kJqTkj2HhF}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file asta_autodiscovery-0.1.1.tar.gz.
File metadata
- Download URL: asta_autodiscovery-0.1.1.tar.gz
- Upload date:
- Size: 78.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60a4ad314b99969697e7665208c5a9e2c377e0c0638c65ea62218d0e4375db81
|
|
| MD5 |
b2c9b8ac74d55b2e150d3d9c5da03e18
|
|
| BLAKE2b-256 |
c89c02c909651d99ecde4cb64c2d8808143c97e5571620129e542a3bbd2e4b62
|
Provenance
The following attestation bundles were made for asta_autodiscovery-0.1.1.tar.gz:
Publisher:
publish-to-pypi.yml on allenai/asta-autodiscovery
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
asta_autodiscovery-0.1.1.tar.gz -
Subject digest:
60a4ad314b99969697e7665208c5a9e2c377e0c0638c65ea62218d0e4375db81 - Sigstore transparency entry: 1608417998
- Sigstore integration time:
-
Permalink:
allenai/asta-autodiscovery@f7caefca3e61e909d2e259ac5d705cb19358a02f -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/allenai
-
Access:
internal
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@f7caefca3e61e909d2e259ac5d705cb19358a02f -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file asta_autodiscovery-0.1.1-py3-none-any.whl.
File metadata
- Download URL: asta_autodiscovery-0.1.1-py3-none-any.whl
- Upload date:
- Size: 88.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c84076da577fbd0460b6db1f1aa96ea9448361b5b499a75f0679c92c1e8457ef
|
|
| MD5 |
e4affc1c6b51bc2303e9864b8d28009d
|
|
| BLAKE2b-256 |
74d9c451e85966a2ca29f3d306a9292c2697c0b2153bfb3fddd77e4bc041d67c
|
Provenance
The following attestation bundles were made for asta_autodiscovery-0.1.1-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on allenai/asta-autodiscovery
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
asta_autodiscovery-0.1.1-py3-none-any.whl -
Subject digest:
c84076da577fbd0460b6db1f1aa96ea9448361b5b499a75f0679c92c1e8457ef - Sigstore transparency entry: 1608418048
- Sigstore integration time:
-
Permalink:
allenai/asta-autodiscovery@f7caefca3e61e909d2e259ac5d705cb19358a02f -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/allenai
-
Access:
internal
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@f7caefca3e61e909d2e259ac5d705cb19358a02f -
Trigger Event:
workflow_dispatch
-
Statement type: