Skip to main content

Open-ended Scientific Discovery via Bayesian Surprise

Project description

Open-ended Scientific Discovery via Bayesian Surprise

Link to our NeurIPS 2025 paper: AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise

Deployment

Image Tagging Strategy

The autodiscovery Docker image follows an environment-based tagging strategy:

  • Dev environment (main branch): :dev, :dev-${commit_sha}
  • Prod environment (env/prod branch): :prod, :prod-${commit_sha}

Images are automatically built and pushed by GitHub Actions when changes merge to main or env/prod.

Note: We do not use :latest tags. All deployments must explicitly specify :dev or :prod to prevent accidental environment mixing.

Deploying to Cloud Run

Deploy or update the Cloud Run Job from the root of the repo:

For development environment:

make deploy-autodiscovery
# Or with explicit env tag:
ENV_TAG=dev SKIP_BUILD=true make deploy-autodiscovery

For production environment:

ENV_TAG=prod SKIP_BUILD=true make deploy-autodiscovery

The SKIP_BUILD=true flag skips building the image (uses the image already built by GitHub Actions). Omit it to build locally.

Datasets

DiscoveryBench

git clone https://github.com/allenai/discoverybench.git temp_db
cp -r temp_db/discoverybench discoverybench
rm -rf temp_db

Blade

git clone https://github.com/behavioral-data/BLADE.git temp_db
cp -r temp_db/blade_bench/datasets blade
rm -rf temp_db

BYO-Datasets!

You can also use your own datasets. To do this, pass in a dataset metadata JSON file containing descriptions of the paths of datasets (relative to the metadata file) and their column descriptions in natural language. You can have a look at the metadata files in the DiscoveryBench directory from above as examples.

Run AutoDS (MCTS-based hypothesis search and verification)

For example, to explore the DiscoveryBench NLS SES dataset, the following command can be used:

# From the repo root
uv run --package autodiscovery python -m autodiscovery.run \
    --work_dir="work" \
    --out_dir="outputs" \
    --dataset_metadata="discoverybench/real/test/nls_ses/metadata.json" \
    --n_experiments=16 \
    --model="gemini-3-flash-preview" \
    --belief_model="gemini-3-flash-preview" \
    --vision_model="gemini-3-flash-preview"

To resume a previous exploration, use the --continue_from_dir flag to specify the directory containing the previous exploration logs. This will allow the script to continue from where it left off, using the MCTS nodes it had generated so far.

✍️ Get in touch!

Please reach out to us on email or open a GitHub issue in case of any issues running the code: dagarwal@cs.umass.edu (Dhruv Agarwal), bodhisattwam@allenai.org (Bodhisattwa Prasad Majumder).

📄 Citation

If you find our work useful, please cite our paper:

@inproceedings{
agarwal2025autodiscovery,
title={AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise},
author={Dhruv Agarwal and Bodhisattwa Prasad Majumder and Reece Adamson and Megha Chakravorty and Satvika Reddy Gavireddy and Aditya Parashar and Harshit Surana and Bhavana Dalvi Mishra and Andrew McCallum and Ashish Sabharwal and Peter Clark},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=kJqTkj2HhF}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asta_autodiscovery-0.1.1.tar.gz (78.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

asta_autodiscovery-0.1.1-py3-none-any.whl (88.6 kB view details)

Uploaded Python 3

File details

Details for the file asta_autodiscovery-0.1.1.tar.gz.

File metadata

  • Download URL: asta_autodiscovery-0.1.1.tar.gz
  • Upload date:
  • Size: 78.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asta_autodiscovery-0.1.1.tar.gz
Algorithm Hash digest
SHA256 60a4ad314b99969697e7665208c5a9e2c377e0c0638c65ea62218d0e4375db81
MD5 b2c9b8ac74d55b2e150d3d9c5da03e18
BLAKE2b-256 c89c02c909651d99ecde4cb64c2d8808143c97e5571620129e542a3bbd2e4b62

See more details on using hashes here.

Provenance

The following attestation bundles were made for asta_autodiscovery-0.1.1.tar.gz:

Publisher: publish-to-pypi.yml on allenai/asta-autodiscovery

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file asta_autodiscovery-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for asta_autodiscovery-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c84076da577fbd0460b6db1f1aa96ea9448361b5b499a75f0679c92c1e8457ef
MD5 e4affc1c6b51bc2303e9864b8d28009d
BLAKE2b-256 74d9c451e85966a2ca29f3d306a9292c2697c0b2153bfb3fddd77e4bc041d67c

See more details on using hashes here.

Provenance

The following attestation bundles were made for asta_autodiscovery-0.1.1-py3-none-any.whl:

Publisher: publish-to-pypi.yml on allenai/asta-autodiscovery

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page