Skip to main content

Learning to Discover at Test Time - RL at test time for LLMs

Project description

🔬 TTT-Discover

Learning to Discover at Test Time

arXiv Project Page License

Mert Yuksekgonul*, Daniel Koceja*, Xinhao Li*, Federico Bianchi*
Jed McCaleb, Xiaolong Wang, Jan Kautz, Yejin Choi, James Zou†, Carlos Guestrin†, Yu Sun*

Stanford · NVIDIA · Astera Institute · UC San Diego · Together AI


TTT-Discover performs reinforcement learning at test time, allowing the LLM to continue training with experience specific to the problem at hand. We achieve new state-of-the-art across mathematics, GPU kernels, algorithms, and biology.

Installation

pip install ttt-discover

Or from source:

pip install -e .

Set environment variables:

export HF_TOKEN="..."
export TINKER_API_KEY="..."      
export WANDB_API_KEY="..."       
export WANDB_ENTITY="..."        

Making your own Environment

To use TTT-Discover for your own application, you should create a new environment. Here are the general steps to make your own environment.

  1. Create a new environment that inherits ttt_discover.Environment.

  2. Define a reward evaluator that inherits ttt_discover.BaseRewardEvaluator. Optionally, you can use ttt_discover.SandboxRewardEvaluator to run generated code in sandboxes.

  3. (Optional) Add initial state definition to your environment.

  4. Define a config and run with ttt_discover.discover!

Here is a sample skeleton for a new environment.

# Import requred ttt_discover objects
from ttt_discover import Environment, BaseRewardEvaluator, State, DiscoverConfig, discover


# Define your reward function
class YourReward(BaseRewardEvaluator):

    def get_reward(self, code: str, state: State) -> float:
        # ...add logic here for computing reward

        return {
            "reward": reward,
            "correctness": 1.0,
            "raw_score": raw_score,
            "msg": f"Success; raw_score={raw_score}",
            "result_construction": [], # Could reuse
            "stdout": "", # No stdout
        }


class YourEnv(Environment):
    reward_function = YourReward
    state_type = State # You may define your own state if you wish

    def get_question(self) -> str:
        state_ctx = self.initial_state.to_prompt(100, metric_name="performance")

        return f"""You are an expert mathematician specializing in combinatorial problems and computational geometry. Your task is to ... {state_ctx}."""


config = DiscoverConfig(
    env_type=YourEnv,
    experiment_name="test-run",
    wandb_project="",
)

# Run discovery
discover(config)

Check examples/circle_packing for a fully implemented example.

Key Results

Mathematics
Erdős Overlap ↓
Kernel A100
TriMul ↓
Kernel H100
TriMul ↓
Algorithms
AtCoder ↑
Biology
Denoising ↑
Best Human 0.380927 4531 μs 1371 μs 566,997 0.64
Prev. Best AI 0.380924 558,026
TTT-Discover 0.380876 2198 μs 1161 μs 567,062 0.71

Domains

Mathematics — Classic open problems in combinatorics and analysis

Task Erdős Min. Overlap ↓ Autocorr. (AC1) ↓ Autocorr. (AC2) ↑
Best Human 0.380927 1.50973 0.9015
Prev. Best AI 0.380924 1.50314 0.9610
TTT-Discover 0.380876 1.50287 0.9591
Kernel Engineering — GPUMode TriMul competition for triangular matrix multiplication
Task A100 ↓ H100 ↓ B200 ↓ MI300x ↓
Best Human 4531 μs 1371 μs 1005 μs 2462 μs
TTT-Discover 2198 μs 1161 μs 905 μs 1596 μs
Algorithm Engineering — AtCoder Heuristic Contests on real-world optimization [AHC39] [AHC58]
Task AHC39 (Geometry) ↑ AHC58 (Scheduling) ↑
Best Human 566,997 847,674,723
Prev. Best AI 558,026 848,373,282
TTT-Discover 567,062 848,414,228
Biology — Single-cell RNA-seq denoising on OpenProblems benchmark
Task PBMC ↑ Tabula ↑
Best Human 0.64 0.64
TTT-Discover 0.71 0.73

The environments to reproduce results from our paper are under examples/. To run these, please see reproducing.md

Submitit

We provide submitit script to launch ttt-discover as a slurm job across multiple nodes with ray. See submitit_launch.sh for an example.

Security Notice

It is recommended to run all jobs on an isolated network or VPN if using ray. Ray has minimal built-in security protections and should not be exposed on a public or shared network.

Acknowledgments

This work builds on several outstanding projects and communities:

  • GPU Mode — Community for GPU kernel optimization and the TriMul competition
  • ALE-Bench — AtCoder-based benchmark for LLM evaluation
  • Tinker — LLM training recipes and RL framework

Citation

@article{ttt-discover2026,
  title   = {Learning to Discover at Test Time},
  author  = {Yuksekgonul, Mert and Koceja, Daniel and Li, Xinhao 
             and Bianchi, Federico and McCaleb, Jed and Wang, Xiaolong 
             and Kautz, Jan and Choi, Yejin and Zou, James 
             and Guestrin, Carlos and Sun, Yu},
  journal = {arXiv preprint arXiv:2601.16175},
  year    = {2026}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ttt_discover-0.1.0.tar.gz (75.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ttt_discover-0.1.0-py3-none-any.whl (81.1 kB view details)

Uploaded Python 3

File details

Details for the file ttt_discover-0.1.0.tar.gz.

File metadata

  • Download URL: ttt_discover-0.1.0.tar.gz
  • Upload date:
  • Size: 75.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.21

File hashes

Hashes for ttt_discover-0.1.0.tar.gz
Algorithm Hash digest
SHA256 dedbe0d1d45b733b7cf72106ff5d4311edaf17c585a781119d61663d1a0e2aca
MD5 185bfb911c689ed8a9fbc80110e5557b
BLAKE2b-256 dfa3253dc9b9ebed51bd305c0d250ababcc112cc88487eaadec960cac228e118

See more details on using hashes here.

File details

Details for the file ttt_discover-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ttt_discover-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 81.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.21

File hashes

Hashes for ttt_discover-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 12d17574126fa5828de47d6f05314daa5f1b8e6e3dee6f22b57568bf9275b7d0
MD5 4bd9fe62847422c18fb9a29fcc087fd4
BLAKE2b-256 3f8d6a32865ee48795365086fca06c7d0b57394e0827095ff67fd6261656cf7f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page