Official implementation for HYDRA.

These details have not been verified by PyPI

Project links

Project description

HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

This is the code for the paper HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning, accepted by ECCV 2024 [Project Page]. We released the code that uses Reinforcement Learning (DQN) to fine-tune the LLM🔥🔥🔥

Release

[2025/02/11] 🤖 HYDRA with RL is released.
[2024/08/05] 🚀 PYPI package is released.
[2024/07/29] 🔥 HYDRA is open sourced in GitHub.

TODOs

We realize that gpt-3.5-turbo-0613 is deprecated, and gpt-3.5 will be replaced by gpt-4o-mini. We will release another version of HYDRA.

As of July 2024, gpt-4o-mini should be used in place of gpt-3.5-turbo, as it is cheaper, more capable, multimodal, and just as fast Openai API Page.

We also notice the embedding model is updated by OpenAI as shown in this link. Due to the uncertainty of the embedding model updates from OpenAI, we suggest you train a new version of the RL controller yourself and update the RL models.

GPT-4o-mini replacement.
LLaMA3.1 (ollama) replacement.
Gradio Demo
GPT-4o Version.
HYDRA with RL(DQN).
HYDRA with Deepseek R1.

https://github.com/user-attachments/assets/39a897ab-d457-49d2-8527-0d6fe3a3b922

Installation

Requirements

Python >= 3.10
conda

Please follow the instructions below to install the required packages and set up the environment.

1. Clone this repository.

git clone https://github.com/ControlNet/HYDRA

2. Setup conda environment and install dependencies.

Option 1: Using pixi (recommended):

pixi install
pixi shell

Option 2: Building from source:

bash -i build_env.sh

If you meet errors, please consider going through the build_env.sh file and install the packages manually.

3. Configure the environments

Edit the file .env or setup in CLI to configure the environment variables.

OPENAI_API_KEY=your-api-key  # if you want to use OpenAI LLMs
OLLAMA_HOST=http://ollama.server:11434  # if you want to use your OLLaMA server for llama or deepseek
# do not change this TORCH_HOME variable
TORCH_HOME=./pretrained_models

4. Download the pretrained models

Run the scripts to download the pretrained models to the ./pretrained_models directory.

python -m hydra_vl4ai.download_model --base_config <EXP-CONFIG-DIR> --model_config <MODEL-CONFIG-PATH>

For example,

python -m hydra_vl4ai.download_model --base_config ./config/okvqa.yaml --model_config ./config/model_config_1gpu.yaml

Inference

A worker is required to run the inference.

python -m hydra_vl4ai.executor --base_config <EXP-CONFIG-DIR> --model_config <MODEL-CONFIG-PATH>

Inference with given one image and prompt

python demo_cli.py \
  --image <IMAGE_PATH> \
  --prompt <PROMPT> \
  --base_config <YOUR-CONFIG-DIR> \
  --model_config <MODEL-PATH>

Inference with Gradio GUI

python demo_gradio.py \
  --base_config <YOUR-CONFIG-DIR> \
  --model_config <MODEL-PATH>

Inference dataset

python main.py \
  --data_root <YOUR-DATA-ROOT> \
  --base_config <YOUR-CONFIG-DIR> \
  --model_config <MODEL-PATH>

Then the inference results are saved in the ./result directory for evaluation.

Evaluation

python evaluate.py <RESULT_JSON_PATH> <DATASET_NAME>

For example,

python evaluate.py result/result_okvqa.jsonl okvqa

Training Controller with RL(DQN)

python train.py \
    --data_root <IMAGE_PATH> \
    --base_config <YOUR-CONFIG-DIR>\
    --model_config <MODEL-PATH> \
    --dqn_config <YOUR-DQN-CONFIG-DIR>

For example,

python train.py \
    --data_root ../coco2014 \
    --base_config ./config/okvqa.yaml\
    --model_config ./config/model_config_1gpu.yaml \
    --dqn_config ./config/dqn_debug.yaml

Citation

@inproceedings{ke2024hydra,
  title={HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning},
  author={Ke, Fucai and Cai, Zhixi and Jahangard, Simindokht and Wang, Weiqing and Haghighi, Pari Delir and Rezatofighi, Hamid},
  booktitle={European Conference on Computer Vision},
  year={2024},
  organization={Springer},
  doi={10.1007/978-3-031-72661-3_8},
  isbn={978-3-031-72661-3},
  pages={132--149},
}

Acknowledgements

Some code and prompts are based on cvlab-columbia/viper.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.6

Sep 6, 2025

This version

0.0.5

Jun 28, 2025

0.0.4

Jun 27, 2025

0.0.3

Mar 28, 2025

0.0.2

Mar 28, 2025

0.0.1

Aug 5, 2024

0.0.0

Aug 5, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hydra_vl4ai-0.0.5.tar.gz (147.1 kB view details)

Uploaded Jun 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hydra_vl4ai-0.0.5-py3-none-any.whl (186.0 kB view details)

Uploaded Jun 28, 2025 Python 3

File details

Details for the file hydra_vl4ai-0.0.5.tar.gz.

File metadata

Download URL: hydra_vl4ai-0.0.5.tar.gz
Upload date: Jun 28, 2025
Size: 147.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for hydra_vl4ai-0.0.5.tar.gz
Algorithm	Hash digest
SHA256	`70d7109a8a2eb33b2a1ac51a8573ac0f9f6ad4d41d9b1f4717ec7df2289a676b`
MD5	`e8876b126c9baa46c28cd1e618746960`
BLAKE2b-256	`84ebb72b2fdf89675ac07e972deedcefcefaeea62ba49d59a8f8e66a0f06beab`

See more details on using hashes here.

File details

Details for the file hydra_vl4ai-0.0.5-py3-none-any.whl.

File metadata

Download URL: hydra_vl4ai-0.0.5-py3-none-any.whl
Upload date: Jun 28, 2025
Size: 186.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for hydra_vl4ai-0.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c4ba6000fe9f2f396bf851c2d24d7f5cb75bc7eda4e36672af752b600eeb8eb6`
MD5	`6fc5b4f0b58d2aa6ef23d41d7ba3458f`
BLAKE2b-256	`846c729d10df28636970f1d845fcb18b9b29bd270e0ce434d69238e3866968ed`

See more details on using hashes here.

hydra-vl4ai 0.0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

Release

TODOs

Installation

Requirements

1. Clone this repository.

2. Setup conda environment and install dependencies.

3. Configure the environments

4. Download the pretrained models

Inference

Inference with given one image and prompt

Inference with Gradio GUI

Inference dataset

Evaluation

Training Controller with RL(DQN)

Citation

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes