Skip to main content

GritHopper: Decomposition-Free Multi-Hop Dense Retrieval

Project description

GitHub - License PyPI - Python Version PyPI - Package Version

GritHopper Logo

GritHopper: Decomposition-Free
Multi-Hop Dense Retrieval

🤗 Models | 📃 Paper



GritHopper is the first decoder-based multi-hop dense retrieval model and achieves state-of-the-art performance on both in-distribution and out-of-distribution benchmarks for decomposition-free multi-hop dense retrieval. Built on GRITLM, it is trained across diverse datasets spanning question-answering and fact-checking. Unlike traditional decomposition-based approaches, GritHopper iteratively retrieves passages without explicit sub-question decomposition, concatenating retrieved evidence with the query at each step.

Using the decoder model in an encoder-only approach (like MDR), it performs each retrieval step in a single forward pass. In contrast to previous SOTA BERT-based approaches (like BeamRetriever or MDR), GritHopper generalizes significantly better to out-of-distribution data.

Key Strengths of GritHopper

  • Encoder-Only Efficiency: Each retrieval iteration requires only a single forward pass (rather than multiple autoregressive steps).
  • Out-of-Distribution Robustness: Achieves state-of-the-art performance compared to other decomposition-free methods on multiple OOD benchmarks.
  • Unified Training: Combines dense retrieval with generative objectives, exploring how post-retrieval information on the generation loss improves dense retrieval performance.
  • Stopping: GritHopper utilizes its generative capabilities via ReAct to control its own state. This way, it can stop itself through causal next-token prediction.

Staring with GritHopper

GritHopper is trained on MuSiQue, 2WikiMultiHopQA, HotPotQA, EX-Fever and HoVer.

GritHopper Models

Model Name Datasets Description Model Size
GritHopper-7B All Datasets GritHopper trained on Answers as Post-Retrieval information (SOTA) 7B

1. Installation

pip install grithopper

2. Initialization

from grithopper import GritHopper

# Initialize GritHopper with your GRITLM model checkpoint or huggingface path
hopper = GritHopper(
    model_name_or_path="UKPLab/GritHopper",  
    device="cuda"  # or "cpu"
)

3. Load Document Candidates

You can either load from a list of (title, passage) pairs and optionally dump them to a file:

documents = [
    ("Title A", "Passage text for document A."),
    ("Title B", "Passage text for document B."),
    # ...
]

hopper.load_document_candidates(
    document_candidates=documents,
    device="cuda",
    output_directory_candidates_dump="my_candidates.pkl"  # optional
)

Or load them from a pre-encoded dump:

hopper.load_candidates_from_file(
    dump_filepath="my_candidates.pkl",
    device="cuda"
)

4. Encode a Query

question = "Who wrote the novel that was adapted into the film Blade Runner?"
previous_evidences = [("Blade Runner (Movie)", " The Movie....")] # optional


query_vector = hopper.encode_query(
    multi_hop_question=question,
    previous_evidences=previous_evidences, # optional
    instruction_type="multi-hop"  # or "fact-check" alternatively you can provide a custom instruction with insruction="your_instruction"
)

5. Single-Step Retrieval

result = hopper.retrieve_(
    query=query_vector,
    top_k=1,
    get_stopping_probability=True
)

# {
#   "retrieved": [
#       {
#         "title": "Title B",
#         "passage": "Passage text for document B.",
#         "score": 0.873
#       }
#   ],
#   "continue_probability": 0.65,  # present if get_stopping_probability=True
#   "stop_probability": 0.35
# }

If you prefer to pass the question string directly:

result = hopper.retrieve_(
    query="Who is the mother of the writer who wrote the novel that was adapted into the film Blade Runner?",
    # optional previous_evidences=[("Blade Runner (Movie)", " The Movie....")],
    top_k=1,
    get_stopping_probability=True,
)

# {
#   "retrieved": [
#       { "title": "Blade Runner (Movie)", "passage": "...", "score": 0.92 }
#   ],
#   "continue_probability": 0.75,
#   "stop_probability": 0.25
# }

6. Iterative (Multi-Hop) Retrieval

chain_of_retrieval = hopper.iterative_retrieve(
    multi_hop_question="Who wrote the novel that was adapted into the film Blade Runner?",
    instruction_type="multi-hop",
    automatic_stopping=True,
    max_hops=4
)

# [
#   {
#     "retrieved": [
#       { "title": "Blade Runner (Movie)", "passage": "...", "score": 0.92 }
#     ],
#     "continue_probability": 0.75,
#     "stop_probability": 0.25
#   },
#   {
#     "retrieved": [
#       { "title": "Philip K.", "passage": "...", "score": 0.88 }
#     ],
#     "continue_probability": 0.65,
#     "stop_probability": 0.35
# },
#   ...
# ]

This process continues until either:

1.	The model determines it should stop (if automatic_stopping=True and stop_probability > continue_probability).
2.	It hits max_hops.
3.	Or no documents can be retrieved at a given step.

Citation

If you use GritHopper in your research, please cite the following paper:

TBD

Contact

Contact person: Justus-Jonas Erker, justus-jonas.erker@tu-darmstadt.de

https://www.ukp.tu-darmstadt.de/

https://www.tu-darmstadt.de/

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions. This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

License

GritHopper is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.

Acknowledgement

this Model is based upon the GRITLM.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grithopper-0.0.3.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

grithopper-0.0.3-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file grithopper-0.0.3.tar.gz.

File metadata

  • Download URL: grithopper-0.0.3.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.0

File hashes

Hashes for grithopper-0.0.3.tar.gz
Algorithm Hash digest
SHA256 5514d8555d50b325fb9f0e2fa8e004d3b32c3a07cbb6b652218c332475c463e6
MD5 80e8cfbea8f4196c4de27b4904d713d4
BLAKE2b-256 90b68e7629288cd9cbc7453c5233d594ea15668ca3a79a89d8091354f1022297

See more details on using hashes here.

File details

Details for the file grithopper-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: grithopper-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.0

File hashes

Hashes for grithopper-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 bcd3fbf1acb4a61d7de10ed41d4ca655812f6943a8780a5e2cb686ac765352d5
MD5 9b7dee8a6a77d9d43d01fb33763b524f
BLAKE2b-256 1d4011aee7a3f5869179f214cae1741941afc5cf4142de06c86b8ca91f8202f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page