Skip to main content

No project description provided

Project description

RAGBooster

RAGBooster improves the performance of retrieval-based large language models by learning which data sources are important to retrieve high quality data.

We provide an example notebook that shows how we boost RedPajama-INCITE-Instruct-3B-v1, a small LLM with 3 billion parameters to be on par with OpenAI's GPT3.5 (175 billion parameters) in a question answering task by using Bing websearch and ragbooster:

Furthermore, we have an additional example notebook, where we demonstrate how to boost a tiny qa model to get within 5% accuracy on GPT3.5 on a data imputation task:

Core classes

At the core of RAGBooster are RetrievalAugmentedModels, which fetch external data to improve prediction quality. Retrieval augmentation requires two components:

  • A retriever, which retrieves external data for a prediction sample. We currently only implement a BingRetriever, which queries Microsoft's Bing Websearch API.
  • A generator, which generates the final prediction from the prediction sample and the external data. This is typically a large language model. We provide the Generator interface, which makes it very easy to leverage LLMs available via an API, for example from OpenAI.

Once you defined your retrieval-augmented model, you can leverage RAGBooster to boost its performance by learning the data importance of retrieval sources (e.g., domains in the web). This often increases accuracy by a few percent.

Background

Have a look at our paper on Improving Retrieval-Augmented Large Language Models with Data-Centric Refinement for detailed algorithms, proofs and experimental results.

Installation

RAGBooster is available as pip package, and can be installed as follows:

pip install ragbooster

Installation for Development

  • Requires Python 3.9 and Rust to be available
  1. Clone the repository: git clone git@github.com:amsterdata/ragbooster.git
  2. Change to the project directory: cd ragbooster
  3. Create a virtualenv: python3.9 -m venv venv
  4. Activate the virtualenv source venv/bin/activate
  5. Install the dev dependencies with pip install ".[dev]"
  6. Build the project maturin develop --release
  • Optional steps:
    • Run the tests with cargo test --release
    • Run the benchmarks with RUSTFLAGS="-C target-cpu=native" cargo bench
    • Run linting for the Python code with flake8 python
    • Start jupyter with jupyter notebook and run the example notebooks

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

ragbooster-0.1.1-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

ragbooster-0.1.1-pp310-pypy310_pp73-manylinux_2_12_i686.manylinux2010_i686.whl (1.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.12+ i686

ragbooster-0.1.1-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

ragbooster-0.1.1-pp39-pypy39_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.3 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ s390x

ragbooster-0.1.1-pp39-pypy39_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.3 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ppc64le

ragbooster-0.1.1-pp39-pypy39_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (1.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARMv7l

ragbooster-0.1.1-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

ragbooster-0.1.1-pp39-pypy39_pp73-manylinux_2_12_i686.manylinux2010_i686.whl (1.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.12+ i686

ragbooster-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

ragbooster-0.1.1-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.3 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ s390x

ragbooster-0.1.1-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.3 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ppc64le

ragbooster-0.1.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.2 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

ragbooster-0.1.1-cp312-cp312-manylinux_2_12_i686.manylinux2010_i686.whl (1.2 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.12+ i686

ragbooster-0.1.1-cp311-none-win_amd64.whl (185.1 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

ragbooster-0.1.1-cp311-none-win32.whl (177.3 kB view hashes)

Uploaded CPython 3.11 Windows x86

ragbooster-0.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

ragbooster-0.1.1-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.3 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ s390x

ragbooster-0.1.1-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.3 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ppc64le

ragbooster-0.1.1-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (1.2 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARMv7l

ragbooster-0.1.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.2 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

ragbooster-0.1.1-cp311-cp311-manylinux_2_12_i686.manylinux2010_i686.whl (1.2 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.12+ i686

ragbooster-0.1.1-cp311-cp311-macosx_11_0_arm64.whl (317.1 kB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

ragbooster-0.1.1-cp311-cp311-macosx_10_7_x86_64.whl (326.7 kB view hashes)

Uploaded CPython 3.11 macOS 10.7+ x86-64

ragbooster-0.1.1-cp310-none-win_amd64.whl (185.1 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

ragbooster-0.1.1-cp310-none-win32.whl (177.3 kB view hashes)

Uploaded CPython 3.10 Windows x86

ragbooster-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

ragbooster-0.1.1-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.3 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ s390x

ragbooster-0.1.1-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.3 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ppc64le

ragbooster-0.1.1-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (1.2 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARMv7l

ragbooster-0.1.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.2 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

ragbooster-0.1.1-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.whl (1.2 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.12+ i686

ragbooster-0.1.1-cp310-cp310-macosx_11_0_arm64.whl (317.1 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

ragbooster-0.1.1-cp310-cp310-macosx_10_7_x86_64.whl (326.7 kB view hashes)

Uploaded CPython 3.10 macOS 10.7+ x86-64

ragbooster-0.1.1-cp39-none-win_amd64.whl (185.1 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

ragbooster-0.1.1-cp39-none-win32.whl (177.3 kB view hashes)

Uploaded CPython 3.9 Windows x86

ragbooster-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

ragbooster-0.1.1-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.3 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ s390x

ragbooster-0.1.1-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.3 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ppc64le

ragbooster-0.1.1-cp39-cp39-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (1.2 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARMv7l

ragbooster-0.1.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.2 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

ragbooster-0.1.1-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.whl (1.2 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.12+ i686

ragbooster-0.1.1-cp39-cp39-macosx_11_0_arm64.whl (317.1 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

ragbooster-0.1.1-cp39-cp39-macosx_10_7_x86_64.whl (326.9 kB view hashes)

Uploaded CPython 3.9 macOS 10.7+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page