No project description provided
Project description
RAGBooster
RAGBooster improves the performance of retrieval-based large language models by learning which data sources are important to retrieve high quality data.
We provide an example notebook that shows how we boost RedPajama-INCITE-Instruct-3B-v1, a small LLM with 3 billion parameters to be on par with OpenAI's GPT3.5 (175 billion parameters) in a question answering task by using Bing websearch and ragbooster:
Furthermore, we have an additional example notebook, where we demonstrate how to boost a tiny qa model to get within 5% accuracy on GPT3.5 on a data imputation task:
Core classes
At the core of RAGBooster are RetrievalAugmentedModels, which fetch external data to improve prediction quality. Retrieval augmentation requires two components:
- A retriever, which retrieves external data for a prediction sample. We currently only implement a BingRetriever, which queries Microsoft's Bing Websearch API.
- A generator, which generates the final prediction from the prediction sample and the external data. This is typically a large language model. We provide the Generator interface, which makes it very easy to leverage LLMs available via an API, for example from OpenAI.
Once you defined your retrieval-augmented model, you can leverage RAGBooster to boost its performance by learning the data importance of retrieval sources (e.g., domains in the web). This often increases accuracy by a few percent.
Background
Have a look at our paper on Improving Retrieval-Augmented Large Language Models with Data-Centric Refinement for detailed algorithms, proofs and experimental results.
Installation
RAGBooster is available as pip package, and can be installed as follows:
pip install ragbooster
Installation for Development
- Requires Python 3.9 and Rust to be available
- Clone the repository:
git clone git@github.com:amsterdata/ragbooster.git
- Change to the project directory:
cd ragbooster
- Create a virtualenv:
python3.9 -m venv venv
- Activate the virtualenv
source venv/bin/activate
- Install the dev dependencies with
pip install ".[dev]"
- Build the project
maturin develop --release
- Optional steps:
- Run the tests with
cargo test --release
- Run the benchmarks with
RUSTFLAGS="-C target-cpu=native" cargo bench
- Run linting for the Python code with
flake8 python
- Start jupyter with
jupyter notebook
and run the example notebooks
- Run the tests with
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for ragbooster-0.1.1-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 182a939d6aa9910c1f7f325c7b6d9bfeeec121d173e1cf71fb9fff20740e989b |
|
MD5 | 399cf68ea8f7dd464726c20bbd85216a |
|
BLAKE2b-256 | 2795c06e24732c75a182b91a7af31e33cfb82699667aeb249da691fc7943d33c |
Hashes for ragbooster-0.1.1-pp310-pypy310_pp73-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 99de156913ab11a1ff20aedb3afd95a377746c41165e157f316f01ef91af719d |
|
MD5 | 884639294e0b789364a92a9325c93eaf |
|
BLAKE2b-256 | 174f0fcbbc3c7a90aa9c805ce94dac51877e8ecaff63654e1891787d0da36dda |
Hashes for ragbooster-0.1.1-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b86b5da26edd252834caa97be60cd872ae2912ab6981f3d46ffd408a97a4256e |
|
MD5 | 12e34ce827f7fe44bbe3c7392edb211c |
|
BLAKE2b-256 | a15a439257274768b915f72b100a5bea51e1832b507de6e9fb393a7e2af9bc16 |
Hashes for ragbooster-0.1.1-pp39-pypy39_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 93d68bad61fc4631d752f1648c87d3000255245c1ad2be0c8fef451cf599c763 |
|
MD5 | 8182a2faec387f1f286d80f676725b6c |
|
BLAKE2b-256 | 754dd726b9e96fcd9a24eb2cc8085dcd3a0613234005e024526fe9851a6884a9 |
Hashes for ragbooster-0.1.1-pp39-pypy39_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 72c55cf239bc85f77055c0a6d19cd95e7f116b8c5ff8796967d6d78eaca51fb3 |
|
MD5 | c7acbccb9a9fce8582b7476dd54cef49 |
|
BLAKE2b-256 | 519e0158ae724596c5c98f4c61a0f263a425d61692e8ef80194332a7c6736b40 |
Hashes for ragbooster-0.1.1-pp39-pypy39_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e41099adc0b215da0032917f750e1fda5ebca66c041a26a4b93bc3dc2595b85d |
|
MD5 | c38544c8bb3b463168ca6e361d44b9f8 |
|
BLAKE2b-256 | 70b68796393c398d0bba90e1044c8a47ef8d984bafca343ebe68d517bd547626 |
Hashes for ragbooster-0.1.1-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | be3f396f1f37258de9820289ec621a21af05cedeb22c02718d4a38bd99b169ea |
|
MD5 | 7f2d814b0c375934f6e64796943803f6 |
|
BLAKE2b-256 | 0d1154cf3b778bad148e5a388d3c32dcbdc8c96d66845206d54129e3d1261459 |
Hashes for ragbooster-0.1.1-pp39-pypy39_pp73-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 44843f8df7b0ea628864cfe98bda68007a75d355117f65e8ad7a2fe446469df1 |
|
MD5 | c3b51a6638812b200440c3937f8c9ef7 |
|
BLAKE2b-256 | 99187116bdcac305ece06f7161ce506ca4c6cde934572a62bf3915f7d67755e1 |
Hashes for ragbooster-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c640c4a386a93a0384fe7cd62fc48beb9ea938479d53e9eddff21078166fad42 |
|
MD5 | 1a2e2c986e27261c3de607f197cef7ac |
|
BLAKE2b-256 | da38f67389318d59e8d66d757d8d85faf6048c38c67a4fe02cc85f6dd6a725e5 |
Hashes for ragbooster-0.1.1-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | db09b0a45c766b2731b1ccbe290dfc11ac339f53665f103483ec83fcc1561924 |
|
MD5 | 208f2d4665a734f7d3e6819b193926b6 |
|
BLAKE2b-256 | bc27a62b28e438c485fdacc1be70ec609fb29c8b2a3dead35cd236c9cad331c8 |
Hashes for ragbooster-0.1.1-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc3f00b8d4459b9ba2cec5cf676400b499dc7d33c026a123fa062522830c37a6 |
|
MD5 | 5f321084eb8b4879fe4adada762f87d5 |
|
BLAKE2b-256 | 25d92d9cb813fcb39fbda3c016d1f7dd92e644298f8555f9bee28c4ea0931adb |
Hashes for ragbooster-0.1.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ecade0c6cee6acfc7c32be3dacd4614d76536ba0604bb13105d6d57dbf44fc6c |
|
MD5 | 6503b3a0bd58e6ae18aadcd82707ac47 |
|
BLAKE2b-256 | 13cac9aaadc44d874f86406eb7e4ecad2c45620d530e522f4aced93989ae4b17 |
Hashes for ragbooster-0.1.1-cp312-cp312-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 887e196aeaf2e0f297c0ec150fca34c3cca11493f3f5e86168ff32ed1bbb575a |
|
MD5 | dd64ef326faf5a2b71e5621567004cac |
|
BLAKE2b-256 | 1d9b0d8c8170fdfd019245bad980607d43f78a6b226d137344f56abb2d155d41 |
Hashes for ragbooster-0.1.1-cp311-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2fb3b0d1520fea91bf09ae6288535324da2dcff39d4dd88519983a16e1520afd |
|
MD5 | a68c46acc4df33da30e26025d3cc71c5 |
|
BLAKE2b-256 | 3603c9d53470c0ecbe4cc164698d512a69c84cacc65bae633f708c1e5b51229a |
Hashes for ragbooster-0.1.1-cp311-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2741d9f5f2fb3aef1da06d6e412cdd2ee16e3ca99ac11197b5aab4b4f60cfaeb |
|
MD5 | f5504b622d52815f3712a5e0544cff7c |
|
BLAKE2b-256 | 63163abe41d088a72b9dba5ce95f65545cd2125a9907e84f319354ec710bfd0a |
Hashes for ragbooster-0.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3c40f386712bb39d4610a77d3473eaac924c3648c7c511b50933682ccb2bc57e |
|
MD5 | faf0342d86e390cbd5b5b63cd2499c98 |
|
BLAKE2b-256 | fd6855cff2e9835f74755676b64b8ee90a454e2176b0d787ab024cd94f2b110a |
Hashes for ragbooster-0.1.1-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c635af710335732ebe6b527f14fb34b71ecb8c539a02bf346871b8b34d76a5b |
|
MD5 | 3c26f85276a7a5295eb76bd746e03abb |
|
BLAKE2b-256 | a9debd5cb8d0c546743c7ac7d78989ac6e95bf1838622dc9b62ebb70eee5a3c3 |
Hashes for ragbooster-0.1.1-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c1e780ef3db10df50aabb406a99e233a3f9ea6848d25ccb70bb79ab26e1f0603 |
|
MD5 | e8df580ceb1d2dc1b924e2ebfddd9a08 |
|
BLAKE2b-256 | 6297b1b36c148e29cf6c59493e219caf494178f29500adba21773de3e5361150 |
Hashes for ragbooster-0.1.1-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 09f0254b1281ab610346b952f7736146f9317d8521068ec5bd6a4a1f6268ecf5 |
|
MD5 | 3d04cdb84f1f0893667babf3f1dd46c4 |
|
BLAKE2b-256 | 8b9c4a0224a406d8fe143fd64aa69c1b21592a1b1831e831e200ff6133c6886e |
Hashes for ragbooster-0.1.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 05e2d7033acb17227a908f7bbe3ad164ec1324832702c1d2652b5d56d2a05d30 |
|
MD5 | 5ffcc0db9973494903159f22b7877138 |
|
BLAKE2b-256 | ceb9e69d07794940f27b79321c3e0160d7163d7edc3608f90ec20550fb18767b |
Hashes for ragbooster-0.1.1-cp311-cp311-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dff5ac514278a5e729930d0d57f90ec7b2952d903fe11d663ca81e9ae9188e99 |
|
MD5 | 75fa406af5ace804056a01606dae4d8a |
|
BLAKE2b-256 | 4586f83f17d4b91ace9c568151b6286e17370bc0ac8d2b5ab875bb778819b7fd |
Hashes for ragbooster-0.1.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3731611c733c11d53bb9229868098357b3e3b5c3b7ac8d6ea1a9e27ca4c29d80 |
|
MD5 | a24db7e931295d9bccd9d98694a38094 |
|
BLAKE2b-256 | 31910078dec35193ab3b4568be2e8374ebb8710260b25a7e98259733283b99d4 |
Hashes for ragbooster-0.1.1-cp311-cp311-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc15ae51f73cd9b9426af521b3dc19f4b4c3b77e923cad770acaa1affb6d330c |
|
MD5 | 0d7a9bb591f7d3933e22bb7d02cee0d9 |
|
BLAKE2b-256 | ac34aef307fa18ac6033f9ef378f4ae411d956438a3e0ea4de7c75a53a7509f8 |
Hashes for ragbooster-0.1.1-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9680f10495150b1678d0b11a79529e7412d9c4459878bc71a2b9aa188b26488b |
|
MD5 | f215655f79797996e4b9a9eaaad6058f |
|
BLAKE2b-256 | 2d026dff4327ab2b061ec7a81704cbe396367e304f0c4cbc90b6f7ac6323bd00 |
Hashes for ragbooster-0.1.1-cp310-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c6dde003f0c801a3771e68e9563b30d8af199296f98d2c01d03bae8932e921f |
|
MD5 | 0371a4d057e64e36c7eb9bb1cc9f0941 |
|
BLAKE2b-256 | 55c7aab6a5c48e7a3db2f9cfd68051ecbc03641f5bed7e42a1d9f6b3ed2fd452 |
Hashes for ragbooster-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1196517abd525e59a35cb026c29b63f1dfe141a2327c33fc309b4eaad42c091f |
|
MD5 | 13e8e37f6849db976eaee522f32f9892 |
|
BLAKE2b-256 | 68c5fc0c8c8ca6e96def7ff537a91b1ce75a59c2f4be5b9dee352efd7dba0b57 |
Hashes for ragbooster-0.1.1-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0bfa6113a6e3710a1e07995bbb16847074a957c1a4a84ca6cd77bd28fb3a965 |
|
MD5 | dce75a6748a22e7983bdd0511b559185 |
|
BLAKE2b-256 | c235332f8fd31996218fe6d03516bb63c0b92fc392063d5645f043e15390bb36 |
Hashes for ragbooster-0.1.1-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b46a364a8b33b590556c6daddd1d5d3a3e5f90647b87ac82f5326a76f292c51 |
|
MD5 | 826a9b13e4667c05a868946021ec1ace |
|
BLAKE2b-256 | 6e0a29fc97f379340619f7373c4812a2c0f6e886d77d8ec35dff0f0363e802fb |
Hashes for ragbooster-0.1.1-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54e4ac75e7a75fcfb473bf5c2e9035cfd97c7e9e9dc3b2810fb09fe7bb7c92d2 |
|
MD5 | 602b761f6a1c0212db34cf5bc0b5226a |
|
BLAKE2b-256 | d1fe5b22bd151e169f2f50a04ac975774cb5301166a6f9db03d7b8fbec67546f |
Hashes for ragbooster-0.1.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7cde8d3189160fb449f5af5ad528d48980a81984bc4c861cf7625c4b87c83a09 |
|
MD5 | 06178500ac6d5e787068ca78b0eb8db5 |
|
BLAKE2b-256 | 0279f1ee32e5429d6917dff9e4bc90e02450cbe96f7cf1ca7b21d0c85d6d9c61 |
Hashes for ragbooster-0.1.1-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d0ecfe52f12879d922eaf0fb6cd72618204a4fcb8e78c7076eeb4007e509474a |
|
MD5 | 35a891cc5cb9efae1dc6c7d9bc57fe36 |
|
BLAKE2b-256 | 3153ff7f6bbd46459b81efe1cc651d7caeb408d47e373807dfa4286f5aa2cd51 |
Hashes for ragbooster-0.1.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f46a0493138851a2e9676e189d4b2a2afbc01cfc26222b1b4c9c9d7534e2a1d |
|
MD5 | 6b417b5e5eff75a694852504cbb45f00 |
|
BLAKE2b-256 | 06adea3c796bce0de9362f3fcdfa3ef22c00ac59726416559092160967f5013a |
Hashes for ragbooster-0.1.1-cp310-cp310-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e8fe7746a5d73b365e3931c13369fde1742d2422f956c78c251db04f15d506d7 |
|
MD5 | a6907086f1d767fda7211f25dc094926 |
|
BLAKE2b-256 | 91128114f8078eb110de64bffd9df517c2e817562c6024b1999c98437097e86b |
Hashes for ragbooster-0.1.1-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13dd82c0ba0ee7211ee198786713709905bac8929fd4e502b4ec6e64947cf751 |
|
MD5 | 9d96bb5e0903d0fac096320bc283917f |
|
BLAKE2b-256 | 96f686c561a55fd569417c606739023c275a36f0d4f9d91341129b0389f50a21 |
Hashes for ragbooster-0.1.1-cp39-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 15ac51484ddd4c5c74dd78007a55c3fe1ca77cc2a182128fe5344770e626321e |
|
MD5 | cce446f554603a87835865c4fae4eddc |
|
BLAKE2b-256 | 0fed29c2abadf45898187b6710e0f7a90d5e47df6b7ad8dc9438b158b2dc2d02 |
Hashes for ragbooster-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d9c527d6237cf5f781b8f1c3f85b50a683e5419244647c3a4a39a857d6d7387c |
|
MD5 | 5afe91dd486781e22314fb4b537274f7 |
|
BLAKE2b-256 | df792e30ecf32a2444c31cb6d28bd91095cb0844cc3bb9ba6a91bd9a1cfe666e |
Hashes for ragbooster-0.1.1-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f94c0ff380afa714c8a05038de808056da7c35a4388d759a896f68c695c7f35 |
|
MD5 | 7e605e6e80eac5d3164ea585511a04b3 |
|
BLAKE2b-256 | a34a63a9c84ea8ffbcc6837156fee663d92c382274ab31ac9b06adffe8f28043 |
Hashes for ragbooster-0.1.1-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b89cd23316f6751a8fe24675413a9bb1a27cb589104253c6708a6b2c3cf59974 |
|
MD5 | aa65eb3b9f8eed65cf94d40c05285656 |
|
BLAKE2b-256 | 9655be7594a370dc3394f56901b7d98e55cba76eaf883ca164764f51b67f66f2 |
Hashes for ragbooster-0.1.1-cp39-cp39-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 65bb4e7414729a51026f7ce6fae8c6a022f64855809e0da99d3d008f3a6284d5 |
|
MD5 | 438d29c8a2fd4717495fe326deb08f67 |
|
BLAKE2b-256 | 0c8902635400952fa1d2d6e199dc546ad7364bd5ab2e94b1780f059c44028f82 |
Hashes for ragbooster-0.1.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a86896e7d9b80bda1800c2ad271b39c8a10568955a1f0ad748cef5578235be3c |
|
MD5 | a15d102edc4ecbbad2183793955f88a7 |
|
BLAKE2b-256 | 2f141b80007ea8a07a3207c036c8dd6e8fdef8f4939fbb5aab21fac26a8bf02a |
Hashes for ragbooster-0.1.1-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2248666ae9c2aa9cbab725b64e02679da6d01a057b47ad6923c0e2e8dc1174a8 |
|
MD5 | 6a1ba97c4c59db6a7cce10fde471cd05 |
|
BLAKE2b-256 | 058c0250c7a620742aa162fb4ea6a17d6dd76dfc80e3174951975a0a4a16fd41 |
Hashes for ragbooster-0.1.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ce6bcee9f11756faea2ee1c3d602fd5b4c2226c85da1ce6ec64e7eafd1e7f6a |
|
MD5 | eeec9f61109b40b89f398ca926ca382b |
|
BLAKE2b-256 | e9d22c4684b1cde4367313280810d83f32805f5afdbe1550387a5fadd94cc43a |
Hashes for ragbooster-0.1.1-cp39-cp39-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b8b35db7e4509285a90847fc9e85e1c0eaa27d2392bfbd3e10f7be13d03cee8d |
|
MD5 | 0513da0abb4e7e6e637d11b9d1a7971f |
|
BLAKE2b-256 | 4aebec5006b4bcb3ea9ab4d872071d09e15f6ad00d95a9150b826abb1731070e |