No project description provided
Project description
RAGBooster
RAGBooster improves the performance of retrieval-based large language models by learning which data sources are important to retrieve high quality data.
We provide an example notebook that shows how we boost RedPajama-INCITE-Instruct-3B-v1, a small LLM with 3 billion parameters to be on par with OpenAI's GPT3.5 (175 billion parameters) in a question answering task by using Bing websearch and ragbooster:
Furthermore, we have an additional example notebook, where we demonstrate how to boost a tiny qa model to get within 5% accuracy on GPT3.5 on a data imputation task:
Core classes
At the core of RAGBooster are RetrievalAugmentedModels, which fetch external data to improve prediction quality. Retrieval augmentation requires two components:
- A retriever, which retrieves external data for a prediction sample. We currently only implement a BingRetriever, which queries Microsoft's Bing Websearch API.
- A generator, which generates the final prediction from the prediction sample and the external data. This is typically a large language model. We provide the Generator interface, which makes it very easy to leverage LLMs available via an API, for example from OpenAI.
Once you defined your retrieval-augmented model, you can leverage RAGBooster to boost its performance by learning the data importance of retrieval sources (e.g., domains in the web). This often increases accuracy by a few percent.
Background
Have a look at our paper on Improving Retrieval-Augmented Large Language Models with Data-Centric Refinement for detailed algorithms, proofs and experimental results.
Installation
RAGBooster is available as pip package, and can be installed as follows:
pip install ragbooster
Installation for Development
- Requires Python 3.9 and Rust to be available
- Clone the repository:
git clone git@github.com:amsterdata/ragbooster.git
- Change to the project directory:
cd ragbooster
- Create a virtualenv:
python3.9 -m venv venv
- Activate the virtualenv
source venv/bin/activate
- Install the dev dependencies with
pip install ".[dev]"
- Build the project
maturin develop --release
- Optional steps:
- Run the tests with
cargo test --release
- Run the benchmarks with
RUSTFLAGS="-C target-cpu=native" cargo bench
- Run linting for the Python code with
flake8 python
- Start jupyter with
jupyter notebook
and run the example notebooks
- Run the tests with
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for ragbooster-0.1.0rc3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1dd4f0b4cbd6a78bf130fddd47f491698659ae942a2e9c1913dce45fd9b58126 |
|
MD5 | 76c6b885ffb63f20fff219cfe33ab95d |
|
BLAKE2b-256 | 79f9ba508dc8835beaebc03cfbe4d5d3e3adee2b357ebe9ed37c2136fb8d26f3 |
Hashes for ragbooster-0.1.0rc3-pp310-pypy310_pp73-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 433d1992636fab54446714f85c1bc96fb75788b1335500e30ee154861aeef80b |
|
MD5 | 4ae3b1907e4322abb7e2e88c1f2d3f71 |
|
BLAKE2b-256 | 9f0803cf7ebb7cca88e3cb59b7120767c1c26a2f28708a2c3f1532b4525eab61 |
Hashes for ragbooster-0.1.0rc3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 506495be8e6e2f937ae47f2d907554fd851587b58f68abb65034fc9697a2062c |
|
MD5 | 89dc3664e0fd357b8d9fbe59a412b189 |
|
BLAKE2b-256 | 85213dd73d0ef6999dce02e2756a126c8202cc2b770c510633c7a32d4f70c6ab |
Hashes for ragbooster-0.1.0rc3-pp39-pypy39_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12954cabc72f89b884e323e3bf057e72f856eea0726b4cc31b7ba729dc008e4a |
|
MD5 | d7e9d88961a2de256e82f5ffe2485759 |
|
BLAKE2b-256 | 6bc50200b0e34ca7d32dff850d7c52b0c5a0df9e2c2b36d97cc5e934868d5fbb |
Hashes for ragbooster-0.1.0rc3-pp39-pypy39_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5b6e2929ce3e49d2f85ab7c95798398f740e788608a2203412803be8cc665e8d |
|
MD5 | 5adb4c7956cf2d1bf6f7e0759e869040 |
|
BLAKE2b-256 | 7a7375a4628bd11eb01a6f87228e222f09478092c0b8bf7947ad6358685d3bcb |
Hashes for ragbooster-0.1.0rc3-pp39-pypy39_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 312095140f2f0d9e5bf23cb627b3194ce2d84a18458a20e6b6f4080e6ecbc526 |
|
MD5 | 49b7a2614aa001c5ccb548ec23b9ce6a |
|
BLAKE2b-256 | 2de43068186d6421a2cbb2200f3308246eac0d38c570cb0b21ece377666d1144 |
Hashes for ragbooster-0.1.0rc3-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9096a05f9975af995d433e2dd2398d1148478d5d86a7012fba6a9386a42e8c23 |
|
MD5 | 5f48844b9c0c0b6f2f9a6eb66cb6c272 |
|
BLAKE2b-256 | 27a3e04fdabadb682abb7efb6a3ab21c9ea22b9f8c908fff3b89f9677f76541a |
Hashes for ragbooster-0.1.0rc3-pp39-pypy39_pp73-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 748ccc861468b017a6bc9426ad31260d9d4005aef008530145d4476fbd67b89f |
|
MD5 | d44ac364a7b73a2d3aa45ca6307d3d2a |
|
BLAKE2b-256 | 8db14282ef2a0f85301382b1d1569d41fad1fc19cafd30bdd42a959706bd476e |
Hashes for ragbooster-0.1.0rc3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6e9aa825b807dfae5c73d2bd598dd9e5fca831ea6ea1d7e88d22b8b5dc9ae57 |
|
MD5 | 80a87eda4f60801761aa669258bce393 |
|
BLAKE2b-256 | 1ad46e8d86b455cfffae0b1fa4854f926648d2622b82fe818293c372ea749b01 |
Hashes for ragbooster-0.1.0rc3-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 953b36f3246b104863088b9e9a0ea9a469374b0d5428cc201e1877945852abb9 |
|
MD5 | ad1bb78acd497c2e2b24d9002e9cc5f1 |
|
BLAKE2b-256 | 46503c873242ec5fdfb2c281dca797bce04d8f76e496b86051206f2459f2a0c0 |
Hashes for ragbooster-0.1.0rc3-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 64546395a86c72da9f25c670b7f9f1e0044cadcb0f70a1a4a168003d09684257 |
|
MD5 | 8557aa9bbd241345cbfa0f970075eb97 |
|
BLAKE2b-256 | 3efec85268bccde9a31702ef5e6a297f6e123bbfcb63318294808262f0417f52 |
Hashes for ragbooster-0.1.0rc3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 63d176235dae343fc0169c93d18eada0cad1742a73f095bdaac2379cfdd97193 |
|
MD5 | d881900d43c5873ab1e5f430bf7d8f7a |
|
BLAKE2b-256 | 8b44c1e524f69de8ac51243177d66fe27a40113f84ae5b3ae57ec879b860a141 |
Hashes for ragbooster-0.1.0rc3-cp312-cp312-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2eb80fa620607b1a1505e4fdc356d0f35afbae55e532668fe7931fa89d611573 |
|
MD5 | b1e14efd07927288fdd34d07626d771e |
|
BLAKE2b-256 | b385827441c053bca31ac6d262c4074b2aa8dc99e5c1e4088003059cd6eea0ed |
Hashes for ragbooster-0.1.0rc3-cp311-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62c10bfdb2d11d7700f846f6cea24666f42f571488b7dd37aac8ccbaf94c9529 |
|
MD5 | 600678815d476e93480ae364071878b2 |
|
BLAKE2b-256 | 91323672c7c4dbf1df734eeab9d583d4e0289605a61c93673be118fc37b395a9 |
Hashes for ragbooster-0.1.0rc3-cp311-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d009ef3c415e0119119bb83899b487b1c6e0367d5ca4f3f453df51580bf71a9 |
|
MD5 | adc450d49a1c1bdf5efb11a967e5bac7 |
|
BLAKE2b-256 | 595acd808fb296edb8e5cd6424c33f9fe56c4ab021e94e0360dc971983f6c69f |
Hashes for ragbooster-0.1.0rc3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 64b342bdbc1fc2fa338d50107042a7b3fe1db38051d461d9648f4e1cbbb9268e |
|
MD5 | e2e1b662d862f9f1fc624d132de86882 |
|
BLAKE2b-256 | cbd670f81841c331e9ec5a21586f6b756729ba3729b5da19a9a021ed2ff51da8 |
Hashes for ragbooster-0.1.0rc3-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2a5e21322ef03fe3fc1b0ef21012f287dff8834f01e18e3c4677fdbfa328157e |
|
MD5 | 1a49e308ae27a4586bf38fbc833c412f |
|
BLAKE2b-256 | 139ea8eb2c95e51c5de65eae3ccc4683702141910a5fd5127968663b3f65c916 |
Hashes for ragbooster-0.1.0rc3-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f3bc3b195ec4b9c5b9a1d6954773daec263afbab945bf15f6eb94e6351fafd39 |
|
MD5 | 3ebb3ba6e7568a42c21dfba68c732e80 |
|
BLAKE2b-256 | 6aa7065ab37ab12977efb510c1eb3aac52c41838ed7d9c170788f71c4239cf85 |
Hashes for ragbooster-0.1.0rc3-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a7b72875113bfc050984cbccaaa2eb17c5e9643efe63ff3818051730d4b81806 |
|
MD5 | df294385fad832570ec35c0ba7b29e84 |
|
BLAKE2b-256 | 961313791ea2e921e0a54f3340fb26351ffc93aa858dcf7174c2b3ef2923b374 |
Hashes for ragbooster-0.1.0rc3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7bf887d0839fcc58a2eaf60fbe730009de2b502a8f38e38b14a617e6a2eb1bb2 |
|
MD5 | a8a0b1ff9e1de9b45907602e3dc21d99 |
|
BLAKE2b-256 | 0f5388628854921cb06af6198daa4b0559d190152a9fc63d52c76d6b960f370a |
Hashes for ragbooster-0.1.0rc3-cp311-cp311-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad176c591b39867e140ca6dcc9d663dbfce59351bd2723f6581ad86fddd0d4b5 |
|
MD5 | 47e1b7265195a7204967837aa64b96c5 |
|
BLAKE2b-256 | e446f463c081ec440281d1f434d008199c1a1397ea946e2f16b884426e356f5e |
Hashes for ragbooster-0.1.0rc3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f6a5801f39537af874e576035f7df62c55a35567ba1ddeddcf8c709f259d1af1 |
|
MD5 | 6af14b310a515d005bbb233f6dc4f506 |
|
BLAKE2b-256 | a2b3365c0a2091752cdbc95affa59e252ee43f38b524a730d3b40048ccbac236 |
Hashes for ragbooster-0.1.0rc3-cp311-cp311-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4bf2e3ea9729d7174e548f3a99aeda56baff44d38df59b99768f92f356f0afb2 |
|
MD5 | 75d05a853ba3753f336a18841cd5b99e |
|
BLAKE2b-256 | aea2b5237fd9b17060a7b22bdcb2ce29b8459a497fecb1a54170ea526260133d |
Hashes for ragbooster-0.1.0rc3-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7b90b54713481b82ce6fc8e627169e343dbb1c97936e3d9f9809ec3e1f4e6a97 |
|
MD5 | 51d43b05688ffb919b0ac5904d50d119 |
|
BLAKE2b-256 | 7f5fccb22b4dde8245f0a658b86f3401893ac1d48cee258a2177b2f5a789bffa |
Hashes for ragbooster-0.1.0rc3-cp310-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f2338a320d6f42b43f30b3ddf2bbaf36970f447d77fc45190cbd7a1730b77cc0 |
|
MD5 | ec00cc2aa8183c9c82d8cc5e06d27085 |
|
BLAKE2b-256 | 8363eb4bbf35d6ec0f8f8dd120aa54e28a90dbcf93d99954bed741e3d488dff0 |
Hashes for ragbooster-0.1.0rc3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 201f86193eac0b144b153bab09b1db16fc5f1975d147af110255fd781e34267d |
|
MD5 | 345fabb8c6c326973fc969669f175425 |
|
BLAKE2b-256 | cd79cef506ea2b8da85f1543f27366729f538b513f1f7288052c337ee0e3d1c2 |
Hashes for ragbooster-0.1.0rc3-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8800871ef0089ca2f6bc24c55832ae5c20c2368fd8977f2fe88dfd58ea0630bc |
|
MD5 | b1bdaf2244de96a361d9c0b54f0c26b3 |
|
BLAKE2b-256 | 5c0dbc559e3df09279d09bcf0bc3f9f8c0eb20baabff6222d25ed9615b3e9f69 |
Hashes for ragbooster-0.1.0rc3-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cef037ba6405f60559464b435e56e55bff496866ceaff2ac0139705aa9fc1a5a |
|
MD5 | 19908d14d056680cd629dd87aeb7a753 |
|
BLAKE2b-256 | ec7fdb8db138293ebad96cc5ee2215966c517a7e86e11aa48140a4c4c328b944 |
Hashes for ragbooster-0.1.0rc3-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0aa6ec130e32d0b29d214dfb4b06b117ceafccf1b7e586461ec2a1b6e591a4e6 |
|
MD5 | 2bc7981f6dc2528c4015a401b857a3de |
|
BLAKE2b-256 | 45388dc315cf57fc4eb7903337d7ce229bd357b2062808170b2c1e50d59debb4 |
Hashes for ragbooster-0.1.0rc3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ee6c6ec187d078e1462108ac35afd3ffe44110bb65f04d62a18252bc7605728 |
|
MD5 | 0457088a371bf6a35749cf4d3f6b334c |
|
BLAKE2b-256 | 4ff8287662ddd55d540c1653149396933ae815b04bcd1449854a50f2d081cea6 |
Hashes for ragbooster-0.1.0rc3-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7941f0f45aea9a1a7763f86348c5438afed3eafff2cd05ea3d424819efd014b3 |
|
MD5 | 010b8dc29c4cc3d746ac8baa4e3e966b |
|
BLAKE2b-256 | cd6b7dc9546ee15f26d529f0d9900a99cbf02e828cab9d1cbb557b408c0399d8 |
Hashes for ragbooster-0.1.0rc3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 95c9f6f315e999d1d10344904edbd9f595fc938de9fff5dc17c565073f2836a2 |
|
MD5 | c2135feff10670db8fcca64c13f8c87c |
|
BLAKE2b-256 | 57367f64dc18606b14549f5f9b357dcb27444071ddf9fa140ed829ae51ceb210 |
Hashes for ragbooster-0.1.0rc3-cp310-cp310-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b18c7d68842064670feeafed60d3f314eeee6770f12cd70af20953a8b555756b |
|
MD5 | d42c1d6076fd6cabfdebd537782baa53 |
|
BLAKE2b-256 | 1b06ca839e42d4adf6a87035de152a808a17bd8ce6ca0aadd50bddb469b9c0db |
Hashes for ragbooster-0.1.0rc3-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b66c16d5dc5ea7a4d1c656372939818ad9798cba4927430241210fee2673fb6 |
|
MD5 | 6dd5488c6a504fd72a3fc28b6b9b4958 |
|
BLAKE2b-256 | 9dd1ef78faebf0d6e096de3a5242d3e908418d950289b2dbc0afbcddc86aa6fd |
Hashes for ragbooster-0.1.0rc3-cp39-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 69bf4c5459c3f72f4a9e740b0f578bc1e380faa541bdaf98e3758e3a09504d64 |
|
MD5 | cae60994d1772e40fa436e74d413ecc3 |
|
BLAKE2b-256 | b1462b63e9d01c4a1b1b70ab1e7afccc2fb6c5b09e2e89ea48c9ecbf196694b2 |
Hashes for ragbooster-0.1.0rc3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 369d17b49cf8140a28a1cbffa32a5d10610c65065100f8cbbc1b54c48520b8bf |
|
MD5 | 3f46093f984930cb9091dab45564d4f6 |
|
BLAKE2b-256 | 3a8a2a9514afe646b7c5dffea0c6984e643ae58281bdb594478b5458d95c1fbc |
Hashes for ragbooster-0.1.0rc3-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | edb37c0e35a0aba6662ad7757f8864119090a1fe9e8c34400958c5dde472e6e0 |
|
MD5 | 910432772d15137638598e128ede28de |
|
BLAKE2b-256 | a12687245963a010ecb7c45a5e7721727c010285cc6580206299434ecf8eeb2d |
Hashes for ragbooster-0.1.0rc3-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc994a8fd0b6b1882ea0d574e2ef5c47082e0f3383f72fea2e4ff60111c13ec4 |
|
MD5 | 0276a510cb791effb51c526869738580 |
|
BLAKE2b-256 | adcdb0f0e214adae6623595468029050eaaf3e371c890b498f48792dbe94980e |
Hashes for ragbooster-0.1.0rc3-cp39-cp39-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 194881ebe6b26b091993028647b6c83d62b7973f69ff21eed7b16ca879d99829 |
|
MD5 | 1789abb43fd7b772225aae9ef700a1be |
|
BLAKE2b-256 | a92c31e463650468a3eef96008e00f89bfa281e8168aaa7660925beb992f7a31 |
Hashes for ragbooster-0.1.0rc3-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2eebe2f9019bc626fabaeb6271ff413e33093dba9b8bad41aa6e066301364009 |
|
MD5 | 4f8b7f5bd9e33492d34d12ff307e6360 |
|
BLAKE2b-256 | 90ae56d6762b4a322158151e0efabf8f148502dfa1ac7082ee72eb0c63e7df64 |
Hashes for ragbooster-0.1.0rc3-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa7fbc949111a6e3984faaeaad517c4bb01ad6e4e14c3e0bde26126245ee23b9 |
|
MD5 | 904a926e122f87f34b4370357ff22f28 |
|
BLAKE2b-256 | 64e0390a68c97ff389bc7576b6a45002b51d5535ebe1f6512d58f2668e2840ed |
Hashes for ragbooster-0.1.0rc3-cp39-cp39-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c51b393844ca9e126ef7ba1c3a9f3e0d1c5cdb9a639ce643f32237d636ebea45 |
|
MD5 | ee484dca8a0015a455311caf1aa357bb |
|
BLAKE2b-256 | f17ba4d6c03e70c98e084d75d0c961062417f5d574b8358cdbd0d7bd9b20c442 |
Hashes for ragbooster-0.1.0rc3-cp39-cp39-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 120f638ec3321e10ff08f81422ee976a6ece35b077eee96f1bb8d3cab0e65ae3 |
|
MD5 | 73d8aec4fd6721e85092d42925603991 |
|
BLAKE2b-256 | 203c0aec5cc72659704d9325d315bc0c6209c3d24a9fb94d0e1c8558274298df |