Skip to main content

ragpipe: iterate quickly on your RAG pipelines.

Project description

ragpipe

    Ragpipe Logo

Ragpipe: Iterate fast on your RAG pipelines.

  Docs •  Discord

Introduction

Ragpipe helps you extract insights from large document repositories quickly.

Ragpipe is lean and nimble. Makes it easy to iterate fast, tweak components of your RAG pipeline until you get desired responses.

Watch a quick video intro.

Note: Under active development. Expect breaking changes.


Instead of the usual chunk-embed-match-rank flow, Ragpipe adopts a holistic, end-to-end view of the pipeline:

  • build a hierachical document model,
  • decompose a complex query into sub-queries
  • resolve sub-queries and obtain responses
  • aggregate the query responses.

How do we resolve each sub-query?

  • choose representations for document parts relevant to a sub-query,
  • specify the bridges among those representations,
  • merge the retrieved docs across bridges to setup a context,
  • present the query and context to a language model to compute the final response

The represent-bridge-merge pattern is very powerful and allows us to build and iterate over all kinds of complex retrieval pipelines, including those based on the traditional retrieve-rank-rerank pattern and more recent advanced RAG patterns.

Installation

pip install ragpipe

Key Ideas

Representations. Choose the query/document fields as well as how to represent each chosen query / document field to aid similarity/relevance computation (bridges) over the entire document repository. Representations can be text strings, dense/sparse vector embeddings or arbitrary data objects, and help bridge the gap between the query and the documents.

Bridges. Choose a pair of query and document representation to bridge. A bridge serves as a relevance indicator: one of the several criteria for identifying the relevant documents for a query. In practice, several bridges together determine the degree to which a document is relevant to a query. Computing each bridge creates a unique ranked list of documents.

Merges. Specify how to combine the bridges, e.g., combine multiple ranked list of documents into a single ranked list.

Data Model. A hierarchical data structure that consists of all the (nested) documents. The data model is created from the original document files and is retained over the entire pipeline. We compute representations for arbitrary nested fields of the data, without flattening the data tree.

To query over a data repository,

  • we compute the data model over the original data repository
  • specify the document fields and the (multiple) representations to be computed for each field
  • specify which representations to compute for query
  • specify bridges: which pair of query and doc field representation should be matched
  • merges: how to combine multiple bridges, sequentially or in parallel, to yield a curated ranked list of relevant documents.
  • gen-response: how to generate response to the query using the relevant document list and a large language model.

Quick Start

See the example in the examples/insurance directory.

Key Dependencies

Ragpipe relies on

  • LlamaIndex: for parsing markdown documents
  • rank_bm25: for BM25 based retrieval
  • fastembed: dense and sparse embeddings
  • litellm: interact with LLM APIs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragpipe-0.0.2.tar.gz (25.9 kB view details)

Uploaded Source

Built Distribution

ragpipe-0.0.2-py3-none-any.whl (29.4 kB view details)

Uploaded Python 3

File details

Details for the file ragpipe-0.0.2.tar.gz.

File metadata

  • Download URL: ragpipe-0.0.2.tar.gz
  • Upload date:
  • Size: 25.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.12 Darwin/20.6.0

File hashes

Hashes for ragpipe-0.0.2.tar.gz
Algorithm Hash digest
SHA256 b55839952531821a8a6887692524d65e1fae1ac0e0d5c8b62a04a3409cda1b23
MD5 c917cbc65fb23ec7f379a0b1f8a229f8
BLAKE2b-256 08fb3247f396051b642c534d612df4e8fee08cd5d3fd176babfbd12633c843de

See more details on using hashes here.

File details

Details for the file ragpipe-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: ragpipe-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 29.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.12 Darwin/20.6.0

File hashes

Hashes for ragpipe-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 28811f572df9f5e5a595cbafa74bcbd957b83a5f9854532d438404745984b042
MD5 549fd9b3c91cc49a4af65ee1e627ffc8
BLAKE2b-256 9ec16b43c091ea3ccf728720d124c33585e5ba8cb2fae675446bd43d9ab2be36

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page