A Python framework for multi-modal document understanding with generative AI
Project description
Rhubarb
Rhubarb is a light-weight Python framework that makes it easy to build document understanding applications using Multi-modal Large Language Models (LLMs) and Embedding models. Rhubarb is created from the gorund up to work with Amazon Bedrock and Anthropic Claude V3 Multi-modal Language Models, and Amazon Titan Multi-modal Embedding model.
What can I do with Rhubarb?
Rhubarb can do multiple document processing tasks such as
- ✅ Document Q&A
- ✅ Streaming chat with documents (Q&A)
- ✅ Document Summarization
- 🚀 Page level summaries
- 🚀 Full summaries
- 🚀 Summaries of specific pages
- 🚀 Streaming Summaries
- ✅ Extraction based on a JSON schema
- 🚀 Key-value extractions
- 🚀 Table extractions
- ✅ Named entity recognition (NER)
- 🚀 With 50 built-in common entities
- ✅ PII recognition with built-in entities
- ✅ Figure and image understanding from documents
- 🚀 Explain charts, graphs, and figures
- 🚀 Perform table reasoning (as figures)
- ✅ Document Classification with vector sampling using multi-modal embedding models
- ✅ Auto generation of JSON schema from natural language prompts for document extraction
- ✅ Logs token usage to help keep track of costs
Rhubarb comes with built-in system prompts that makes it easy to use it for a number of different document understanding use-cases. You can customize Rhubarb by passing in your own system prompts. It supports exact JSON schema based output generation which makes it easy to integrate into downstream applications.
- Supports PDF, TIFF, PNG, JPG files
- Performs document to image conversion internally to work with the multi-modal models
- Works on local files or files stored in S3
- Supports specifying page numbers for multi-page documents
- Supports chat-history based chat for documents
- Supports streaming and non-streaming mode
Installation
Start by installing Rhubarb using pip. Rhubarb is currently available in Test PyPi which you can install as follows
pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ rhubarb_dev
Usage
Create a boto3 session.
import boto3
session = boto3.Session()
Call Rhubarb
Local file
from rhubarb import DocAnalysis
da = DocAnalysis(file_path="./path/to/doc/doc.pdf",
boto3_session=session)
resp = da.run(message="What is the employee's name?")
resp
With file in Amazon S3
from rhubarb import DocAnalysis
da = DocAnalysis(file_path="s3://path/to/doc/doc.pdf",
boto3_session=session)
resp = da.run(message="What is the employee's name?")
resp
For more usage examples see cookbooks.
Security
See CONTRIBUTING for more information.
License
This project is licensed under the Apache-2.0 License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyrhubarb-0.0.1.tar.gz.
File metadata
- Download URL: pyrhubarb-0.0.1.tar.gz
- Upload date:
- Size: 33.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.10.14 Linux/6.5.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9aa27d5d59fd26c2758b508dee399837ad3fb7e89abfbbf420305b9601d4ba53
|
|
| MD5 |
7b8df9d1c4cfb5012a3b871795fd987f
|
|
| BLAKE2b-256 |
c980ed7f3edd1ccdc749d5922c4f4fdc87273f1e8226a88d1253eb05f137c5eb
|
File details
Details for the file pyrhubarb-0.0.1-py3-none-any.whl.
File metadata
- Download URL: pyrhubarb-0.0.1-py3-none-any.whl
- Upload date:
- Size: 43.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.10.14 Linux/6.5.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6d6ce59c2888cf6e05144158601d6cb9dcf6a40e010c68b2177d3d17eae933a
|
|
| MD5 |
24a65f1a4812f665b3dc8f80e2a82a72
|
|
| BLAKE2b-256 |
a5d88c4e4c9dd5142c941710ee345793e161a1537d5ff986f390297c945bf080
|