Skip to main content

A Python framework for multi-modal document understanding with generative AI

Project description

Rhubarb

Amazon Bedrock License made-with-python Python 3.11 Ruff

Rhubarb

Rhubarb is a light-weight Python framework that makes it easy to build document understanding applications using Multi-modal Large Language Models (LLMs) and Embedding models. Rhubarb is created from the gorund up to work with Amazon Bedrock and Anthropic Claude V3 Multi-modal Language Models, and Amazon Titan Multi-modal Embedding model.

What can I do with Rhubarb?

Rhubarb can do multiple document processing tasks such as

  • ✅ Document Q&A
  • ✅ Streaming chat with documents (Q&A)
  • ✅ Document Summarization
    • 🚀 Page level summaries
    • 🚀 Full summaries
    • 🚀 Summaries of specific pages
    • 🚀 Streaming Summaries
  • ✅ Extraction based on a JSON schema
    • 🚀 Key-value extractions
    • 🚀 Table extractions
  • ✅ Named entity recognition (NER)
    • 🚀 With 50 built-in common entities
  • ✅ PII recognition with built-in entities
  • ✅ Figure and image understanding from documents
    • 🚀 Explain charts, graphs, and figures
    • 🚀 Perform table reasoning (as figures)
  • ✅ Document Classification with vector sampling using multi-modal embedding models
  • ✅ Auto generation of JSON schema from natural language prompts for document extraction
  • ✅ Logs token usage to help keep track of costs

Rhubarb comes with built-in system prompts that makes it easy to use it for a number of different document understanding use-cases. You can customize Rhubarb by passing in your own system prompts. It supports exact JSON schema based output generation which makes it easy to integrate into downstream applications.

  • Supports PDF, TIFF, PNG, JPG files
  • Performs document to image conversion internally to work with the multi-modal models
  • Works on local files or files stored in S3
  • Supports specifying page numbers for multi-page documents
  • Supports chat-history based chat for documents
  • Supports streaming and non-streaming mode

Installation

Start by installing Rhubarb using pip. Rhubarb is currently available in Test PyPi which you can install as follows

pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ rhubarb_dev

Usage

Create a boto3 session.

import boto3
session = boto3.Session()

Call Rhubarb

Local file

from rhubarb import DocAnalysis

da = DocAnalysis(file_path="./path/to/doc/doc.pdf", 
                 boto3_session=session)
resp = da.run(message="What is the employee's name?")
resp

With file in Amazon S3

from rhubarb import DocAnalysis

da = DocAnalysis(file_path="s3://path/to/doc/doc.pdf", 
                 boto3_session=session)
resp = da.run(message="What is the employee's name?")
resp

For more usage examples see cookbooks.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyrhubarb-0.0.1.tar.gz (33.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyrhubarb-0.0.1-py3-none-any.whl (43.1 kB view details)

Uploaded Python 3

File details

Details for the file pyrhubarb-0.0.1.tar.gz.

File metadata

  • Download URL: pyrhubarb-0.0.1.tar.gz
  • Upload date:
  • Size: 33.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.14 Linux/6.5.0-1018-azure

File hashes

Hashes for pyrhubarb-0.0.1.tar.gz
Algorithm Hash digest
SHA256 9aa27d5d59fd26c2758b508dee399837ad3fb7e89abfbbf420305b9601d4ba53
MD5 7b8df9d1c4cfb5012a3b871795fd987f
BLAKE2b-256 c980ed7f3edd1ccdc749d5922c4f4fdc87273f1e8226a88d1253eb05f137c5eb

See more details on using hashes here.

File details

Details for the file pyrhubarb-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: pyrhubarb-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 43.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.14 Linux/6.5.0-1018-azure

File hashes

Hashes for pyrhubarb-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b6d6ce59c2888cf6e05144158601d6cb9dcf6a40e010c68b2177d3d17eae933a
MD5 24a65f1a4812f665b3dc8f80e2a82a72
BLAKE2b-256 a5d88c4e4c9dd5142c941710ee345793e161a1537d5ff986f390297c945bf080

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page