Skip to main content

Easy to use wrapper for text document question and answer using RAG and LLM

Project description

image

ReportMiner is an easy-to-use wrapper for multimodal retrieval augmented generation (RAG) tasks for technical reports. With ReportMiner, you can use any HuggingFace or Byaldi and apply for your reports.

🤖 Installation

Make sure to install Poppler first

sudo apt-get install -y poppler-utils on Linux OS

Then, install the package

pip install reportminer

🚀 Use RAG in 3 simple steps

Using ReportMiner is just few lines of codes.

1. Setup embedding model and visual language model

Here we will use ColPali-1.2 as our embedding model and SmolVLM as our visual language model (by default). This combination works very well even with free-tier Colab GPU T4 and low memory.

from reportminer import rag

# Setup embedding and VL model
rag_models = rag.setup_model()

If you have access to Colab Pro's GPU A100 and high memory, you can make advantage of Qwen2-VL as your visual language model and activate Flash Attention 2.

2. Upload your report

# Specify pdf file
pdf_file = '/content/15-9-19a-core.pdf'

# Process PDF report by converting into embeddings
rag_models = rag.process_PDF(pdf_file, rag_models, dpi=200, index_name='pvt-rag')

3. Perform query

There are 2 minimal tasks you can use with ReportMiner. First, you can retrieve relevant page taking advantage of the embedding model

rag.RAG('How is the Klinkenberg corrected gas horizontal permeability to porosity looks like at core 7?', 
        rag_models, k=1)

Second, you can ask a question, taking advantage of the visual language model

rag.Ask('How many cored intervals are there in the report? Mention the depth of each', rag_models)

👨‍💻 Future development

In the next version, the following will be implemented:

  1. Support for multiple PDF reports in a directory structure (best industrial case)
  2. Support for fine tuning with user reports of specific domain
  3. Integration with vector databases for robust document indexing

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reportminer-0.1.3.tar.gz (7.2 kB view details)

Uploaded Source

File details

Details for the file reportminer-0.1.3.tar.gz.

File metadata

  • Download URL: reportminer-0.1.3.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.12

File hashes

Hashes for reportminer-0.1.3.tar.gz
Algorithm Hash digest
SHA256 43c17b4c1d2e15ce25e63b36411fd005e8bbad7a3bb6531c2604cfdb9ac2aef4
MD5 6d7ed2ea3754086b3da33ee3bb913c03
BLAKE2b-256 5c611441a9a58f1dd94682acee0f4b9b4525b599b51af8edf05d939ba00acb2d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page