Skip to main content

Scientific Document Insight Q/A

Project description


title: Scientific Document Insights Q/A emoji: 📝 colorFrom: yellow colorTo: pink sdk: streamlit sdk_version: 1.27.2 app_file: streamlit_app.py pinned: false license: apache-2.0

DocumentIQA: Scientific Document Insights Q/A

Work in progress :construction_worker:

Introduction

Question/Answering on scientific documents using LLMs (OpenAI, Mistral, LLama2, etc..). This application is the frontend for testing the RAG (Retrieval Augmented Generation) on scientific documents, that we are developing at NIMS. Differently to most of the project, we focus on scientific articles. We target only the full-text using Grobid that provide and cleaner results than the raw PDF2Text converter (which is comparable with most of other solutions).

NER in LLM response: The responses from the LLMs are post-processed to extract physical quantities, measurements (with grobid-quantities) and materials mentions (with grobid-superconductors).

Demos:

Getting started

  • Select the model+embedding combination you want ot use (for LLama2 you must acknowledge their licence both on meta.com and on huggingface. See here)(Llama2 was removed due to API limitations).
  • Enter your API Key (Open AI or Huggingface).
  • Upload a scientific article as PDF document. You will see a spinner or loading indicator while the processing is in progress.
  • Once the spinner stops, you can proceed to ask your questions

screenshot2.png

Options

Context size

Allow to change the number of embedding chunks that are considered for responding. The text chunk are around 250 tokens, which uses around 1000 tokens for each question.

Query mode

By default, the mode is set to LLM (Language Model) which enables question/answering. You can directly ask questions related to the document content, and the system will answer the question using content from the document. If you switch the mode to "Embedding," the system will return specific chunks from the document that are semantically related to your query. This mode helps to test why sometimes the answers are not satisfying or incomplete.

Development notes

To release a new version:

  • bump-my-version bump patch
  • git push --tags

To use docker:

  • docker run lfoppiano/document-insights-qa:latest

To install the library with Pypi:

  • pip install document-qa-engine

Acknolwedgement

This project is developed at the National Institute for Materials Science (NIMS) in Japan in collaboration with the Lambard-ML-Team.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

document-qa-engine-0.2.1.tar.gz (451.3 kB view details)

Uploaded Source

Built Distribution

document_qa_engine-0.2.1-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file document-qa-engine-0.2.1.tar.gz.

File metadata

  • Download URL: document-qa-engine-0.2.1.tar.gz
  • Upload date:
  • Size: 451.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for document-qa-engine-0.2.1.tar.gz
Algorithm Hash digest
SHA256 5f9b8524a78f4d4486da50021f1591a4142c2368cb2d84aba6b1154d9ab18db5
MD5 67cfa694c8008cef13b922425dc56295
BLAKE2b-256 4c3a176ae46ba444d8b95dd9fb9e2c542e0f0a5e7b00fd4c57f4ad9a73c19000

See more details on using hashes here.

File details

Details for the file document_qa_engine-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for document_qa_engine-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 752a748194251533a43f4b78472b10ab92aa7bb4b46283b9b1700f2a2d08414c
MD5 a6363e6692beef6e9e548014a84108df
BLAKE2b-256 7bb503da275369f4ca86813c8df92b6c82efd12ca861003d314aa70b5122e9ae

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page