LLM based tools and agents to fascilitate scientific research
Project description
readet
🚧 until I prepare a more comprehensive documentation, use this readme to work with the package
⚠️ If you run this package on a Windows machine, make sure you define the paths to files accordingly.
⚠️ this documentation explains how to use the functionalities using a minimal set of inputs and using default arguments. But you can control parameters if you want. I will add the details in the documentation soon.
readet is a package developed using LangChain for perusing scientific and technical literature. But all tools are applicable to any context.
Eventhough several functionalities are included in this package, such as multi-agent systems, these modules are used more frequently:
➡️ summarizers that are used to summarize a text, mostly pdf files.
➡️ RAGs or Retrieval Augmented Generation tools which can be used to ask questions about a document.
➡️ prebuilt agents that are used to download papers and patents in bulk.
here is the current directory tree of the package
readet
├── __init__.py
├── bots
│ ├── __init__.py
│ ├── agents.py
│ ├── chat_tools.py
│ ├── components.py
│ ├── multi_agents.py
│ └── prebuilt.py
├── core
│ ├── __init__.py
│ ├── chains.py
│ ├── knowledge_graphs.py
│ ├── rags.py
│ ├── retrievers.py
│ ├── summarizers.py
│ └── tools.py
└── utils
├── __init__.py
├── docs.py
├── io.py
├── models.py
├── save_load.py
└── schemas.py
👉 How to install
I recommend setting up a virtual environment with python version 3.10
conda create -n <name> python=3.10
Then you can activate the environment using
conda activate <name>
This will make sure the package dependencies remain inside the virtual environment. The package can be installed using ```console pip3 install readet ``` I also included the _requirements.txt_ file.
👉 How to use
This package uses several API s that need API keys. Fortunaletly, all of them are free for a while (or forever if you do not use them too often). Here is the list of APIs
1️⃣ OpenAI
2️⃣ Serp API
3️⃣ Anthropic
4️⃣ Tavily Search
5️⃣ LangChain
6️⃣ Hugging Face
apply for 1️⃣ to 3️⃣ first. With these APIs you can use utilize most of the functionalities in this package. But it is good to obtain all APIs at some point.
The easiest way is to define all API keys in a keys.env file and load it in your environment. The keys.env file is structured as
OPENAI_API_KEY =""
TAVILY_API_KEY=""
SERP_API_KEY=""
ANTHROPIC_API_KEY =""
👉 example use case 1
📖 summarizers
I use the PlainSummarizer as an example:
First, import necessary functions and classes
# use this function to load your API keys from keys.env file
from readet.utils.io import load_keys
load_keys('keys.env')
from readet.core.summarizers import PlainSummarizers
Now define parameters:
# you can define any model from openai. Include 'openai-' before the model name.
# example: 'openai-gpt-4o'
chat_model = 'openai-gpt-4o-mini'
# degree of improvisation given to the model; 0 is preferred
temperature = 0
# instantiate the summarizer
plain_summarizer = PlainSummarizer(chat_model = chat_model, temperature = temperature)
Now specify the path to your pdf file and run the summarizer:
# note that your path might be different. In Windows, MacOS or Linux. Choose the exact path
pdf_file = '../files/my_file.pdf'
response = plain_summarizer(pdf_file)
You can print the response to see the summary
Also, You may run the callable as much as you want to many pdf files:
pdf_files = ['./my_papers/paper.pdf', './my_patents/patent.pdf']
responses = {}
for count,pdf in enumerate(pdf_files):
responses[f'summary_{count}'] = plain_summarizer(pdf)
Note that ingesting pdf files may take some time. For a general scientific paper it may take about 12 seconds. Later when I explain RAGs, I will describe a method to store ingested pdf files to avoid spending too much time reading pdf files from scratch.
👉 example use case 2
📑 RAGS
RAGS are used to ask questions about a document. Say you have a pdf file and you want to ask questions about the content without reading it. RAGS ingest the pdf file and store in a database (a vectorstore) and use LLMs to respond to your questions based on what they hold. All RAGs in this package can keep their database on your local computer. So you do not need to add pdf files from scratch all the time.
readet contains several RAGs but working with all of them is the same. Here is a list
1️⃣ PlainRAG: simple but useful RAG to ask questions about a pdf file
2️⃣ RAGWithCitations: similar to plainRAG, but returns the reference as well (see an example below)
3️⃣ AgenticRAG: RAG with extra checks to make sure the answer is relevant to the context of the document
4️⃣ SelfRAG: RAG with introspection, to avoid hallucination
5️⃣ AdaptiveRAG: RAG that screens the question based on the relevance to the document. If not relevant, it gives an answer by google search. For example, it does not allow you to answer question about salsa dancing from a fluid dynamics text
I start with the PlainRAG which is the simplest model:
from readet.utils.io import load_keys
load_keys('keys.env')
from readet.core.rags import PlainRAG
You can define a RAG from scratch, or initialize it from saved data. I start from the former case
pdf_file = './my_papers/fluidflow.pdf'
# define your RAG store path here
store_path = './myRAGS'
rag = PlainRAG(documents = pdf_file, store_path = store_path)
This will give you a function for asking questions:
rag("who are the authors of this work?")
rag("what is the relationship between fluid pressure and solid content?")
Let's start the RAG from the previously saved database (or "vector store"). This will allow you to add new pdf files, or keep asking question from the old files.
here are parameters that you need to pass to the class:
# this parameter can also be None, if you do not want to add any new pdf file
new_pdf_file = './my_papers/turbulence.pdf'
# directory path
store_path = './myRAGS'
# either use a version number, ex 0,1,.., or pass 'last'
load_version_number = 'last'
rag2 = PlainRAG(documents = new_pdf_file, store_path = store_path, load_version_number = load_version_number)
Now you can ask questions.
rag2("what is the relationship between inertia and viscosity?")
Let's use RAGWithCitations as well:
from readet.utils.io import load_keys
load_keys('keys.env')
from readet.core.rags import RAGWithCitations
pdf_file = './files/HaddadiMorrisJFM2014.pdf'
store_path = './RAGStore'
rag = RAGWithCitations(pdf_file, store_path = store_path)
rag("what is the relationship between inertia and normal stress?")
And here is the answer:
'Inertia affects the normal stress in suspensions by influencing the distribution of particles and their interactions under shear flow. As inertia increases, it can lead to higher particle pressure and changes in the normal stress differences, particularly the first normal stress difference (N1), which becomes more negative with increasing inertia and volume fraction. This relationship highlights the complex interplay between inertia and stress in particle-laden fluids, where increased inertia amplifies the effects of excluded volume and alters the stress distribution within the suspension.',
'Haddadi, H. & Morris, J. F. (2023). Microstructure and Rheology of Finite Inertia Suspensions. J. Fluid Mech.'
I use one more example of the AdaptiveRAG and move on to the next example usage. All other RAGs mentioned above work the same
from readet.core.rags import AdaptiveRAG
from readet.utils.io import load_keys
load_keys('keys.env')
# can be None if you want to load from database
pdf_file = './files/fluidflow.pdf'
store_path = './RAGFluid'
# if you want to load from database, choose a verion number or 'last'; else None
load_version_number = None
rag = AdaptiveRAG(documents = None, store_path = store_path, load_version_number = 'last')
rag("what is relationship between Reynolds number and viscosity?")
And here is the answer:
The Reynolds number (Re) is a dimensionless quantity that characterizes the flow regime in fluid dynamics, influenced by factors such as velocity, characteristic length, and viscosity. Generally, as Re increases, the effects of inertia become more significant compared to viscous forces, which can lead to changes in flow behavior. However, the viscosity itself may not show significant changes with varying Re, as indicated in the context provided.
👉 example use case 3
📚 search and download several papers from Google Scholar and Arxiv
This tool has been a real convenience for me and I hope it helps you as well. I explain how it works. But I included this tool as an agent in a multi agent chat bot and I deploy that chatbot soon. You can use this tool, summary and RAGs to peruse a lot papers.
⚠️To use the Download functionality , you need OpenAI and Serp API API keys. Use the links in the first part of this ReadMe document to obtain the API keys.
⚠️ ⚠️ To use this agent, prompting is important. Make sure to mention "search and download" if you want the agent to download the files for you. Otherwise, it will output a list of papers and their information and links to download the article.
from readet.utils.io import load_keys
load_keys('keys.env')
from readet.bots.prebuilt import Download
Now you can define the parameters. These parameters are a path to save the downloaded files and maximum number of papers to download. Note that if you connection to the download faces a publisher paywall, the pdf file is not downloaded. But you can use the list of papers that are found to identify those papers and ask some to download it for you.
save_path = './pdfs'
max_results = 100
downloader = Download(save_path = save_path, max_results = max_results)
# NOTE: if you want to download the paper, explicitly mention the word 'download'
download("search and download all papers related to finite inertia suspension flow of ellipsoidal particles")
The downloaded files are stored in save_path. A '.txt' file containing information of the papers is also stored in the save_path directory
For example, the first record in this file is :
*******************
Title: Numerical study of filament suspensions at finite inertia
Authors: AA Banaei, ME Rosti, L Brandt
Citation Count: 36
PDF Link: https://www.cambridge.org/core/services/aop-cambridge-core/content/view/5FA754F237DC68A6721F7C055FA08CEC/S0022112019007948a.pdf/div-class-title-numerical-study-of-filament-suspensions-at-finite-inertia-div.pdf
for example, you can send this file to colleagues via email.
I am continuosly adding more functionalities. Hope this package is useful for your scientific discovery 🤞
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file readet-0.1.0.tar.gz.
File metadata
- Download URL: readet-0.1.0.tar.gz
- Upload date:
- Size: 47.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.9.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e71c2a15fef57d759763bc414153ef4f6add4ef1ee959960ef4bd01a56340eff
|
|
| MD5 |
53120c735729b590a58595ebfc2cfcc0
|
|
| BLAKE2b-256 |
bc28fdcef9510626d82b2013f24ebf6db228e2048641e7f0acbc767b0d6ab772
|
File details
Details for the file readet-0.1.0-py3-none-any.whl.
File metadata
- Download URL: readet-0.1.0-py3-none-any.whl
- Upload date:
- Size: 45.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.9.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9bb8c09e30653854e7c15a545ef7b91ff86137538fc8be8aa5f13ba0b9a228b4
|
|
| MD5 |
cb03f7f4a2f078d92ba83750fcda83cc
|
|
| BLAKE2b-256 |
59b45053c73b3febe2245381feb6be0341eb919b75e4c1f520d6da0a1807109d
|