A Python package Leveraging LLMs for Research Synthesis
Project description
Research summarizer
Leveraging LLMs for Research Synthesis
This package is designed to leverage the power of Large Language Models (LLMs) to summarize research papers. It uses a combination of Natural Language Processing (NLP) techniques and LLMs to extract and summarize key sections from research papers. The summarizer focuses on the methodology, results, discussion, and conclusion sections, providing a high-level summary of the key findings and conclusions (although you could extend to cover introduction or other parts of the paper).
Features
- PDF Extraction: Extract text content from PDF files.
- Text Preprocessing: Clean and preprocess the extracted text for better summarization.
- Section Extraction: Identify and extract specific sections from the research paper.
- Text Summarization: Generate high-level summaries of the extracted sections using Open source LLMs like Llama 3 and Open AI's GPT-4 model.
- It can batch process multiple research papers at once.
- So, users just need to upload a folder containing multiple research papers and the summarizer will process all the papers and return a summary of each paper.
- The summaries are saved to a folder on your machine.
- Streamlit Interface: A user-friendly web interface for uploading PDF files and displaying summaries. You can access the web app via this link
Installation
-
Clone the repository:
git clone https://github.com/drhammed/res-sum.git
Set up a virtual environment
python -m venv venv
source venv/bin/activate # On Windows use venv\Scripts\activate
Install the required packages:
pip install -r requirements.txt
Download NLTK data:
python -m nltk.downloader punkt wordnet
Configuration
- Google Drive API Credentials:
-
Create a project on the (Google Cloud Console).
-
Enable the Google Drive API.
-
Create credentials (OAuth 2.0 Client IDs) and download the credentials.json file.
-
Place the credentials.json file in the project directory. For a full instruction on this, see my GDriveOps python package
- OpenAI API Key: Obtain an API key from Groq.
For the OpenAI API key, you can obtain one from OpenAI.
You can the set the API keys in the .env file or in the .env.local file.
Usage
Acknowledgments
- This project uses the API key from Groq AI and OpenAI GPT-4 model for text summarization.
- So, I want to thank the Groq AI for providing free tier access to interact with their models.
- Thanks to the Google Drive API for providing the tools to interact with Google Drive.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file res_sum-0.1.0.tar.gz
.
File metadata
- Download URL: res_sum-0.1.0.tar.gz
- Upload date:
- Size: 10.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d28ff3d80ff64d184ee32ad0b272dd3f6ae75b1adbbc6ae86e21d260b267831 |
|
MD5 | c3100719c9b8f27e8cfda91855692b49 |
|
BLAKE2b-256 | 22b556c49dba7e3dcc8121c25965d28fbb861d1e249ba7a4afab9f897edb468d |
File details
Details for the file res_sum-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: res_sum-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f681d16436e475a9cd8a1a8c2d1030feacaf6ad4fa287cceb64c3cdf2f765f18 |
|
MD5 | c0f119d64df6a477211416a4a31ef3d7 |
|
BLAKE2b-256 | 77a2fce52a4ff694360dc8c0eb6a9a99006e09494519277cafd03aa93543d9ff |