Skip to main content

A Python package Leveraging LLMs for Research Synthesis

Project description

Research summarizer

Leveraging LLMs for Research Synthesis

This package is designed to leverage the power of Large Language Models (LLMs) to summarize research papers. It uses a combination of Natural Language Processing (NLP) techniques and LLMs to extract and summarize key sections from research papers. The summarizer focuses on the methodology, results, discussion, and conclusion sections, providing a high-level summary of the key findings and conclusions (although you could extend to cover introduction or other parts of the paper).

Features

  • PDF Extraction: Extract text content from PDF files.
  • Text Preprocessing: Clean and preprocess the extracted text for better summarization.
  • Section Extraction: Identify and extract specific sections from the research paper.
  • Text Summarization: Generate high-level summaries of the extracted sections using Open source LLMs like Llama 3 and Open AI's GPT-4 model.
  • It can batch process multiple research papers at once.
  • So, users just need to upload a folder containing multiple research papers and the summarizer will process all the papers and return a summary of each paper.
  • The summaries are saved to a folder on your machine.
  • Streamlit Interface: A user-friendly web interface for uploading PDF files and displaying summaries. You can access the web app via this link

Installation

  1. Clone the repository:

    git clone https://github.com/drhammed/res-sum.git
    

Set up a virtual environment

python -m venv venv source venv/bin/activate # On Windows use venv\Scripts\activate

Install the required packages:

pip install -r requirements.txt

Download NLTK data:

python -m nltk.downloader punkt wordnet

Configuration

  1. Google Drive API Credentials:
  • Create a project on the (Google Cloud Console).

  • Enable the Google Drive API.

  • Create credentials (OAuth 2.0 Client IDs) and download the credentials.json file.

  • Place the credentials.json file in the project directory. For a full instruction on this, see my GDriveOps python package

  1. OpenAI API Key: Obtain an API key from Groq.

For the OpenAI API key, you can obtain one from OpenAI.

You can the set the API keys in the .env file or in the .env.local file.

Usage

Acknowledgments

  • This project uses the API key from Groq AI and OpenAI GPT-4 model for text summarization.
  • So, I want to thank the Groq AI for providing free tier access to interact with their models.
  • Thanks to the Google Drive API for providing the tools to interact with Google Drive.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

res_sum-0.1.0.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

res_sum-0.1.0-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file res_sum-0.1.0.tar.gz.

File metadata

  • Download URL: res_sum-0.1.0.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for res_sum-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5d28ff3d80ff64d184ee32ad0b272dd3f6ae75b1adbbc6ae86e21d260b267831
MD5 c3100719c9b8f27e8cfda91855692b49
BLAKE2b-256 22b556c49dba7e3dcc8121c25965d28fbb861d1e249ba7a4afab9f897edb468d

See more details on using hashes here.

File details

Details for the file res_sum-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: res_sum-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for res_sum-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f681d16436e475a9cd8a1a8c2d1030feacaf6ad4fa287cceb64c3cdf2f765f18
MD5 c0f119d64df6a477211416a4a31ef3d7
BLAKE2b-256 77a2fce52a4ff694360dc8c0eb6a9a99006e09494519277cafd03aa93543d9ff

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page