Skip to main content

A happy toolkit for arxiv paper summarization and understanding.

Project description

Arxiv Summarizer

The ArxivSummarizer is a Python class designed for summarizing ArXiv documents using Hugging Face's Transformers library. It can be configured with a custom SummarizationModel or with pre-trained models based on user preferences.

Table of Contents

Installation

Make sure you have Python 3.8 or later installed. Install the required dependencies using the following command:

You can use Arxiv Summarizer by simply doing

pip install arxiv-summarizer

For developers looking to tinker you can simply git clone this repository and use:

pip install .

Usage

As CLI

Arxiv Summarizer can be easily used as a CLI tool to get papers summarized...

$ python3 -m arxiv_summarizer 1234.56789v1

With Custom SummarizationModel

If you have a custom SummarizationModel and Tokenizer, you can use them with ArxivSummarizer directly:

from arxiv_summarizer import SummarizationModel, ArxivSummarizer

# Initialize your custom SummarizationModel
custom_model = SummarizationModel(
    model="your_custom_model", 
    tokenizer="your_custom_tokenizer", 
    max_length=512, do_sample=True
)

# Initialize ArxivSummarizer with your custom model
summarizer = ArxivSummarizer(summarizer=custom_model)

# Generate a summary
summary = summarizer(arxiv_id="1234.5678")
print(summary)

With Pre-trained Model by Name

You can use a pre-trained model from Hugging Face's model hub by specifying its name:

from arxiv_summarizer import ArxivSummarizer

# Initialize ArxivSummarizer with a pre-trained model by name
summarizer = ArxivSummarizer(model="facebook/bart-large-cnn")

# Generate a summary
summary = summarizer(arxiv_id="1234.5678")
print(summary)

With Default Models

If you don't provide a specific model, ArxivSummarizer will use default models based on GPU availability:

from arxiv_summarizer import ArxivSummarizer

# Initialize ArxivSummarizer with default models
summarizer = ArxivSummarizer()

# Generate a summary
summary = summarizer(arxiv_id="1234.5678")
print(summary)

Fetching a list of papers

First we can search a list of papers directly using the fetch_paper() definition.

from rich.progress import Progress
from rich.console import Console
from rich.table import Table

from typing import List
from arxiv_summarizer.fetch_paper import fetch_paper, ArxivPaper

# Get the list of papers
papers = fetch_paper("Yoshua Bengio", max_docs=15)
results : List[ArxivPaper] = [paper for paper in papers]

print(f"{len(results)} Papers Found !!!")

This will load the papers, their metadata and their summaries. Now we can download the content of the paper and show the progress using a progressbar from rich.

# Download the papers
with Progress() as progress:
    task = progress.add_task("[cyan] Downloading content...", total = len(results))

    for index, paper in enumerate(results):
        progress.update(task, advance=1, description=f"Downloading content for paper {paper.arxiv_id}")
        _ = results[index].content # This will download the content automatically.

Once all the content has been downloaded, we can display the content in a tabular structure using a rich Table.

# Print the data
console = Console()

table = Table(show_header=True, header_style="bold magenta")
table.add_column("ID", style="dim")
table.add_column("Title", style="dim")
table.add_column("Authors", style="dim")
table.add_column("Content Size", style="dim")

for entry in results:
    entry:ArxivPaper
    table.add_row(entry.arxiv_id, entry.name, ", ".join(entry.authors), str(len(entry.content)))

console.print(table)
15 Papers Found !!!
Downloading content for paper 1203.4416v1 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ ID           ┃ Title                                      ┃ Authors                                    ┃ Content Size ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ 1206.5533v2  │ Practical recommendations for              │ Yoshua Bengio                              │ 134815       │
│              │ gradient-based training of deep            │                                            │              │
│              │ architectures                              │                                            │              │
│ 1207.4404v1  │ Better Mixing via Deep Representations     │ Yoshua Bengio, Grégoire Mesnil, Yann       │ 31767        │
│              │                                            │ Dauphin, Salah Rifai                       │              │
│ 1305.0445v2  │ Deep Learning of Representations: Looking  │ Yoshua Bengio                              │ 121365       │
│              │ Forward                                    │                                            │              │
│ 1212.2686v1  │ Joint Training of Deep Boltzmann Machines  │ Ian Goodfellow, Aaron Courville, Yoshua    │ 13806        │
│              │                                            │ Bengio                                     │              │
│ 1703.07718v1 │ Independently Controllable Features        │ Emmanuel Bengio, Valentin Thomas, Joelle   │ 19385        │
│              │                                            │ Pineau, Doina Precup, Yoshua Bengio        │              │
│ 1211.5063v2  │ On the difficulty of training Recurrent    │ Razvan Pascanu, Tomas Mikolov, Yoshua      │ 50908        │
│              │ Neural Networks                            │ Bengio                                     │              │
│ 1206.5538v3  │ Representation Learning: A Review and New  │ Yoshua Bengio, Aaron Courville, Pascal     │ 194906       │
│              │ Perspectives                               │ Vincent                                    │              │
│ 1207.0057v1  │ Implicit Density Estimation by Local       │ Yoshua Bengio, Guillaume Alain, Salah      │ 35635        │
│              │ Moment Matching to Sample from             │ Rifai                                      │              │
│              │ Auto-Encoders                              │                                            │              │
│ 1305.6663v4  │ Generalized Denoising Auto-Encoders as     │ Yoshua Bengio, Li Yao, Guillaume Alain,    │ 33769        │
│              │ Generative Models                          │ Pascal Vincent                             │              │
│ 1311.6184v4  │ Bounding the Test Log-Likelihood of        │ Yoshua Bengio, Li Yao, Kyunghyun Cho       │ 23711        │
│              │ Generative Models                          │                                            │              │
│ 1510.02777v2 │ Early Inference in Energy-Based Models     │ Yoshua Bengio, Asja Fischer                │ 26477        │
│              │ Approximates Back-Propagation              │                                            │              │
│ 1509.05936v2 │ STDP as presynaptic activity times rate of │ Yoshua Bengio, Thomas Mesnard, Asja        │ 22030        │
│              │ change of postsynaptic activity            │ Fischer, Saizheng Zhang, Yuhuai Wu         │              │
│ 1103.2832v1  │ Autotagging music with conditional         │ Michael Mandel, Razvan Pascanu, Hugo       │ 47698        │
│              │ restricted Boltzmann machines              │ Larochelle, Yoshua Bengio                  │              │
│ 2007.15139v2 │ Deriving Differential Target Propagation   │ Yoshua Bengio                              │ 63661        │
│              │ from Iterating Approximate Inverses        │                                            │              │
│ 1203.4416v1  │ On Training Deep Boltzmann Machines        │ Guillaume Desjardins, Aaron Courville,     │ 20531        │
│              │                                            │ Yoshua Bengio                              │              │
└──────────────┴────────────────────────────────────────────┴────────────────────────────────────────────┴──────────────┘

Examples

For more detailed examples, refer to the Examples directory.

Contributing

Contributions are welcome! Please refer to the Contributing Guidelines for details on how to contribute to this project.

License

This project is licensed under the Apache.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxiv-summarizer-0.1.1.tar.gz (19.9 kB view details)

Uploaded Source

Built Distribution

arxiv_summarizer-0.1.1-py3-none-any.whl (18.2 kB view details)

Uploaded Python 3

File details

Details for the file arxiv-summarizer-0.1.1.tar.gz.

File metadata

  • Download URL: arxiv-summarizer-0.1.1.tar.gz
  • Upload date:
  • Size: 19.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for arxiv-summarizer-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f81b4b7edbaafd24249d4cce83004f4f988e822d5d5a603a68d807e7aec82365
MD5 c50180b9d940c8e11aab42011300bd55
BLAKE2b-256 5db7313acb1cd06701324c7d73fc0c4cbd6927e2c31348499a1771c31a70a30c

See more details on using hashes here.

File details

Details for the file arxiv_summarizer-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for arxiv_summarizer-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a060643e1383d92ce785bf723644fa5e731b0a8bdb6e47e627a4a4b97b1694b6
MD5 a475a865a6124cd677812fa447375cb3
BLAKE2b-256 5fadfb5aef24bdb9b5e639a936da80f6804e185de5462d7b275769926a12f38a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page