A happy toolkit for arxiv paper summarization and understanding.
Project description
Arxiv Summarizer
The ArxivSummarizer
is a Python class designed for summarizing ArXiv documents using Hugging Face's Transformers library. It can be configured with a custom SummarizationModel
or with pre-trained models based on user preferences.
Table of Contents
Installation
Make sure you have Python 3.8 or later installed. Install the required dependencies using the following command:
You can use Arxiv Summarizer by simply doing
pip install arxiv-summarizer
For developers looking to tinker you can simply git clone
this repository and use:
pip install .
Usage
As CLI
Arxiv Summarizer can be easily used as a CLI tool to get papers summarized...
$ python3 -m arxiv_summarizer 1234.56789v1
With Custom SummarizationModel
If you have a custom SummarizationModel
and Tokenizer
, you can use them with ArxivSummarizer
directly:
from arxiv_summarizer import SummarizationModel, ArxivSummarizer
# Initialize your custom SummarizationModel
custom_model = SummarizationModel(
model="your_custom_model",
tokenizer="your_custom_tokenizer",
max_length=512, do_sample=True
)
# Initialize ArxivSummarizer with your custom model
summarizer = ArxivSummarizer(summarizer=custom_model)
# Generate a summary
summary = summarizer(arxiv_id="1234.5678")
print(summary)
With Pre-trained Model by Name
You can use a pre-trained model from Hugging Face's model hub by specifying its name:
from arxiv_summarizer import ArxivSummarizer
# Initialize ArxivSummarizer with a pre-trained model by name
summarizer = ArxivSummarizer(model="facebook/bart-large-cnn")
# Generate a summary
summary = summarizer(arxiv_id="1234.5678")
print(summary)
With Default Models
If you don't provide a specific model, ArxivSummarizer
will use default models based on GPU availability:
from arxiv_summarizer import ArxivSummarizer
# Initialize ArxivSummarizer with default models
summarizer = ArxivSummarizer()
# Generate a summary
summary = summarizer(arxiv_id="1234.5678")
print(summary)
Fetching a list of papers
First we can search a list of papers directly using the fetch_paper()
definition.
from rich.progress import Progress
from rich.console import Console
from rich.table import Table
from typing import List
from arxiv_summarizer.fetch_paper import fetch_paper, ArxivPaper
# Get the list of papers
papers = fetch_paper("Yoshua Bengio", max_docs=15)
results : List[ArxivPaper] = [paper for paper in papers]
print(f"{len(results)} Papers Found !!!")
This will load the papers, their metadata and their summaries. Now we can download the content of the paper and show the progress using a progressbar from rich
.
# Download the papers
with Progress() as progress:
task = progress.add_task("[cyan] Downloading content...", total = len(results))
for index, paper in enumerate(results):
progress.update(task, advance=1, description=f"Downloading content for paper {paper.arxiv_id}")
_ = results[index].content # This will download the content automatically.
Once all the content has been downloaded, we can display the content in a tabular structure using a rich
Table.
# Print the data
console = Console()
table = Table(show_header=True, header_style="bold magenta")
table.add_column("ID", style="dim")
table.add_column("Title", style="dim")
table.add_column("Authors", style="dim")
table.add_column("Content Size", style="dim")
for entry in results:
entry:ArxivPaper
table.add_row(entry.arxiv_id, entry.name, ", ".join(entry.authors), str(len(entry.content)))
console.print(table)
15 Papers Found !!!
Downloading content for paper 1203.4416v1 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ ID ┃ Title ┃ Authors ┃ Content Size ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ 1206.5533v2 │ Practical recommendations for │ Yoshua Bengio │ 134815 │
│ │ gradient-based training of deep │ │ │
│ │ architectures │ │ │
│ 1207.4404v1 │ Better Mixing via Deep Representations │ Yoshua Bengio, Grégoire Mesnil, Yann │ 31767 │
│ │ │ Dauphin, Salah Rifai │ │
│ 1305.0445v2 │ Deep Learning of Representations: Looking │ Yoshua Bengio │ 121365 │
│ │ Forward │ │ │
│ 1212.2686v1 │ Joint Training of Deep Boltzmann Machines │ Ian Goodfellow, Aaron Courville, Yoshua │ 13806 │
│ │ │ Bengio │ │
│ 1703.07718v1 │ Independently Controllable Features │ Emmanuel Bengio, Valentin Thomas, Joelle │ 19385 │
│ │ │ Pineau, Doina Precup, Yoshua Bengio │ │
│ 1211.5063v2 │ On the difficulty of training Recurrent │ Razvan Pascanu, Tomas Mikolov, Yoshua │ 50908 │
│ │ Neural Networks │ Bengio │ │
│ 1206.5538v3 │ Representation Learning: A Review and New │ Yoshua Bengio, Aaron Courville, Pascal │ 194906 │
│ │ Perspectives │ Vincent │ │
│ 1207.0057v1 │ Implicit Density Estimation by Local │ Yoshua Bengio, Guillaume Alain, Salah │ 35635 │
│ │ Moment Matching to Sample from │ Rifai │ │
│ │ Auto-Encoders │ │ │
│ 1305.6663v4 │ Generalized Denoising Auto-Encoders as │ Yoshua Bengio, Li Yao, Guillaume Alain, │ 33769 │
│ │ Generative Models │ Pascal Vincent │ │
│ 1311.6184v4 │ Bounding the Test Log-Likelihood of │ Yoshua Bengio, Li Yao, Kyunghyun Cho │ 23711 │
│ │ Generative Models │ │ │
│ 1510.02777v2 │ Early Inference in Energy-Based Models │ Yoshua Bengio, Asja Fischer │ 26477 │
│ │ Approximates Back-Propagation │ │ │
│ 1509.05936v2 │ STDP as presynaptic activity times rate of │ Yoshua Bengio, Thomas Mesnard, Asja │ 22030 │
│ │ change of postsynaptic activity │ Fischer, Saizheng Zhang, Yuhuai Wu │ │
│ 1103.2832v1 │ Autotagging music with conditional │ Michael Mandel, Razvan Pascanu, Hugo │ 47698 │
│ │ restricted Boltzmann machines │ Larochelle, Yoshua Bengio │ │
│ 2007.15139v2 │ Deriving Differential Target Propagation │ Yoshua Bengio │ 63661 │
│ │ from Iterating Approximate Inverses │ │ │
│ 1203.4416v1 │ On Training Deep Boltzmann Machines │ Guillaume Desjardins, Aaron Courville, │ 20531 │
│ │ │ Yoshua Bengio │ │
└──────────────┴────────────────────────────────────────────┴────────────────────────────────────────────┴──────────────┘
Examples
For more detailed examples, refer to the Examples directory.
Contributing
Contributions are welcome! Please refer to the Contributing Guidelines for details on how to contribute to this project.
License
This project is licensed under the Apache.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file arxiv-summarizer-0.1.1.tar.gz
.
File metadata
- Download URL: arxiv-summarizer-0.1.1.tar.gz
- Upload date:
- Size: 19.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f81b4b7edbaafd24249d4cce83004f4f988e822d5d5a603a68d807e7aec82365 |
|
MD5 | c50180b9d940c8e11aab42011300bd55 |
|
BLAKE2b-256 | 5db7313acb1cd06701324c7d73fc0c4cbd6927e2c31348499a1771c31a70a30c |
File details
Details for the file arxiv_summarizer-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: arxiv_summarizer-0.1.1-py3-none-any.whl
- Upload date:
- Size: 18.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a060643e1383d92ce785bf723644fa5e731b0a8bdb6e47e627a4a4b97b1694b6 |
|
MD5 | a475a865a6124cd677812fa447375cb3 |
|
BLAKE2b-256 | 5fadfb5aef24bdb9b5e639a936da80f6804e185de5462d7b275769926a12f38a |