Summarize research papers from arXiv using LLMs
Project description
thom
Summarize research papers from arXiv using LLMs.
Named after René Thom (1923-2002), the French mathematician who founded catastrophe theory and won the Fields Medal in 1958.
Installation
pip install thom
Or install from source:
git clone https://github.com/thom-project/thom.git
cd thom
pip install -e .
Quick Start
Python Library
import thom
# Fetch a paper from arXiv
paper = thom.fetch_paper("2301.00001")
print(paper.title)
print(paper.abstract)
# Summarize the paper
summary = thom.summarize(paper)
print(summary.summary)
print(summary.key_points)
# Search for papers
papers = thom.search("transformer attention mechanism", max_results=5)
for p in papers:
print(f"{p.arxiv_id}: {p.title}")
# Compare multiple papers
papers = [thom.fetch_paper(id) for id in ["2301.00001", "2301.00002"]]
analysis = thom.compare(papers)
print(analysis)
Command Line
# Summarize a paper
thom summarize 2301.00001
# Or use a URL
thom summarize https://arxiv.org/abs/2301.00001
# Fetch paper metadata only
thom fetch 2301.00001
# Search for papers
thom search "machine learning"
# Compare papers
thom compare 2301.00001 2301.00002
# Use different models
thom summarize 2301.00001 --model gpt-4o
thom summarize 2301.00001 --model claude-3-5-sonnet-20241022
thom summarize 2301.00001 --model ollama/llama3
# Adjust detail level
thom summarize 2301.00001 --detail brief
thom summarize 2301.00001 --detail detailed
# Output in different languages
thom summarize 2301.00001 --language french
# JSON output
thom summarize 2301.00001 --json
# List supported models
thom models
Configuration
API Keys
Set your API key via environment variables:
# OpenAI
export OPENAI_API_KEY="sk-..."
# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."
# Or set programmatically
import thom
thom.set_api_key("openai", "sk-...")
Using Local Models with Ollama
# First, install and run Ollama
ollama run llama3
# Then use with thom
thom summarize 2301.00001 --model ollama/llama3
import thom
paper = thom.fetch_paper("2301.00001")
summary = thom.summarize(paper, model="ollama/llama3")
API Reference
Core Functions
thom.fetch_paper(identifier)
Fetch a paper from arXiv by ID or URL.
paper = thom.fetch_paper("2301.00001")
paper = thom.fetch_paper("https://arxiv.org/abs/2301.00001")
paper = thom.fetch_paper("https://arxiv.org/pdf/2301.00001.pdf")
Returns an ArxivPaper object with:
arxiv_id: The arXiv identifiertitle: Paper titleauthors: List of author namesabstract: Paper abstractcategories: arXiv categoriespublished: Publication datepdf_url: URL to PDFarxiv_url: URL to arXiv page
thom.summarize(paper, model="gpt-4o-mini", detail_level="medium", language="english")
Generate a summary of a paper.
summary = thom.summarize(paper)
summary = thom.summarize(paper, model="gpt-4o", detail_level="detailed")
summary = thom.summarize(paper, language="spanish")
Returns a Summary object with:
paper: The original ArxivPapersummary: Generated summary textkey_points: List of key pointsmodel: Model used for summarization
thom.search(query, max_results=10, sort_by="relevance")
Search for papers on arXiv.
papers = thom.search("machine learning")
papers = thom.search("au:hinton", max_results=20)
papers = thom.search("cat:cs.LG", sort_by="submittedDate")
thom.compare(papers, model="gpt-4o-mini")
Generate a comparative analysis of multiple papers.
papers = [thom.fetch_paper(id) for id in ids]
analysis = thom.compare(papers)
Supported Models
thom uses LiteLLM for LLM support, which means you can use models from:
- OpenAI:
gpt-4o,gpt-4o-mini,gpt-4-turbo,gpt-3.5-turbo - Anthropic:
claude-3-5-sonnet-20241022,claude-3-opus-20240229,claude-3-sonnet-20240229 - Google:
gemini/gemini-1.5-pro,gemini/gemini-1.5-flash - Cohere:
command-r-plus,command-r - Ollama (local):
ollama/llama3,ollama/mistral,ollama/mixtral - Together AI:
together_ai/meta-llama/Llama-3-70b-chat-hf - And many more
List available models:
models = thom.list_supported_models()
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file thom-0.1.1.tar.gz.
File metadata
- Download URL: thom-0.1.1.tar.gz
- Upload date:
- Size: 18.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a558bd0b99c334c768d900c39be1f8d0387cd51512547e11429da6ce20ed4bd
|
|
| MD5 |
08893f4fc35bb16f756b9ee475427c08
|
|
| BLAKE2b-256 |
1ed5519de4d95bc3ba1caccbee02fc6a172dc7f4e150f8fcde3569f0f97d2edc
|
Provenance
The following attestation bundles were made for thom-0.1.1.tar.gz:
Publisher:
python-publish.yml on yanndebray/thom
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
thom-0.1.1.tar.gz -
Subject digest:
4a558bd0b99c334c768d900c39be1f8d0387cd51512547e11429da6ce20ed4bd - Sigstore transparency entry: 805026293
- Sigstore integration time:
-
Permalink:
yanndebray/thom@7d0e17791c2d4194368604d49d0a38c40805a39f -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/yanndebray
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@7d0e17791c2d4194368604d49d0a38c40805a39f -
Trigger Event:
release
-
Statement type:
File details
Details for the file thom-0.1.1-py3-none-any.whl.
File metadata
- Download URL: thom-0.1.1-py3-none-any.whl
- Upload date:
- Size: 13.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
454d6e00151da347186fcbce950f9a12c8178deb3aa3b9c6f63c3bcde68480d1
|
|
| MD5 |
55a875b27e63046822ae91fdfd5a82d6
|
|
| BLAKE2b-256 |
b3e91ebc834f5492d143b54158369a7f0dae10315de40b1f5a0e4fe29c58fc6e
|
Provenance
The following attestation bundles were made for thom-0.1.1-py3-none-any.whl:
Publisher:
python-publish.yml on yanndebray/thom
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
thom-0.1.1-py3-none-any.whl -
Subject digest:
454d6e00151da347186fcbce950f9a12c8178deb3aa3b9c6f63c3bcde68480d1 - Sigstore transparency entry: 805026294
- Sigstore integration time:
-
Permalink:
yanndebray/thom@7d0e17791c2d4194368604d49d0a38c40805a39f -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/yanndebray
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@7d0e17791c2d4194368604d49d0a38c40805a39f -
Trigger Event:
release
-
Statement type: