Skip to main content

Summarize research papers from arXiv using LLMs

Project description

thom

Summarize research papers from arXiv using LLMs.

Named after René Thom (1923-2002), the French mathematician who founded catastrophe theory and won the Fields Medal in 1958.

Installation

pip install thom

Or install from source:

git clone https://github.com/thom-project/thom.git
cd thom
pip install -e .

Quick Start

Python Library

import thom

# Fetch a paper from arXiv
paper = thom.fetch_paper("2301.00001")
print(paper.title)
print(paper.abstract)

# Summarize the paper
summary = thom.summarize(paper)
print(summary.summary)
print(summary.key_points)

# Search for papers
papers = thom.search("transformer attention mechanism", max_results=5)
for p in papers:
    print(f"{p.arxiv_id}: {p.title}")

# Compare multiple papers
papers = [thom.fetch_paper(id) for id in ["2301.00001", "2301.00002"]]
analysis = thom.compare(papers)
print(analysis)

Command Line

# Summarize a paper
thom summarize 2301.00001

# Or use a URL
thom summarize https://arxiv.org/abs/2301.00001

# Fetch paper metadata only
thom fetch 2301.00001

# Search for papers
thom search "machine learning"

# Compare papers
thom compare 2301.00001 2301.00002

# Use different models
thom summarize 2301.00001 --model gpt-4o
thom summarize 2301.00001 --model claude-3-5-sonnet-20241022
thom summarize 2301.00001 --model ollama/llama3

# Adjust detail level
thom summarize 2301.00001 --detail brief
thom summarize 2301.00001 --detail detailed

# Output in different languages
thom summarize 2301.00001 --language french

# JSON output
thom summarize 2301.00001 --json

# List supported models
thom models

Configuration

API Keys

Set your API key via environment variables:

# OpenAI
export OPENAI_API_KEY="sk-..."

# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

# Or set programmatically
import thom
thom.set_api_key("openai", "sk-...")

Using Local Models with Ollama

# First, install and run Ollama
ollama run llama3

# Then use with thom
thom summarize 2301.00001 --model ollama/llama3
import thom

paper = thom.fetch_paper("2301.00001")
summary = thom.summarize(paper, model="ollama/llama3")

API Reference

Core Functions

thom.fetch_paper(identifier)

Fetch a paper from arXiv by ID or URL.

paper = thom.fetch_paper("2301.00001")
paper = thom.fetch_paper("https://arxiv.org/abs/2301.00001")
paper = thom.fetch_paper("https://arxiv.org/pdf/2301.00001.pdf")

Returns an ArxivPaper object with:

  • arxiv_id: The arXiv identifier
  • title: Paper title
  • authors: List of author names
  • abstract: Paper abstract
  • categories: arXiv categories
  • published: Publication date
  • pdf_url: URL to PDF
  • arxiv_url: URL to arXiv page

thom.summarize(paper, model="gpt-4o-mini", detail_level="medium", language="english")

Generate a summary of a paper.

summary = thom.summarize(paper)
summary = thom.summarize(paper, model="gpt-4o", detail_level="detailed")
summary = thom.summarize(paper, language="spanish")

Returns a Summary object with:

  • paper: The original ArxivPaper
  • summary: Generated summary text
  • key_points: List of key points
  • model: Model used for summarization

thom.search(query, max_results=10, sort_by="relevance")

Search for papers on arXiv.

papers = thom.search("machine learning")
papers = thom.search("au:hinton", max_results=20)
papers = thom.search("cat:cs.LG", sort_by="submittedDate")

thom.compare(papers, model="gpt-4o-mini")

Generate a comparative analysis of multiple papers.

papers = [thom.fetch_paper(id) for id in ids]
analysis = thom.compare(papers)

Supported Models

thom uses LiteLLM for LLM support, which means you can use models from:

  • OpenAI: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo
  • Anthropic: claude-3-5-sonnet-20241022, claude-3-opus-20240229, claude-3-sonnet-20240229
  • Google: gemini/gemini-1.5-pro, gemini/gemini-1.5-flash
  • Cohere: command-r-plus, command-r
  • Ollama (local): ollama/llama3, ollama/mistral, ollama/mixtral
  • Together AI: together_ai/meta-llama/Llama-3-70b-chat-hf
  • And many more

List available models:

models = thom.list_supported_models()

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thom-0.1.1.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thom-0.1.1-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file thom-0.1.1.tar.gz.

File metadata

  • Download URL: thom-0.1.1.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for thom-0.1.1.tar.gz
Algorithm Hash digest
SHA256 4a558bd0b99c334c768d900c39be1f8d0387cd51512547e11429da6ce20ed4bd
MD5 08893f4fc35bb16f756b9ee475427c08
BLAKE2b-256 1ed5519de4d95bc3ba1caccbee02fc6a172dc7f4e150f8fcde3569f0f97d2edc

See more details on using hashes here.

Provenance

The following attestation bundles were made for thom-0.1.1.tar.gz:

Publisher: python-publish.yml on yanndebray/thom

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file thom-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: thom-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 13.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for thom-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 454d6e00151da347186fcbce950f9a12c8178deb3aa3b9c6f63c3bcde68480d1
MD5 55a875b27e63046822ae91fdfd5a82d6
BLAKE2b-256 b3e91ebc834f5492d143b54158369a7f0dae10315de40b1f5a0e4fe29c58fc6e

See more details on using hashes here.

Provenance

The following attestation bundles were made for thom-0.1.1-py3-none-any.whl:

Publisher: python-publish.yml on yanndebray/thom

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page