A Python package leveraging LLMs for research evidence synthesis

These details have not been verified by PyPI

Project links

Project description

res-sum

A Python package leveraging LLMs for research evidence synthesis

res-sum takes a folder of PDF research papers and produces structured summaries of each one using Large Language Models. It extracts text, builds a knowledge graph of entities and relationships across your papers, and uses hybrid retrieval (vector search + graph traversal) to produce contextually grounded summaries.

Built with ecology in mind, but works for any scientific field.

Features

Batch-summarize PDFs — point it at a folder, get a structured summary for each paper
Knowledge graph — extracts entities and relationships from your papers using LLMs, stored as a queryable NetworkX graph
Hybrid retrieval (GraphRAG) — combines vector similarity search (ChromaDB) with knowledge graph traversal
Domain-aware prompting — ecology-specific Chain-of-Thought prompts; custom domains via YAML
Multiple LLM providers — Ollama (local, free, default), Ollama Cloud, Groq, OpenAI, Anthropic
Multiple output formats — DOCX, JSON, CSV
Persistent storage — vector store + knowledge graph persist to disk; incremental ingestion for new papers

Installation

pip install res-sum

For additional LLM providers:

pip install res-sum[openai]        # OpenAI (GPT-4o)
pip install res-sum[anthropic]     # Anthropic (Claude)
pip install res-sum[ollama-cloud]  # Ollama Cloud API
pip install res-sum[all-providers] # All of the above

Default setup (Ollama — free, local, no API key)

If you have Ollama installed locally, res-sum works out of the box with no API key:

ollama pull llama3.2

That's it.

Quick start

Python API

from res_sum import ResSum

# Initialize (defaults: Ollama local, ecology domain)
rs = ResSum(
    llm_provider="ollama",       # or "ollama_cloud", "groq", "openai", "anthropic"
    domain="ecology",            # or "general", or path to custom YAML
)

# Ingest papers — extracts text, builds vector store + knowledge graph
rs.ingest_papers("./pdf_folder/")

# Summarize across all papers
summary = rs.summarize("What are the key findings on pollinator decline?")

# Or batch-summarize: one summary per paper, saved to disk
rs.summarize_papers(
    pdf_directory="./pdf_folder/",
    output_directory="./summaries/",
    output_format="docx",        # or "json", "csv"
)

Command line

# Batch summarize with Ollama (default)
res-sum summarize \
    --pdf_directory ./papers/ \
    --output_directory ./summaries/ \
    --domain ecology

# Use Groq instead (requires API key)
res-sum summarize \
    --pdf_directory ./papers/ \
    --output_directory ./summaries/ \
    --provider groq \
    --api_key $GROQ_API_KEY

# See available providers, models, and domains
res-sum info

LLM providers

Provider	API key needed	Rate limits	How to use
Ollama (default)	No	None (runs locally)	Install Ollama, pull a model
Ollama Cloud	Yes (`OLLAMA_API_KEY`)	Based on plan	`--provider ollama_cloud`
Groq	Yes (`GROQ_API_KEY`)	Free tier available	`--provider groq`
OpenAI	Yes (`OPENAI_API_KEY`)	Pay-per-use	`--provider openai`
Anthropic	Yes (`ANTHROPIC_API_KEY`)	Pay-per-use	`--provider anthropic`

API keys can be passed directly or set as environment variables. They are never stored by the package.

Setting up API keys

Option 1 — Environment variables (recommended):

# Add to your ~/.zshrc or ~/.bashrc
export OLLAMA_API_KEY="your-key-here"    # for Ollama Cloud
export GROQ_API_KEY="your-key-here"      # for Groq
export OPENAI_API_KEY="your-key-here"    # for OpenAI
export ANTHROPIC_API_KEY="your-key-here" # for Anthropic

Then just specify the provider — the key is picked up automatically:

rs = ResSum(llm_provider="ollama_cloud")

Option 2 — Pass directly:

rs = ResSum(
    llm_provider="ollama_cloud",
    api_key="your-ollama-cloud-key-here",
)

To get an Ollama Cloud API key, go to ollama.com/settings/keys.

Domain configurations

res-sum ships with two built-in domains:

ecology (default) — entity types: Species, Location, Method, Metric, Concept, Temporal. Includes ecology-specific section headers (Study Area, Field Methods, Statistical Analysis, etc.) and a 6-step Chain-of-Thought prompt.
general — broader entity types for any scientific field.

You can define your own domain with a YAML file:

# my_domain.yaml
name: biomedical
entity_types:
  - name: DRUG
    description: "Pharmaceutical compounds or treatments"
    examples: ["metformin", "aspirin"]
  - name: DISEASE
    description: "Medical conditions"
    examples: ["diabetes", "cancer"]
relationship_types:
  - TREATS
  - CAUSES
  - ASSOCIATED_WITH

rs = ResSum(domain="./my_domain.yaml")

Retrieval modes

Mode	What it does	Best for
`hybrid` (default)	Vector search + graph expansion + community context, re-ranked	General summarization
`local`	ChromaDB vector search only	Specific factual queries
`graph`	Graph traversal + vector lookup	Relational queries
`global`	Community-level summaries + vector search	Thematic synthesis across many papers

summary = rs.summarize("...", mode="hybrid")  # or "local", "graph", "global"

Explore your knowledge base

After ingesting papers, open an interactive dashboard to visualize and inspect everything:

rs.explore()  # opens in your browser

Or from the command line:

res-sum explore --data_dir ./knowledge_base

The dashboard has four tabs:

Overview — papers ingested, chunk counts, entity type breakdown, graph stats
Knowledge Graph — interactive graph visualization. Nodes colored by entity type, sized by connections. Click to see relationships, filter by type, search by name.
Vector Store — browse all text chunks by paper. See which section each chunk came from, expand to read full text.
Communities — entity clusters detected by the Leiden algorithm, with LLM-generated summaries explaining what connects each group.

It's a single HTML file — works offline, shareable with collaborators.

Programmatic access

# Query an entity
rs.query_graph("Canis lupus")

# Most connected entities
rs.get_central_entities(top_k=10)

# Community structure
rs.get_communities()

# Access the NetworkX graph directly
graph = rs.knowledge_graph.graph

The graph is saved as GraphML and can be imported into Neo4j or any graph visualization tool.

How it works

PDF files
  → Text extraction (pymupdf4llm — handles multi-column, tables)
  → Section detection (ecology-aware regex + Markdown headers)
  → Chunking (RecursiveCharacterTextSplitter)
  → ChromaDB (embed + store chunks)
  → LLM entity/relationship extraction → NetworkX knowledge graph
  → Community detection (Leiden/Louvain)
  → Hybrid retrieval (vector + graph + community)
  → LLM summarization (Chain-of-Thought prompting)
  → Output (DOCX / JSON / CSV)

All data persists to a data_dir/ folder. Adding new papers only processes what's new.

Requirements

Python >= 3.9
Ollama installed locally (for default provider), or an API key for another provider

Contributing

Issues and pull requests are welcome on GitHub.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0

Apr 3, 2026

0.2.4

Mar 29, 2026

0.2.3

Mar 26, 2026

0.2.2

Mar 26, 2026

0.2.1

Mar 26, 2026

0.2.0

Mar 26, 2026

0.1.1

Sep 24, 2024

0.1.0

Sep 22, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

res_sum-0.3.0.tar.gz (54.1 kB view details)

Uploaded Apr 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

res_sum-0.3.0-py3-none-any.whl (59.8 kB view details)

Uploaded Apr 3, 2026 Python 3

File details

Details for the file res_sum-0.3.0.tar.gz.

File metadata

Download URL: res_sum-0.3.0.tar.gz
Upload date: Apr 3, 2026
Size: 54.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for res_sum-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`21c6ef44d407b3a49fe2018d7cd89c82a9b80243ea1413a24f9a8366859fe3ee`
MD5	`2958eb09f50ec8158b0de5558610f0bd`
BLAKE2b-256	`f099e1f07b81628144e02830880bf575c618ecceecb4bb82a8539695fd76d95e`

See more details on using hashes here.

File details

Details for the file res_sum-0.3.0-py3-none-any.whl.

File metadata

Download URL: res_sum-0.3.0-py3-none-any.whl
Upload date: Apr 3, 2026
Size: 59.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for res_sum-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0cd1a9c9d5c0b47f6b4cc3018b55fb1a103ea5d027a127015bd770a25c9a400c`
MD5	`7c3e97d9bdfd03b0084a36ed636d9e2e`
BLAKE2b-256	`fe155de6e9eae861cc9e340c9c99138aa4d609526faff47d5102ead149cfac14`

See more details on using hashes here.

res-sum 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

res-sum

Features

Installation

Default setup (Ollama — free, local, no API key)

Quick start

Python API

Command line

LLM providers

Setting up API keys

Domain configurations

Retrieval modes

Explore your knowledge base

Programmatic access

How it works

Requirements

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes