A Python package leveraging LLMs for research evidence synthesis
Project description
res-sum
A Python package leveraging LLMs for research evidence synthesis
res-sum takes a folder of PDF research papers and produces structured summaries of each one using Large Language Models. It extracts text, builds a knowledge graph of entities and relationships across your papers, and uses hybrid retrieval (vector search + graph traversal) to produce contextually grounded summaries.
Built with ecology in mind, but works for any scientific field.
Features
- Batch-summarize PDFs — point it at a folder, get a structured summary for each paper
- Knowledge graph — extracts entities and relationships from your papers using LLMs, stored as a queryable NetworkX graph
- Hybrid retrieval (GraphRAG) — combines vector similarity search (ChromaDB) with knowledge graph traversal
- Domain-aware prompting — ecology-specific Chain-of-Thought prompts; custom domains via YAML
- Multiple LLM providers — Ollama (local, free, default), Ollama Cloud, Groq, OpenAI, Anthropic
- Multiple output formats — DOCX, JSON, CSV
- Persistent storage — vector store + knowledge graph persist to disk; incremental ingestion for new papers
Installation
pip install res-sum
For additional LLM providers:
pip install res-sum[openai] # OpenAI (GPT-4o)
pip install res-sum[anthropic] # Anthropic (Claude)
pip install res-sum[ollama-cloud] # Ollama Cloud API
pip install res-sum[all-providers] # All of the above
Default setup (Ollama — free, local, no API key)
If you have Ollama installed locally, res-sum works out of the box with no API key:
ollama pull llama3.2
That's it.
Quick start
Python API
from res_sum import ResSum
# Initialize (defaults: Ollama local, ecology domain)
rs = ResSum(
llm_provider="ollama", # or "ollama_cloud", "groq", "openai", "anthropic"
domain="ecology", # or "general", or path to custom YAML
)
# Ingest papers — extracts text, builds vector store + knowledge graph
rs.ingest_papers("./pdf_folder/")
# Summarize across all papers
summary = rs.summarize("What are the key findings on pollinator decline?")
# Or batch-summarize: one summary per paper, saved to disk
rs.summarize_papers(
pdf_directory="./pdf_folder/",
output_directory="./summaries/",
output_format="docx", # or "json", "csv"
)
Command line
# Batch summarize with Ollama (default)
res-sum summarize \
--pdf_directory ./papers/ \
--output_directory ./summaries/ \
--domain ecology
# Use Groq instead (requires API key)
res-sum summarize \
--pdf_directory ./papers/ \
--output_directory ./summaries/ \
--provider groq \
--api_key $GROQ_API_KEY
# See available providers, models, and domains
res-sum info
LLM providers
| Provider | API key needed | Rate limits | How to use |
|---|---|---|---|
| Ollama (default) | No | None (runs locally) | Install Ollama, pull a model |
| Ollama Cloud | Yes (OLLAMA_API_KEY) |
Based on plan | --provider ollama_cloud |
| Groq | Yes (GROQ_API_KEY) |
Free tier available | --provider groq |
| OpenAI | Yes (OPENAI_API_KEY) |
Pay-per-use | --provider openai |
| Anthropic | Yes (ANTHROPIC_API_KEY) |
Pay-per-use | --provider anthropic |
API keys can be passed directly or set as environment variables. They are never stored by the package.
Setting up API keys
Option 1 — Environment variables (recommended):
# Add to your ~/.zshrc or ~/.bashrc
export OLLAMA_API_KEY="your-key-here" # for Ollama Cloud
export GROQ_API_KEY="your-key-here" # for Groq
export OPENAI_API_KEY="your-key-here" # for OpenAI
export ANTHROPIC_API_KEY="your-key-here" # for Anthropic
Then just specify the provider — the key is picked up automatically:
rs = ResSum(llm_provider="ollama_cloud")
Option 2 — Pass directly:
rs = ResSum(
llm_provider="ollama_cloud",
api_key="your-ollama-cloud-key-here",
)
To get an Ollama Cloud API key, go to ollama.com/settings/keys.
Domain configurations
res-sum ships with two built-in domains:
ecology(default) — entity types: Species, Location, Method, Metric, Concept, Temporal. Includes ecology-specific section headers (Study Area, Field Methods, Statistical Analysis, etc.) and a 6-step Chain-of-Thought prompt.general— broader entity types for any scientific field.
You can define your own domain with a YAML file:
# my_domain.yaml
name: biomedical
entity_types:
- name: DRUG
description: "Pharmaceutical compounds or treatments"
examples: ["metformin", "aspirin"]
- name: DISEASE
description: "Medical conditions"
examples: ["diabetes", "cancer"]
relationship_types:
- TREATS
- CAUSES
- ASSOCIATED_WITH
rs = ResSum(domain="./my_domain.yaml")
Retrieval modes
| Mode | What it does | Best for |
|---|---|---|
hybrid (default) |
Vector search + graph expansion + community context, re-ranked | General summarization |
local |
ChromaDB vector search only | Specific factual queries |
graph |
Graph traversal + vector lookup | Relational queries |
global |
Community-level summaries + vector search | Thematic synthesis across many papers |
summary = rs.summarize("...", mode="hybrid") # or "local", "graph", "global"
Explore your knowledge base
After ingesting papers, open an interactive dashboard to visualize and inspect everything:
rs.explore() # opens in your browser
Or from the command line:
res-sum explore --data_dir ./knowledge_base
The dashboard has four tabs:
- Overview — papers ingested, chunk counts, entity type breakdown, graph stats
- Knowledge Graph — interactive graph visualization. Nodes colored by entity type, sized by connections. Click to see relationships, filter by type, search by name.
- Vector Store — browse all text chunks by paper. See which section each chunk came from, expand to read full text.
- Communities — entity clusters detected by the Leiden algorithm, with LLM-generated summaries explaining what connects each group.
It's a single HTML file — works offline, shareable with collaborators.
Programmatic access
# Query an entity
rs.query_graph("Canis lupus")
# Most connected entities
rs.get_central_entities(top_k=10)
# Community structure
rs.get_communities()
# Access the NetworkX graph directly
graph = rs.knowledge_graph.graph
The graph is saved as GraphML and can be imported into Neo4j or any graph visualization tool.
How it works
PDF files
→ Text extraction (pymupdf4llm — handles multi-column, tables)
→ Section detection (ecology-aware regex + Markdown headers)
→ Chunking (RecursiveCharacterTextSplitter)
→ ChromaDB (embed + store chunks)
→ LLM entity/relationship extraction → NetworkX knowledge graph
→ Community detection (Leiden/Louvain)
→ Hybrid retrieval (vector + graph + community)
→ LLM summarization (Chain-of-Thought prompting)
→ Output (DOCX / JSON / CSV)
All data persists to a data_dir/ folder. Adding new papers only processes what's new.
Requirements
- Python >= 3.9
- Ollama installed locally (for default provider), or an API key for another provider
Contributing
Issues and pull requests are welcome on GitHub.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file res_sum-0.3.0.tar.gz.
File metadata
- Download URL: res_sum-0.3.0.tar.gz
- Upload date:
- Size: 54.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21c6ef44d407b3a49fe2018d7cd89c82a9b80243ea1413a24f9a8366859fe3ee
|
|
| MD5 |
2958eb09f50ec8158b0de5558610f0bd
|
|
| BLAKE2b-256 |
f099e1f07b81628144e02830880bf575c618ecceecb4bb82a8539695fd76d95e
|
File details
Details for the file res_sum-0.3.0-py3-none-any.whl.
File metadata
- Download URL: res_sum-0.3.0-py3-none-any.whl
- Upload date:
- Size: 59.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0cd1a9c9d5c0b47f6b4cc3018b55fb1a103ea5d027a127015bd770a25c9a400c
|
|
| MD5 |
7c3e97d9bdfd03b0084a36ed636d9e2e
|
|
| BLAKE2b-256 |
fe155de6e9eae861cc9e340c9c99138aa4d609526faff47d5102ead149cfac14
|