A Python library for building and analyzing academic paper citation trees
Project description
Paper Tree
A Python library for building and analyzing academic paper citation trees using the Semantic Scholar API.
Features
- 🌳 Build Citation Trees: Construct citation networks starting from any paper
- 🔍 Automatic Deduplication: Efficiently handle papers cited multiple times
- 📊 Flexible Depth Control: Customize how deep to traverse the citation network
- 💾 Multiple Export Formats: Save to JSON or PostgreSQL
- 🚀 Rate Limit Handling: Built-in retry logic and rate limiting
- 📈 Rich Metadata: Track depth, citations, authors, and references
Installation
Basic Installation
pip install paper-tree
With PostgreSQL Support
pip install paper-tree[postgres]
Development Installation
git clone https://github.com/clodlingxi/paper_tree.git
cd paper_tree
pip install -e .[dev]
Quick Start
Building a Citation Tree
from paper_tree import CitationTreeBuilder
# Initialize builder with API key (optional but recommended)
builder = CitationTreeBuilder(api_key="your_semantic_scholar_api_key")
# Build citation tree starting from a paper
tree = builder.build_tree("ARXIV:1706.03762", max_depth=2)
print(f"Built tree with {len(tree)} papers")
print(f"Root paper: {tree.root_title}")
Exporting to JSON
from paper_tree import JSONExporter
tree = {""} # Your Json
# Export to JSON file
exporter = JSONExporter()
exporter.export(tree, "citation_tree.json")
Exporting to PostgreSQL
from paper_tree import PostgreSQLExporter
tree = {""} # Your Json
# Configure database connection
db_exporter = PostgreSQLExporter(
host="localhost",
database="paper_db",
user="postgres",
password="your_password",
table_name="citation_tree"
)
# Export to database
db_exporter.export(tree, drop_existing=True)
Usage Examples
Basic Citation Tree
from paper_tree import CitationTreeBuilder, JSONExporter
# Create builder
builder = CitationTreeBuilder(api_key="your_api_key")
# Build tree from "Attention is All you Need" paper
tree = builder.build_tree("ARXIV:1706.03762", max_depth=2)
# Get statistics
stats = tree.get_statistics()
print(f"Total papers: {stats['total_papers']}")
print(f"Max depth: {stats['max_depth']}")
print(f"Papers by depth: {stats['papers_by_depth']}")
# Export to JSON
JSONExporter.export(tree, "attention_tree.json")
Building Multiple Trees
from paper_tree import CitationTreeBuilder
builder = CitationTreeBuilder(api_key="your_api_key")
# Build trees from multiple root papers
root_papers = [
"ARXIV:1706.03762", # Attention is All you Need
"ARXIV:1512.03385", # ResNet
"ARXIV:1409.0473", # GoogLeNet
]
trees = builder.build_tree_from_multiple_roots(root_papers, max_depth=1)
for tree in trees:
print(f"{tree.root_title}: {len(tree)} papers")
Working with Papers
tree = {""} # Your Json
# Get root paper
root = tree.get_root_paper()
print(f"Title: {root.title}")
print(f"Year: {root.year}")
print(f"Citations: {root.citation_count}")
# Get papers at specific depth
depth_1_papers = tree.get_papers_by_depth(1)
print(f"Found {len(depth_1_papers)} papers at depth 1")
# Check if paper exists in tree
if "some_paper_id" in tree:
paper = tree.get_paper("some_paper_id")
print(f"Found: {paper.title}")
Context Manager Usage
from paper_tree import CitationTreeBuilder
# Use context manager for automatic cleanup
with CitationTreeBuilder(api_key="your_api_key") as builder:
tree = builder.build_tree("ARXIV:1706.03762", max_depth=2)
# API session is automatically closed
API Reference
CitationTreeBuilder
Main class for building citation trees.
Parameters:
api_key(str, optional): Semantic Scholar API keyrate_limit_delay(float): Delay between API requests in seconds (default: 1.5)max_retries(int): Maximum retry attempts for failed requests (default: 3)
Methods:
build_tree(root_paper_id, max_depth=2, verbose=True): Build a citation treebuild_tree_from_multiple_roots(root_paper_ids, max_depth=2, verbose=True): Build multiple treesclose(): Close the API session
CitationTree
Represents a citation tree structure.
Properties:
root_title: Title of the root papersize: Total number of papersmax_depth: Maximum depth in the tree
Methods:
get_paper(paper_id): Get a paper by IDget_papers_by_depth(depth): Get all papers at a specific depthget_root_paper(): Get the root paperto_dict(): Convert to dictionary formatget_statistics(): Get tree statistics
JSONExporter
Export citation trees to JSON format.
Methods:
export(tree, filename, indent=2, ensure_ascii=False): Export tree to JSON fileload(filename): Load tree from JSON file (returns dict)
PostgreSQLExporter
Export citation trees to PostgreSQL database.
Parameters:
host(str): Database host (default: 'localhost')port(int): Database port (default: 5432)database(str): Database name (default: 'paper_db')user(str): Database user (default: 'postgres')password(str): Database passwordtable_name(str): Table name (default: 'citation_tree')
Methods:
export(tree, drop_existing=False, verbose=True): Export tree to database
Data Structure
Paper Object
Each paper contains:
paper_id: Unique identifiertitle: Paper titleyear: Publication yearcitation_count: Number of citationsabstract: Paper abstractauthors: List of authors (with id and name)depth: Depth in the citation tree (0 = root)references: List of referenced paper IDs
Database Schema
When exporting to PostgreSQL, the following table is created:
CREATE TABLE citation_tree (
paper_id VARCHAR(255) PRIMARY KEY,
title TEXT,
year INTEGER,
citation_count INTEGER,
abstract TEXT,
authors JSONB,
depth INTEGER NOT NULL,
"references" JSONB,
root_title TEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Configuration
API Rate Limits
The library automatically handles Semantic Scholar API rate limits:
- Default delay: 1.5 seconds between requests
- Automatic retry with exponential backoff on 429 errors
- Batch requests up to 500 papers per call
Getting an API Key
While optional, using an API key provides higher rate limits:
- Visit Semantic Scholar API
- Sign up for an API key
- Use it when initializing the builder:
from paper_tree import CitationTreeBuilder
builder = CitationTreeBuilder(api_key="your_api_key_here")
Examples
See the examples/ directory for more detailed examples:
examples/basic_usage.py: Basic citation tree buildingexamples/export_json.py: JSON export exampleexamples/export_postgres.py: PostgreSQL export exampleexamples/analyze_tree.py: Tree analysis and statistics
Requirements
- Python >= 3.8
- requests >= 2.28.0
- psycopg2-binary >= 2.9.0 (for PostgreSQL support)
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Citation
If you use this library in your research, please cite:
@software{paper_tree,
title = {Paper Tree: A Python Library for Citation Tree Analysis},
author = {Paper Tree Contributors},
year = {2025},
url = {https://github.com/clodlingxi/paper_tree}
}
Acknowledgments
- Data provided by Semantic Scholar API
- Inspired by the need for better citation network analysis tools
Support
- 📫 Issues: GitHub Issues
- 📖 Documentation: GitHub Wiki
- 💬 Discussions: GitHub Discussions
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file paper_tree-0.1.0.tar.gz.
File metadata
- Download URL: paper_tree-0.1.0.tar.gz
- Upload date:
- Size: 21.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d380384f91e66a3cb74b7d5f478cf4a2f4b91d4a105a7d49fdd99ec7fafe42f4
|
|
| MD5 |
963a9416b2a2145ba985a37ff1923e9b
|
|
| BLAKE2b-256 |
26afd917dee204a05955413758c4c30104dd2e0f083519d3987262fb20222c02
|
File details
Details for the file paper_tree-0.1.0-py3-none-any.whl.
File metadata
- Download URL: paper_tree-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
202e5ffa4e931f05f1d5f3f42e787a22f21ed6841e86e9e936ae26effbcc7719
|
|
| MD5 |
1113d59a794deb45ec854393918c3fa3
|
|
| BLAKE2b-256 |
0402191ba2587c67d42b29f44dadae1defee9c9fcebcf1dcb6f246ce85b2888d
|