Research paper analysis pipeline with citation crawling, pluggable LLM prompts, and knowledge graph building
Project description
🦀 CrabScholar
Research paper analysis pipeline with citation crawling, pluggable LLM prompts, and knowledge graph building.
Features
- Multi-input: Analyze papers by title, DOI, keywords, URL, local PDF, or raw text
- Citation Crawling: BFS traversal of references/citations via Semantic Scholar API (configurable depth, default 3)
- 5 Default Analysis Dimensions (LLM Evaluation focus):
- Paper Analysis — overview, contributions, methodology
- Dataset Crafting — data creation, annotation, preprocessing
- Evaluation Method — benchmarks, baselines, evaluation setup
- Metrics — specific metrics, reported results
- Statistical Tests — significance tests, confidence intervals, rigor
- Pluggable Prompts: Add YAML files for custom dimensions, override defaults
- Knowledge Graph: NetworkX-based graph with paper/author/method/dataset/metric entities
- Multi-Provider LLM: Via LiteLLM — OpenAI, Anthropic, Ollama, vLLM, etc. with fallback chain
- Export: JSON, GraphML, GEXF, CSV
Installation
uv sync
Quick Start
# Initialize project config
uv run crab init
# Edit .env with your API key
nano .env
# Analyze a paper by title
uv run crab analyze "attention is all you need"
# Search by keywords
uv run crab analyze --keywords "LLM evaluation, benchmark contamination"
# Analyze a local PDF
uv run crab analyze --pdf paper.pdf
# Control crawl depth
uv run crab analyze "GPT-4 Technical Report" --depth 5
# Search without analyzing
uv run crab search "transformer evaluation"
# Build knowledge graph from results
uv run crab build
# Export graph
uv run crab export json
uv run crab export graphml
uv run crab export csv
# List analysis dimensions
uv run crab dimensions
# Show config
uv run crab info
Configuration
Settings load from: CLI flags > env vars (CRAB_ prefix) > .env > crab.yaml > defaults.
# crab.yaml
default_model: openai/gpt-4o-mini
fallback_models:
- openai/gpt-3.5-turbo
- anthropic/claude-3-haiku-20240307
citation_depth: 3
max_papers: 50
output: output
concurrency: 4
Custom Prompts
Create YAML files in a custom directory:
# my_prompts/bias_analysis.yaml
name: bias_analysis
display_name: "Bias Analysis"
description: "Analyze papers for bias in LLM evaluation"
system_message: "You are a bias analysis expert..."
extraction_prompt: |
Analyze the paper for potential biases...
Paper: {title}
Text: {paper_text}
...
Then use: uv run crab analyze "paper" --prompts-dir my_prompts/
Python API
from crab_scholar.pipeline import run_pipeline
from crab_scholar.config import CrabConfig
config = CrabConfig(
default_model="openai/gpt-4o-mini",
citation_depth=3,
)
kg = run_pipeline(input_query="attention is all you need", config=config)
print(f"Entities: {kg.entity_count}, Relations: {kg.relation_count}")
Architecture
Input (query/DOI/PDF/text)
↓
Scholar API → Resolve paper
↓
BFS Crawler → Expand citations/references (depth=N)
↓
Fetcher → Download PDFs, extract text
↓
Analyzer → Run pluggable dimensions (5 defaults)
↓
Graph Builder → Entities + Relations → NetworkX
↓
Export → JSON / GraphML / GEXF / CSV
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crab_scholar-0.2.0.tar.gz.
File metadata
- Download URL: crab_scholar-0.2.0.tar.gz
- Upload date:
- Size: 178.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
78731181d0fc3dc40c4011973708354f67f22e79410c871daef8f7de9bea09c0
|
|
| MD5 |
9c9ed23dc25176b6e8e6bba5a535f45e
|
|
| BLAKE2b-256 |
4d0e92295e8c4116fdad68e63c40275ffbb8b77e9abcf8c14ef5171008ea69c5
|
File details
Details for the file crab_scholar-0.2.0-py3-none-any.whl.
File metadata
- Download URL: crab_scholar-0.2.0-py3-none-any.whl
- Upload date:
- Size: 49.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9919efd3084b24ecdae750307d97b43f59cd24d6b9fa45cbfa7a1be952759243
|
|
| MD5 |
9eb7b18b923f9679397b9357baafb5ae
|
|
| BLAKE2b-256 |
b8766ff71eb39725d8427a84d01c040c050e76206c5c577e8d005ca100cd1728
|