Skip to main content

Research paper analysis pipeline with citation crawling, pluggable LLM prompts, and knowledge graph building

Project description

🦀 CrabScholar

Research paper analysis pipeline with citation crawling, pluggable LLM prompts, and knowledge graph building.

Features

  • Multi-input: Analyze papers by title, DOI, keywords, URL, local PDF, or raw text
  • Citation Crawling: BFS traversal of references/citations via Semantic Scholar API (configurable depth, default 3)
  • 5 Default Analysis Dimensions (LLM Evaluation focus):
    1. Paper Analysis — overview, contributions, methodology
    2. Dataset Crafting — data creation, annotation, preprocessing
    3. Evaluation Method — benchmarks, baselines, evaluation setup
    4. Metrics — specific metrics, reported results
    5. Statistical Tests — significance tests, confidence intervals, rigor
  • Pluggable Prompts: Add YAML files for custom dimensions, override defaults
  • Knowledge Graph: NetworkX-based graph with paper/author/method/dataset/metric entities
  • Multi-Provider LLM: Via LiteLLM — OpenAI, Anthropic, Ollama, vLLM, etc. with fallback chain
  • Export: JSON, GraphML, GEXF, CSV

Installation

uv sync

Quick Start

# Initialize project config
uv run crab init

# Edit .env with your API key
nano .env

# Analyze a paper by title
uv run crab analyze "attention is all you need"

# Search by keywords
uv run crab analyze --keywords "LLM evaluation, benchmark contamination"

# Analyze a local PDF
uv run crab analyze --pdf paper.pdf

# Control crawl depth
uv run crab analyze "GPT-4 Technical Report" --depth 5

# Search without analyzing
uv run crab search "transformer evaluation"

# Build knowledge graph from results
uv run crab build

# Export graph
uv run crab export json
uv run crab export graphml
uv run crab export csv

# List analysis dimensions
uv run crab dimensions

# Show config
uv run crab info

Configuration

Settings load from: CLI flags > env vars (CRAB_ prefix) > .env > crab.yaml > defaults.

# crab.yaml
default_model: openai/gpt-4o-mini
fallback_models:
  - openai/gpt-3.5-turbo
  - anthropic/claude-3-haiku-20240307

citation_depth: 3
max_papers: 50
output: output
concurrency: 4

Custom Prompts

Create YAML files in a custom directory:

# my_prompts/bias_analysis.yaml
name: bias_analysis
display_name: "Bias Analysis"
description: "Analyze papers for bias in LLM evaluation"
system_message: "You are a bias analysis expert..."
extraction_prompt: |
  Analyze the paper for potential biases...
  Paper: {title}
  Text: {paper_text}
  ...

Then use: uv run crab analyze "paper" --prompts-dir my_prompts/

Python API

from crab_scholar.pipeline import run_pipeline
from crab_scholar.config import CrabConfig

config = CrabConfig(
    default_model="openai/gpt-4o-mini",
    citation_depth=3,
)

kg = run_pipeline(input_query="attention is all you need", config=config)
print(f"Entities: {kg.entity_count}, Relations: {kg.relation_count}")

Architecture

Input (query/DOI/PDF/text)
    ↓
Scholar API → Resolve paper
    ↓
BFS Crawler → Expand citations/references (depth=N)
    ↓
Fetcher → Download PDFs, extract text
    ↓
Analyzer → Run pluggable dimensions (5 defaults)
    ↓
Graph Builder → Entities + Relations → NetworkX
    ↓
Export → JSON / GraphML / GEXF / CSV

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crab_scholar-0.1.0.tar.gz (178.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crab_scholar-0.1.0-py3-none-any.whl (49.1 kB view details)

Uploaded Python 3

File details

Details for the file crab_scholar-0.1.0.tar.gz.

File metadata

  • Download URL: crab_scholar-0.1.0.tar.gz
  • Upload date:
  • Size: 178.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for crab_scholar-0.1.0.tar.gz
Algorithm Hash digest
SHA256 963c45ea82fca8dbbd5285a2aca215d49b6b9c7691c7a714a59ae4356d8501b4
MD5 a56ff6b07e496dd183e88d543e9aea24
BLAKE2b-256 7de2d4ad73748a3a27ba3ad85756664498894f063381c20fa45f73eed54de243

See more details on using hashes here.

File details

Details for the file crab_scholar-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: crab_scholar-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 49.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for crab_scholar-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a568a703a573872e575228852efad779313170de28ec961ef47ed0a8248a533c
MD5 e8741d8cd4f97053ce8cd33fb69b062f
BLAKE2b-256 4a7ad86bfd040206aff1ef868215a5965a60414bf8e7d7f994a06b2de1c5a388

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page