Skip to main content

论文被引画像分析工具 — 自动爬取施引文献、识别著名学者、生成可视化 HTML 报告

Project description

English | 中文

CitationClaw Logo

CitationClaw: A Lightweight Engine for Discovering Scientific Impact through Citations

让每一次引用都成为可解释的影响力
Turning Every Citation into Explainable Impact

Homepage PyPI PyPI Downloads Visitors PRs Welcome Issues Python Platform LLM ScraperAPI License: CC BY-NC 4.0

Turn Every Citation into Explainable Impact.
Input paper titles (or import from Google Scholar profiles), and generate a full citation portrait report in minutes.

🚀 Contribute with PRs

CitationClaw is community-driven and PR-friendly.

📢 News

  • 2026-03-18: Released beta v1.0.9 — Multi-paper dashboard dedup fix (title-based dedup key, correct KG edges); year-traverse no longer persisted across sessions; default parallel workers raised to 10; V-API Key registration link added; timeout log messages during LLM retries; SCOPE section scrollable with expand button; cache write throttling (every 10 items) to prevent large-file slowdowns.
  • 2026-03-18: Released beta v1.0.8 — UI improvements: default to Basic tier, maintenance notice for Advanced/Full, startup modal with usage notices, V-API Key labeling; fix SCOPE section showing incorrect citing paper count in multi-paper mode.
  • 2026-03-16: Released beta v1.0.7 — Reliability hardening: parallel quota-storm prevention (early-exit guards + 2h gather timeout), and empty/corrupt JSONL guards across all pipeline phases to ensure the pipeline always runs to completion.
  • 2026-03-15: Released beta v1.0.6 — English README as default, Chinese switch at top, and usage flow linked to Guidelines Quick Start.
  • 2026-03-14: Released v1.0.5 — AI assistant widgets for UI/report pages and reliability fixes.
  • 2026-03-14: Released v1.0.4 — improved UI and introduced Basic/Advanced/Full service tiers.
  • 2026-03-12: Released v1.0 — first public release.

Key Features

  • 🧠 Five-Phase Citation Pipeline: crawl -> author intelligence -> export -> citing description -> dashboard.
  • 🎯 Renowned Scholar Focus: auto-identifies high-impact scholars and generates dedicated outputs.
  • Tiered Analysis Modes: Basic / Advanced / Full for speed-cost-depth tradeoff.
  • 🔁 Resumable + Cache-Aware: supports resume-by-page, author cache, and citing-description cache.
  • 📊 Shareable HTML Report: standalone dashboard file, no extra server needed for viewing.
  • 🧩 Skills Runtime Inside: keeps five-phase logic while moving execution to modular skills.

🏗️ Architecture

CitationClaw keeps deterministic business phases while using a skills-style runtime for orchestration.

UI/REST/WebSocket
      │
      ▼
TaskExecutor (Orchestrator)
      │
      ▼
Skills Runtime
  ├─ phase1_citation_fetch
  ├─ phase2_author_intel
  ├─ phase3_export
  ├─ phase4_citation_desc
  └─ phase5_report_generate

More details: Technical Report

Table of Contents

📦 Install

Requires Python 3.10+ (Python 3.12 recommended).

Install from PyPI (recommended)

pip install citationclaw
citationclaw                  # default: 127.0.0.1:8000
citationclaw --port 8080      # custom port

Install from source

git clone https://github.com/VisionXLab/CitationClaw.git
cd CitationClaw
pip install -r requirements.txt
python start.py               # default: 127.0.0.1:8000
python start.py --port 8080

🚀 Quick Start

For first-time users, follow the complete guide with screenshots:

⚙️ Configuration Highlights

  • Required keys:
    • ScraperAPI Key(s) for Google Scholar crawling
    • OpenAI-compatible API Key for LLM-based analysis
  • Recommended search model:
    • Keep gemini-3-flash-preview-search for search-capable stages
  • Service tiers:
    • Basic: lower cost and faster for first runs
    • Advanced: citing descriptions for renowned-scholar papers only
    • Full: citing descriptions for all citing papers
  • For papers with >1000 citations:
    • Enable year traverse mode

📁 Project Structure

citationclaw/
├── app/                 # FastAPI app, task orchestration, config, logs
├── core/                # scraping / search / export / dashboard engines
├── skills/              # skills runtime and five phase skills
├── static/              # frontend assets
├── templates/           # Jinja2 pages
docs/                    # docs and demos
test/                    # tests

📤 Outputs

Each run creates a timestamped folder under data/result-{timestamp}/, usually including:

  • paper_results.xlsx
  • paper_results_all_renowned_scholar.xlsx
  • paper_results_top-tier_scholar.xlsx
  • paper_results_with_citing_desc.xlsx
  • paper_results.json
  • paper_dashboard.html

🤝 Contribute & Roadmap

PRs are welcome and appreciated.

Suggested directions:

  • richer skill metadata and registry conventions
  • stronger retry and network-failure resilience
  • dashboard readability and UX improvement
  • tests for pipeline contracts and compatibility
  • provider/model compatibility presets

Useful links:

🌍 Community

User Group QR

⭐ Star History

Star History Chart

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

citationclaw-1.0.9.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

citationclaw-1.0.9-py3-none-any.whl (1.4 MB view details)

Uploaded Python 3

File details

Details for the file citationclaw-1.0.9.tar.gz.

File metadata

  • Download URL: citationclaw-1.0.9.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for citationclaw-1.0.9.tar.gz
Algorithm Hash digest
SHA256 14b5f0e64447c806504a8a97fe1958c019e3a97c90ace1d8a71a195d8af8eac5
MD5 634700e3f0eca7025d71b8465596e358
BLAKE2b-256 518450283995aeeb5f1a5f9cae66630d1ae4e72d7762c13e51b2757fc4df523a

See more details on using hashes here.

File details

Details for the file citationclaw-1.0.9-py3-none-any.whl.

File metadata

  • Download URL: citationclaw-1.0.9-py3-none-any.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for citationclaw-1.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 411a4041f1b5fbd5c5aa546a81fa6b40a84b7475a16059270bbbf3eb189009aa
MD5 f2e2e6527de712b36d84e487ce7d05a9
BLAKE2b-256 3d34a0729c9ad1aef0c16e51f1737a7fe83a288db4d424bca823fc7273dc28e8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page