Skip to main content

论文被引画像分析工具 — 自动爬取施引文献、识别著名学者、生成可视化 HTML 报告

Project description

English | 中文

CitationClaw Logo

CitationClaw: A Lightweight Engine for Discovering Scientific Impact through Citations

让每一次引用都成为可解释的影响力
Turning Every Citation into Explainable Impact

Homepage PyPI PyPI Downloads Visitors PRs Welcome Issues Python Platform LLM ScraperAPI License: CC BY-NC 4.0

Turn Every Citation into Explainable Impact.
Input paper titles (or import from Google Scholar profiles), and generate a full citation portrait report in minutes.

🚀 Contribute with PRs

CitationClaw is community-driven and PR-friendly.

📢 News

  • 2026-03-16: Released beta v1.0.7 — Reliability hardening: parallel quota-storm prevention (early-exit guards + 2h gather timeout), and empty/corrupt JSONL guards across all pipeline phases to ensure the pipeline always runs to completion.
  • 2026-03-15: Released beta v1.0.6 — English README as default, Chinese switch at top, and usage flow linked to Guidelines Quick Start.
  • 2026-03-14: Released v1.0.5 — AI assistant widgets for UI/report pages and reliability fixes.
  • 2026-03-14: Released v1.0.4 — improved UI and introduced Basic/Advanced/Full service tiers.
  • 2026-03-12: Released v1.0 — first public release.

Key Features

  • 🧠 Five-Phase Citation Pipeline: crawl -> author intelligence -> export -> citing description -> dashboard.
  • 🎯 Renowned Scholar Focus: auto-identifies high-impact scholars and generates dedicated outputs.
  • Tiered Analysis Modes: Basic / Advanced / Full for speed-cost-depth tradeoff.
  • 🔁 Resumable + Cache-Aware: supports resume-by-page, author cache, and citing-description cache.
  • 📊 Shareable HTML Report: standalone dashboard file, no extra server needed for viewing.
  • 🧩 Skills Runtime Inside: keeps five-phase logic while moving execution to modular skills.

🏗️ Architecture

CitationClaw keeps deterministic business phases while using a skills-style runtime for orchestration.

UI/REST/WebSocket
      │
      ▼
TaskExecutor (Orchestrator)
      │
      ▼
Skills Runtime
  ├─ phase1_citation_fetch
  ├─ phase2_author_intel
  ├─ phase3_export
  ├─ phase4_citation_desc
  └─ phase5_report_generate

More details: Technical Report

Table of Contents

📦 Install

Requires Python 3.10+ (Python 3.12 recommended).

Install from PyPI (recommended)

pip install citationclaw
citationclaw                  # default: 127.0.0.1:8000
citationclaw --port 8080      # custom port

Install from source

git clone https://github.com/VisionXLab/CitationClaw.git
cd CitationClaw
pip install -r requirements.txt
python start.py               # default: 127.0.0.1:8000
python start.py --port 8080

🚀 Quick Start

For first-time users, follow the complete guide with screenshots:

⚙️ Configuration Highlights

  • Required keys:
    • ScraperAPI Key(s) for Google Scholar crawling
    • OpenAI-compatible API Key for LLM-based analysis
  • Recommended search model:
    • Keep gemini-3-flash-preview-search for search-capable stages
  • Service tiers:
    • Basic: lower cost and faster for first runs
    • Advanced: citing descriptions for renowned-scholar papers only
    • Full: citing descriptions for all citing papers
  • For papers with >1000 citations:
    • Enable year traverse mode

📁 Project Structure

citationclaw/
├── app/                 # FastAPI app, task orchestration, config, logs
├── core/                # scraping / search / export / dashboard engines
├── skills/              # skills runtime and five phase skills
├── static/              # frontend assets
├── templates/           # Jinja2 pages
docs/                    # docs and demos
test/                    # tests

📤 Outputs

Each run creates a timestamped folder under data/result-{timestamp}/, usually including:

  • paper_results.xlsx
  • paper_results_all_renowned_scholar.xlsx
  • paper_results_top-tier_scholar.xlsx
  • paper_results_with_citing_desc.xlsx
  • paper_results.json
  • paper_dashboard.html

🤝 Contribute & Roadmap

PRs are welcome and appreciated.

Suggested directions:

  • richer skill metadata and registry conventions
  • stronger retry and network-failure resilience
  • dashboard readability and UX improvement
  • tests for pipeline contracts and compatibility
  • provider/model compatibility presets

Useful links:

🌍 Community

User Group QR

⭐ Star History

Star History Chart

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

citationclaw-1.0.7.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

citationclaw-1.0.7-py3-none-any.whl (1.4 MB view details)

Uploaded Python 3

File details

Details for the file citationclaw-1.0.7.tar.gz.

File metadata

  • Download URL: citationclaw-1.0.7.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for citationclaw-1.0.7.tar.gz
Algorithm Hash digest
SHA256 6765b342b79270f163a0e5deaf89711af2603145a5b981060155e38eeebfc54d
MD5 c4e16e09c2eeed6fed8b852f4ccbd19f
BLAKE2b-256 a1125c0efa56393bc18343e43648a0c2813a42c1d363610f19307381c8effd1b

See more details on using hashes here.

File details

Details for the file citationclaw-1.0.7-py3-none-any.whl.

File metadata

  • Download URL: citationclaw-1.0.7-py3-none-any.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for citationclaw-1.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 e3f547b75d606a1a111492cd53c404274e9d52a680c4b069d18fd460abafdcc7
MD5 97c6ddc9de39053b0472914fa2d8455e
BLAKE2b-256 d3edc4ecddc005f38a7410dd13bf129a0df40d2cb038bd9553d57cd8728afc8f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page