Skip to main content

Google Scholar self-citation analyzer

Project description

Scholar Citations

A Python tool for analyzing self-citation patterns in Google Scholar profiles.

Overview

Scholar Citations is a powerful tool that helps researchers and evaluators analyze self-citation patterns in Google Scholar profiles. Self-citations (when authors cite their own previous work) are a normal part of academic publishing, but excessive self-citation can sometimes skew metrics like h-index and citation counts.

This tool allows you to:

  • Analyze any Google Scholar profile to identify self-citations
  • Calculate self-citation percentages and metrics
  • Generate detailed reports of self-citation patterns
  • Estimate self-citation counts for highly-cited papers using sampling

Installation

pip install scholar-citations

Requirements

  • Python 3.7 or higher
  • Google Chrome browser (for Selenium automation)

Usage

Basic Usage

scholar-citations "https://scholar.google.com/citations?user=USER_ID"

Replace USER_ID with the ID from the Google Scholar profile URL you want to analyze.

Advanced Options

# Analyze only the first 20 papers
scholar-citations "https://scholar.google.com/citations?user=USER_ID" --max-papers 20

# Check only 10 citations per paper (for faster analysis)
scholar-citations "https://scholar.google.com/citations?user=USER_ID" --max-citations 10

# Save detailed results to a JSON file
scholar-citations "https://scholar.google.com/citations?user=USER_ID" --output results.json

# Show the browser window (useful for solving CAPTCHAs)
scholar-citations "https://scholar.google.com/citations?user=USER_ID" --visible

# Enable debug logging
scholar-citations "https://scholar.google.com/citations?user=USER_ID" --debug

Command Help

scholar-citations --help                                           
usage: scholar-citations [-h] [--max-papers MAX_PAPERS] [--max-citations MAX_CITATIONS] [--output OUTPUT] [--visible] [--debug] url

Analyze self-citations on Google Scholar

positional arguments:
  url                   Google Scholar profile URL

optional arguments:
  -h, --help            show this help message and exit
  --max-papers MAX_PAPERS
                        Maximum number of papers to analyze
  --max-citations MAX_CITATIONS
                        Maximum number of citations to check per paper
  --output OUTPUT       Output file for detailed results (JSON)
  --visible             Show browser window during analysis
  --debug               Enable debug logging

Features

  • Anti-detection measures: Uses sophisticated browser fingerprinting techniques to avoid detection
  • Robust author matching: Intelligently matches different formats of author names to detect self-citations
  • Progress saving: Saves intermediate results to avoid losing progress if the process is interrupted
  • Sampling: For papers with many citations, examines a representative sample and extrapolates results
  • Detailed reporting: Provides both summary statistics and detailed paper-by-paper analysis
  • CAPTCHA handling: When run with --visible, allows you to solve CAPTCHAs if they appear

Example Output

======= RESULTS =======
Author: Rahul Vishwakarma
Papers analyzed: 103 of 103
Total citations: 129
Self-citations: 21
Self-citation percentage: 16.28%

Self-citation examples (first 5):
1. Original: System and method for efficient backup system awar...
   Citing: System and method for efficient backup system awar...
2. Original: Risk-Aware and Explainable Framework for Ensuring ...
   Citing: Uncertainty-Aware Unimodal and Multimodal Learning...
3. Original: Risk-Aware and Explainable Framework for Ensuring ...
   Citing: Uncertainty-Aware Hardware Trojan Detection Using ...
4. Original: Risk-Aware and Explainable Framework for Ensuring ...
   Citing: Reconfigurable Run-Time Hardware Trojan Mitigation...
5. Original: Risk-Aware and Explainable Framework for Ensuring ...
   Citing: Towards Uncertainty-Aware Hardware Trojan Detectio...

... and 16 more self-citations

How It Works

  1. The tool visits the specified Google Scholar profile
  2. It extracts the list of publications by the author
  3. For each publication, it analyzes the "Cited by" list
  4. It compares author lists to identify overlaps (self-citations)
  5. It calculates statistics and generates a report

Development

Setup Development Environment

git clone https://github.com/yourusername/scholar_citations.git
cd scholar_citations
pip install -e .

Running Tests

pip install pytest
pytest tests/
================================================================== test session starts ==================================================================
platform darwin -- Python 3.9.21, pytest-8.3.4, pluggy-1.5.0
rootdir: /Users/rahul/Downloads/scholar_citations
configfile: pyproject.toml
plugins: cov-6.0.0
collected 3 items                                                                                                                                       

tests/test_analyzer.py ...                                                                                                                        [100%]

=================================================================== 3 passed in 0.03s ===================================================================

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this tool in your research, please cite it as:

Vishwakarma, R. (2025). Scholar Citations: A tool for analyzing self-citation patterns in Google Scholar profiles. [Software]. Available from https://pypi.org/project/scholar-citations/

Disclaimer

This tool is meant for academic and research purposes only. Please use responsibly and respect Google Scholar's terms of service. The tool includes rate limiting and anti-detection features to minimize impact on Google's servers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scholar_citations-0.1.3.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scholar_citations-0.1.3-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file scholar_citations-0.1.3.tar.gz.

File metadata

  • Download URL: scholar_citations-0.1.3.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.9.21

File hashes

Hashes for scholar_citations-0.1.3.tar.gz
Algorithm Hash digest
SHA256 519e1af9b0997e16b234b7609d5383742f8781ecbb65d99d73d4eec117b29dae
MD5 15678102fa830d59b7a81dbd1ba9e4d6
BLAKE2b-256 28f164efd887af4e8b99bd43cf7bb2a29387ba6a05544af7f0967d82d7e7335c

See more details on using hashes here.

File details

Details for the file scholar_citations-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for scholar_citations-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a9fc5654dfaa7c399452c227a55130d16b8d8743eeaed097f4e14145f62bac36
MD5 96d2828f848002156a48f1e5bad65fe4
BLAKE2b-256 0fac23c05b87a23f4f3c3624dbaeb6fca0fa546b7f26a12c69f4ff213304103c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page