Google Scholar self-citation analyzer
Project description
Scholar Citations
A Python tool for analyzing self-citation patterns in Google Scholar profiles.
Overview
Scholar Citations is a powerful tool that helps researchers and evaluators analyze self-citation patterns in Google Scholar profiles. Self-citations (when authors cite their own previous work) are a normal part of academic publishing, but excessive self-citation can sometimes skew metrics like h-index and citation counts.
This tool allows you to:
- Analyze any Google Scholar profile to identify self-citations
- Calculate self-citation percentages and metrics
- Generate detailed reports of self-citation patterns
- Estimate self-citation counts for highly-cited papers using sampling
Installation
pip install scholar-citations
Requirements
- Python 3.7 or higher
- Google Chrome browser (for Selenium automation)
Usage
Basic Usage
scholar-citations "https://scholar.google.com/citations?user=USER_ID"
Replace USER_ID with the ID from the Google Scholar profile URL you want to analyze.
Advanced Options
# Analyze only the first 20 papers
scholar-citations "https://scholar.google.com/citations?user=USER_ID" --max-papers 20
# Check only 10 citations per paper (for faster analysis)
scholar-citations "https://scholar.google.com/citations?user=USER_ID" --max-citations 10
# Save detailed results to a JSON file
scholar-citations "https://scholar.google.com/citations?user=USER_ID" --output results.json
# Show the browser window (useful for solving CAPTCHAs)
scholar-citations "https://scholar.google.com/citations?user=USER_ID" --visible
# Enable debug logging
scholar-citations "https://scholar.google.com/citations?user=USER_ID" --debug
Command Help
scholar-citations --help
usage: scholar-citations [-h] [--max-papers MAX_PAPERS] [--max-citations MAX_CITATIONS] [--output OUTPUT] [--visible] [--debug] url
Analyze self-citations on Google Scholar
positional arguments:
url Google Scholar profile URL
optional arguments:
-h, --help show this help message and exit
--max-papers MAX_PAPERS
Maximum number of papers to analyze
--max-citations MAX_CITATIONS
Maximum number of citations to check per paper
--output OUTPUT Output file for detailed results (JSON)
--visible Show browser window during analysis
--debug Enable debug logging
Features
- Anti-detection measures: Uses sophisticated browser fingerprinting techniques to avoid detection
- Robust author matching: Intelligently matches different formats of author names to detect self-citations
- Progress saving: Saves intermediate results to avoid losing progress if the process is interrupted
- Sampling: For papers with many citations, examines a representative sample and extrapolates results
- Detailed reporting: Provides both summary statistics and detailed paper-by-paper analysis
- CAPTCHA handling: When run with
--visible, allows you to solve CAPTCHAs if they appear
Example Output
======= RESULTS =======
Author: Rahul Vishwakarma
Papers analyzed: 103 of 103
Total citations: 129
Self-citations: 21
Self-citation percentage: 16.28%
Self-citation examples (first 5):
1. Original: System and method for efficient backup system awar...
Citing: System and method for efficient backup system awar...
2. Original: Risk-Aware and Explainable Framework for Ensuring ...
Citing: Uncertainty-Aware Unimodal and Multimodal Learning...
3. Original: Risk-Aware and Explainable Framework for Ensuring ...
Citing: Uncertainty-Aware Hardware Trojan Detection Using ...
4. Original: Risk-Aware and Explainable Framework for Ensuring ...
Citing: Reconfigurable Run-Time Hardware Trojan Mitigation...
5. Original: Risk-Aware and Explainable Framework for Ensuring ...
Citing: Towards Uncertainty-Aware Hardware Trojan Detectio...
... and 16 more self-citations
How It Works
- The tool visits the specified Google Scholar profile
- It extracts the list of publications by the author
- For each publication, it analyzes the "Cited by" list
- It compares author lists to identify overlaps (self-citations)
- It calculates statistics and generates a report
Development
Setup Development Environment
git clone https://github.com/yourusername/scholar_citations.git
cd scholar_citations
pip install -e .
Running Tests
pip install pytest
pytest tests/
================================================================== test session starts ==================================================================
platform darwin -- Python 3.9.21, pytest-8.3.4, pluggy-1.5.0
rootdir: /Users/rahul/Downloads/scholar_citations
configfile: pyproject.toml
plugins: cov-6.0.0
collected 3 items
tests/test_analyzer.py ... [100%]
=================================================================== 3 passed in 0.03s ===================================================================
License
This project is licensed under the MIT License - see the LICENSE file for details.
Citation
If you use this tool in your research, please cite it as:
Vishwakarma, R. (2025). Scholar Citations: A tool for analyzing self-citation patterns in Google Scholar profiles. [Software]. Available from https://pypi.org/project/scholar-citations/
Disclaimer
This tool is meant for academic and research purposes only. Please use responsibly and respect Google Scholar's terms of service. The tool includes rate limiting and anti-detection features to minimize impact on Google's servers.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scholar_citations-0.1.3.tar.gz.
File metadata
- Download URL: scholar_citations-0.1.3.tar.gz
- Upload date:
- Size: 14.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
519e1af9b0997e16b234b7609d5383742f8781ecbb65d99d73d4eec117b29dae
|
|
| MD5 |
15678102fa830d59b7a81dbd1ba9e4d6
|
|
| BLAKE2b-256 |
28f164efd887af4e8b99bd43cf7bb2a29387ba6a05544af7f0967d82d7e7335c
|
File details
Details for the file scholar_citations-0.1.3-py3-none-any.whl.
File metadata
- Download URL: scholar_citations-0.1.3-py3-none-any.whl
- Upload date:
- Size: 15.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9fc5654dfaa7c399452c227a55130d16b8d8743eeaed097f4e14145f62bac36
|
|
| MD5 |
96d2828f848002156a48f1e5bad65fe4
|
|
| BLAKE2b-256 |
0fac23c05b87a23f4f3c3624dbaeb6fca0fa546b7f26a12c69f4ff213304103c
|