Skip to main content

A Python tool to rank Google Scholar publications by citations.

Project description

Sort Google Scholar by the Number of Citations

PyPI Version

sortgs is a Python tool for ranking Google Scholar publications by the number of citations. It is useful for finding relevant papers in a specific field. The data acquired from Google Scholar includes Title, Citations, Links, Rank, and a new column with the number of citations per year. In the background, it first try to fetch results using python requests. If it fails, it will use selenium to fetch the results.

🚀 Run it on Google Colab

  • No-Code Version (new!): No coding required! Perfect for a quick start!
  • Code Version: For developers who want full control of what's behind the scenes! 💻

💡 All you need is a Google Account to get started.
⚠️ Note: Google Scholar may block access after too many repetitive requests due to CAPTCHA checks, so proceed mindfully!

📚 Colab No-Code Instructions

https://github.com/user-attachments/assets/25de7bad-2a5d-4bcf-b486-faa1d7a29eb3

Installation

You can install sortgs directly using pip:

pip install sortgs

This will install the latest version of sortgs and its dependencies.

Usage

Once installed, you can run sortgs directly from the command line:

sortgs "your keyword"

Replace "your keyword" with any keyword you'd like to search for. A CSV file with the name your_keyword.csv will be created in your current directory.

Misc

For a feedback, send me an email: fernando [dot] wittmann [at] gmail [dot] com

Command Line Arguments

usage: sortgs [-h] [--sortby SORTBY] [--nresults NRESULTS] [--csvpath CSVPATH]
              [--notsavecsv] [--plotresults] [--startyear STARTYEAR]
              [--endyear ENDYEAR] [--debug] kw

positional arguments:
  kw                    Keyword to be searched. Use double quote followed by
                        simple quote for an exact keyword. 
                        Example: sortgs "'exact keyword'"

optional arguments:
  -h, --help            show this help message and exit
  --sortby SORTBY       Column to be sorted by. Default is "Citations". To sort
                        by citations per year, use --sortby "cit/year"
  --langfilter LANGFILTER [LANGFILTER ...]
                        Only languages listed are permitted to pass the filter. 
                        List of supported language codes: zh-CN, zh-TW, nl, en, fr,
                        de, it, ja, ko, pl, pt, es, tr
  --nresults NRESULTS   Number of articles to search on Google Scholar. Default
                        is 100. (careful with robot checking if value is high)
  --csvpath CSVPATH     Path to save the exported csv file. Default is the 
                        current folder
  --notsavecsv          By default, results are exported to a csv file. Select
                        this option to just print results but not store them
  --plotresults         Use this flag to plot results with the original rank on
                        the x-axis and the number of citations on the y-axis.
                        Default is False
  --startyear STARTYEAR
                        Start year when searching. Default is None
  --endyear ENDYEAR     End year when searching. Default is current year
  --debug               Debug mode. Used for unit testing. It will get pages
                        stored on web archive

Examples

  1. Default Search:

    sortgs "machine learning"
    

    This command searches for the top 100 results related to "machine learning" and saves them as a CSV file.

  2. Sort by Citations per Year:

    sortgs "machine learning" --sortby "cit/year"
    

    Search for "machine learning" and sort by the number of citations per year.

  3. Specify Date Range:

    sortgs "machine learning" --startyear 2005 --endyear 2015
    

    Search for papers from 2005 to 2015.

  4. Search for an Exact Keyword:

    sortgs "'machine learning'"
    
  5. Save Results in a Specific Path:

    sortgs 'neural networks' --csvpath './examples/'
    

    This will save the results under a subfolder called 'examples'.

  6. Multiple Keywords:

    sortgs '"deep learning" OR "neural networks" OR "machine learning"' --sortby "cit/year"
    
  7. Language Filter:

    sortgs "machine learning" --langfilter pt es fr de
    

    This will only include articles in Portuguese, Spanish, French, and German.

Output Example

While running, sortgs will provide updates in the terminal:

❯ sortgs "'machine learning'"
Running with the following parameters:
Keyword: 'machine learning', Number of results: 100, Save database: True, Path: /Users/wittmann/sort-google-scholar, Sort by: Citations, Plot results: False, Start year: None, End year: 2023, Debug: False
Loading next 10 results
Loading next 20 results
...

Step-by-Step Installation

  1. Install Python 3 and its dependencies from Requirements (suggestion: use Ananconda https://www.anaconda.com/distribution/)
  2. In the terminal (or cmd if using Windows), run pip install sortgs
  3. Use the command sortgs "your keyword" (replace "your keyword" to any keyword that you'd like to search)
  4. A CSV file with the name your_keyword.csv should be created.

If those steps are too complicated for you, send me an email with a list of keyworks that you'd like them ranked to: fernando [dot] wittmann [at] gmail [dot] com

Conda Environment Setup

Creating the Environment

conda env create -f conda_environment.yml

Reset the environment

conda deactivate
conda remove --name sortgs --all
conda env create -f environment.yml

Activate the environment

conda activate sortgs

Running Project Using Docker

This guide will walk you through the process of installing Docker, pulling the fernandowittmann/sort-google-scholar Docker image, and running the project.

Step 1: Install Docker

Windows or Mac

  1. Download Docker Desktop: Go to the Docker Desktop website and download the appropriate installer for your operating system.
  2. Install Docker Desktop: Run the installer and follow the on-screen instructions.
  3. Verify Installation: Open a terminal (or command prompt on Windows) and run docker --version to verify that Docker has been installed successfully.

Linux

  1. Update Package Index: Run sudo apt-get update to update your package index.
  2. Install Docker: Run sudo apt-get install docker-ce docker-ce-cli containerd.io to install Docker.
  3. Start Docker: Run sudo systemctl start docker to start the Docker daemon.
  4. Verify Installation: Run docker --version to ensure Docker is installed correctly.

Step 2: Pull the Docker Image

  1. Pull Image: Run the following command to pull the fernandowittmann/sort-google-scholar image from Docker Hub:

    docker pull fernandowittmann/sort-google-scholar
    

Step 3: Run the Project

  1. Create a Results Directory: Create a directory on your host machine where you want the results to be saved. For example, mkdir ~/results.

  2. Run the Docker Container: Use the following command to run the container. This command mounts your results directory to the /results directory in the container and starts the sorting process for Google Scholar results based on your specified parameters.

    docker run -v "$PWD/results:/results" -it fernandowittmann/sort-google-scholar ./sortgs.py --kw "machine learning" --sortby "cit/year" --csvpath /results
    

    Replace $PWD/results with the absolute path to your results directory if you are not in the parent directory of results.

Contributing

Just run:

$python -m unittest

And check if all tests passes. Alternativelly send a PR, github actions will run the tests for you.

About Robot Check

Google Scholar may block access after too many repetitive requests due to CAPTCHA checks. If this issue arrises, selenium will be used to attempt to fetch the results. You might be asked to solve a CAPTCHA manually. Ideally, you should use a VPN to avoid this issue. When using selenium, you might need to install chromedriver. You can download it from https://developer.chrome.com/docs/chromedriver/downloads and add it to your PATH.

LICENSE

  • MIT

Support My Work

If you find this project useful, consider supporting me:

Buy Me a Coffee

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sortgs-1.0.6.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

sortgs-1.0.6-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file sortgs-1.0.6.tar.gz.

File metadata

  • Download URL: sortgs-1.0.6.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for sortgs-1.0.6.tar.gz
Algorithm Hash digest
SHA256 6e9bc09c8c28b0fad30f0e5cc02af3e97a62dc254784f801fc9aaafcf18e1054
MD5 1314eaeaebc792d6d8f6a459fe57ca75
BLAKE2b-256 0b9f32985d10bef11d2bcd95be649a00adbf52633c1733150553a1d62d0da971

See more details on using hashes here.

File details

Details for the file sortgs-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: sortgs-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for sortgs-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 fdf9c437d714c69cc1b6cfa16af06dbfc7013abfb44727afbfcd09caff9c4016
MD5 ef4621e049d7c37bf665f0a3bed505b3
BLAKE2b-256 fddb752c43b7eb4ebdb91d9b2fbcaa453bdfa6ebb44d3acfff39fc734d774c36

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page