A Python tool to rank Google Scholar publications by citations.
Project description
Sort Google Scholar by the Number of Citations
sortgs is a Python tool for ranking Google Scholar publications by the number of citations. It is useful for finding relevant papers in a specific field. The data acquired from Google Scholar includes Title, Citations, Links, Rank, and a new column with the number of citations per year. In the background, it first try to fetch results using python requests. If it fails, it will use selenium to fetch the results.
Try on Google Colab:
- No install requirements! Limitations: Can't handle robot checking, so use it carefully.
Installation
You can now install sortgs
directly using pip
:
pip install sortgs
This will install the latest version of sortgs
and its dependencies.
Usage
Once installed, you can run sortgs
directly from the command line:
sortgs "your keyword"
Replace "your keyword"
with any keyword you'd like to search for. A CSV file with the name your_keyword.csv
will be created in your current directory.
Misc
If this project was helpful to you in any way, feel free to buy me a cup of coffee :)
For a feedback, send me an email: fernando [dot] wittmann [at] gmail [dot] com
Command Line Arguments
usage: sortgs [-h] [--sortby SORTBY] [--nresults NRESULTS] [--csvpath CSVPATH]
[--notsavecsv] [--plotresults] [--startyear STARTYEAR]
[--endyear ENDYEAR] [--debug] kw
positional arguments:
kw Keyword to be searched. Use double quote followed by
simple quote for an exact keyword.
Example: sortgs "'exact keyword'"
optional arguments:
-h, --help show this help message and exit
--sortby SORTBY Column to be sorted by. Default is "Citations". To sort
by citations per year, use --sortby "cit/year"
--nresults NRESULTS Number of articles to search on Google Scholar. Default
is 100. (careful with robot checking if value is high)
--csvpath CSVPATH Path to save the exported csv file. Default is the
current folder
--notsavecsv By default, results are exported to a csv file. Select
this option to just print results but not store them
--plotresults Use this flag to plot results with the original rank on
the x-axis and the number of citations on the y-axis.
Default is False
--startyear STARTYEAR
Start year when searching. Default is None
--endyear ENDYEAR End year when searching. Default is current year
--debug Debug mode. Used for unit testing. It will get pages
stored on web archive
Examples
-
Default Search:
sortgs "machine learning"
This command searches for the top 100 results related to "machine learning" and saves them as a CSV file.
-
Sort by Citations per Year:
sortgs "machine learning" --sortby "cit/year"
Search for "machine learning" and sort by the number of citations per year.
-
Specify Date Range:
sortgs "machine learning" --startyear 2005 --endyear 2015
Search for papers from 2005 to 2015.
-
Search for an Exact Keyword:
sortgs "'machine learning'"
-
Save Results in a Specific Path:
sortgs 'neural networks' --csvpath './examples/'
This will save the results under a subfolder called 'examples'.
-
Multiple Keywords:
sortgs '"deep learning" OR "neural networks" OR "machine learning"' --sortby "cit/year"
Output Example
While running, sortgs
will provide updates in the terminal:
❯ sortgs "'machine learning'"
Running with the following parameters:
Keyword: 'machine learning', Number of results: 100, Save database: True, Path: /Users/wittmann/sort-google-scholar, Sort by: Citations, Plot results: False, Start year: None, End year: 2023, Debug: False
Loading next 10 results
Loading next 20 results
...
Step-by-Step Installation
- Install Python 3 and its dependencies from Requirements (suggestion: use Ananconda https://www.anaconda.com/distribution/)
- In the terminal (or cmd if using Windows), run
pip install sortgs
- Use the command
sortgs "your keyword"
(replace "your keyword" to any keyword that you'd like to search) - A CSV file with the name
your_keyword.csv
should be created.
If those steps are too complicated for you, send me an email with a list of keyworks that you'd like them ranked to: fernando [dot] wittmann [at] gmail [dot] com
Requirements
If you install anaconda, all of those requirements (except selenium) are going to be met:
- Python 2.7 or Python 3
- Install from the requirements file:
pip install -r requirements.txt
Highly suggested, if having problems with robot checking:
- ChromeDriver: http://chromedriver.chromium.org/
- After downloading chromedriver, rename it to
chromedriver
and add it in a folder accessible by the PATH (Example: your python directory. Mine is at/Users/.../anaconda/bin/
)
- After downloading chromedriver, rename it to
Running Project Using Docker
This guide will walk you through the process of installing Docker, pulling the fernandowittmann/sort-google-scholar
Docker image, and running the project.
Step 1: Install Docker
Windows or Mac
- Download Docker Desktop: Go to the Docker Desktop website and download the appropriate installer for your operating system.
- Install Docker Desktop: Run the installer and follow the on-screen instructions.
- Verify Installation: Open a terminal (or command prompt on Windows) and run
docker --version
to verify that Docker has been installed successfully.
Linux
- Update Package Index: Run
sudo apt-get update
to update your package index. - Install Docker: Run
sudo apt-get install docker-ce docker-ce-cli containerd.io
to install Docker. - Start Docker: Run
sudo systemctl start docker
to start the Docker daemon. - Verify Installation: Run
docker --version
to ensure Docker is installed correctly.
Step 2: Pull the Docker Image
-
Pull Image: Run the following command to pull the
fernandowittmann/sort-google-scholar
image from Docker Hub:docker pull fernandowittmann/sort-google-scholar
Step 3: Run the Project
-
Create a Results Directory: Create a directory on your host machine where you want the results to be saved. For example,
mkdir ~/results
. -
Run the Docker Container: Use the following command to run the container. This command mounts your results directory to the
/results
directory in the container and starts the sorting process for Google Scholar results based on your specified parameters.docker run -v "$PWD/results:/results" -it fernandowittmann/sort-google-scholar ./sortgs.py --kw "machine learning" --sortby "cit/year" --csvpath /results
Replace
$PWD/results
with the absolute path to your results directory if you are not in the parent directory ofresults
.
Contributing
In order to make contributions, all of the tests must be passed. In order to test the code, we will be using the DEBUG mode which is going to use a URL from web archive. Please make sure to save the URL you want to test on web archive in case it is different from the one I already saved. By default it only works in debug mode when using the keywords 'machine learning'. There are 6 tests and all of them are testing different aspects that should match when using SortGS. In order to run the test cases, just run:
$python -m unittest
LICENSE
- MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.