Find Google Scholar Profiles
Project description
ai-scholar-toolbox
The python package provides an efficient way to get statistics of a scholar on Google Scholar given academic information of the scholar.
Install package
pip install ai-scholar-toolbox
Download Browser Binary and Browser Driver
By default, our package uses Chromium binary file. Please take care the compatibility between the binary file and the browser driver. Also, if your OS is not linux based or you install the browser in a directory other than default directory, please refer to Selenium Chrome requirements when instantiating the browser driver.
Download:
Install:
-
Linux
sudo apt update sudo apt upgrade wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb sudo apt install ./google-chrome-stable_current_amd64.deb
Get Started in ai-scholar-toolbox
-
Instantiate a
ScholarSearch
object. This will download 78k dataset to the local machine automatically.from ScholarSearch import ScholarSearch scholar_search = ScholarSearch()
-
Set attributes for the class:
# set the similarity ratio of comparing two strings when searching on Google Scholar webpage. If not given, default is 0.8. scholar_search.similarity_ratio = 0.8 # set the path of browser driver. scholar_search.driver_path = '../../chromedriver' # required: setup scholar_search.setup()
Optional: In case that you want to get responses of a list of scholars, the class method
get_profiles()
is implemented for you to load (could be multiple) json data files.# optional scholar_search.get_profiles(['../review_data/area_chair_id_to_profile.json', '../review_data/reviewer_id_to_profile.json'])
-
Search candidate scholars by matching a specific query:
If you want to input the information of a scholar on OpenReview and get related google scholar information, you can pass in a python dictionary with necessary features based on the OpenReview scholar profile page (for instance, from Zhijing's OpenReview profile). Note that filling in more information as recommended below will get a better search result. TODO: change the dict to be of a certain person.
# keys that are required: # scholar_info_dict['content']['gscholar']: the link to Google Scholar profile in the OpenReview webpage. If cannot be found, you can either choose not to include it or pass in an empty string. # scholar_info_dict['content']['history']: the most updated history of the scholar in the OpenReview webpage. Previous history is not needed. # scholar_info_dict['content']['relations']: all relations that the scholar list in the OpenReview webpage. We recommend to list all the relations here. Only name is needed. # scholar_info_dict['content']['expertise']: all keywords that the scholar label their academic research field. We recommend to list all the expertise keywords here. Only keyword is needed. # Most recommended: scholar_info_dict = { "profile": { "id": "~Zhijing_Jin1", # most important information to use "content": { "gscholar": "https://scholar.google.com/citations?user=RkI8h-wAAAAJ", "history": [ # second most important information to use { "position": "PhD student", "institution": { "domain": "mpg.de", "name": "Max-Planck Institute" } } ], "relations": [ { "name": "Bernhard Schoelkopf" }, { "name": "Rada Mihalcea" }, { "name": "Mrinmaya Sachan" }, { "name": "Ryan Cotterell" } ], "expertise": [ { "keywords": [ "causal inference" ] }, { "keywords": [ "computational social science" ] }, { "keywords": [ "social good" ] }, { "keywords": [ "natural language processing" ] } ] } } } # Minimum required but least recommended: scholar_info_dict = { "profile": { "id": "~Zhijing_Jin1", "content": {} } }
Then, you can pass the dictionary to the method
get_scholar()
to get possible candidates.# query: python dictionary that you just generated. resp = scholar_search.get_scholar(query=scholar_info_dict, simple=True, top_n=3, print_true=True) resp
Alternatively, if you just want to input the name of a scholar and get possible google scholar candidates, you can pass the name as a string directly to the function as the following:
# query: python str, the name of the scholar. resp = scholar_search.get_scholar(query='Zhijing Jin', simple=True, top_n=3, print_true=True) resp
Search Algorithms
The algorithm can be explained as follows if the input query is a python dictionary:
def get_candidates(openreview_dict, top_n_related):
if gs_sid in openreview_dict:
if gs_sid in 78k_scholar:
return dict(78k_scholar.loc[78k_scholar[‘gs_sid’]==gs_sid])
else:
response = search_directly_on_google_scholar_by_gssid(gs_sid)
return response
else:
name, email_suffix, position, organization, relations = extract_name_from_openreview_dict(openreview_dict)
response_78k = search_scholar_on_78k(name)
response_gs = search_scholar_on_google_scholar(name, email_suffix, position, organization, relations)
response = select_final_candidates(response_78k, response_gs, top_n_related = top_n_related)
return Response
Statistics Summary
Our 78k dataset has 78,066 AI scholars in total. Please check our 78k AI scholar dataset for more details.
Given all the chairs and reviewers in OpenReview (664 in total), our package achieves 93.02% precision, 85.11% recall, and 88.89% F1-score on a random subset of 50 scholars that don't have gs_sid
included in the input dict.
FAQ
TODO: add content
Support
If you have any questions, bug reports, or feature requests regarding either the codebase or the models released in the projects section, please don't hesitate to post on our Github Issues page.
License
The package is licensed under the MIT license. TODO: check licenses
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ai_scholar_toolbox-0.0.1.tar.gz
.
File metadata
- Download URL: ai_scholar_toolbox-0.0.1.tar.gz
- Upload date:
- Size: 15.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.7.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a3d5bfbe626208d6d1b3efccd9442c9b8e85dad06aded7c001eb0d4e03c4f928 |
|
MD5 | 11e57cc3f2e326d0bff801391c042db2 |
|
BLAKE2b-256 | 418c9b3888fb451495969c8b1b1d71eda8182af93029b6101c2c2d58ce53e4a8 |
File details
Details for the file ai_scholar_toolbox-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: ai_scholar_toolbox-0.0.1-py3-none-any.whl
- Upload date:
- Size: 14.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.7.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08708c7dafca60fd47c74af0622c026b2a294dba6738f8af737373b123e6dfd1 |
|
MD5 | 5fa170b7a5e2996b0df50716c808fe3a |
|
BLAKE2b-256 | f14b71b5a30fbe558bf3fc9e34b799c5b09913eb9890afdd5b7c4fa792476e35 |