Skip to main content

Find Google Scholar Profiles

Project description

ai-scholar-toolbox

The python package provides an efficient way to get statistics of a scholar on Google Scholar given academic information of the scholar.

Install package

pip install ai-scholar-toolbox

Download Browser Binary and Browser Driver

By default, our package uses Chromium binary file. Please take care the compatibility between the binary file and the browser driver. Also, if your OS is not linux based or you install the browser in a directory other than default directory, please refer to Selenium Chrome requirements when instantiating the browser driver.

Download:

Install:

  • Linux

    sudo apt update
    sudo apt upgrade
    wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
    sudo apt install ./google-chrome-stable_current_amd64.deb
    

Get Started in ai-scholar-toolbox

  1. Instantiate a ScholarSearch object. This will download 78k dataset to the local machine automatically.

    from ScholarSearch import ScholarSearch
    scholar_search = ScholarSearch()
    
  2. Set attributes for the class:

    # set the similarity ratio of comparing two strings when searching on Google Scholar webpage. If not given, default is 0.8.
    scholar_search.similarity_ratio = 0.8
    # set the path of browser driver.
    scholar_search.driver_path = '../../chromedriver'
    # required: setup
    scholar_search.setup()
    

    Optional: In case that you want to get responses of a list of scholars, the class method get_profiles() is implemented for you to load (could be multiple) json data files.

    # optional
    scholar_search.get_profiles(['../review_data/area_chair_id_to_profile.json', '../review_data/reviewer_id_to_profile.json'])
    
  3. Search candidate scholars by matching a specific query:

    If you want to input the information of a scholar on OpenReview and get related google scholar information, you can pass in a python dictionary with necessary features based on the OpenReview scholar profile page (for instance, from Zhijing's OpenReview profile). Note that filling in more information as recommended below will get a better search result. TODO: change the dict to be of a certain person.

    # keys that are required:
    # scholar_info_dict['content']['gscholar']: the link to Google Scholar profile in the OpenReview webpage. If cannot be found, you can either choose not to include it or pass in an empty string.
    # scholar_info_dict['content']['history']: the most updated history of the scholar in the OpenReview webpage. Previous history is not needed.
    # scholar_info_dict['content']['relations']: all relations that the scholar list in the OpenReview webpage. We recommend to list all the relations here. Only name is needed.
    # scholar_info_dict['content']['expertise']: all keywords that the scholar label their academic research field. We recommend to list all the expertise keywords here. Only keyword is needed.
    
    # Most recommended:
    scholar_info_dict = {
       "profile": {
           "id": "~Zhijing_Jin1", # most important information to use
           "content": {
               "gscholar": "https://scholar.google.com/citations?user=RkI8h-wAAAAJ",
               "history": [ # second most important information to use
                   {
                       "position": "PhD student",
                       "institution": {
                           "domain": "mpg.de",
                           "name": "Max-Planck Institute"
                       }
                   }
               ],
               "relations": [
                   {
                       "name": "Bernhard Schoelkopf"
                   },
                   {
                       "name": "Rada Mihalcea"
                   },
                   {
                       "name": "Mrinmaya Sachan"
                   },
                   {
                       "name": "Ryan Cotterell"
                   }
               ],
               "expertise": [
                   {
                       "keywords": [
                         "causal inference"
                       ]
                   },
                   {
                       "keywords": [
                         "computational social science"
                       ]
                   },
                   {
                       "keywords": [
                         "social good"
                       ]
                   },
                   {
                       "keywords": [
                         "natural language processing"
                       ]
                   }
               ]
           }
       }
    }
    
    # Minimum required but least recommended:
    scholar_info_dict = {
       "profile": {
           "id": "~Zhijing_Jin1",
           "content": {}
       }
    }
    

    Then, you can pass the dictionary to the method get_scholar() to get possible candidates.

    # query: python dictionary that you just generated.
    resp = scholar_search.get_scholar(query=scholar_info_dict, simple=True, top_n=3, print_true=True)
    resp
    

    Alternatively, if you just want to input the name of a scholar and get possible google scholar candidates, you can pass the name as a string directly to the function as the following:

    # query: python str, the name of the scholar.
    resp = scholar_search.get_scholar(query='Zhijing Jin', simple=True, top_n=3, print_true=True)
    resp
    

Search Algorithms

The algorithm can be explained as follows if the input query is a python dictionary:

def get_candidates(openreview_dict, top_n_related):
  if gs_sid in openreview_dict:
    if gs_sid in 78k_scholar:
      return dict(78k_scholar.loc[78k_scholar[gs_sid]==gs_sid])
    else:
      response = search_directly_on_google_scholar_by_gssid(gs_sid)
      return response
  else:
      name, email_suffix, position, organization, relations = extract_name_from_openreview_dict(openreview_dict)
      response_78k = search_scholar_on_78k(name) 
      response_gs = search_scholar_on_google_scholar(name, email_suffix, position, organization, relations)
      response = select_final_candidates(response_78k, response_gs, top_n_related = top_n_related)
      return Response

Statistics Summary

Our 78k dataset has 78,066 AI scholars in total. Please check our 78k AI scholar dataset for more details.

Given all the chairs and reviewers in OpenReview (664 in total), our package achieves 93.02% precision, 85.11% recall, and 88.89% F1-score on a random subset of 50 scholars that don't have gs_sid included in the input dict.

FAQ

TODO: add content

Support

If you have any questions, bug reports, or feature requests regarding either the codebase or the models released in the projects section, please don't hesitate to post on our Github Issues page.

License

The package is licensed under the MIT license. TODO: check licenses

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_scholar_toolbox-0.0.1.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

ai_scholar_toolbox-0.0.1-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file ai_scholar_toolbox-0.0.1.tar.gz.

File metadata

  • Download URL: ai_scholar_toolbox-0.0.1.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.13

File hashes

Hashes for ai_scholar_toolbox-0.0.1.tar.gz
Algorithm Hash digest
SHA256 a3d5bfbe626208d6d1b3efccd9442c9b8e85dad06aded7c001eb0d4e03c4f928
MD5 11e57cc3f2e326d0bff801391c042db2
BLAKE2b-256 418c9b3888fb451495969c8b1b1d71eda8182af93029b6101c2c2d58ce53e4a8

See more details on using hashes here.

File details

Details for the file ai_scholar_toolbox-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for ai_scholar_toolbox-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 08708c7dafca60fd47c74af0622c026b2a294dba6738f8af737373b123e6dfd1
MD5 5fa170b7a5e2996b0df50716c808fe3a
BLAKE2b-256 f14b71b5a30fbe558bf3fc9e34b799c5b09913eb9890afdd5b7c4fa792476e35

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page