Skip to main content

Graph Network Analysis for scraping Google Scholar authors.

Project description

Welcome to Scholar Network

This package is intended for people wanting to scrape Google Scholar to build graph networks of Google Scholar authors and identify network connections as opportunities for collaboration.

Documentation

API Reference Documentation available here

Features

  1. Selenium based web scraping
  2. Poetry based dependency management
  3. Basic Graph algorithms and metrics

Requirements

  • A Selenium web driver link
    • Chrome
      • brew install --cask chromedriver
    • Firefox
      • brew install geckodriver
    • Safari
      • Comes included in Safari 10+

ToDo:

  • Write tests

Usage

To get started you can clone the repo and activate the poetry environment.

git clone https://github.com/UK-IPOP/scholar-network.git
cd scholar-network
poetry install --no-dev
poetry shell

Then start hacking! 😃

Examples

You must know each author's Google Scholar ID for this package to work.

Scraping one author (my wife, for example):

>>>import scholar_network as sn
>>>sn.scrape_single_author(scholar_id='ZmwzVQUAAAAJ', scholar_name='Michelle Duong')

The data for the author will then be in your data/scraped.json file.

This defaults to the Safari web driver which we could have manually specified, or, alternatively, we could request to use the Chrome web driver.

>>>import scholar_network as sn
>>>sn.scrape_single_author(scholar_id='ZmwzVQUAAAAJ', scholar_name='Michelle Duong', driver='chrome')

To create a graph from this new data is easy:

>>>g = sn.build_graph()

Then, to see the most common five (5) connections:

>>>g.edge_rank(limit=5)
Out[4]:
[(('David Burgess', 'Donna Burgess'), 64),
 (('Ashley Martinez', 'Daniela Moga'), 64),
 (('Daniela Moga', 'Erin Abner'), 62),
 (('Donna Burgess', 'Katie Wallace'), 62),
 (('Chang-Guo Zhan', 'Fang Zheng'), 60)]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scholar-network-0.2.6.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

scholar_network-0.2.6-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file scholar-network-0.2.6.tar.gz.

File metadata

  • Download URL: scholar-network-0.2.6.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.10.5 Darwin/21.5.0

File hashes

Hashes for scholar-network-0.2.6.tar.gz
Algorithm Hash digest
SHA256 257aad73c12a7c92272803cfa89b1c6f7ab8def91eaf73c72679eab039424270
MD5 a5fa91001fd9c309269dd79626e97b15
BLAKE2b-256 76a40bcf38a036746014d46c10d294c845e59a1a8a30887c0e0f628b492b4159

See more details on using hashes here.

File details

Details for the file scholar_network-0.2.6-py3-none-any.whl.

File metadata

  • Download URL: scholar_network-0.2.6-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.10.5 Darwin/21.5.0

File hashes

Hashes for scholar_network-0.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 248716c1af10f679e89b8cd8335811d3712e293f3a976c134a55314355d3cf5d
MD5 774ab293198ad726add5d88acc6b79da
BLAKE2b-256 b10e1ae9b4934d3c3e1dea1aa8a32e973b54a5f0f86f922ebf3c3225841f40b8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page