Graph Network Analysis for scraping Google Scholar authors.
Project description
Welcome to Scholar Network
This package is intended for people wanting to scrape Google Scholar to build graph networks of Google Scholar authors and identify network connections as opportunities for collaboration.
Documentation
API Reference Documentation available here
Features
- Selenium based web scraping
- Poetry based dependency management
- Basic Graph algorithms and metrics
Requirements
- A Selenium web driver link
- Chrome
brew install --cask chromedriver
- Firefox
brew install geckodriver
- Safari
- Comes included in Safari 10+
- Chrome
ToDo:
- Write tests
Usage
To get started you can clone the repo and activate the poetry environment.
git clone https://github.com/UK-IPOP/scholar-network.git
cd scholar-network
poetry install --no-dev
poetry shell
Then start hacking! 😃
Examples
You must know each author's Google Scholar ID for this package to work.
Scraping one author (my wife, for example):
>>>import scholar_network as sn
>>>sn.scrape_single_author(scholar_id='ZmwzVQUAAAAJ', scholar_name='Michelle Duong')
The data for the author will then be in your data/scraped.json
file.
This defaults to the Safari web driver which we could have manually specified, or, alternatively, we could request to use the Chrome web driver.
>>>import scholar_network as sn
>>>sn.scrape_single_author(scholar_id='ZmwzVQUAAAAJ', scholar_name='Michelle Duong', driver='chrome')
To create a graph from this new data is easy:
>>>g = sn.build_graph()
Then, to see the most common five (5) connections:
>>>g.edge_rank(limit=5)
Out[4]:
[(('David Burgess', 'Donna Burgess'), 64),
(('Ashley Martinez', 'Daniela Moga'), 64),
(('Daniela Moga', 'Erin Abner'), 62),
(('Donna Burgess', 'Katie Wallace'), 62),
(('Chang-Guo Zhan', 'Fang Zheng'), 60)]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scholar-network-0.2.6.tar.gz
.
File metadata
- Download URL: scholar-network-0.2.6.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.14 CPython/3.10.5 Darwin/21.5.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 257aad73c12a7c92272803cfa89b1c6f7ab8def91eaf73c72679eab039424270 |
|
MD5 | a5fa91001fd9c309269dd79626e97b15 |
|
BLAKE2b-256 | 76a40bcf38a036746014d46c10d294c845e59a1a8a30887c0e0f628b492b4159 |
File details
Details for the file scholar_network-0.2.6-py3-none-any.whl
.
File metadata
- Download URL: scholar_network-0.2.6-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.14 CPython/3.10.5 Darwin/21.5.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 248716c1af10f679e89b8cd8335811d3712e293f3a976c134a55314355d3cf5d |
|
MD5 | 774ab293198ad726add5d88acc6b79da |
|
BLAKE2b-256 | b10e1ae9b4934d3c3e1dea1aa8a32e973b54a5f0f86f922ebf3c3225841f40b8 |