A web crawler library that fetches and parses data from Boston College Agora Portal
Project description
pygora
A web crawler library that fetches and parses data from BC Agora Portal.
Getting started (Python 3):
pip install pygora-phchcc
Examples
log in agora, download and print links to all subject pages
from pygora import *
session, gen_time = get_session("myAgoraUsername", "myAgoraPassword", check_valid=True)
# if gen_time == 0, we know something goes wrong (maybe you did not input the correct credential)
print(gen_time)
subjects = download_subjects(session, simple=True) # simple: each subject is a string
for i, line in enumerate(subjects):
print(i, line)
# subjects = download_subjects(session) #eacg subject is a dict, with more information
cache the username and password so that you don't have to write them explicitly in a script
from pygora import *
# to set credential, run it once so that username & password are stored locally
set_credential("myAgoraUsername", "myAgoraPassword")
# to clear out credential
set_credential("", "")
example of parse_subject_page
: print out all biology courses (school and subject codes can be found in subject.txt
), provided that if you have run set_credential
from pygora import *
session, gen_time = get_session(*get_credential(), check_valid=True)
# if you are confident that your username & password are correct, do
# session, gen_time = get_session(*get_credential())
url = SUBJECT_URL.format('2MCAS', '2BIOL') # get you a url string
resp = session.get(url) # use your session to HTTP get the url
courses = parse_subject_page(resp) # parse the subject page
for course in courses:
print(course)
example of parse_course_page
: print all information on a course page (the course code can be found in the output of parse_subject_page
)
from pygora import *
session, gen_time = get_session(*get_credential())
url = COURSE_URL.format('ACCT102101')
# a dummy dict in this example, could be your data fetched from database
info_dict = dict()
resp = session.get(url)
parse_course_page(resp, info_dict) # update the dict
for key, value in info_dict.items():
print(key, value)
Related Projects
the backend of EagleVision
the backend of New PEPS (planning)
Join Dev Team / Contact Us:
open an issue on Github to announce the feature/bug that you want to work on
or through email: (Haochen) phchcc_at_gmail_dot_com
or search our names in BC directory
Special Thanks
Special thanks to people who made EagleVision (this project's prototype) and pygora alive (names are listed in alphabetical order):
Baichuan (Patrick) Guo -- the original "Honest Team"
David Shen -- the EagleVision Dev Team
Estevan Feliz -- the original "Honest Team" & the EagleVision Dev Team
Roger Wang -- the EagleVision Dev Team
Yuning (Tommy) Yang -- the original "Honest Team"
Yuxuan (Jacky) Jin -- the EagleVision Dev Team
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pygora-phchcc-0.0.14.tar.gz
(6.8 kB
view details)
Built Distribution
File details
Details for the file pygora-phchcc-0.0.14.tar.gz
.
File metadata
- Download URL: pygora-phchcc-0.0.14.tar.gz
- Upload date:
- Size: 6.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.1 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 16c1809d355a0694b4568f8439a7f63fd6418e424ecd8aae545d710ad2fe4592 |
|
MD5 | f025abdbcd90878fab5b6a9ccebdde85 |
|
BLAKE2b-256 | 8b0e02e7d222825f0f040fdde831f8ba4665555d02d4ca44827c72fe6625ab6e |
File details
Details for the file pygora_phchcc-0.0.14-py3-none-any.whl
.
File metadata
- Download URL: pygora_phchcc-0.0.14-py3-none-any.whl
- Upload date:
- Size: 7.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.1 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5a73a76cf429432ede2dac2c1401f009f39f648e61b4c13956c7631701a8fa28 |
|
MD5 | 315ca603366802a53ce28b7c4e27ba9d |
|
BLAKE2b-256 | d487935585e65845d754d5ae8240b57aada7a9cfd1c6bb11a9c36a0370984768 |