Skip to main content

A web crawler library that fetches and parses data from Boston College Agora Portal

Project description


A web crawler library that fetches and parses data from BC Agora Portal.

To Install (need Python 3):

pip install pygora-phchcc

To Run:

example: download and store all subject links with corresponding school code & subject code (the username and password will not be cached locally)
from pygora import *

session, gen_time = get_session("myAgoraUsername", "myAgoraPassword", check_valid=True)
# if gen_time == 0, we know something goes wrong (maybe you did not input the correct credential)
subjects = download_subjects(session, simple=True)
# subjects = download_subjects(session) # get you the full information

with open("subjects.txt", "w") as f:
    for line in subjects:
        f.write(line + "\n")
cache the username and password so that you don't have to write them explicitly in a script
from pygora import *

# to set credential, run it once so that username & password are stored locally
set_credential("myAgoraUsername", "myAgoraPassword")

# to clear out credential
set_credential("", "")
example of parse_subject_page: print out all biology courses (school and subject codes can be found in subject.txt), provided that if you have run set_credential
from pygora import *

session, gen_time = get_session(*get_credential(), check_valid=True)
# if you are confident that your username & password are correct, do
# session, gen_time = get_session(*get_credential())

url = SUBJECT_URL.format('2MCAS', '2BIOL')  # get you a url string
resp = session.get(url)  # use your session to HTTP get the url
courses = parse_subject_page(resp)  # parse the subject page
for course in courses:
example of parse_course_page: print all information on a course page (the course code can be found in the output of parse_subject_page)
from pygora import *

session, gen_time = get_session(*get_credential())
url = COURSE_URL.format('ACCT102101')

# a dummy dict in this example, could be your data fetched from database
info_dict = dict()
resp = session.get(url)
parse_course_page(resp, info_dict)  # update the dict
for key, value in info_dict.items():
    print(key, value)

Used by

the backend of EagleVision
the backend of New PEPS (planning)

Join Dev Team / Contact Us:

open an issue on Github to announce the feature/bug that you want to work on
or through email: (Haochen) phchcc_at_gmail_dot_com
or search our names in BC directory

Special Thanks

Special thanks to people who made EagleVision (this project's prototype) and pygora alive (names are listed in alphabetical order):
Ashkan Moghaddassi -- provides code reviews
Baichuan (Patrick) Guo -- the original "Honest Team"
Cecilia Wu -- awesome idea inputs
David Shen -- the EagleVision Dev Team
Estevan Feliz -- the original "Honest Team" & the EagleVision Dev Team
Jacob Wolf -- provides code reviews
Jianxin (Jeff) Wang -- provides code reviews
Roger Wang -- the EagleVision Dev Team
Yingjian (Steven) Wu -- awesome idea inputs
Yuning (Tommy) Yang -- the original "Honest Team"
Yuxuan (Jacky) Jin -- the EagleVision Dev Team

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
pygora_phchcc-0.0.7-py3-none-any.whl (7.5 kB) Copy SHA256 hash SHA256 Wheel py3
pygora-phchcc-0.0.7.tar.gz (6.5 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page