Skip to main content

A web crawler library that fetches and parses data from Boston College Agora Portal

Project description

pygora

A web crawler library that fetches and parses data from BC Agora Portal.

Getting started (Python 3):

pip install pygora-phchcc

Examples

log in agora, download and print links to all subject pages
from pygora import *

session, gen_time = get_session("myAgoraUsername", "myAgoraPassword", check_valid=True)
# if gen_time == 0, we know something goes wrong (maybe you did not input the correct credential)
print(gen_time)

subjects = download_subjects(session, simple=True)  # simple: each subject is a string
for i, line in enumerate(subjects):
    print(i, line)

# subjects = download_subjects(session) #eacg subject is a dict, with more information
cache the username and password so that you don't have to write them explicitly in a script
from pygora import *

# to set credential, run it once so that username & password are stored locally
set_credential("myAgoraUsername", "myAgoraPassword")

# to clear out credential
set_credential("", "")
example of parse_subject_page: print out all biology courses (school and subject codes can be found in subject.txt), provided that if you have run set_credential
from pygora import *

session, gen_time = get_session(*get_credential(), check_valid=True)
# if you are confident that your username & password are correct, do
# session, gen_time = get_session(*get_credential())

url = SUBJECT_URL.format('2MCAS', '2BIOL')  # get you a url string
resp = session.get(url)  # use your session to HTTP get the url
courses = parse_subject_page(resp)  # parse the subject page
for course in courses:
    print(course)
example of parse_course_page: print all information on a course page (the course code can be found in the output of parse_subject_page)
from pygora import *

session, gen_time = get_session(*get_credential())
url = COURSE_URL.format('ACCT102101')

# a dummy dict in this example, could be your data fetched from database
info_dict = dict()
resp = session.get(url)
parse_course_page(resp, info_dict)  # update the dict
for key, value in info_dict.items():
    print(key, value)

Related Projects

the backend of EagleVision
the backend of New PEPS (planning)

Join Dev Team / Contact Us:

open an issue on Github to announce the feature/bug that you want to work on
or through email: (Haochen) phchcc_at_gmail_dot_com
or search our names in BC directory

Special Thanks

Special thanks to people who made EagleVision (this project's prototype) and pygora alive (names are listed in alphabetical order):

Baichuan (Patrick) Guo -- the original "Honest Team"
David Shen -- the EagleVision Dev Team
Estevan Feliz -- the original "Honest Team" & the EagleVision Dev Team
Roger Wang -- the EagleVision Dev Team
Yuning (Tommy) Yang -- the original "Honest Team"
Yuxuan (Jacky) Jin -- the EagleVision Dev Team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygora-phchcc-0.0.14.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

pygora_phchcc-0.0.14-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file pygora-phchcc-0.0.14.tar.gz.

File metadata

  • Download URL: pygora-phchcc-0.0.14.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.1 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.2

File hashes

Hashes for pygora-phchcc-0.0.14.tar.gz
Algorithm Hash digest
SHA256 16c1809d355a0694b4568f8439a7f63fd6418e424ecd8aae545d710ad2fe4592
MD5 f025abdbcd90878fab5b6a9ccebdde85
BLAKE2b-256 8b0e02e7d222825f0f040fdde831f8ba4665555d02d4ca44827c72fe6625ab6e

See more details on using hashes here.

File details

Details for the file pygora_phchcc-0.0.14-py3-none-any.whl.

File metadata

  • Download URL: pygora_phchcc-0.0.14-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.1 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.2

File hashes

Hashes for pygora_phchcc-0.0.14-py3-none-any.whl
Algorithm Hash digest
SHA256 5a73a76cf429432ede2dac2c1401f009f39f648e61b4c13956c7631701a8fa28
MD5 315ca603366802a53ce28b7c4e27ba9d
BLAKE2b-256 d487935585e65845d754d5ae8240b57aada7a9cfd1c6bb11a9c36a0370984768

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page