Skip to main content

A Python module which contains useful functions to help scrap data from Dawson College which is a CEGEP in Montreal Quebec Canada.

Project description

Dawson College PyScrapper v1.0.1

A Python module which contains useful functions to help scrap data from Dawson College which is a CEGEP in Montreal Quebec Canada.

Features

  • Get information on all the programs offered by Dawson College (ex: Computer Science, Computer Engineering, etc.)
  • Get an estimate of the total number of students enrolled
  • Get the total number of faculty members
  • Get the general metrics of Dawson College (ex: total number of programs offered, number of programs, number of profiles, number of disciplines, number of special studies, number of general studies, etc.)

Usage

Installation

pip install git+ssh://git@github.com/jdboisvert/dawson-college-pyscrapper

Using the core functionality

Getting program details for a specific program

from dawson_college_pyscrapper.scrapper import get_program_details

program_url = "https://www.dawsoncollege.qc.ca/programs/program-name"
# Get the BeautifulSoup Tag object of the program that is listed on the programs page.
listed_program = BeautifulSoup(requests.get(PROGRAMS_LISTING_URL).text.strip(), "html.parser").find("tr")

# Get the details of the program at the given URL.
program_details = get_program_details(program_url=program_url, listed_program=listed_program)
print(program_details)

Get details of all programs

from dawson_college_pyscrapper.scrapper import get_programs

programs = get_programs()
for program in programs:
    print(f"Program Name: {program.name}")
    print(f"Modified Date: {program.modified_date}")
    print(f"Program Type: {program.program_type}")
    print(f"Program URL: {program.url}")
    print("\n")

Get the total number of students enrolled

from dawson_college_pyscrapper.scrapper import get_total_number_of_students

total_number_of_students = get_total_number_of_students()
print(f"Total number of students: {total_number_of_students}")

Get the total number of faculty members

from dawson_college_pyscrapper.scrapper import get_total_number_of_faculty_members

total_number_of_faculty_members = get_total_number_of_faculty_members()
print(f"Total number of faculty members: {total_number_of_faculty_members}")

Get the general metrics of Dawson College

from dawson_college_pyscrapper.scrapper import scrap

generalMetrics = scrap()
print(f"Total programs offered: {GeneralMetrics.total_programs_offered}")
print(f"Number of programs: {GeneralMetrics.number_of_programs}")
print(f"Number of profiles: {GeneralMetrics.number_of_profiles}")
print(f"Number of disciplines: {GeneralMetrics.number_of_disciplines}")
print(f"Number of special studies: {GeneralMetrics.number_of_special_studies}")
print(f"Number of general studies: {GeneralMetrics.number_of_General_studies}")
print("\n")
print("Year count:")
for year, count in GeneralMetrics.total_year_counts.items():
    print(f"{year}: {count}")

print("\n")
print("Programs:")
for program in GeneralMetrics.programs:
    print(f"Program Name: {program.name}")
    print(f"Modified Date: {program.modified_date}")
    print(f"Program Type: {program.program_type}")
    print(f"Program URL: {program.url}")
    print("\n")

More examples

Check out the examples in the tests directory.

Development

Getting started

# install pyenv (if necessary)
brew install pyenv pyenv-virtualenv
echo """
export PYENV_VIRTUALENV_DISABLE_PROMPT=1
eval "$(pyenv init --path)"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
""" > ~/.zshrc
source ~/.zshrc

# create a virtualenv
pyenv install 3.11.1
pyenv virtualenv 3.11.1 dawson_college_pyscrapper
pyenv activate dawson_college_pyscrapper

# install dependencies
pip install -U pip
pip install -e ".[dev]"

Pre-commit

A number of pre-commit hooks are set up to ensure all commits meet basic code quality standards.

If one of the hooks changes a file, you will need to git add that file and re-run git commit before being able to continue.

To Install: pre-commit install

Testing

pytest and tox are used for testing. Tox is configured to try testing against both Python 3.8 and Python 3.9 if you have them available. If one is missing, Tox will skip it rather than fail out.

# just the unit tests against your current python version
pytest

# just the unit tests with a matching prefix
pytest -k test_some_function

# full test suite and code coverage reporting
tox

Credits

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dawson_college_pyscrapper-1.0.1.tar.gz (17.7 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file dawson_college_pyscrapper-1.0.1.tar.gz.

File metadata

  • Download URL: dawson_college_pyscrapper-1.0.1.tar.gz
  • Upload date:
  • Size: 17.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/6.0.0 pkginfo/1.9.6 requests/2.28.2 requests-toolbelt/0.10.1 tqdm/4.64.1 CPython/3.11.1

File hashes

Hashes for dawson_college_pyscrapper-1.0.1.tar.gz
Algorithm Hash digest
SHA256 b13e3f762840845ebe8c7d83020f273d79ce4bbf296c5e0971a2ddc5e97f6a09
MD5 34689bbe6389696e2c6f5e3234204ff7
BLAKE2b-256 a5f5e531948ea8ba31f6fabcad16f909b7d5ad9b337bf1d143fa7078170f1612

See more details on using hashes here.

Provenance

File details

Details for the file dawson_college_pyscrapper-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: dawson_college_pyscrapper-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/6.0.0 pkginfo/1.9.6 requests/2.28.2 requests-toolbelt/0.10.1 tqdm/4.64.1 CPython/3.11.1

File hashes

Hashes for dawson_college_pyscrapper-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9c80a0717bfecf2cd3e235dacb12bc83054eb3692cc7f585e08403067d45e948
MD5 5343e18e2618b1ef7ad275a72ebfc960
BLAKE2b-256 8561d12f8a107e21a92270abcb6ea2174c8ef38f3f4d4dda1b9628aac3d9edf2

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page