Module for scraping UC Merced's class schedules

Project description

UCMercedule: Scraper

A Python module that scrapes UC Merced class schedules for you!

API

Using this module pretty much just entails 1. creating a Schedule instance and 2. reading its data attributes; see below for more details.

`ucmscraper.Schedule`

A Schedule instance object is a fully parsed UC Merced schedule page from a given term.

The Schedule class is a record type/plain old data structure, meaning it really only structures data into fields and provides very little functionality on its own. The Term, Course, and Section classes that compose Schedule follow the same vein. It is up to the client to implement their own functions for handling these types.

Schedules can created in three ways: two involve a factory class method, and one is a plain constructor.

1. `ucmscraper.Schedule.fetch_latest()`

Performs an HTTP request and, if successful, returns a Schedule object for the latest term (Fall 2019 at the time of writing).

2. `ucmscraper.Schedule.fetch(term)`

Performs an HTTP request and, if successful, returns a Schedule object for the given Term object. Terms should be retrieved via ucmscraper.get_current_terms().

3. `ucmscraper.Schedule(schedule_html)`

Parses schedule_html and returns a Schedule object.

Attributes

Schedule has the following data attributes:

schedule.html - a string of the raw HTML of the original schedule page

schedule.term - a Term object containing information about the term associated with this Schedule instance.

schedule.departments - an OrderedDict whose keys are department codes and whose values are the associated department titles, e.g.:

{
    'ANTH': 'Anthropology',
    'BEST': 'Bio Engin Small Scale Tech',
    'BIO': 'Biological Sciences',
    'BIOE': 'Bioengineering',
    ...
}

Keys follow the order that they appear in schedule pages, which is alphabetical.

schedule.courses - a tuple of Course namedtuples in the order that courses appear on the schedule page, e.g.

(
    Course(
        department_code='ANTH',
        number='001',
        title='Sociocultural Anthropology',
        units=4
    ),
    ...
    Course(
        department_code='WRI',
        number='131C',
        title='Undergraduate Research Journal',
        units=2
    )
)

schedule.sections - a tuple of Section namedtuples, each representing one non-exam row from the schedule page, and in the order that sections appear on the schedule page, e.g.:

(
    Section(
        CRN=30250,
        department_code='ANTH',
        course_number='001',
        number='01',
        title='Sociocultural Anthropology',
        notes=('Must Also Register For A Corresponding Discussion',),
        activity='LECT',
        days='MW',
        start_time='1:30 PM',
        end_time='2:45 PM',
        location='ACS 120',
        instructor='DeLugan, Robin',
        max_seats=210,
        taken_seats=0,
        free_seats=210
    ),
    ...
    Section(
        CRN=34978,
        department_code='WRI',
        course_number='131C',
        number='01',
        title='Undergraduate Research Journal',
        notes=(),
        activity='SEM',
        days='W',
        start_time='9:30 AM',
        end_time='11:20 AM',
        location='CLSSRM 272',
        instructor='Staff',
        max_seats=20,
        taken_seats=0,
        free_seats=20
    )
)

`ucmscraper.get_current_terms()`

When first called, performs an HTTP request and if successful, returns a tuple of terms currently available for viewing via the official schedule search form. Terms are represented by Term objects. Keys follow the same order as in the official schedule search form.

Example return value:

(Term(code='201910', name='Spring Semester 2019'),
 Term(code='201920', name='Summer Semester 2019 - All Courses'),
 Term(code='201920 - S6', name='Summer Semester 2019 - First 6-week Summer Session A'),
 Term(code='201920 - S62', name='Summer Semester 2019 - Second 6-week Summer Session C'),
 Term(code='201920 - S8', name='Summer Semester 2019 - 8-week Summer Session B'),
 Term(code='201930', name='Fall Semester 2019'))

Note: old terms no longer on the official schedule search form have their access restricted, so this module cannot retrieve them. I may maintain schedule pages from old terms, so contact me if you want access to them.

Term has the following data attributes:

Term.code - a string containing a validterm value from the official schedule search form. When you choose a term via one of the "Select a Term" radio buttons, you are selecting a validterm to be submitted when you click "View Class Schedule".

Term.name - a string containing a term name associated with one of the aforementioned radio buttons.

Installation

pipenv install ucmscraper

Example usage

import json
import pathlib
import ucmscraper

# Create example folder to store output files
pathlib.Path('./example').mkdir(exist_ok=True)

def get_last_value(ordered_dict):
    return next(reversed(ordered_dict.values()))

latest_term = get_last_value(ucmscraper.get_current_terms())
try:
    with open('example/{}.html'.format(latest_term.name), 'r') as f:
        schedule_html = f.read()
        schedule = ucmscraper.Schedule(schedule_html, latest_term)
except FileNotFoundError:
    schedule = ucmscraper.Schedule.fetch_latest()

class NamedTupleIterEncoder(json.JSONEncoder):
    def default(self, o):
        return [t._asdict() for t in o]

term = schedule.term.name
with open('example/{}.html'.format(term), 'w') as f:
    f.write(schedule.html)
# OrderedDicts don't need sort_keys=True
with open('example/{} - Departments.json'.format(term), 'w') as f:
    json.dump(schedule.departments, f, indent=4)
with open('example/{} - Courses.json'.format(term), 'w') as f:
    json.dump([t._asdict() for t in schedule.courses], f, indent=4)
with open('example/{} - Sections.json'.format(term), 'w') as f:
    json.dump([t._asdict() for t in schedule.sections], f, indent=4)

Check out the resulting schedule files in the example folder.

Project details

Release history Release notifications | RSS feed

This version

2.2.0

Sep 7, 2019

2.1.0

Mar 13, 2019

2.0.0

Mar 11, 2019

1.5.1

Mar 11, 2019

1.5.0

Mar 10, 2019

1.4.6

Jun 30, 2018

1.4.5

Jun 29, 2018

1.4.4

Jun 27, 2018

1.4.3

Jun 24, 2018

1.4.2

Jun 24, 2018

1.4.1

Jun 10, 2018

1.4.0

Jun 8, 2018

1.3.0

Jun 8, 2018

1.2.0

Jun 8, 2018

1.1.0

Jun 6, 2018

1.0.0

Jun 5, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ucmscraper-2.2.0.tar.gz (7.0 kB view details)

Uploaded Sep 7, 2019 Source

File details

Details for the file ucmscraper-2.2.0.tar.gz.

File metadata

Download URL: ucmscraper-2.2.0.tar.gz
Upload date: Sep 7, 2019
Size: 7.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.4

File hashes

Hashes for ucmscraper-2.2.0.tar.gz
Algorithm	Hash digest
SHA256	`c0f3eed188d8d574e5a851f3e9316624050d9ebecade0b06790f83e211ccfaac`
MD5	`e2201e2dcdb12ee4c70b39f4f473b228`
BLAKE2b-256	`c2236c236ce978b48067f019125aae184e67dd1e82ded00adcc64d6e5b76dafa`

See more details on using hashes here.

ucmscraper 2.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

UCMercedule: Scraper

API

`ucmscraper.Schedule`

1. `ucmscraper.Schedule.fetch_latest()`

2. `ucmscraper.Schedule.fetch(term)`

3. `ucmscraper.Schedule(schedule_html)`

Attributes

`ucmscraper.get_current_terms()`

Installation

Example usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

ucmscraper 2.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

UCMercedule: Scraper

API

ucmscraper.Schedule

1. ucmscraper.Schedule.fetch_latest()

2. ucmscraper.Schedule.fetch(term)

3. ucmscraper.Schedule(schedule_html)

Attributes

ucmscraper.get_current_terms()

Installation

Example usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

`ucmscraper.Schedule`

1. `ucmscraper.Schedule.fetch_latest()`

2. `ucmscraper.Schedule.fetch(term)`

3. `ucmscraper.Schedule(schedule_html)`

`ucmscraper.get_current_terms()`