Asynchronous Python scraper for Celcat Calendar

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

etiennec78

These details have not been verified by PyPI

Framework
- AsyncIO
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Celcat Calendar Scraper 📆

An asynchronous Python library for scraping Celcat calendar systems.

Installation 🚀

pip install celcat-scraper

Features 🌟

Event data filtering 🧹
Async/await support for better performance 🔀
Rate limiting with adaptive backoff ⏳
Optional caching support 💾
Optional reusable aiohttp session ♻️
Automatic session management 🍪
Batch processing of events 📦
Error handling and retries 🚨

Usage ⚙️

Basic example of retrieving calendar events:

import asyncio
from datetime import date, timedelta
from celcat_scraper import CelcatConfig, CelcatScraperAsync

async def main():
    # Configure the scraper
    config = CelcatConfig(
        url="https://university.com/calendar",
        username="your_username",
        password="your_password",
        include_holidays=True,
    )

    # Create scraper instance and get events
    async with CelcatScraperAsync(config) as scraper:
        start_date = date.today()
        end_date = start_date + timedelta(days=30)

        # Recommended to store events locally and reduce the amout of requests
        file_path = "store.json"
        events = scraper.deserialize_events(file_path)

        events = await scraper.get_calendar_events(
            start_date, end_date, previous_events=events
        )

        for event in events:
            print(f"Event {event['id']}")
            print(f"Course: {event['category']} - {event['course']}")
            print(f"Time: {event['start']} to {event['end']}")
            print(f"Location: {', '.join(event['rooms'])} at {', '.join(event['sites'])} - {event['department']}")
            print(f"Professors: {', '.join(event['professors'])}")
            print("---")

        # Save events for a future refresh
        scraper.serialize_events(events, file_path)

if __name__ == "__main__":
    asyncio.run(main())

Filtering 🧹

Celcat Calendar data is often messy, and needs to be processed before it can be used. For example, the same course may have several different names in different events. Filtering allows these attributes to be standardized.

Usage ⚙️

ℹ️ Info: Each filter argument is optional. When course_strip_redundant is enabled, using remembered_strips is recommended.

⚠️ Warning: Disabling filters will require you to reset your previous events and refetch to undo changes.

import asyncio
from datetime import date, timedelta
import json
from celcat_scraper import CelcatFilterConfig, FilterType, CelcatConfig, CelcatScraperAsync

async def main():
    # Load remembered_strips from a file
    remembered_strips = []
    try:
        with open("remembered_strips.json", "r") as f:
            remembered_strips = json.load(f)
    except (FileNotFoundError, json.JSONDecodeError):
        remembered_strips = []

    # Create a list of manual course replacements
    course_replacements = {"English - S2": "English", "Mathematics": "Maths"}

    # Configure a filter
    filter_config = CelcatFilterConfig(
        filters = {
            FilterType.COURSE_TITLE,
            FilterType.COURSE_STRIP_MODULES,
            FilterType.COURSE_STRIP_CATEGORY,
            FilterType.COURSE_STRIP_PUNCTUATION,
            FilterType.COURSE_GROUP_SIMILAR,
            FilterType.COURSE_STRIP_REDUNDANT,
            FilterType.PROFESSORS_TITLE,
            FilterType.ROOMS_TITLE,
            FilterType.ROOMS_STRIP_AFTER_NUMBER,
            FilterType.SITES_TITLE,
            FilterType.SITES_REMOVE_DUPLICATES,
        }
        course_remembered_strips=remembered_strips,
        course_replacements=course_replacements,
    )

    config = CelcatConfig(
        url="https://university.com/calendar",
        username="your_username",
        password="your_password",
        include_holidays=True,
        # Pass the filter as an argument
        filter_config=filter_config,
    )

    async with CelcatScraperAsync(config) as scraper:
        start_date = date.today()
        end_date = start_date + timedelta(days=30)

        events = scraper.deserialize_events("store.json")
        events = await scraper.get_calendar_events(
            start_date, end_date, previous_events=events
        )

        scraper.serialize_events(events, file_path)

    # Save the updated remembered_strips back to file
    with open("remembered_strips.json", "w") as f:
        json.dump(scraper.filter_config.course_remembered_strips, f)

if __name__ == "__main__":
    asyncio.run(main())

Available Filters 🧹

Filter	Description	Example
*_TITLE	Capitalize only the first letter of each word	MATHS CLASS -> Maths Class
COURSE_STRIP_MODULES	Remove modules from courses names	Maths [DPAMAT2D] -> Maths
COURSE_STRIP_CATEGORY	Remove category from course names	Maths CM -> Maths
COURSE_STRIP_PUNCTUATION	Remove ".,:;!?" from text	Math. -> Math
COURSE_GROUP_SIMILAR	Search for all event names and group ones containing another	Maths, Maths S1 -> Maths
COURSE_STRIP_REDUNDANT	Extract parts removed by the previous filter and remove them from all other courses	Physics S1 -> Physics
ROOMS_STRIP_AFTER_NUMBER	Remove all text after the first number found	Room 403 32 seats -> Room 403
SITES_REMOVE_DUPLICATES	Remove duplicates from the list	Building A, Building A -> Building A

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

etiennec78

These details have not been verified by PyPI

Framework
- AsyncIO
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.1.3

Mar 19, 2025

1.1.2

Mar 19, 2025

This version

1.1.1

Mar 18, 2025

1.1.0

Mar 17, 2025

1.0.0

Mar 1, 2025

0.1.0

Dec 1, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

celcat_scraper-1.1.1.tar.gz (15.2 kB view details)

Uploaded Mar 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

celcat_scraper-1.1.1-py3-none-any.whl (16.9 kB view details)

Uploaded Mar 18, 2025 Python 3

File details

Details for the file celcat_scraper-1.1.1.tar.gz.

File metadata

Download URL: celcat_scraper-1.1.1.tar.gz
Upload date: Mar 18, 2025
Size: 15.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for celcat_scraper-1.1.1.tar.gz
Algorithm	Hash digest
SHA256	`d4ea31aabb73233642ccf5b1722bd3a6f33ab0613c601d43e3722dde08f1b29b`
MD5	`ea6f0e0f6d7eadfe8f0712012311d099`
BLAKE2b-256	`f955a4eb41e6d09904ede106e51a1097341fdd12ea1c7b59f49137780e7317c0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for celcat_scraper-1.1.1.tar.gz:

Publisher: python-publish.yml on etiennec78/celcat-scraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: celcat_scraper-1.1.1.tar.gz
- Subject digest: d4ea31aabb73233642ccf5b1722bd3a6f33ab0613c601d43e3722dde08f1b29b
- Sigstore transparency entry: 184386316
- Sigstore integration time: Mar 18, 2025
Source repository:
- Permalink: etiennec78/celcat-scraper@a9f6e449458f1e30b1b09e5e73cce096bf428d68
- Branch / Tag: refs/tags/v1.1.1
- Owner: https://github.com/etiennec78
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@a9f6e449458f1e30b1b09e5e73cce096bf428d68
- Trigger Event: release

File details

Details for the file celcat_scraper-1.1.1-py3-none-any.whl.

File metadata

Download URL: celcat_scraper-1.1.1-py3-none-any.whl
Upload date: Mar 18, 2025
Size: 16.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for celcat_scraper-1.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6c4d603408ef66250618714981603a62171c609df40dfe026d513847ce94befc`
MD5	`ba700f20730c4e04b98c86d73348fc83`
BLAKE2b-256	`cf1cbe0526b60ef228959353716f7700b9515ee920c70aff6f629dfd807ee9ba`

See more details on using hashes here.

Provenance

The following attestation bundles were made for celcat_scraper-1.1.1-py3-none-any.whl:

Publisher: python-publish.yml on etiennec78/celcat-scraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: celcat_scraper-1.1.1-py3-none-any.whl
- Subject digest: 6c4d603408ef66250618714981603a62171c609df40dfe026d513847ce94befc
- Sigstore transparency entry: 184386319
- Sigstore integration time: Mar 18, 2025
Source repository:
- Permalink: etiennec78/celcat-scraper@a9f6e449458f1e30b1b09e5e73cce096bf428d68
- Branch / Tag: refs/tags/v1.1.1
- Owner: https://github.com/etiennec78
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@a9f6e449458f1e30b1b09e5e73cce096bf428d68
- Trigger Event: release

celcat-scraper 1.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Celcat Calendar Scraper 📆

Installation 🚀

Features 🌟

Usage ⚙️

Filtering 🧹

Usage ⚙️

Available Filters 🧹

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance