Skip to main content

Automated downloader for University Malaya past year exam papers

Project description

๐ŸŽ“ UM Past Year Paper Downloader - PaperFetch

One-click bulk download solution for University Malaya (UM) past year exam papers

Automate the tedious process of manually downloading past year papers one by one. Simply provide your UM credentials and subject code, and get all available papers in a single organized ZIP file.


โœจ Key Features

๐Ÿš€ Core Functionality

  • ๐Ÿ”„ One-Click Bulk Download: Download all past year papers for any subject code automatically
  • ๐Ÿ“ฆ Smart ZIP Organization: Automatically organizes papers by year and creates a structured ZIP archive
  • ๐Ÿ” Secure Authentication: Handles complex UM OpenAthens authentication flow seamlessly
  • โšก Concurrent Downloads: Multi-threaded downloading for faster performance
  • ๐Ÿ”„ Auto-Retry Logic: Robust error handling with configurable retry attempts
  • ๐Ÿ“Š Real-time Progress: Live progress bars and detailed status updates

๐Ÿ“ File Organization

  • ๐Ÿ“‚ Hierarchical Structure: Papers organized by subject โ†’ year โ†’ semester
  • ๐Ÿท๏ธ Smart File Naming: Automatically detects and preserves meaningful filenames
  • ๐Ÿ“‹ Auto-Generated README: Includes download summary and paper inventory in ZIP
  • ๐Ÿ—‚๏ธ Organized Output: Individual PDFs + consolidated ZIP file
  • ๐Ÿงน Optional Cleanup: Choice to keep individual files or ZIP only

๐Ÿ–ฅ๏ธ User Experience

  • ๐Ÿ“ฑ Terminal-Based Interface: Clean, intuitive command-line interface
  • ๐ŸŽฏ Interactive Mode: Prompts for credentials and settings when needed
  • โš™๏ธ Command-Line Mode: Full automation with command-line arguments
  • ๐Ÿ“ Custom Download Locations: Choose where to save your papers
  • ๐Ÿ” Browser Options: Support for Edge, Chrome with auto-detection
  • ๐Ÿ“ Comprehensive Logging: Detailed logs for troubleshooting

๐Ÿ”’ Security & Reliability

  • ๐Ÿ›ก๏ธ Secure Password Input: Hidden password entry (never stored/logged)
  • ๐Ÿงน Session Cleanup: Automatic browser data cleanup after use
  • โœ… Download Verification: Validates PDF integrity after download
  • ๐Ÿ” HTTPS Enforcement: Secure connections to UM servers
  • โฑ๏ธ Configurable Timeouts: Customizable session and download timeouts

๐Ÿ“‹ Complete Command Reference

Available Commands (9 total)

Command Short Description Default
--username -u UM username (without @siswa.um.edu.my) prompted
--subject-code -s Subject code to search for (e.g., WIA1005) prompted
--output-dir -o Custom download directory ./downloads
--browser -b Browser choice: auto, chrome, edge edge
--timeout Session timeout in seconds 30
--max-retries Maximum retry attempts for failed downloads 3
--show-browser Show browser window (disable headless mode) false
--no-location-prompt Skip interactive location selection false
--verbose -v Enable detailed debug logging false

Usage Examples

1. Interactive Mode (Recommended for first-time users)

python main.py

Prompts for username, password, subject code, and download location

2. Quick Command-Line Mode

python main.py --username john_doe --subject-code WIA1005

Only prompts for password

3. Fully Automated Mode

python main.py -u student123 -s WXES1116 -o "C:/Downloads/Papers" --no-location-prompt

No prompts except secure password entry

4. Debug Mode with Visible Browser

python main.py --subject-code WIA1005 --show-browser --verbose

Shows browser actions and detailed logging

5. High-Performance Mode

python main.py -s WIA1005 --max-retries 5 --timeout 60

Extended timeouts and retries for slow connections

6. Custom Browser Selection

python main.py --browser chrome --subject-code CSC1025

Force use of Chrome browser


๐Ÿš€ Quick Start Guide

Prerequisites

  • Python 3.8+ installed
  • One of these browsers: Microsoft Edge (recommended), Google Chrome
  • UM student account with active credentials
  • Stable internet connection

Installation

# 1. Clone/download this repository
git clone <repository-url>
cd um-past-year-downloader

# 2. Install dependencies
pip install -r requirements.txt

# 3. Ready to use!
python main.py

First Run

python main.py

Follow the interactive prompts:

  1. Enter your UM username (without @siswa.um.edu.my)
  2. Enter your password securely
  3. Enter subject code (e.g., WIA1005)
  4. Choose download location
  5. Confirm download of found papers

๐Ÿ“Š What You Get

Organized File Structure

๐Ÿ“ downloads/
โ”œโ”€โ”€ ๐Ÿ“ WIA1005/
โ”‚   โ”œโ”€โ”€ ๐Ÿ“ Year_2023/
โ”‚   โ”‚   โ”œโ”€โ”€ WIA1005_Final_2023_S1.pdf
โ”‚   โ”‚   โ””โ”€โ”€ WIA1005_Final_2023_S2.pdf
โ”‚   โ”œโ”€โ”€ ๐Ÿ“ Year_2022/
โ”‚   โ”‚   โ”œโ”€โ”€ WIA1005_Final_2022_S1.pdf
โ”‚   โ”‚   โ””โ”€โ”€ WIA1005_Final_2022_S2.pdf
โ”‚   โ””โ”€โ”€ ๐Ÿ“ Unsorted/
โ”‚       โ””โ”€โ”€ WIA1005_Additional_Papers.pdf
โ”œโ”€โ”€ ๐Ÿ“ฆ WIA1005_past_years.zip
โ””โ”€โ”€ ๐Ÿ“„ WIA1005_README.txt

ZIP Archive Contents

  • Hierarchical Organization: Subject โ†’ Year โ†’ Files
  • Automatic README: Download summary and file inventory
  • Optimized Compression: Balanced compression for size/speed
  • Preserve Metadata: Original filenames and dates maintained

Generated Reports

  • Download Summary: Shows total papers found and downloaded
  • Failed Downloads: Lists any papers that couldn't be downloaded
  • File Inventory: Complete list of papers with years and types
  • Timestamp: When the download was performed

๐Ÿ”ง Advanced Configuration

Browser Selection Guide

Browser Best For Advantages Notes
Edge Windows users Built-in, no driver conflicts, memory efficient Recommended
Chrome Mac/Linux users Wide compatibility, stable May need driver updates
Auto Uncertain Detects best available Falls back to Edge โ†’ Chrome

Performance Tuning

# For slow connections
python main.py --timeout 60 --max-retries 5

# For fast connections  
python main.py --timeout 15 --max-retries 2

# For debug/troubleshooting
python main.py --verbose --show-browser

Output Directory Options

  • Default: ./downloads (project folder)
  • Custom: Any valid path (e.g., C:/Users/Student/Papers)
  • Interactive: Choose during runtime
  • Auto: Use --no-location-prompt to skip selection

๐Ÿงช Testing & Validation

Built-in Test Scripts

1. Complete System Test

python test_setup.py

Tests Python environment, dependencies, browser drivers, and network connectivity

2. Authentication Test

python test_login.py

Tests only the UM login process (useful for credential verification)

3. Search Functionality Test

python test_search_debug.py

Tests paper search without downloading

Validation Features

  • PDF Integrity Check: Verifies downloaded files are valid PDFs
  • Size Validation: Ensures files aren't empty or corrupted
  • Download Verification: Confirms all expected papers were downloaded
  • ZIP Integrity: Validates ZIP file creation and contents

๐Ÿ› ๏ธ Technical Architecture

Modular Components

1. Authentication (auth/um_authenticator.py)

  • Handles complex UM OpenAthens SAML authentication
  • Manages session cookies and security tokens
  • Supports multiple browser backends

2. Paper Discovery (scraper/paper_scraper.py)

  • Searches UM repository by subject code
  • Extracts paper metadata (year, semester, type)
  • Handles pagination and result filtering

3. Download Engine (downloader/pdf_downloader.py)

  • Concurrent multi-threaded downloads
  • Progress tracking with visual indicators
  • Retry logic with exponential backoff
  • File integrity validation

4. Archive Creator (utils/zip_creator.py)

  • Intelligent file organization by year/semester
  • Optimized compression algorithms
  • Auto-generated documentation
  • Metadata preservation

5. Logging System (utils/logger.py)

  • Structured logging with multiple levels
  • Separate log files for debugging
  • Performance metrics and timing

Dependencies

  • selenium - Web automation and browser control
  • requests - HTTP session management
  • beautifulsoup4 - HTML parsing and data extraction
  • tqdm - Progress bars and status indicators
  • webdriver-manager - Automatic browser driver management

๐Ÿšจ Troubleshooting

Common Issues & Solutions

โŒ Login Failed

  • โœ… Verify username/password are correct
  • โœ… Check if your UM account is active
  • โœ… Try using Edge browser: --browser edge
  • โœ… Enable debug mode: --verbose --show-browser

โŒ No Papers Found

  • โœ… Verify subject code is correct (e.g., WIA1005, not wia1005)
  • โœ… Check if papers exist for that subject
  • โœ… Try different semester/year variations

โŒ Download Errors

  • โœ… Check internet connection stability
  • โœ… Increase timeout: --timeout 60
  • โœ… Increase retries: --max-retries 5
  • โœ… Check disk space in output directory

โŒ Browser/WebDriver Issues

  • โœ… Windows users: Use Edge first: --browser edge
  • โœ… Update browser to latest version
  • โœ… Try: pip install --upgrade webdriver-manager
  • โœ… See TROUBLESHOOTING.md for detailed solutions

Exit Codes

  • 0 - Success
  • 1 - Authentication failure
  • 2 - Network connectivity issues
  • 3 - No papers found or download failed
  • 4 - File system permissions error
  • 130 - User cancelled (Ctrl+C)

๐Ÿ“ˆ Performance Metrics

Typical Performance

  • Authentication: 5-10 seconds
  • Paper Search: 2-5 seconds
  • Download Speed: 2-5 MB/s per file (concurrent)
  • ZIP Creation: 1-3 seconds
  • Total Time: 30 seconds - 2 minutes (depending on paper count)

Optimization Features

  • Concurrent Downloads: Up to 4 simultaneous downloads
  • Intelligent Caching: Avoids re-downloading existing files
  • Compressed Archives: ZIP compression reduces file size by 10-30%
  • Progress Tracking: Real-time ETA and speed indicators

โš–๏ธ Legal & Academic Use

Terms of Use

  • โœ… Educational Purpose Only: For UM students' academic use
  • โœ… Respect UM Policies: Adheres to university terms of service
  • โœ… No Circumvention: Uses standard authentication methods
  • โœ… Rate Limiting: Respects server load limits
  • โœ… Valid Credentials Required: Must have active UM account

What This Tool Does NOT Do

  • โŒ Bypass any security measures
  • โŒ Access restricted content
  • โŒ Store or share credentials
  • โŒ Violate copyright or academic policies
  • โŒ Access content you don't have permission for

๐Ÿ’ก Tips for Best Experience

For Windows Users

# Recommended command for Windows
python main.py --browser edge --subject-code WIA1005

For Mac/Linux Users

# Recommended command for Mac/Linux
python main.py --browser chrome --subject-code WIA1005

For Slow Connections

python main.py --timeout 90 --max-retries 5 --subject-code WIA1005

For Batch Processing

# Create a batch script for multiple subjects
python main.py -s WIA1005 --no-location-prompt -o "./Papers/WIA1005"
python main.py -s WXES1116 --no-location-prompt -o "./Papers/WXES1116"

๐Ÿค Support & Contributing

Getting Help

  1. ๐Ÿ“– Read TROUBLESHOOTING.md - Comprehensive solution guide
  2. ๐Ÿ” Check logs - Review log files for detailed error information
  3. ๐Ÿงช Run tests - Use python test_setup.py to validate environment
  4. ๐Ÿ”„ Try Edge browser - Often resolves driver issues: --browser edge

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with tests
  4. Submit a pull request
  5. Follow existing code style and documentation standards

๐Ÿ“„ Disclaimer

Disclaimer: This tool is an unofficial utility created to help UM students access past year papers more efficiently. It is not affiliated with or endorsed by University Malaya. Users are responsible for complying with UM's terms of service and academic policies.


๐ŸŽฏ Quick Command Cheat Sheet

# Basic usage
python main.py

# Fast automated mode  
python main.py -u username -s WIA1005 --no-location-prompt

# Debug mode
python main.py --verbose --show-browser -s WIA1005

# High performance
python main.py --max-retries 5 --timeout 60 -s WXES1116

# Custom location
python main.py -o "C:/Papers" -s CSC1025

# Windows optimized
python main.py --browser edge -s WIA1005

Time to lock in for your final

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

umpaper_fetch-1.0.3.tar.gz (60.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

umpaper_fetch-1.0.3-py3-none-any.whl (46.8 kB view details)

Uploaded Python 3

File details

Details for the file umpaper_fetch-1.0.3.tar.gz.

File metadata

  • Download URL: umpaper_fetch-1.0.3.tar.gz
  • Upload date:
  • Size: 60.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/2.0.7 tqdm/4.66.1 importlib-metadata/6.8.0 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.11.9

File hashes

Hashes for umpaper_fetch-1.0.3.tar.gz
Algorithm Hash digest
SHA256 7185624d5cb5cbaa1f32980b96704d79c6b03eb99061a35038e629a9cb86aca2
MD5 0da9661f085034b5f32ab58c456cda5e
BLAKE2b-256 079de4df15ba29c6fa490c57bc96c5aef795541ea5ba5795ff7d91f93b4c068a

See more details on using hashes here.

File details

Details for the file umpaper_fetch-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: umpaper_fetch-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 46.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/2.0.7 tqdm/4.66.1 importlib-metadata/6.8.0 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.11.9

File hashes

Hashes for umpaper_fetch-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1a94781bffe70069a83111ba18e3ba1ae108facfcff539b5e5afd91a34cd66ee
MD5 2ae6a2b0ce58376db462384d377b1a22
BLAKE2b-256 fbe4e4dbb93e22ba58e2a5df2044f74658cc0a779401de98b3dc58977aaf58e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page