Automated downloader for University Malaya past year exam papers
Project description
๐ UM Past Year Paper Downloader - PaperFetch
One-click bulk download solution for University Malaya (UM) past year exam papers
Automate the tedious process of manually downloading past year papers one by one. Simply provide your UM credentials and subject code, and get all available papers in a single organized ZIP file.
โจ Key Features
๐ Core Functionality
- ๐ One-Click Bulk Download: Download all past year papers for any subject code automatically
- ๐ฆ Smart ZIP Organization: Automatically organizes papers by year and creates a structured ZIP archive
- ๐ Secure Authentication: Handles complex UM OpenAthens authentication flow seamlessly
- โก Concurrent Downloads: Multi-threaded downloading for faster performance
- ๐ Auto-Retry Logic: Robust error handling with configurable retry attempts
- ๐ Real-time Progress: Live progress bars and detailed status updates
๐ File Organization
- ๐ Hierarchical Structure: Papers organized by subject โ year โ semester
- ๐ท๏ธ Smart File Naming: Automatically detects and preserves meaningful filenames
- ๐ Auto-Generated README: Includes download summary and paper inventory in ZIP
- ๐๏ธ Organized Output: Individual PDFs + consolidated ZIP file
- ๐งน Optional Cleanup: Choice to keep individual files or ZIP only
๐ฅ๏ธ User Experience
- ๐ฑ Terminal-Based Interface: Clean, intuitive command-line interface
- ๐ฏ Interactive Mode: Prompts for credentials and settings when needed
- โ๏ธ Command-Line Mode: Full automation with command-line arguments
- ๐ Custom Download Locations: Choose where to save your papers
- ๐ Browser Options: Support for Edge, Chrome with auto-detection
- ๐ Comprehensive Logging: Detailed logs for troubleshooting
๐ Security & Reliability
- ๐ก๏ธ Secure Password Input: Hidden password entry (never stored/logged)
- ๐งน Session Cleanup: Automatic browser data cleanup after use
- โ Download Verification: Validates PDF integrity after download
- ๐ HTTPS Enforcement: Secure connections to UM servers
- โฑ๏ธ Configurable Timeouts: Customizable session and download timeouts
๐ Complete Command Reference
Available Commands (9 total)
| Command | Short | Description | Default |
|---|---|---|---|
--username |
-u |
UM username (without @siswa.um.edu.my) | prompted |
--subject-code |
-s |
Subject code to search for (e.g., WIA1005) | prompted |
--output-dir |
-o |
Custom download directory | ./downloads |
--browser |
-b |
Browser choice: auto, chrome, edge |
edge |
--timeout |
Session timeout in seconds | 30 |
|
--max-retries |
Maximum retry attempts for failed downloads | 3 |
|
--show-browser |
Show browser window (disable headless mode) | false |
|
--no-location-prompt |
Skip interactive location selection | false |
|
--verbose |
-v |
Enable detailed debug logging | false |
Usage Examples
1. Interactive Mode (Recommended for first-time users)
python main.py
Prompts for username, password, subject code, and download location
2. Quick Command-Line Mode
python main.py --username john_doe --subject-code WIA1005
Only prompts for password
3. Fully Automated Mode
python main.py -u student123 -s WXES1116 -o "C:/Downloads/Papers" --no-location-prompt
No prompts except secure password entry
4. Debug Mode with Visible Browser
python main.py --subject-code WIA1005 --show-browser --verbose
Shows browser actions and detailed logging
5. High-Performance Mode
python main.py -s WIA1005 --max-retries 5 --timeout 60
Extended timeouts and retries for slow connections
6. Custom Browser Selection
python main.py --browser chrome --subject-code CSC1025
Force use of Chrome browser
๐ Quick Start Guide
Prerequisites
- Python 3.8+ installed
- One of these browsers: Microsoft Edge (recommended), Google Chrome
- UM student account with active credentials
- Stable internet connection
Installation
# 1. Clone/download this repository
git clone <repository-url>
cd um-past-year-downloader
# 2. Install dependencies
pip install -r requirements.txt
# 3. Ready to use!
python main.py
First Run
python main.py
Follow the interactive prompts:
- Enter your UM username (without @siswa.um.edu.my)
- Enter your password securely
- Enter subject code (e.g., WIA1005)
- Choose download location
- Confirm download of found papers
๐ What You Get
Organized File Structure
๐ downloads/
โโโ ๐ WIA1005/
โ โโโ ๐ Year_2023/
โ โ โโโ WIA1005_Final_2023_S1.pdf
โ โ โโโ WIA1005_Final_2023_S2.pdf
โ โโโ ๐ Year_2022/
โ โ โโโ WIA1005_Final_2022_S1.pdf
โ โ โโโ WIA1005_Final_2022_S2.pdf
โ โโโ ๐ Unsorted/
โ โโโ WIA1005_Additional_Papers.pdf
โโโ ๐ฆ WIA1005_past_years.zip
โโโ ๐ WIA1005_README.txt
ZIP Archive Contents
- Hierarchical Organization: Subject โ Year โ Files
- Automatic README: Download summary and file inventory
- Optimized Compression: Balanced compression for size/speed
- Preserve Metadata: Original filenames and dates maintained
Generated Reports
- Download Summary: Shows total papers found and downloaded
- Failed Downloads: Lists any papers that couldn't be downloaded
- File Inventory: Complete list of papers with years and types
- Timestamp: When the download was performed
๐ง Advanced Configuration
Browser Selection Guide
| Browser | Best For | Advantages | Notes |
|---|---|---|---|
| Edge | Windows users | Built-in, no driver conflicts, memory efficient | Recommended |
| Chrome | Mac/Linux users | Wide compatibility, stable | May need driver updates |
| Auto | Uncertain | Detects best available | Falls back to Edge โ Chrome |
Performance Tuning
# For slow connections
python main.py --timeout 60 --max-retries 5
# For fast connections
python main.py --timeout 15 --max-retries 2
# For debug/troubleshooting
python main.py --verbose --show-browser
Output Directory Options
- Default:
./downloads(project folder) - Custom: Any valid path (e.g.,
C:/Users/Student/Papers) - Interactive: Choose during runtime
- Auto: Use
--no-location-promptto skip selection
๐งช Testing & Validation
Built-in Test Scripts
1. Complete System Test
python test_setup.py
Tests Python environment, dependencies, browser drivers, and network connectivity
2. Authentication Test
python test_login.py
Tests only the UM login process (useful for credential verification)
3. Search Functionality Test
python test_search_debug.py
Tests paper search without downloading
Validation Features
- PDF Integrity Check: Verifies downloaded files are valid PDFs
- Size Validation: Ensures files aren't empty or corrupted
- Download Verification: Confirms all expected papers were downloaded
- ZIP Integrity: Validates ZIP file creation and contents
๐ ๏ธ Technical Architecture
Modular Components
1. Authentication (auth/um_authenticator.py)
- Handles complex UM OpenAthens SAML authentication
- Manages session cookies and security tokens
- Supports multiple browser backends
2. Paper Discovery (scraper/paper_scraper.py)
- Searches UM repository by subject code
- Extracts paper metadata (year, semester, type)
- Handles pagination and result filtering
3. Download Engine (downloader/pdf_downloader.py)
- Concurrent multi-threaded downloads
- Progress tracking with visual indicators
- Retry logic with exponential backoff
- File integrity validation
4. Archive Creator (utils/zip_creator.py)
- Intelligent file organization by year/semester
- Optimized compression algorithms
- Auto-generated documentation
- Metadata preservation
5. Logging System (utils/logger.py)
- Structured logging with multiple levels
- Separate log files for debugging
- Performance metrics and timing
Dependencies
selenium- Web automation and browser controlrequests- HTTP session managementbeautifulsoup4- HTML parsing and data extractiontqdm- Progress bars and status indicatorswebdriver-manager- Automatic browser driver management
๐จ Troubleshooting
Common Issues & Solutions
โ Login Failed
- โ Verify username/password are correct
- โ Check if your UM account is active
- โ
Try using Edge browser:
--browser edge - โ
Enable debug mode:
--verbose --show-browser
โ No Papers Found
- โ Verify subject code is correct (e.g., WIA1005, not wia1005)
- โ Check if papers exist for that subject
- โ Try different semester/year variations
โ Download Errors
- โ Check internet connection stability
- โ
Increase timeout:
--timeout 60 - โ
Increase retries:
--max-retries 5 - โ Check disk space in output directory
โ Browser/WebDriver Issues
- โ
Windows users: Use Edge first:
--browser edge - โ Update browser to latest version
- โ
Try:
pip install --upgrade webdriver-manager - โ
See
TROUBLESHOOTING.mdfor detailed solutions
Exit Codes
0- Success1- Authentication failure2- Network connectivity issues3- No papers found or download failed4- File system permissions error130- User cancelled (Ctrl+C)
๐ Performance Metrics
Typical Performance
- Authentication: 5-10 seconds
- Paper Search: 2-5 seconds
- Download Speed: 2-5 MB/s per file (concurrent)
- ZIP Creation: 1-3 seconds
- Total Time: 30 seconds - 2 minutes (depending on paper count)
Optimization Features
- Concurrent Downloads: Up to 4 simultaneous downloads
- Intelligent Caching: Avoids re-downloading existing files
- Compressed Archives: ZIP compression reduces file size by 10-30%
- Progress Tracking: Real-time ETA and speed indicators
โ๏ธ Legal & Academic Use
Terms of Use
- โ Educational Purpose Only: For UM students' academic use
- โ Respect UM Policies: Adheres to university terms of service
- โ No Circumvention: Uses standard authentication methods
- โ Rate Limiting: Respects server load limits
- โ Valid Credentials Required: Must have active UM account
What This Tool Does NOT Do
- โ Bypass any security measures
- โ Access restricted content
- โ Store or share credentials
- โ Violate copyright or academic policies
- โ Access content you don't have permission for
๐ก Tips for Best Experience
For Windows Users
# Recommended command for Windows
python main.py --browser edge --subject-code WIA1005
For Mac/Linux Users
# Recommended command for Mac/Linux
python main.py --browser chrome --subject-code WIA1005
For Slow Connections
python main.py --timeout 90 --max-retries 5 --subject-code WIA1005
For Batch Processing
# Create a batch script for multiple subjects
python main.py -s WIA1005 --no-location-prompt -o "./Papers/WIA1005"
python main.py -s WXES1116 --no-location-prompt -o "./Papers/WXES1116"
๐ค Support & Contributing
Getting Help
- ๐ Read
TROUBLESHOOTING.md- Comprehensive solution guide - ๐ Check logs - Review log files for detailed error information
- ๐งช Run tests - Use
python test_setup.pyto validate environment - ๐ Try Edge browser - Often resolves driver issues:
--browser edge
Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Submit a pull request
- Follow existing code style and documentation standards
๐ Disclaimer
Disclaimer: This tool is an unofficial utility created to help UM students access past year papers more efficiently. It is not affiliated with or endorsed by University Malaya. Users are responsible for complying with UM's terms of service and academic policies.
๐ฏ Quick Command Cheat Sheet
# Basic usage
python main.py
# Fast automated mode
python main.py -u username -s WIA1005 --no-location-prompt
# Debug mode
python main.py --verbose --show-browser -s WIA1005
# High performance
python main.py --max-retries 5 --timeout 60 -s WXES1116
# Custom location
python main.py -o "C:/Papers" -s CSC1025
# Windows optimized
python main.py --browser edge -s WIA1005
Time to lock in for your final
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file umpaper_fetch-1.0.0.tar.gz.
File metadata
- Download URL: umpaper_fetch-1.0.0.tar.gz
- Upload date:
- Size: 48.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/2.0.7 tqdm/4.66.1 importlib-metadata/6.8.0 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9937fc31e60ddb10520a13f032ebb13f0b41c93d3448a75e1721f51fff11322
|
|
| MD5 |
933ebd8cfb50711fd826c23eceb8da22
|
|
| BLAKE2b-256 |
7cecae15258f4ea41600e4ddea50b8b850452e4deb5527186c39b83981d39298
|
File details
Details for the file umpaper_fetch-1.0.0-py3-none-any.whl.
File metadata
- Download URL: umpaper_fetch-1.0.0-py3-none-any.whl
- Upload date:
- Size: 46.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/2.0.7 tqdm/4.66.1 importlib-metadata/6.8.0 keyring/25.6.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
653e76f5b7aa322016afdb2187887c466f457e17d0a1d180abc7b1288cfd5049
|
|
| MD5 |
26e72e715e11e369b1f97c2fe98471ec
|
|
| BLAKE2b-256 |
6a373a4867ff8b3f4340b20ab3f73ef7e8f455524dc0823f518f5b212859925a
|