A comprehensive toolkit for scraping high school data
Project description
hs-scraper-toolkit
A comprehensive toolkit for scraping high school sports data from various athletic websites including MaxPreps and Athletic.net.
Features
- MaxPreps Roster Scraping: Extract detailed roster information including player names, numbers, positions, grades, and more
- Athletic.net Track & Field Data: Scrape athlete rosters and event schedules for track & field and cross country
- Flexible Filtering: Filter data by sport, gender, season, and competition level
- Easy Integration: Simple Python classes with pandas DataFrame outputs
Installation
# Install from local directory
pip install .
# Install with development dependencies
pip install -e ".[dev]"
Dependencies
beautifulsoup4>=4.9.0- HTML parsingrequests>=2.25.0- HTTP requestspandas>=1.3.0- Data manipulationselenium>=4.0.0- Web automation (for Athletic.net)
Quick Start
MaxPreps Roster Scraping
from hs_scraper_toolkit.Athletics.MaxPrepRoster import MaxPrepRoster
# Initialize scraper with team URL
scraper = MaxPrepRoster("https://www.maxpreps.com/il/chicago/northside-mustangs")
# Scrape all available sports
roster_data = scraper.scrape()
# Filter by specific criteria
basketball_data = scraper.scrape(
sports=['basketball'],
genders=['boys'],
seasons=['winter'],
levels=['varsity']
)
print(f"Found {len(roster_data)} athletes")
print(roster_data.head())
Athletic.net Track & Field Scraping
from hs_scraper_toolkit.Athletics.AthleticNetTrackField import AthleticNetTrackField
# Initialize scraper with team URL
scraper = AthleticNetTrackField("https://www.athletic.net/team/19718")
# Scrape athlete rosters
athletes = scraper.scrape_athletes(['cross-country', 'track-and-field-outdoor'])
# Scrape event schedules
events = scraper.scrape_events(['cross-country'], [2024, 2025])
print(f"Found {len(athletes)} athletes")
print(f"Found {len(events)} events")
Athletics Module
MaxPrepRoster
Scrapes roster data from MaxPreps team pages.
Supported Data:
- Athlete names and jersey numbers
- Sports, seasons, and competition levels
- Player positions and grade levels
- Gender categories
Supported Sports: All sports available on MaxPreps (basketball, football, soccer, etc.)
AthleticNetTrackField
Scrapes track & field and cross country data from Athletic.net using Selenium WebDriver.
Supported Data:
- Athlete rosters with names and gender
- Event schedules with dates and locations
- Meet information and venues
Supported Sports:
- Cross Country (
cross-country) - Outdoor Track & Field (
track-and-field-outdoor) - Indoor Track & Field (
track-and-field-indoor)
Requirements:
- ChromeDriver must be installed and accessible in PATH
- Stable internet connection (scraping may take several minutes)
Data Output
Both scrapers return pandas DataFrames with standardized column structures:
Athlete Data Columns
name: Athlete namenumber: Jersey number (0 for Athletic.net)sport: Sport typeseason: Season (fall/winter/spring)level: Competition level (varsity/jv/freshman)gender: Gender (boys/girls)grade: Grade level (9/10/11/12 or N/A)position: Player position (N/A for track/field)
Event Data Columns (Athletic.net only)
name: Event/meet namedate: Event datetime: Event timegender: Gender categorysport: Sport typelevel: Competition levelopponent: Opposing teamslocation: Event venuehome: Home event indicator
Examples
See the example/main.py file for comprehensive usage examples.
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
License
MIT License - see LICENSE file for details.
Support
For issues, questions, or contributions, please visit the GitHub repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hs_scraper_toolkit-1.0.1.tar.gz.
File metadata
- Download URL: hs_scraper_toolkit-1.0.1.tar.gz
- Upload date:
- Size: 15.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51afc1d92827a5a7bb706a2d56490f46eb5b9620469b58da18654898ab7718da
|
|
| MD5 |
f395739c2f01477d71ca1fb841604b30
|
|
| BLAKE2b-256 |
cd8aeba8e421bf87718d759cd078efbb75684429f21c087bd524b28dc200b2da
|
File details
Details for the file hs_scraper_toolkit-1.0.1-py3-none-any.whl.
File metadata
- Download URL: hs_scraper_toolkit-1.0.1-py3-none-any.whl
- Upload date:
- Size: 10.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
420d18690b254d6701f7edb2648f26404047b2143086aeea743f88d06c602faf
|
|
| MD5 |
b62dfa34036206e8ee5f955e884014b3
|
|
| BLAKE2b-256 |
1e9031d9f09ca07e53573b1a21ed34dc5899807a47c4c61ac3c6c96cfe41943e
|