Skip to main content

A comprehensive toolkit for scraping high school data

Project description

hs-scraper-toolkit

A comprehensive toolkit for scraping high school sports data from various athletic websites including MaxPreps and Athletic.net.

Features

  • MaxPreps Roster Scraping: Extract detailed roster information including player names, numbers, positions, grades, and more
  • Athletic.net Track & Field Data: Scrape athlete rosters and event schedules for track & field and cross country
  • Flexible Filtering: Filter data by sport, gender, season, and competition level
  • Easy Integration: Simple Python classes with pandas DataFrame outputs

Installation

# Install from local directory
pip install .

# Install with development dependencies
pip install -e ".[dev]"

Dependencies

  • beautifulsoup4>=4.9.0 - HTML parsing
  • requests>=2.25.0 - HTTP requests
  • pandas>=1.3.0 - Data manipulation
  • selenium>=4.0.0 - Web automation (for Athletic.net)

Quick Start

MaxPreps Roster Scraping

from hs_scraper_toolkit.Athletics.MaxPrepRoster import MaxPrepRoster

# Initialize scraper with team URL
scraper = MaxPrepRoster("https://www.maxpreps.com/il/chicago/northside-mustangs")

# Scrape all available sports
roster_data = scraper.scrape()

# Filter by specific criteria
basketball_data = scraper.scrape(
    sports=['basketball'],
    genders=['boys'],
    seasons=['winter'],
    levels=['varsity']
)

print(f"Found {len(roster_data)} athletes")
print(roster_data.head())

Athletic.net Track & Field Scraping

from hs_scraper_toolkit.Athletics.AthleticNetTrackField import AthleticNetTrackField

# Initialize scraper with team URL
scraper = AthleticNetTrackField("https://www.athletic.net/team/19718")

# Scrape athlete rosters
athletes = scraper.scrape_athletes(['cross-country', 'track-and-field-outdoor'])

# Scrape event schedules
events = scraper.scrape_events(['cross-country'], [2024, 2025])

print(f"Found {len(athletes)} athletes")
print(f"Found {len(events)} events")

Athletics Module

MaxPrepRoster

Scrapes roster data from MaxPreps team pages.

Supported Data:

  • Athlete names and jersey numbers
  • Sports, seasons, and competition levels
  • Player positions and grade levels
  • Gender categories

Supported Sports: All sports available on MaxPreps (basketball, football, soccer, etc.)

AthleticNetTrackField

Scrapes track & field and cross country data from Athletic.net using Selenium WebDriver.

Supported Data:

  • Athlete rosters with names and gender
  • Event schedules with dates and locations
  • Meet information and venues

Supported Sports:

  • Cross Country (cross-country)
  • Outdoor Track & Field (track-and-field-outdoor)
  • Indoor Track & Field (track-and-field-indoor)

Requirements:

  • ChromeDriver must be installed and accessible in PATH
  • Stable internet connection (scraping may take several minutes)

Data Output

Both scrapers return pandas DataFrames with standardized column structures:

Athlete Data Columns

  • name: Athlete name
  • number: Jersey number (0 for Athletic.net)
  • sport: Sport type
  • season: Season (fall/winter/spring)
  • level: Competition level (varsity/jv/freshman)
  • gender: Gender (boys/girls)
  • grade: Grade level (9/10/11/12 or N/A)
  • position: Player position (N/A for track/field)

Event Data Columns (Athletic.net only)

  • name: Event/meet name
  • date: Event date
  • time: Event time
  • gender: Gender category
  • sport: Sport type
  • level: Competition level
  • opponent: Opposing teams
  • location: Event venue
  • home: Home event indicator

Examples

See the example/main.py file for comprehensive usage examples.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

License

MIT License - see LICENSE file for details.

Support

For issues, questions, or contributions, please visit the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hs_scraper_toolkit-1.0.1.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hs_scraper_toolkit-1.0.1-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file hs_scraper_toolkit-1.0.1.tar.gz.

File metadata

  • Download URL: hs_scraper_toolkit-1.0.1.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.8

File hashes

Hashes for hs_scraper_toolkit-1.0.1.tar.gz
Algorithm Hash digest
SHA256 51afc1d92827a5a7bb706a2d56490f46eb5b9620469b58da18654898ab7718da
MD5 f395739c2f01477d71ca1fb841604b30
BLAKE2b-256 cd8aeba8e421bf87718d759cd078efbb75684429f21c087bd524b28dc200b2da

See more details on using hashes here.

File details

Details for the file hs_scraper_toolkit-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for hs_scraper_toolkit-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 420d18690b254d6701f7edb2648f26404047b2143086aeea743f88d06c602faf
MD5 b62dfa34036206e8ee5f955e884014b3
BLAKE2b-256 1e9031d9f09ca07e53573b1a21ed34dc5899807a47c4c61ac3c6c96cfe41943e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page