Skip to main content

A package to scrape professional and college swimming data.

Project description

SwimScraper

Installation

  • You can install SwimScraper using pip: pip install SwimScraper
  • An example of one way to use the scraping functions:
from SwimScraper import SwimScraper as ss

#gets Pitt men roster for 2020
pitt_M_roster_2020 = ss.getRoster(team = 'University of Pittsburgh', team_ID = 405, gender = 'M', year = 2020)

#gets list of all meets that Pitt participated in for 2020
pitt_meetlist_2020 = ss.getTeamMeetList(team_name = 'University of Pittsburgh', team_ID = 405, year = 2020)

Scraping Functions

Getting Team Data

  • getCollegeTeams(team_names, conference_names, division_names) -> returns list of teams where each team has a team_name, team_ID, team_state, team_division, team_division_ID, team_conference, team_conference_ID
    • Select one of the three inputs:
    • team_names - team_list = ss.getCollegeTeams(team_names = ['University of Pittsburgh', 'University of Louisville'])
    • conference_names - ACC_teams = ss.getCollegeTeams(conference_names = ['ACC'])
    • division_names - d1_teams = ss.getCollegeTeams(division_names = ['Division 1'])
  • getTeamRankingsList(gender, season_ID, year) -> returns list of top 50 countries where each team has a team_name, team_ID, and swimcloud_points (score given by swimcloud.com based on team's fastest times)
    • Select a gender and either a season_ID (e.g., 19 for the 2015-16 season, 24 for the 2020-21 season) or year
    • season_ID - male_rankings_2015 = ss.getTeamRankingsList('M', season_ID = 19)
    • year - female_rankings_2019 = ss.getTeamRankingsList('F', year = 2019)

Getting Roster Data

  • getRoster(team, gender, team_ID, season_ID, year, pro) -> returns list of swimmers where each swimmer has a swimmer_name, swimmer_ID, team_name, team_ID, grade, hometown_state, hometown_city, HS_power_index (a score given to high school students for recruiting - scale is from 1.00 (best) to 100.00)
    • Select a gender, a team name or team_ID, a season_ID or year, and set pro = True for non-College teams
    • team - pitt_F_roster_2020 = ss.getRoster(team = 'University of Pittsburgh', gender = 'F', year = 2020)
    • team_ID - boston_college_M_roster_2018 = ss.getRoster(team = '', team_ID = 228, gender = 'M', season_ID = 22)
    • pro - japan_M_roster_2020 = ss.getRoster(team = 'Japan', team_ID = 10008082, gender = 'M', year = 2020, pro = True)
  • getHSRecruitRankings(class_year, gender, state, state_abbreviation, international) -> returns list of the top 200 High School recruits from the specified class where each swimmer has a swimmer_name, swimmer_ID, team_name, team_ID, hometown_state, hometown_city, HS_power_index
    • Select a year, gender, a state or state_abbreviation, and set international = True for international HS students
    • male_recruits_2018 = ss.getHSRecruitRankings(2018, 'M')
    • state - female_recruits_2020_Hawaii = ss.getHSRecruitRankings(2020, 'F', state = 'Hawaii')
    • state_abbreviation - female_recruits_2020_Hawaii = ss.getHSRecruitRankings(2020, 'F', state_abbreviation = 'HI')

Getting Swimmer Data

  • getPowerIndex(swimmer_ID) -> returns a swimmer's HS recruiting power index
    • swimmer_433591_power_index = ss.getPowerIndex(433591)
  • getSwimmerEvents(swimmer_ID) -> returns a list of all events that the specified swimmer has participated in
    • swimmer_362091_event_list = ss.getSwimmerEvents(362091)
  • getSwimmerTimes(swimmer_ID, event_name, event_ID) -> returns a list of all of the swimmer's times in the specified event where each time has a swimmer_ID, pool_type, event, event_ID, time, meet_name, year, date, improvement (improvement from last time)
    • event_name - swimmer_257824_50free_times = ss.getSwimmerTimes(257824, '50 Free')
    • event_ID - swimmer_257824_50free_times = ss.getSwimmerTimes(257824, '', event_ID = 150)

Getting Meet Data

  • getTeamMeetList(team_name, team_ID, season_ID, year) -> returns a list of all the meets the team has competed in for the specififed season or year where each meet has a team_ID, meet_ID, meet_name, meet_date, meet_location
    • pitt_2019_meet_list = ss.getTeamMeetList(team_name = 'University of Pittsburgh', year = 2019)
    • USA_2019_meet_list = ss.getTeamMeetList(team_name = '', team_ID = 10008158, season_ID = 23)
  • getMeetEventList(meet_ID) -> returns a list of which events took place at the specified meet where each event has an event_name, event_ID and an event_href which can be used as an input in the following functions that get meet results
    • olympics_2012_event_list = ss.getMeetEventList(196380)
  • getCollegeMeetResults(meet_ID, event_name, gender, event_ID, event_href) -> returns a list of all times for the specified event where each time has a meet_ID, swimmer_name, swimmer_ID, team_name, team_ID, event_name, event_ID, event_type (prelims, finals,...), time, score, and improvement
    • event_name - pitt_army_100free_results = ss.getCollegeMeetResults(190690,'100 Free', 'F')
    • event_ID - pitt_army_100free_results = ss.getCollegeMeetResults(190690, '', 'F', event_ID = 1100)
    • event_href (from getMeetEventList) - pitt_army_100free_results = ss.getCollegeMeetResults(190690, '', 'F', event_href = '/results/190690/event/17/')
  • getProMeetResults(meet_ID, event_name, gender, event_ID, event_href) -> returns a list of all times for the specified event where each time has a meet_ID, swimmer_name, swimmer_ID, team_name, team_ID, event_name, event_ID, event_type (prelims, finals,...), time, FINA_score, and improvement
    • olympics2016_200free_male_times = ss.getProMeetResults(106117, event_name = '200 Free', gender = 'M')
    • olympics2016_400medleyrelay_women_times = ss.getProMeetResults(106117, event_name = '', gender = 'F', event_ID = 7400)
    • olympics2012_50free_women_times = ss.getProMeetResults(196380, event_name = '', gender = 'F', event_href = '/results/196380/event/1/')

Other Helper Functions

  • getTeamID(team_name) - gets corresponding team_ID for the specified team *currently only for college teams
  • getTeamName(team_ID) - gets team_name for the specified team_ID *currently only for college teams
  • getSeasonID(year) - gets season ID for a specified year
  • getYear(season_ID) - gets year for a specified season_ID
  • getEventID(event_name) - gets event_ID for a specified event_name
  • getEventName(event_ID) - gets event_name for a specified event_ID
  • convertTime(display_time) - converts a time of the format minutes:seconds (1:53.8) to seconds (113.8)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SwimScraper-0.0.4.tar.gz (15.4 kB view hashes)

Uploaded Source

Built Distribution

SwimScraper-0.0.4-py3-none-any.whl (14.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page