Skip to main content

A package to scrape professional and college swimming data.

Project description

SwimScraper

Installation

  • You can install SwimScraper using pip: pip install SwimScraper
  • An example of one way to use the scraping functions:
from SwimScraper import SwimScraper as ss

#gets Pitt men roster for 2020
pitt_M_roster_2020 = ss.getRoster(team = 'University of Pittsburgh', team_ID = 405, gender = 'M', year = 2020)

#gets list of all meets that Pitt participated in for 2020
pitt_meetlist_2020 = ss.getTeamMeetList(team_name = 'University of Pittsburgh', team_ID = 405, year = 2020)

Scraping Functions

Getting Team Data

  • getCollegeTeams(team_names, conference_names, division_names) -> returns list of teams where each team has a team_name, team_ID, team_state, team_division, team_division_ID, team_conference, team_conference_ID
    • Select one of the three inputs:
    • team_names - team_list = ss.getCollegeTeams(team_names = ['University of Pittsburgh', 'University of Louisville'])
    • conference_names - ACC_teams = ss.getCollegeTeams(division_names = ['ACC'])
    • division_names - d1_teams = ss.getCollegeTeams(conference_names = ['Division 1'])
  • getTeamRankingsList(gender, season_ID, year) -> returns list of top 50 countries where each team has a team_name, team_ID, and swimcloud_points (score given by swimcloud.com based on team's fastest times)
    • Select a gender and either a season_ID (e.g., 19 for the 2015-16 season, 24 for the 2020-21 season) or year
    • season_ID - male_rankings_2015 = ss.getTeamRankingsList('M', season_ID = 19)
    • year - female_rankings_2019 = ss.getTeamRankingsList('F', year = 2019)

Getting Roster Data

  • getRoster(team, gender, team_ID, season_ID, year, pro) -> returns list of swimmers where each swimmer has a swimmer_name, swimmer_ID, team_name, team_ID, grade, hometown_state, hometown_city, HS_power_index (a score given to high school students for recruiting - scale is from 1.00 (best) to 100.00)
    • Select a gender, a team name or team_ID, a season_ID or year, and set pro = True for non-College teams
    • team - pitt_F_roster_2020 = ss.getRoster(team = 'University of Pittsburgh', gender = 'F', year = 2020)
    • team_ID - boston_college_M_roster_2018 = ss.getRoster(team = '', team_ID = 228, gender = 'M', season_ID = 22)
    • pro - japan_M_roster_2020 = ss.getRoster(team = 'Japan', team_ID = 10008082, gender = 'M', year = 2020, pro = True)
  • getHSRecruitRankings(class_year, gender, state, state_abbreviation, international) -> returns list of the top 200 High School recruits from the specified class where each swimmer has a swimmer_name, swimmer_ID, team_name, team_ID, hometown_state, hometown_city, HS_power_index
    • Select a year, gender, a state or state_abbreviation, and set international = True for international HS students
    • male_recruits_2018 = ss.getHSRecruitRankings(2018, 'M')
    • state - female_recruits_2020_Hawaii = ss.getHSRecruitRankings(2020, 'F', state = 'Hawaii')
    • state_abbreviation - female_recruits_2020_Hawaii = ss.getHSRecruitRankings(2020, 'F', state_abbreviation = 'HI')

Getting Swimmer Data

  • getPowerIndex(swimmer_ID) -> returns a swimmer's HS recruiting power index
    • swimmer_433591_power_index = ss.getPowerIndex(433591)
  • getSwimmerEvents(swimmer_ID) -> returns a list of all events that the specified swimmer has participated in
    • swimmer_362091_event_list = ss.getSwimmerEvents(362091)
  • getSwimmerTimes(swimmer_ID, event_name, event_ID) -> returns a list of all of the swimmer's times in the specified event where each time has a swimmer_ID, pool_type, event, event_ID, time, meet_name, year, date, improvement (improvement from last time)
    • event_name - swimmer_257824_50free_times = ss.getSwimmerTimes(257824, '50 Free')
    • event_ID - swimmer_257824_50free_times = ss.getSwimmerTimes(257824, '', event_ID = 150)

Getting Meet Data

  • getTeamMeetList(team_name, team_ID, season_ID, year) -> returns a list of all the meets the team has competed in for the specififed season or year where each meet has a team_ID, meet_ID, meet_name, meet_date, meet_location
    • pitt_2019_meet_list = ss.getTeamMeetList(team_name = 'University of Pittsburgh', year = 2019)
    • USA_2019_meet_list = ss.getTeamMeetList(team_name = '', team_ID = 10008158, season_ID = 23)
  • getMeetEventList(meet_ID) -> returns a list of which events took place at the specified meet where each event has an event_name, event_ID and an event_href which can be used as an input in the following functions that get meet results
    • olympics_2012_event_list = ss.getMeetEventList(196380)
  • getCollegeMeetResults(meet_ID, event_name, gender, event_ID, event_href) -> returns a list of all times for the specified event where each time has a meet_ID, swimmer_name, swimmer_ID, team_name, team_ID, event_name, event_ID, event_type (prelims, finals,...), time, score, and improvement
    • event_name - pitt_army_100free_results = ss.getCollegeMeetResults(190690,'100 Free', 'F')
    • event_ID - pitt_army_100free_results = ss.getCollegeMeetResults(190690, '', 'F', event_ID = 1100)
    • event_href (from getMeetEventList) - pitt_army_100free_results = ss.getCollegeMeetResults(190690, '', 'F', event_href = '/results/190690/event/17/')
  • getProMeetResults(meet_ID, event_name, gender, event_ID, event_href) -> returns a list of all times for the specified event where each time has a meet_ID, swimmer_name, swimmer_ID, team_name, team_ID, event_name, event_ID, event_type (prelims, finals,...), time, FINA_score, and improvement
    • olympics2016_200free_male_times = ss.getProMeetResults(106117, event_name = '200 Free', gender = 'M')
    • olympics2016_400medleyrelay_women_times = ss.getProMeetResults(106117, event_name = '', gender = 'F', event_ID = 7400)
    • olympics2012_50free_women_times = ss.getProMeetResults(196380, event_name = '', gender = 'F', event_href = '/results/196380/event/1/')

Other Helper Functions

  • getTeamID(team_name) - gets corresponding team_ID for the specified team *currently only for college teams
  • getTeamName(team_ID) - gets team_name for the specified team_ID *currently only for college teams
  • getSeasonID(year) - gets season ID for a specified year
  • getYear(season_ID) - gets year for a specified season_ID
  • getEventID(event_name) - gets event_ID for a specified event_name
  • getEventName(event_ID) - gets event_name for a specified event_ID
  • convertTime(display_time) - converts a time of the format minutes:seconds (1:53.8) to seconds (113.8)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SwimScraper-0.0.2.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

SwimScraper-0.0.2-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file SwimScraper-0.0.2.tar.gz.

File metadata

  • Download URL: SwimScraper-0.0.2.tar.gz
  • Upload date:
  • Size: 15.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.2

File hashes

Hashes for SwimScraper-0.0.2.tar.gz
Algorithm Hash digest
SHA256 19bb770cf4eb7fef16ea1ba938adc3893f97eaa10379ae7fbf6311006d82144b
MD5 1437fc8ebf6db6db5f3c0e31dfc9f087
BLAKE2b-256 dbdacc031aaa604d8d2a99b894ab6eeb107a5093618acfc332be7363bd21d6e1

See more details on using hashes here.

File details

Details for the file SwimScraper-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: SwimScraper-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 14.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.2

File hashes

Hashes for SwimScraper-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4be4431b471cabb5da8a0c78e5fa1c37b830aaedf66fa33d23ab20752e4aabec
MD5 366b25612e1231b84d694bce0365d8d6
BLAKE2b-256 8fb59b741b2150b0ffd3862f0eb576449196fb583e33e0c4a7eeadc86cea5e1e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page