Skip to main content

A package to scrape professional and college swimming data.

Project description

SwimScraper

Installation

  • You can install SwimScraper using pip: pip install SwimScraper
  • An example of one way to use the scraping functions:
from SwimScraper import SwimScraper as ss

#gets Pitt men roster for 2020
pitt_M_roster_2020 = ss.getRoster(team = 'University of Pittsburgh', team_ID = 405, gender = 'M', year = 2020)

#gets list of all meets that Pitt participated in for 2020
pitt_meetlist_2020 = ss.getTeamMeetList(team_name = 'University of Pittsburgh', team_ID = 405, year = 2020)

Scraping Functions

Getting Team Data

  • getCollegeTeams(team_names, conference_names, division_names) -> returns list of teams where each team has a team_name, team_ID, team_state, team_division, team_division_ID, team_conference, team_conference_ID
    • Select one of the three inputs:
    • team_names - team_list = ss.getCollegeTeams(team_names = ['University of Pittsburgh', 'University of Louisville'])
    • conference_names - ACC_teams = ss.getCollegeTeams(conference_names = ['ACC'])
    • division_names - d1_teams = ss.getCollegeTeams(division_names = ['Division 1'])
  • getTeamRankingsList(gender, season_ID, year) -> returns list of top 50 countries where each team has a team_name, team_ID, and swimcloud_points (score given by swimcloud.com based on team's fastest times)
    • Select a gender and either a season_ID (e.g., 19 for the 2015-16 season, 24 for the 2020-21 season) or year
    • season_ID - male_rankings_2015 = ss.getTeamRankingsList('M', season_ID = 19)
    • year - female_rankings_2019 = ss.getTeamRankingsList('F', year = 2019)

Getting Roster Data

  • getRoster(team, gender, team_ID, season_ID, year, pro) -> returns list of swimmers where each swimmer has a swimmer_name, swimmer_ID, team_name, team_ID, grade, hometown_state, hometown_city, HS_power_index (a score given to high school students for recruiting - scale is from 1.00 (best) to 100.00)
    • Select a gender, a team name or team_ID, a season_ID or year, and set pro = True for non-College teams
    • team - pitt_F_roster_2020 = ss.getRoster(team = 'University of Pittsburgh', gender = 'F', year = 2020)
    • team_ID - boston_college_M_roster_2018 = ss.getRoster(team = '', team_ID = 228, gender = 'M', season_ID = 22)
    • pro - japan_M_roster_2020 = ss.getRoster(team = 'Japan', team_ID = 10008082, gender = 'M', year = 2020, pro = True)
  • getHSRecruitRankings(class_year, gender, state, state_abbreviation, international) -> returns list of the top 200 High School recruits from the specified class where each swimmer has a swimmer_name, swimmer_ID, team_name, team_ID, hometown_state, hometown_city, HS_power_index
    • Select a year, gender, a state or state_abbreviation, and set international = True for international HS students
    • male_recruits_2018 = ss.getHSRecruitRankings(2018, 'M')
    • state - female_recruits_2020_Hawaii = ss.getHSRecruitRankings(2020, 'F', state = 'Hawaii')
    • state_abbreviation - female_recruits_2020_Hawaii = ss.getHSRecruitRankings(2020, 'F', state_abbreviation = 'HI')

Getting Swimmer Data

  • getPowerIndex(swimmer_ID) -> returns a swimmer's HS recruiting power index
    • swimmer_433591_power_index = ss.getPowerIndex(433591)
  • getSwimmerEvents(swimmer_ID) -> returns a list of all events that the specified swimmer has participated in
    • swimmer_362091_event_list = ss.getSwimmerEvents(362091)
  • getSwimmerTimes(swimmer_ID, event_name, event_ID) -> returns a list of all of the swimmer's times in the specified event where each time has a swimmer_ID, pool_type, event, event_ID, time, meet_name, year, date, improvement (improvement from last time)
    • event_name - swimmer_257824_50free_times = ss.getSwimmerTimes(257824, '50 Free')
    • event_ID - swimmer_257824_50free_times = ss.getSwimmerTimes(257824, '', event_ID = 150)

Getting Meet Data

  • getTeamMeetList(team_name, team_ID, season_ID, year) -> returns a list of all the meets the team has competed in for the specififed season or year where each meet has a team_ID, meet_ID, meet_name, meet_date, meet_location
    • pitt_2019_meet_list = ss.getTeamMeetList(team_name = 'University of Pittsburgh', year = 2019)
    • USA_2019_meet_list = ss.getTeamMeetList(team_name = '', team_ID = 10008158, season_ID = 23)
  • getMeetEventList(meet_ID) -> returns a list of which events took place at the specified meet where each event has an event_name, event_ID and an event_href which can be used as an input in the following functions that get meet results
    • olympics_2012_event_list = ss.getMeetEventList(196380)
  • getCollegeMeetResults(meet_ID, event_name, gender, event_ID, event_href) -> returns a list of all times for the specified event where each time has a meet_ID, swimmer_name, swimmer_ID, team_name, team_ID, event_name, event_ID, event_type (prelims, finals,...), time, score, and improvement
    • event_name - pitt_army_100free_results = ss.getCollegeMeetResults(190690,'100 Free', 'F')
    • event_ID - pitt_army_100free_results = ss.getCollegeMeetResults(190690, '', 'F', event_ID = 1100)
    • event_href (from getMeetEventList) - pitt_army_100free_results = ss.getCollegeMeetResults(190690, '', 'F', event_href = '/results/190690/event/17/')
  • getProMeetResults(meet_ID, event_name, gender, event_ID, event_href) -> returns a list of all times for the specified event where each time has a meet_ID, swimmer_name, swimmer_ID, team_name, team_ID, event_name, event_ID, event_type (prelims, finals,...), time, FINA_score, and improvement
    • olympics2016_200free_male_times = ss.getProMeetResults(106117, event_name = '200 Free', gender = 'M')
    • olympics2016_400medleyrelay_women_times = ss.getProMeetResults(106117, event_name = '', gender = 'F', event_ID = 7400)
    • olympics2012_50free_women_times = ss.getProMeetResults(196380, event_name = '', gender = 'F', event_href = '/results/196380/event/1/')

Other Helper Functions

  • getTeamID(team_name) - gets corresponding team_ID for the specified team *currently only for college teams
  • getTeamName(team_ID) - gets team_name for the specified team_ID *currently only for college teams
  • getSeasonID(year) - gets season ID for a specified year
  • getYear(season_ID) - gets year for a specified season_ID
  • getEventID(event_name) - gets event_ID for a specified event_name
  • getEventName(event_ID) - gets event_name for a specified event_ID
  • convertTime(display_time) - converts a time of the format minutes:seconds (1:53.8) to seconds (113.8)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SwimScraper-0.0.4.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

SwimScraper-0.0.4-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file SwimScraper-0.0.4.tar.gz.

File metadata

  • Download URL: SwimScraper-0.0.4.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.2

File hashes

Hashes for SwimScraper-0.0.4.tar.gz
Algorithm Hash digest
SHA256 2cc22cda6ade14a059bf262b3351d515f43f89bc1675147a1571423029a237af
MD5 a6fdde425d5f16d3c4abc478d5baddff
BLAKE2b-256 742329146ffdccd5ddb15ade290a0e088dcf9287bb7c6c2fd07f1bc917e3adfe

See more details on using hashes here.

File details

Details for the file SwimScraper-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: SwimScraper-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 14.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.2

File hashes

Hashes for SwimScraper-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 807ce078fe3f6d0112cc5e2f5f033c38c880510bf8e3b6a945f1e5540bfab7b7
MD5 9138e925a6c0cea6fb839f578d931ccd
BLAKE2b-256 8be1a3e266d3e56849ca3f58cc34214c928191918beed27216cc636e13f39ebe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page