Skip to main content

Python horse racing class library

Project description

This project aims to provide horse racing data management and analysis services in Python.

Build Status Coverage Status Code Health

Installation

Prior to using racing_data, the package must be installed in your current Python environment. In most cases, an automated installation via PyPI and pip will suffice, as follows:

pip install racing_data

If you would prefer to gain access to new (unstable) features via a pre-release version of the package, specify the ‘pre’ option when calling pip, as follows:

pip install --pre racing_data

To gain access to bleeding edge developments, the package can be installed from a source distribution. To do so, you will need to clone the git repository and execute the setup.py script from the root directory of the source tree, as follows:

git clone https://github.com/justjasongreen/racing_data.git
cd racing_data
python setup.py install

If you would prefer to install the package as a symlink to the source distribution (for development purposes), execute the setup.py script with the ‘develop’ option instead, as follows:

python setup.py develop

Basic Usage

To access the functionality described below, you must first create an instance of the racing_data.Provider class. To do so, you will need to provide a compatible web scraper and a database connection. The web scraper can be any object that implements the punters_client.Scraper API, support calls such as the following:

meets = scraper.scrape_meets(date)
races = scraper.scrape_races(meet)
runners = scraper.scrape_runners(race)
horse = scraper.scrape_horse(runner)
jockey = scraper.scrape_jockey(runner)
trainer = scraper.scrape_trainer(runner)
performances = scraper.scrape_performances(horse)

The database connection can be any object that implements the pymongo.Database API, supporting calls such as the following:

documents = database[collection_name].find(query)
document['_id'] = database[collection_name].insert_one(document).inserted_id
database[collection_name].replace_one({'_id': document['_id']}, document)

racing_data has only been tested with punters_client.Scraper as the web scraper and pymongo.Database as the database connection. To set up the required dependencies in your own project using the same packages, execute the following code in your Python interpreter:

>>> import pymongo
>>> database_uri = 'mongodb://localhost:27017/racing_data'
>>> database_client = pymongo.MongoClient(database_uri)
>>> database = database_client.get_default_database()
>>> import cache_requests
>>> http_client = cache_requests.Session()
>>> from lxml import html
>>> html_parser = html.fromstring
>>> import punters_client
>>> scraper = punters_client.Scraper(http_client, html_parser)

With these dependencies in place, you can now create an instance of the racing_data.Provider class as follows:

>>> import racing_data
>>> provider = racing_data.Provider(database, scraper)

The provider instance can now be used to scrape, store and access a range of racing data, as illustrated in the following sections…

Getting Meets

Meets represent a collection of races occurring at a given track on a given date. To get a list of meets occurring on a given date, execute the following code in your Python interpreter:

>>> from datetime import datetime
>>> date = datetime(2016, 2, 1)
>>> meets = provider.get_meets_by_date(date)

The get_meets_by_date method will return a list of Meet objects. Meet objects are derived from Python’s built-in dict type, so a meet’s details can be accessed as follows:

>>> meet = meets[index]
>>> track = meet['track']

Getting Races

Races represent a collection of runners competing in a single event at a meet. To get a list of races occurring at a given meet, execute the following code in your Python interpreter (where meet is an existing Meet object obtained via the provider.get_meets_by_date method):

>>> races = meet.races

The meet.races property will return a list of Race objects. Race objects are derived from Python’s built-in dict type, so a race’s details can be accessed as follows:

>>> race = races[index]
>>> number = race['number']

In addition to the dictionary values, Race objects also provide a ‘meet’ property that can be used to get the meet at which the race occurs:

>>> meet = race.meet

Getting Runners

Runners represent a single combination of horse, jockey and trainer competing in a race. To get a list of runners competing in a given race, execute the following code in your Python interpreter (where race is an existing Race object obtained via the Meet.races property):

>>> runners = race.runners

The race.runners property will return a list of Runner objects. Runner objects are derived from Python’s built-in dict type, so a runner’s details can be accessed as follows:

>>> runner = runners[index]
>>> number = runner['number']

In addition to the dictionary values, Runner objects also provide a ‘race’ property that can be used to get the race in which the runner occurs:

>>> race = runner.race

Furthermore, Runner objects also offer the following calculated values as properties that can be accessed using dot-notation:

Property

Description

actual_distance

The race distance adjusted for the runner’s barrier and race track circ/straight values

actual_weight

The average weight of a racehorse plus the actual weight being carried by the horse

age

The age of the horse as at the date of the race

carrying

The weight being carried by the horse after allowances

result

The final result for the runner (if the race has already been run)

spell

The number of days since the horse’s previous race

starting_price

The starting price for the runner (if the race has already been run)

up

The number of races run by the horse (including this one) since a rest period of 90 days or more

In addition to the properties listed above, Runner objects also offer the following performance lists (see below) as properties that can also be accessed using dot-notation:

Property

Description

at_distance

All prior performances for the horse within 100m of the current race distance

at_distance_on_track

All prior performances for the horse within 100m of the current race distance on the same track

at_up

All prior performances for the horse with the same UP number as the current run

career

All performances for the horse prior to the current race date

last_10

The last 10 performances for the horse prior to the current race

last_12_months

All performances for the horse within 12 months prior to the current race date

on_firm

All prior performances for the horse on FIRM tracks

on_good

All prior performances for the horse on GOOD tracks

on_heavy

All prior performances for the horse on HEAVY tracks

on_soft

All prior performances for the horse on SOFT tracks

on_synthetic

All prior performances for the horse on SYNTHETIC tracks

on_track

All prior performances for the horse on the same track as the current race

on_turf

All prior performances for the horse on turf tracks (that is, NOT synthetic tracks)

since_rest

All prior performances for the horse since its last rest period of 90 days or more

with_jockey

All prior performances for the horse with the same jockey

Performance Lists

The performance list properties described above return PerformanceList objects. The PerformanceList class is derived from Python’s built-in list type, allowing easy access to the individual performances contained in the list. In addition to the built-in list functionality, PerformanceList objects also offer the following calculated values as properties that can be accessed via dot-notation:

Property

Description

earnings

The total amount earned by the horse and connections in the list

earnings_potential

The total earnings as a percentage of the total prize pools in the list

fourths

The number of fourth placed performances in the list

fourth_pct

The percentage of fourth placed performances in the list

momentums

Returns a tuple containing minimum, maximum and average momentum for the list

places

The number of placing (first/second/third) performances in the list

place_pct

The percentage of placing (first/second/third) performances in the list

result_potential

Returns 1.0 - (the sum of all results / the sum of all starters) in the list

roi

The total return on investment if a $1 WIN bet were placed on all performances in the list

seconds

The number of second placed performances in the list

second_pct

The percentage of second placed performances in the list

starting_prices

Returns a tuple containing minimum, maximum and average starting prices for the list

starts

The total number of starts in the list

thirds

The total number of third placed performances in the list

third_pct

The percentage of third placed performances in the list

wins

The total number of winning performances in the list

win_pct

The percentage of winning performances in the list

Getting Horses, Jockeys and Trainers

To get the horse, jockey or trainer associated with a given runner, execute the following code in your Python interpreter (where runner is an existing Runner object obtained via the Race.runners property):

>>> horse = runner.horse
>>> jockey = runner.jockey
>>> trainer = runner.trainer

The runner.horse, runner.jockey and runner.trainer properties will return Horse, Jockey and Trainer objects respectively. Horse, Jockey and Trainer objects are derived from Python’s built-in dict type, so a horse/jockey/trainer’s details can be accessed as follows:

>>> name = horse['name']
>>> name = jockey['name']
>>> name = trainer['name']

Getting Performances

Performances represent the results of completed runs by horses and jockeys. To get a list of performances for a given horse, execute the following code in your Python interpreter (where horse is an existing Horse object obtained via the Runner.horse property):

>>> performances = horse.performances

The horse.performances property will return a list of Performance objects. Performance objects are derived from Python’s built-in dict type, so a performance’s details can be accessed as follows:

>>> performance = performances[index]
>>> result = performance['result']

In addition to the dictionary values, Performance objects also provide ‘horse’ and ‘jockey’ properties that can be used to get the horse/jockey associated with the performance:

>>> horse = performance.horse
>>> jockey = performance.jockey

Furthermore, Performance objects also offer the following calculated values as properties that can be accessed using dot-notation:

Property

Description

actual_distance

The actual distance covered by the horse in the winning time

actual_weight

The average weight of a racehorse plus the actual weight being carried by the horse

momentum

The average momentum achieved by the horse/jockey during the run

profit

The profit/loss on a $1 bet on this performance

result

The final result for the runner (if the race has already been run)

speed

The average speed achieved by the horse/jockey during the run

spell

The number of days since the horse’s previous race

up

The number of races run by the horse (including this one) since a rest period of 90 days or more

(NOTE: Jockey objects also provide a ‘performances’ property that can be used to access a list of performances associated with that jockey. Unlike the Horse.performances property though, the Jockey.performances property will not scrape the web in search of relevant data. Instead, the Jockey.performances property will only return relevant performances that already exist in the database. This is due to the vast number of past performances associated with any given jockey, and the inherent difficulty in scraping such a vast amount of data in a timely fashion from most data providers.)

Development and Testing

The source distribution includes a test suite based on pytest. To ensure compatibility with all supported versions of Python, it is recommended that the test suite be run via tox.

To install all development and test requirements into your current Python environment, execute the following command from the root directory of the source tree:

pip install -e .[dev,test]

To run the test suite included in the source distribution, execute the tox command from the root directory of the source tree as follows:

tox

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

racing_data-1.0.0b11.tar.gz (25.4 kB view details)

Uploaded Source

Built Distribution

racing_data-1.0.0b11-py2.py3-none-any.whl (21.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file racing_data-1.0.0b11.tar.gz.

File metadata

File hashes

Hashes for racing_data-1.0.0b11.tar.gz
Algorithm Hash digest
SHA256 9066def5d129608f4abebd5d89e6004ef836788803178a33ae2705d28af99589
MD5 cc4eff6827e7623d29770fca1b8ce88d
BLAKE2b-256 6dc07787415a313641033908481c669afcd0e42393f84ca18a5ad1b6640455bb

See more details on using hashes here.

File details

Details for the file racing_data-1.0.0b11-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for racing_data-1.0.0b11-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 def89e178f93a518b60681a17ee751f9d49b4b2666e937f5a633bf0de2d4caf0
MD5 1e4fdbe51931ba736d56f99fbb1ed69f
BLAKE2b-256 03cbc6a6229d5bb4234ffbe3ea89be90de02050b57e5b3ad33cc0a2c73fbe77f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page