Skip to main content

Python bindings to the Chadwick library

Project description

pychadwick

A Python package to interface with the Chadwick libray.
Chadwick is a set of tools for parsing retrosheet data and is available at

http://chadwick.sourceforge.net/doc/index.html

https://github.com/chadwickbureau/chadwick

Features

As of now this package supports retrosheet event data only.

Installation

$ pip install pychadwick

Example use

Python replacement for cwevent

When you install pychadwick, it will install a Python exe that mimic the cwevent exe from the chadwick project. It reads a set of event files and prints them out in csv format to stdout.

This downloads a fresh copy of the retrosheet event files, and parses them with 7 CPUs

$ time pycwevent -n 7  > /tmp/events1.csv
stderr: data_root not given as argument, downloading fresh copy of retrosheet events...
stderr: found 2254 files
Warning: Invalid integer value 'b'

real	3m14.517s
user	12m18.104s
sys	0m25.264s

$ wc -l /tmp/events1.csv 
13976191 /tmp/events1.csv

This uses a pre-downloaded copy of the retrosheet event files, with 7 CPUs

$ time pycwevent -n 7 --data-root /tmp/retrosheet-master/event/regular/ > /tmp/events2.csv
stderr: found 2254 files
Warning: Invalid integer value 'b'

real	1m57.499s
user	9m52.236s
sys	0m17.672s

$ wc -l /tmp/events2.csv 
13976184 /tmp/events2.csv

Python interface to cwevent

Load events

Load events for a game from a file stored on the web

>>> from pychadwick.chadwick import Chadwick                                                                                    

>>> chadwick = Chadwick()                                                                                                       

>>> file_path = "https://raw.githubusercontent.com/chadwickbureau/retrosheet/master/event/regular/1982OAK.EVA" 

>>> games = chadwick.games(file_path)                                                                                           

>>> game = next(games)                                                                                                          

>>> df = chadwick.game_to_dataframe(game)                                                                                       

>>> df                                                                                                                           
         GAME_ID AWAY_TEAM_ID  INN_CT  BAT_HOME_ID  ...  ASS9_FLD_CD  ASS10_FLD_CD  UNKNOWN_OUT_EXC_FL UNCERTAIN_PLAY_EXC_FL
0   OAK198204060          CAL       1            0  ...            0             0                   F                     F
1   OAK198204060          CAL       1            0  ...            0             0                   F                     F
2   OAK198204060          CAL       1            0  ...            0             0                   F                     F
3   OAK198204060          CAL       1            1  ...            0             0                   F                     F
4   OAK198204060          CAL       1            1  ...            0             0                   F                     F
..           ...          ...     ...          ...  ...          ...           ...                 ...                   ...
81  OAK198204060          CAL      11            1  ...            0             0                   F                     F
82  OAK198204060          CAL      11            1  ...            0             0                   F                     F
83  OAK198204060          CAL      11            1  ...            0             0                   F                     F
84  OAK198204060          CAL      11            1  ...            0             0                   F                     F
85  OAK198204060          CAL      11            1  ...            0             0                   F                     F

[86 rows x 159 columns]

Load events for a game from a local file

>>> file_path = " /tmp/retrosheet-master/event/regular/1982OAK.EVA"

>>> games = chadwick.games(file_path)                                                                                           

>>> game = next(games)                                                                                                          

>>> df = chadwick.game_to_dataframe(game)                                                                                       

>>> df                                                                                                                           
         GAME_ID AWAY_TEAM_ID  INN_CT  BAT_HOME_ID  ...  ASS9_FLD_CD  ASS10_FLD_CD  UNKNOWN_OUT_EXC_FL UNCERTAIN_PLAY_EXC_FL
0   OAK198204060          CAL       1            0  ...            0             0                   F                     F
1   OAK198204060          CAL       1            0  ...            0             0                   F                     F
2   OAK198204060          CAL       1            0  ...            0             0                   F                     F
3   OAK198204060          CAL       1            1  ...            0             0                   F                     F
4   OAK198204060          CAL       1            1  ...            0             0                   F                     F
..           ...          ...     ...          ...  ...          ...           ...                 ...                   ...
81  OAK198204060          CAL      11            1  ...            0             0                   F                     F
82  OAK198204060          CAL      11            1  ...            0             0                   F                     F
83  OAK198204060          CAL      11            1  ...            0             0                   F                     F
84  OAK198204060          CAL      11            1  ...            0             0                   F                     F
85  OAK198204060          CAL      11            1  ...            0             0                   F                     F

[86 rows x 159 columns]

Check which columns are defined

>>>  chadwick.all_headers

Check which columns are enabled

>>>  chadwick.active_headers

Disable all columns, and add only GAME_ID and BAT_ID

>>> _ = [chadwick.unset_event_field(e) for e in chadwick.all_headers]                                                          

>>> chadwick.active_headers                                                                                                    
[]

>>> chadwick.set_event_field("GAME_ID")                                                                                        

>>> chadwick.set_event_field("BAT_ID")                                                                                         

>>> games = chadwick.games(file_path)                                                                                          

>>>  game = next(games)                                                                                                         

>>> df = chadwick.game_to_dataframe(game)                                                                                      

>>> df

         GAME_ID    BAT_ID
0   OAK198204060  burlr001
1   OAK198204060  lynnf001
2   OAK198204060  carer001
3   OAK198204060  hendr001
4   OAK198204060  murpd002
..           ...       ...
81  OAK198204060  meyed001
82  OAK198204060  armat001
83  OAK198204060  grosw001
84  OAK198204060  spenj101
85  OAK198204060  loped001

[86 rows x 2 columns]

Activate all the columns again

>>> chadwick.set_all_headers()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pychadwick-0.5.0.tar.gz (119.9 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page