Skip to main content

Python bindings to the Chadwick library

Project description

pychadwick

A Python package to interface with the Chadwick libray.
Chadwick is a set of tools for parsing retrosheet data and is available at

http://chadwick.sourceforge.net/doc/index.html

https://github.com/chadwickbureau/chadwick

Features

As of now this package supports retrosheet event data only.

Installation

$ pip install pychadwick

Example use

Python replacement for cwevent

When you install pychadwick, it will install a Python exe that mimic the cwevent exe from the chadwick project. It reads a set of event files and prints them out in csv format to stdout.

This downloads a fresh copy of the retrosheet event files, and parses them with 7 CPUs

$ time pycwevent -n 7  > /tmp/events1.csv
stderr: data_root not given as argument, downloading fresh copy of retrosheet events...
stderr: found 2254 files
Warning: Invalid integer value 'b'

real	3m14.517s
user	12m18.104s
sys	0m25.264s

$ wc -l /tmp/events1.csv 
13976191 /tmp/events1.csv

This uses a pre-downloaded copy of the retrosheet event files, with 7 CPUs

$ time pycwevent -n 7 --data-root /tmp/retrosheet-master/event/regular/ > /tmp/events2.csv
stderr: found 2254 files
Warning: Invalid integer value 'b'

real	1m57.499s
user	9m52.236s
sys	0m17.672s

$ wc -l /tmp/events2.csv 
13976184 /tmp/events2.csv

Python interface to cwevent

Load events

Load events for a game from a file stored on the web

>>> from pychadwick.chadwick import Chadwick                                                                                    

>>> chadwick = Chadwick()                                                                                                       

>>> file_path = "https://raw.githubusercontent.com/chadwickbureau/retrosheet/master/event/regular/1982OAK.EVA" 

>>> games = chadwick.games(file_path)                                                                                           

>>> game = next(games)                                                                                                          

>>> df = chadwick.game_to_dataframe(game)                                                                                       

>>> df                                                                                                                           
         GAME_ID AWAY_TEAM_ID  INN_CT  BAT_HOME_ID  ...  ASS9_FLD_CD  ASS10_FLD_CD  UNKNOWN_OUT_EXC_FL UNCERTAIN_PLAY_EXC_FL
0   OAK198204060          CAL       1            0  ...            0             0                   F                     F
1   OAK198204060          CAL       1            0  ...            0             0                   F                     F
2   OAK198204060          CAL       1            0  ...            0             0                   F                     F
3   OAK198204060          CAL       1            1  ...            0             0                   F                     F
4   OAK198204060          CAL       1            1  ...            0             0                   F                     F
..           ...          ...     ...          ...  ...          ...           ...                 ...                   ...
81  OAK198204060          CAL      11            1  ...            0             0                   F                     F
82  OAK198204060          CAL      11            1  ...            0             0                   F                     F
83  OAK198204060          CAL      11            1  ...            0             0                   F                     F
84  OAK198204060          CAL      11            1  ...            0             0                   F                     F
85  OAK198204060          CAL      11            1  ...            0             0                   F                     F

[86 rows x 159 columns]

Load events for a game from a local file

>>> file_path = " /tmp/retrosheet-master/event/regular/1982OAK.EVA"

>>> games = chadwick.games(file_path)                                                                                           

>>> game = next(games)                                                                                                          

>>> df = chadwick.game_to_dataframe(game)                                                                                       

>>> df                                                                                                                           
         GAME_ID AWAY_TEAM_ID  INN_CT  BAT_HOME_ID  ...  ASS9_FLD_CD  ASS10_FLD_CD  UNKNOWN_OUT_EXC_FL UNCERTAIN_PLAY_EXC_FL
0   OAK198204060          CAL       1            0  ...            0             0                   F                     F
1   OAK198204060          CAL       1            0  ...            0             0                   F                     F
2   OAK198204060          CAL       1            0  ...            0             0                   F                     F
3   OAK198204060          CAL       1            1  ...            0             0                   F                     F
4   OAK198204060          CAL       1            1  ...            0             0                   F                     F
..           ...          ...     ...          ...  ...          ...           ...                 ...                   ...
81  OAK198204060          CAL      11            1  ...            0             0                   F                     F
82  OAK198204060          CAL      11            1  ...            0             0                   F                     F
83  OAK198204060          CAL      11            1  ...            0             0                   F                     F
84  OAK198204060          CAL      11            1  ...            0             0                   F                     F
85  OAK198204060          CAL      11            1  ...            0             0                   F                     F

[86 rows x 159 columns]

Check which columns are defined

>>>  chadwick.all_headers

Check which columns are enabled

>>>  chadwick.active_headers

Disable all columns, and add only GAME_ID and BAT_ID

>>> _ = [chadwick.unset_event_field(e) for e in chadwick.all_headers]                                                          

>>> chadwick.active_headers                                                                                                    
[]

>>> chadwick.set_event_field("GAME_ID")                                                                                        

>>> chadwick.set_event_field("BAT_ID")                                                                                         

>>> games = chadwick.games(file_path)                                                                                          

>>>  game = next(games)                                                                                                         

>>> df = chadwick.game_to_dataframe(game)                                                                                      

>>> df

         GAME_ID    BAT_ID
0   OAK198204060  burlr001
1   OAK198204060  lynnf001
2   OAK198204060  carer001
3   OAK198204060  hendr001
4   OAK198204060  murpd002
..           ...       ...
81  OAK198204060  meyed001
82  OAK198204060  armat001
83  OAK198204060  grosw001
84  OAK198204060  spenj101
85  OAK198204060  loped001

[86 rows x 2 columns]

Activate all the columns again

>>> chadwick.set_all_headers()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pychadwick-0.6.1.tar.gz (122.0 kB view details)

Uploaded Source

File details

Details for the file pychadwick-0.6.1.tar.gz.

File metadata

  • Download URL: pychadwick-0.6.1.tar.gz
  • Upload date:
  • Size: 122.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for pychadwick-0.6.1.tar.gz
Algorithm Hash digest
SHA256 82b89781cd4bc62eba224aa8ec5811047afa2c3f246320d0cb9a84e026e3707d
MD5 a23f35bbc1624d5e528587e7dd7d4eed
BLAKE2b-256 357efeca51ce50cce7211282495427cba7b57b364af2e1f9ab547a052288a459

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page