Python package for loading and caching CSVs hosted on github into pandas dataframes
Project description
nfelo DCM
nfelo DCM is an abstraction layer for loading and saving NFL related CSVs stored on the web. DCM stands for Dataframe-CSV Mapping. The goal of the DCM is to get pandas dataframes of fresh data loaded in a way that balances simplicity, efficiency, and performance.
import nfelodcm
import pandas as pd
## Load 2 dataframes
db = nfelodcm.load(['pbp', 'games'])
## access the PBP dataframe ##
db['pbp']
Maps
Maps are config files that tell the dcm, where data CSVs are located, how they should be retrieved, and what fields to pull. Each CSV has its own config, where parameters can be set for things like freshness SLAs, CSV parsing engines, assignments (aka mutations).
An important characteristic of these maps, and overall framework, is that all fields must be 1) specified in the map and 2) typed. Fields not listed in the map will not be loaded. Fields untyped will throw an error.
Here is a sample config:
{
"name": "games",
"description": "nflgamedata games",
"last_local_update": "2023-12-16T22:42:41.040569",
"download_url": "https://raw.githubusercontent.com/nflverse/nfldata/master/data/games.csv",
"compression": null,
"engine": "c",
"freshness": {
"type": "gh_commit",
"gh_api_endpoint": "https://api.github.com/repos/nflverse/nfldata/commits",
"gh_release_tag": null,
"sla_seconds": null
},
"iter": {
"type": null,
"start": null
},
"assignments": [
"game_id_repl"
],
"map": {
"game_id": "object",
"season": "int32",
"game_type": "object",
"week": "int32",
"gameday": "object",
"weekday": "object",
"gametime": "object",
"away_team": "object",
"away_score": "float32",
"home_team": "object",
"home_score": "float32",
"location": "object",
"result": "float32",
"total": "float32",
"overtime": "float32",
"old_game_id": "float32",
"gsis": "float32",
"nfl_detail_id": "object",
"pfr": "object",
"pff": "float32",
"espn": "int32",
"ftn": "float32",
"away_rest": "int32",
"home_rest": "int32",
"away_moneyline": "float32",
"home_moneyline": "float32",
"spread_line": "float32",
"away_spread_odds": "float32",
"home_spread_odds": "float32",
"total_line": "float32",
"under_odds": "float32",
"over_odds": "float32",
"div_game": "int32",
"roof": "object",
"surface": "object",
"temp": "float32",
"wind": "float32",
"away_qb_id": "object",
"home_qb_id": "object",
"away_qb_name": "object",
"home_qb_name": "object",
"away_coach": "object",
"home_coach": "object",
"referee": "object",
"stadium_id": "object",
"stadium": "object"
}
}
Data
When a CSV is translated into a Dataframe, a copy of the data is stored locally for cached retrieval based on SLAs and freshness. For data stored in github, freshness is determined by either the last release or last commit. Presently, data is stored locally as CSVs
Assignments
Assignment is the pandas vernacular for mutate. In the DCM, "Assignments" reference functions that take a dataframe as an input and returns a mutated/assigned dataframe as its response. Assignments can be added to the assignments folder and referenced by name in config files.
Retrieval
To load data, pass an array of table names to the .load() function. The name passed for each table should match the name of the map file (ie passing 'pbp' would retrieve whatever data was specified in the 'pbp.json') When this function is called, all freshness checks, caching, downloading, field typing, and mutations are handled automatically behind the scenes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file nfelodcm-0.1.7.tar.gz
.
File metadata
- Download URL: nfelodcm-0.1.7.tar.gz
- Upload date:
- Size: 17.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 19a4930bf68b5ca85d89481131aaf5c3ace140dd7e2d76c39b03831fa49cb3e5 |
|
MD5 | be997dff6bd2def5af0b4bdd02a4387f |
|
BLAKE2b-256 | 1744792ad717ceacbcce8a823a5a3b21f1b8255e43607661a79cb1b9e0cc2ddd |
File details
Details for the file nfelodcm-0.1.7-py3-none-any.whl
.
File metadata
- Download URL: nfelodcm-0.1.7-py3-none-any.whl
- Upload date:
- Size: 25.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e74c507557ef76273757fd2d474af4eed834c9c647bb88b3ba730ded7cd62cd8 |
|
MD5 | 59414ef8dbb6c0923c9b5a1240fd5748 |
|
BLAKE2b-256 | 010c7409f6c95f75dd53ce62952af48a0c6db811a9dfd76683e2b7bf77b035b6 |