Skip to main content

Python package for loading and caching CSVs hosted on github into pandas dataframes

Project description

nfelo DCM

nfelo DCM is an abstraction layer for loading and saving NFL related CSVs stored on the web. DCM stands for Dataframe-CSV Mapping. The goal of the DCM is to get pandas dataframes of fresh data loaded in a way that balances simplicity, efficiency, and performance.

import nfelodcm
import pandas as pd

## Load 2 dataframes
db = nfelodcm.load(['pbp', 'games'])
## access the PBP dataframe ##
db['pbp']

Maps

Maps are config files that tell the dcm, where data CSVs are located, how they should be retrieved, and what fields to pull. Each CSV has its own config, where parameters can be set for things like freshness SLAs, CSV parsing engines, assignments (aka mutations).

An important characteristic of these maps, and overall framework, is that all fields must be 1) specified in the map and 2) typed. Fields not listed in the map will not be loaded. Fields untyped will throw an error.

Here is a sample config:

{
  "name": "games",
  "description": "nflgamedata games",
  "last_local_update": "2023-12-16T22:42:41.040569",
  "download_url": "https://raw.githubusercontent.com/nflverse/nfldata/master/data/games.csv",
  "compression": null,
  "engine": "c",
  "freshness": {
    "type": "gh_commit",
    "gh_api_endpoint": "https://api.github.com/repos/nflverse/nfldata/commits",
    "gh_release_tag": null,
    "sla_seconds": null
  },
  "iter": {
    "type": null,
    "start": null
  },
  "assignments": [
    "game_id_repl"
  ],
  "map": {
    "game_id": "object",
    "season": "int32",
    "game_type": "object",
    "week": "int32",
    "gameday": "object",
    "weekday": "object",
    "gametime": "object",
    "away_team": "object",
    "away_score": "float32",
    "home_team": "object",
    "home_score": "float32",
    "location": "object",
    "result": "float32",
    "total": "float32",
    "overtime": "float32",
    "old_game_id": "float32",
    "gsis": "float32",
    "nfl_detail_id": "object",
    "pfr": "object",
    "pff": "float32",
    "espn": "int32",
    "ftn": "float32",
    "away_rest": "int32",
    "home_rest": "int32",
    "away_moneyline": "float32",
    "home_moneyline": "float32",
    "spread_line": "float32",
    "away_spread_odds": "float32",
    "home_spread_odds": "float32",
    "total_line": "float32",
    "under_odds": "float32",
    "over_odds": "float32",
    "div_game": "int32",
    "roof": "object",
    "surface": "object",
    "temp": "float32",
    "wind": "float32",
    "away_qb_id": "object",
    "home_qb_id": "object",
    "away_qb_name": "object",
    "home_qb_name": "object",
    "away_coach": "object",
    "home_coach": "object",
    "referee": "object",
    "stadium_id": "object",
    "stadium": "object"
  }
}

Data

When a CSV is translated into a Dataframe, a copy of the data is stored locally for cached retrieval based on SLAs and freshness. For data stored in github, freshness is determined by either the last release or last commit. Presently, data is stored locally as CSVs

Assignments

Assignment is the pandas vernacular for mutate. In the DCM, "Assignments" reference functions that take a dataframe as an input and returns a mutated/assigned dataframe as its response. Assignments can be added to the assignments folder and referenced by name in config files.

Retrieval

To load data, pass an array of table names to the .load() function. The name passed for each table should match the name of the map file (ie passing 'pbp' would retrieve whatever data was specified in the 'pbp.json') When this function is called, all freshness checks, caching, downloading, field typing, and mutations are handled automatically behind the scenes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nfelodcm-0.1.18.tar.gz (24.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nfelodcm-0.1.18-py3-none-any.whl (38.8 kB view details)

Uploaded Python 3

File details

Details for the file nfelodcm-0.1.18.tar.gz.

File metadata

  • Download URL: nfelodcm-0.1.18.tar.gz
  • Upload date:
  • Size: 24.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.22

File hashes

Hashes for nfelodcm-0.1.18.tar.gz
Algorithm Hash digest
SHA256 7940fe873cc773c95102d965037ba76a0c6e6c27ee34b5ee0b8119f1fb742974
MD5 733c794ef5031e04c162143bb9dae64e
BLAKE2b-256 ac4d6de6516cbf6064ec9842bbd8712e8d9ce3d1f44bfb105dde80a8e4106117

See more details on using hashes here.

File details

Details for the file nfelodcm-0.1.18-py3-none-any.whl.

File metadata

  • Download URL: nfelodcm-0.1.18-py3-none-any.whl
  • Upload date:
  • Size: 38.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.22

File hashes

Hashes for nfelodcm-0.1.18-py3-none-any.whl
Algorithm Hash digest
SHA256 264fe8c7f8ea548467d72d85c0052f14d68dc814a80c2c7addae0ff3f8c28ccf
MD5 47dffbe5b6bd4dce31f9fe042bb4cd38
BLAKE2b-256 6f4ef9faf16cb27b1277048e149e5c57e6e5a97b19952663e49caf72da5059ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page