Skip to main content

A tool for panel data analysis.

Project description

PyPI - Version DOI

panelsplit: a tool for panel data analysis

panelsplit is a Python package designed to facilitate time series cross-validation when working with multiple entities (aka panel data). This tool is useful for handling panel data in various stages throughout the data pipeline, including feature engineering, hyper-parameter tuning, and model estimation.

Installation

panelsplit is tested for compatibility with python versions >= 3.11. You can install panelsplit using pip:

pip install panelsplit

Documentation

To read the documentation, visit here.

Example Usage

import pandas as pd
from panelsplit.cross_validation import PanelSplit

# Generate example data
num_countries = 2
years = range(2001, 2004)
num_years = len(years)

data_dict = {
    'country_id': [c for c in range(1, num_countries + 1) for _ in years],
    'year': [year for _ in range(num_countries) for year in years],
    'y': np.random.normal(0, 1, num_countries * num_years),
    'x1': np.random.normal(0, 1, num_countries * num_years),
    'x2': np.random.normal(0, 1, num_countries * num_years)
}

panel_data = pd.DataFrame(data_dict)
panel_split = PanelSplit(periods = panel_data.year, n_splits =2)

splits = panel_split.split()

for train_idx, test_idx in splits:
    print("Train:"); display(panel_data.loc[train_idx])
    print("Test:"); display(panel_data.loc[test_idx])

Spatio-Temporal Cross-Validation

panelsplit can also handle combined spatio-temporal holdouts by factoring in entity hierarchies (e.g., states or cities) to prevent cluster-level leakage. You can simultaneously validate on unobserved time periods and structurally unobserved groups:

from sklearn.model_selection import StratifiedGroupKFold

# Create spatial splits that evaluate cluster-level combinations robustly:
panel_split = PanelSplit(
    periods=panel_data.year,
    n_splits=2,
    groups=panel_data["country_id"],
    group_splitter=StratifiedGroupKFold(n_splits=3) # Use any valid Scikit-Learn group methodology!
)

# You can also pass arbitrarily nested multi-column groups!
# PanelSplit will internally flatten them into a single composite group identifier for KFold slicing.
# e.g., groups = panel_data[["country_id", "city_id"]]

# Lazy Evaluation securely propagates X and y through the StratifiedGroupKFold!
splits = panel_split.split(X=panel_data, y=panel_data["y"])
# Yields 6 total sub-splits (2 temporal cuts x 3 spatial stratified holds)!

For more examples and detailed usage instructions, refer to the examples directory in this repository. Also feel free to check out an introductory article on panelsplit.

Background

Work on panelsplit started at EconAI in December 2023 and has been under active development since then.

Contributing

Contributions to panelsplit are welcome! If you encounter any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request on GitHub.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

panelsplit-2.1.1.tar.gz (261.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

panelsplit-2.1.1-py3-none-any.whl (56.7 kB view details)

Uploaded Python 3

File details

Details for the file panelsplit-2.1.1.tar.gz.

File metadata

  • Download URL: panelsplit-2.1.1.tar.gz
  • Upload date:
  • Size: 261.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for panelsplit-2.1.1.tar.gz
Algorithm Hash digest
SHA256 fc912deff254b1b9f3c00f60f8142ae30548f713768524cfd283694fbc634b2f
MD5 cb188727620e619e98b17e8b867fe593
BLAKE2b-256 1c9deade764d0c894ca7455dbe4f536eabae9289ecf3167545c937751da362d2

See more details on using hashes here.

File details

Details for the file panelsplit-2.1.1-py3-none-any.whl.

File metadata

  • Download URL: panelsplit-2.1.1-py3-none-any.whl
  • Upload date:
  • Size: 56.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for panelsplit-2.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 199551ade83fd4a31810fbfde1c3ba395b520b97cb22592edd0189afb78e32b6
MD5 f5737c9e27ddaed332822c95f318a204
BLAKE2b-256 eddc034c6a0c12052fc00bd7dd5672657add119740ad9d4887acffb8679dc362

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page