Skip to main content

A tool for panel data analysis.

Project description

PyPI - Version DOI

panelsplit: a tool for panel data analysis

panelsplit is a Python package designed to facilitate time series cross-validation when working with multiple entities (aka panel data). This tool is useful for handling panel data in various stages throughout the data pipeline, including feature engineering, hyper-parameter tuning, and model estimation.

Installation

panelsplit is tested for compatibility with python versions >= 3.11. You can install panelsplit using pip:

pip install panelsplit

Documentation

To read the documentation, visit here.

Example Usage

import pandas as pd
from panelsplit.cross_validation import PanelSplit

# Generate example data
num_countries = 2
years = range(2001, 2004)
num_years = len(years)

data_dict = {
    'country_id': [c for c in range(1, num_countries + 1) for _ in years],
    'year': [year for _ in range(num_countries) for year in years],
    'y': np.random.normal(0, 1, num_countries * num_years),
    'x1': np.random.normal(0, 1, num_countries * num_years),
    'x2': np.random.normal(0, 1, num_countries * num_years)
}

panel_data = pd.DataFrame(data_dict)
panel_split = PanelSplit(periods = panel_data.year, n_splits =2)

splits = panel_split.split()

for train_idx, test_idx in splits:
    print("Train:"); display(panel_data.loc[train_idx])
    print("Test:"); display(panel_data.loc[test_idx])

Spatio-Temporal Cross-Validation

panelsplit can also handle combined spatio-temporal holdouts by factoring in entity hierarchies (e.g., states or cities) to prevent cluster-level leakage. You can simultaneously validate on unobserved time periods and structurally unobserved groups:

from sklearn.model_selection import StratifiedGroupKFold

# Create spatial splits that evaluate cluster-level combinations robustly:
panel_split = PanelSplit(
    periods=panel_data.year,
    n_splits=2,
    groups=panel_data["country_id"],
    group_splitter=StratifiedGroupKFold(n_splits=3) # Use any valid Scikit-Learn group methodology!
)

# You can also pass arbitrarily nested multi-column groups!
# PanelSplit will internally flatten them into a single composite group identifier for KFold slicing.
# e.g., groups = panel_data[["country_id", "city_id"]]

# Lazy Evaluation securely propagates X and y through the StratifiedGroupKFold!
splits = panel_split.split(X=panel_data, y=panel_data["y"])
# Yields 6 total sub-splits (2 temporal cuts x 3 spatial stratified holds)!

For more examples and detailed usage instructions, refer to the examples directory in this repository. Also feel free to check out an introductory article on panelsplit.

Background

Work on panelsplit started at EconAI in December 2023 and has been under active development since then.

Contributing

Contributions to panelsplit are welcome! If you encounter any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request on GitHub.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

panelsplit-2.1.0.tar.gz (261.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

panelsplit-2.1.0-py3-none-any.whl (56.7 kB view details)

Uploaded Python 3

File details

Details for the file panelsplit-2.1.0.tar.gz.

File metadata

  • Download URL: panelsplit-2.1.0.tar.gz
  • Upload date:
  • Size: 261.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for panelsplit-2.1.0.tar.gz
Algorithm Hash digest
SHA256 451ba8f38b0728054931a3aa8367b5a6e32200841e32ecf08f32685c9af56580
MD5 163b1e9ac37c2b6d23d622679ec87002
BLAKE2b-256 c091bae2ebcc841002673378a2b3d95c00b849c0cf461e589762192b2130593a

See more details on using hashes here.

File details

Details for the file panelsplit-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: panelsplit-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 56.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for panelsplit-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0b93df378f6187cb65f2f00d9e77e952ba942eabf8217b30e021e110f2414824
MD5 8869aea5c56facc6ea4fd8136c5e47ec
BLAKE2b-256 93beedabcc1fd7500c80f7527226d6122c8d12f47eee8c35f6787aefd5de6f98

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page