Skip to main content

Convenient helpers for splitting DataFrames into features/target and creating train/dev/test splits.

Project description

data-spliter

A small, well-tested Python package that provides two conveniences on top of scikit-learn:

  1. x_y_data-spliter – split a DataFrame into feature matrix X and target vector y by column name or index.
  2. train_test_data-spliter – thin validated wrapper around sklearn.model_selection.train_test_split.
  3. train_dev_test_data-spliter – split data into three sets (train / dev / test) with sizes expressed as fractions of the full dataset.

Installation

pip install data-spliter

Or from source:

git clone https://github.com/Fares-Ayman-1/data-spliter.git
cd data-spliter
pip install -e ".[dev]"

Quick start

import pandas as pd
from data-spliter import x_y_data-spliter, train_test_data-spliter, train_dev_test_data-spliter

df = pd.read_csv("my_data.csv")

# Split features from target (by name or by position)
X, y = x_y_data-spliter(df, column_name="price")
X, y = x_y_data-spliter(df, column_index=-1)

# Train / test split
x_train, x_test, y_train, y_test = train_test_data-spliter(X, y, test_size=0.2)

# Train / dev / test split
x_train, x_dev, x_test, y_train, y_dev, y_test = train_dev_test_data-spliter(
    X, y, dev_size=0.1, test_size=0.2
)

Running tests

pytest --cov=data-spliter

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_spliter-1.0.0-py3-none-any.whl (2.9 kB view details)

Uploaded Python 3

File details

Details for the file data_spliter-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: data_spliter-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 2.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for data_spliter-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4d81346485ef35dca227d43922582064a87b388b8a62d6f893193c52bcb2e77f
MD5 7e7aa9de253580f159d43fe26e1bd397
BLAKE2b-256 d4e19e542c764fbca5f349560a4aea78f365995d58adeff3a7cc955ce285129a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page