Skip to main content

Convenient helpers for splitting DataFrames into features/target and creating train/dev/test splits.

Project description

mlsplitter

A small, well-tested Python package that provides two conveniences on top of scikit-learn:

  1. x_y_mlsplitter – split a DataFrame into feature matrix X and target vector y by column name or index.
  2. train_test_mlsplitter – thin validated wrapper around sklearn.model_selection.train_test_split.
  3. train_dev_test_mlsplitter – split data into three sets (train / dev / test) with sizes expressed as fractions of the full dataset.

Installation

pip install mlsplitter

Or from source:

git clone https://github.com/Fares-Ayman-1/mlsplitter.git
cd mlsplitter
pip install -e ".[dev]"

Quick start

import pandas as pd
from mlsplitter import x_y_splitter, train_test_splitter, train_dev_test_splitter

df = pd.read_csv("my_data.csv")

# Split features from target (by name or by position)
X, y = x_y_splitter(df, column_name="price")
X, y = x_y_splitter(df, column_index=-1)

# Train / test split
x_train, x_test, y_train, y_test = train_test_splitter(X, y, test_size=0.2)

# Train / dev / test split
x_train, x_dev, x_test, y_train, y_dev, y_test = train_dev_test_splitter(
    X, y, dev_size=0.1, test_size=0.2
)

Running tests

pytest --cov=mlsplitter

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlsplitter-1.0.0-py3-none-any.whl (6.0 kB view details)

Uploaded Python 3

File details

Details for the file mlsplitter-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: mlsplitter-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 6.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for mlsplitter-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e3ad71ce255e1f6d61ad9cd97ece9e068ceaa92480eadc80141b42717c1d4dce
MD5 739f49c85772f7b498653f2a0a3a7b6b
BLAKE2b-256 d14aae5155fe7a1fa0a4aab28bf9ab95acf7543c2731dbc8811de74fbc136e15

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page