Skip to main content

Soft Tech Talks: data loading, cleaning, and splitting in 1-3 calls.

Project description

softdata

softdata is the "data in, ready out" layer of Soft Tech Talks. One-liners to load, clean, and split your dataset with safe defaults and friendly errors.

from softdata import load, clean, split

df = load("iris")                           # or load("csv", path="students.csv")
df = clean(df, impute="median", encode="auto")
Xtr, Xval, Xte, y = split(df, target="species", strategy="stratified")

Install (local)

pip install -e .
# or build: python -m build  (needs `pip install build`)

API

load(source, **kwargs)

  • Built-ins: "iris", "wine", "breast_cancer" (scikit-learn toys)
  • Files: "csv" (needs path=), "parquet" (needs path=)

clean(df, impute="median", encode="auto", drop_leaky=None, datetime_auto=True)

  • Detects numeric/categorical/date columns
  • Imputes numeric (median/mean) and categorical (most frequent)
  • Encodes categorical columns with one-hot (drop-first) when encode="auto"
  • Preserves the original target column (do encoding only on features)

split(df, target, strategy="auto", test_size=0.2, val_size=0.1, random_state=42)

  • If strategy="auto", uses stratified split for discrete targets (<= 20 unique values), else random
  • Returns X_train, X_val, X_test, y_dict where y_dict has "train"|"val"|"test"

Example

from softdata import load, clean, split
df = load("iris")
df = clean(df)
Xtr, Xval, Xte, y = split(df, target="target")
print(Xtr.shape, Xval.shape, Xte.shape)

Tests

pip install -r requirements-dev.txt
pytest -q

#\x00 \x00s\x00o\x00f\x00t\x00d\x00a\x00t\x00a\x00 \x00 \x00#\x00 \x00s\x00o\x00f\x00t\x00d\x00a\x00t\x00a\x00 \x00 \x00

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

softdata-0.1.0.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

softdata-0.1.0-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file softdata-0.1.0.tar.gz.

File metadata

  • Download URL: softdata-0.1.0.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for softdata-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2a727b569cbc8c4551717e4ab669f61d532524b37b2ba276eda77207a995a974
MD5 b73610936bbc3bf0dff2724fad4116eb
BLAKE2b-256 c643fbf297bedf109899b6548208c26c0c596d5699b39047bf010f716b80dc72

See more details on using hashes here.

File details

Details for the file softdata-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: softdata-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for softdata-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 299cd591c794034510e4feeb36ad93190fd0fb255d9b3f954898f0a7661090e0
MD5 bd5e5af985d4a399212789ecef6f9e66
BLAKE2b-256 7b623fe6fefe69858685bfd74b34d5e30dadb1eb48a198a5f466fd68cff3f578

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page