Skip to main content

Soft Tech Talks: data loading, cleaning, and splitting in 1-3 calls.

Project description

softdata

softdata is the "data in, ready out" layer of Soft Tech Talks. One-liners to load, clean, and split your dataset with safe defaults and friendly errors.

from softdata import load, clean, split

df = load("iris")                           # or load("csv", path="students.csv")
df = clean(df, impute="median", encode="auto")
Xtr, Xval, Xte, y = split(df, target="species", strategy="stratified")

Install (local)

pip install -e .
# or build: python -m build  (needs `pip install build`)

API

load(source, **kwargs)

  • Built-ins: "iris", "wine", "breast_cancer" (scikit-learn toys)
  • Files: "csv" (needs path=), "parquet" (needs path=)

clean(df, impute="median", encode="auto", drop_leaky=None, datetime_auto=True)

  • Detects numeric/categorical/date columns
  • Imputes numeric (median/mean) and categorical (most frequent)
  • Encodes categorical columns with one-hot (drop-first) when encode="auto"
  • Preserves the original target column (do encoding only on features)

split(df, target, strategy="auto", test_size=0.2, val_size=0.1, random_state=42)

  • If strategy="auto", uses stratified split for discrete targets (<= 20 unique values), else random
  • Returns X_train, X_val, X_test, y_dict where y_dict has "train"|"val"|"test"

Example

from softdata import load, clean, split
df = load("iris")
df = clean(df)
Xtr, Xval, Xte, y = split(df, target="target")
print(Xtr.shape, Xval.shape, Xte.shape)

Tests

pip install -r requirements-dev.txt
pytest -q

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

softdata-0.1.1.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

softdata-0.1.1-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file softdata-0.1.1.tar.gz.

File metadata

  • Download URL: softdata-0.1.1.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for softdata-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0a6710dde0fe32c5cfbaa11f95f5f05c8b13ce3468f59e09ddf6f115ef712efc
MD5 03b6b77da9d16d9fc75ea18fa61e2de4
BLAKE2b-256 def22fd7db2c644ac7c386d4186f1611a723cb824d894af187b319700541c058

See more details on using hashes here.

File details

Details for the file softdata-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: softdata-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for softdata-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c6a6de54e8ea27f6e5824043b40c14754ca689da10c38204349f7e54a9399384
MD5 7b4219c8baa142c0fd1d5ecf070bd9fe
BLAKE2b-256 17d5979482e306369da149e035c54bd59ca90dda1ccc2b04dc7520fa104e7585

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page