Soft Tech Talks: data loading, cleaning, and splitting in 1-3 calls.
Project description
softdata
softdata is the "data in, ready out" layer of Soft Tech Talks. One-liners to load, clean, and split your dataset with safe defaults and friendly errors.
from softdata import load, clean, split
df = load("iris") # or load("csv", path="students.csv")
df = clean(df, impute="median", encode="auto")
Xtr, Xval, Xte, y = split(df, target="species", strategy="stratified")
Install (local)
pip install -e .
# or build: python -m build (needs `pip install build`)
API
load(source, **kwargs)
- Built-ins:
"iris","wine","breast_cancer"(scikit-learn toys) - Files:
"csv"(needspath=),"parquet"(needspath=)
clean(df, impute="median", encode="auto", drop_leaky=None, datetime_auto=True)
- Detects numeric/categorical/date columns
- Imputes numeric (median/mean) and categorical (most frequent)
- Encodes categorical columns with one-hot (drop-first) when
encode="auto" - Preserves the original target column (do encoding only on features)
split(df, target, strategy="auto", test_size=0.2, val_size=0.1, random_state=42)
- If
strategy="auto", uses stratified split for discrete targets (<= 20 unique values), else random - Returns
X_train, X_val, X_test, y_dictwherey_dicthas"train"|"val"|"test"
Example
from softdata import load, clean, split
df = load("iris")
df = clean(df)
Xtr, Xval, Xte, y = split(df, target="target")
print(Xtr.shape, Xval.shape, Xte.shape)
Tests
pip install -r requirements-dev.txt
pytest -q
#\x00 \x00s\x00o\x00f\x00t\x00d\x00a\x00t\x00a\x00 \x00 \x00#\x00 \x00s\x00o\x00f\x00t\x00d\x00a\x00t\x00a\x00 \x00 \x00
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file softdata-0.1.0.tar.gz.
File metadata
- Download URL: softdata-0.1.0.tar.gz
- Upload date:
- Size: 4.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a727b569cbc8c4551717e4ab669f61d532524b37b2ba276eda77207a995a974
|
|
| MD5 |
b73610936bbc3bf0dff2724fad4116eb
|
|
| BLAKE2b-256 |
c643fbf297bedf109899b6548208c26c0c596d5699b39047bf010f716b80dc72
|
File details
Details for the file softdata-0.1.0-py3-none-any.whl.
File metadata
- Download URL: softdata-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
299cd591c794034510e4feeb36ad93190fd0fb255d9b3f954898f0a7661090e0
|
|
| MD5 |
bd5e5af985d4a399212789ecef6f9e66
|
|
| BLAKE2b-256 |
7b623fe6fefe69858685bfd74b34d5e30dadb1eb48a198a5f466fd68cff3f578
|