Soft Tech Talks: data loading, cleaning, and splitting in 1-3 calls.
Project description
softdata
softdata is the "data in, ready out" layer of Soft Tech Talks. One-liners to load, clean, and split your dataset with safe defaults and friendly errors.
from softdata import load, clean, split
df = load("iris") # or load("csv", path="students.csv")
df = clean(df, impute="median", encode="auto")
Xtr, Xval, Xte, y = split(df, target="species", strategy="stratified")
Install (local)
pip install -e .
# or build: python -m build (needs `pip install build`)
API
load(source, **kwargs)
- Built-ins:
"iris","wine","breast_cancer"(scikit-learn toys) - Files:
"csv"(needspath=),"parquet"(needspath=)
clean(df, impute="median", encode="auto", drop_leaky=None, datetime_auto=True)
- Detects numeric/categorical/date columns
- Imputes numeric (median/mean) and categorical (most frequent)
- Encodes categorical columns with one-hot (drop-first) when
encode="auto" - Preserves the original target column (do encoding only on features)
split(df, target, strategy="auto", test_size=0.2, val_size=0.1, random_state=42)
- If
strategy="auto", uses stratified split for discrete targets (<= 20 unique values), else random - Returns
X_train, X_val, X_test, y_dictwherey_dicthas"train"|"val"|"test"
Example
from softdata import load, clean, split
df = load("iris")
df = clean(df)
Xtr, Xval, Xte, y = split(df, target="target")
print(Xtr.shape, Xval.shape, Xte.shape)
Tests
pip install -r requirements-dev.txt
pytest -q
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
softdata-0.1.1.tar.gz
(4.6 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file softdata-0.1.1.tar.gz.
File metadata
- Download URL: softdata-0.1.1.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a6710dde0fe32c5cfbaa11f95f5f05c8b13ce3468f59e09ddf6f115ef712efc
|
|
| MD5 |
03b6b77da9d16d9fc75ea18fa61e2de4
|
|
| BLAKE2b-256 |
def22fd7db2c644ac7c386d4186f1611a723cb824d894af187b319700541c058
|
File details
Details for the file softdata-0.1.1-py3-none-any.whl.
File metadata
- Download URL: softdata-0.1.1-py3-none-any.whl
- Upload date:
- Size: 6.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c6a6de54e8ea27f6e5824043b40c14754ca689da10c38204349f7e54a9399384
|
|
| MD5 |
7b4219c8baa142c0fd1d5ecf070bd9fe
|
|
| BLAKE2b-256 |
17d5979482e306369da149e035c54bd59ca90dda1ccc2b04dc7520fa104e7585
|