A reusable data science toolkit for production-ready pipelines
Project description
deepsim-dskit - A Reusable Data Science Framework
deepsim-dskit is an installable Python package for reproducible, configuration-driven
data science pipelines. It provides reusable building blocks for loading data,
preprocessing, splitting, modeling, artifact management, and experiment runs.
Installation
pip install -e ".[dev]"
Optional extras:
pip install "deepsim-dskit[polars]"
pip install "deepsim-dskit[yaml]"
Quick Start
from dskit import load_dataset, create_split
df = load_dataset("data/advertising.csv", index_col=0)
split = create_split(df, target="sales", test_size=0.2, random_state=42)
Run a full experiment from a config dictionary:
from dskit import run_full_pipeline
config = {
"experiment_id": "advertising_baseline",
"seed": 42,
"data": {
"path": "data/advertising.csv",
"target": "sales",
"read_kwargs": {"index_col": 0},
},
"splitting": {"test_size": 0.2, "val_size": 0.1, "random_state": 42},
"preprocessing": {
"missing": {"strategies": {}, "indicator_columns": []},
"outliers": {"columns": [], "method": "iqr", "multiplier": 1.5},
"scaling": {"columns": ["TV", "radio", "newspaper"], "method": "standard"},
},
"models": {
"linear": {"class": "LinearRegression", "params": {}},
"ridge": {"class": "Ridge", "params": {"alpha": 1.0}},
},
"output": {
"experiments_dir": "experiments",
"registry_path": "registry/experiments.json",
},
}
result = run_full_pipeline(config)
print(result["best_model_name"])
print(result["metrics"]["test_r2"])
The
outputblock may also includelogs_dir(defaults to"logs").
CLI
dskit-run --version
dskit-run --config configs/advertising.json --dry-run
dskit-run --config configs/advertising.json --env production
What's Included
| Module | Purpose |
|---|---|
data_io |
Load, validate, and save datasets |
eda |
Exploratory summaries |
preprocessing |
Imputation, outlier treatment, scaling |
splitting |
Reproducible train/test/validation splits |
pipeline |
Fit/transform preprocessing pipeline |
feature_engineering |
Encoding and feature construction |
modeling |
Training, evaluation, and ModelRegistry |
persistence |
Save and load artifacts |
artifacts |
Experiment artifacts and registry helpers |
reproducibility |
Config-driven experiment execution |
config |
Config validation and environment profiles |
performance |
Profiling and optimization helpers |
License
MIT License. See LICENSE.
Author
Shouke Wei, PhD · Deepsim Press Author Page Affiliation: Deepsim Intelligence Technology Inc. deepsim.ca
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file deepsim_dskit-1.0.0.tar.gz.
File metadata
- Download URL: deepsim_dskit-1.0.0.tar.gz
- Upload date:
- Size: 56.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf0adba60eae5dfe41ba676efa6337ef91201aead15985d46a0d87a46b81bffb
|
|
| MD5 |
6c1a1ed64a7e77e159d11f19cd7889f9
|
|
| BLAKE2b-256 |
dda8a821a7cca7beef7e2b3bb079cfdf0e85f445390e8bf82e3502329c82bd85
|
File details
Details for the file deepsim_dskit-1.0.0-py3-none-any.whl.
File metadata
- Download URL: deepsim_dskit-1.0.0-py3-none-any.whl
- Upload date:
- Size: 56.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e2b95628eea1f6610ffc9f8fbd855f407e74f1929762c53ee227d5b9e2ec482
|
|
| MD5 |
7ed23926d045db7617219af02fd764af
|
|
| BLAKE2b-256 |
858f50ff074fdef4fecc1518f1a10861eb3213a2d42c44b2fcd65e0790fac9e5
|