Data Science Toolkit (DST) is a Python library that helps implement data science related project with ease.

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Data Science Toolkit (DST)

Data Science Toolkit (DST) is a Python library that helps implement data science projects with ease: from data ingestion and preprocessing to modeling, geospatial analysis, computer vision, text vectorization, and reinforcement learning.

It bundles practical, production-friendly utilities and higher-level abstractions so you can move faster while keeping control over the details.

Key Features

Data handling: DataFrame for loading CSV/JSON/Excel/Parquet, cleaning, transforming, and streaming large datasets.
Modeling: Model for traditional ML and deep learning training, cross-validation, metrics, and GPU helpers.
Text & NLP: Vectorizer for bag-of-words/TF-IDF, tokenization, cosine similarity, and projections.
Charts: Chart utilities for quick exploratory visuals with Matplotlib/Seaborn/Plotly.
GIS: GIS for geospatial data layers, joins, CRS transforms, area/perimeter, and exports.
Computer Vision: ImageFactory for resizing, cropping, contour detection, blending, and basic filters.
Reinforcement Learning: Environment and R3 tools to explore policies and custom environments.
Crop Simulation: CSM modules for crop water requirement, ET simulations, and monitoring pipelines.
Utilities: Lib with climate, math, text processing, IO helpers, and more.

Installation

DST is published as data-science-toolkit.

pip install data-science-toolkit

If you’re installing from source (for development):

git clone https://github.com/elhachimi-ch/dst.git
cd dst
pip install -e .

Notes:

Requires Python 3.5+.
Some features (e.g., deep learning, GIS, CV) pull heavier dependencies (TensorFlow, CatBoost, OpenCV, Geo stack). Install times may vary.

Quickstart

from data_science_toolkit.dataframe import DataFrame
from data_science_toolkit.model import Model

# Load a toy dataset
data = DataFrame()
data.load_dataset('iris')
y = data.get_column('target')
data.drop_column('target')

# Fit a decision tree
model = Model(data_x=data.get_dataframe(), data_y=y, model_type='dt', training_percent=0.8)
model.train()
model.report()          # classification metrics
model.cross_validation(5)

Work with Parquet (large data)

from data_science_toolkit.dataframe import DataFrame

# Stream a Parquet dataset efficiently
df = DataFrame(data_path="path/to/parquet/dir", data_type="parquet", n_workers="auto")
summary = df.describe()  # computes per-column stats without loading entire data into RAM
print(summary)

Text Vectorization

from data_science_toolkit.vectorizer import Vectorizer

documents = [
	"data science is fun",
	"toolkits help data workflows",
	"science advances with good tools"
]

vec = Vectorizer(documents_as_list=documents, vectorizer_type='tfidf', ngram_tuple=(1,2))
matrix = vec.get_matrix()
features = vec.get_features_names()
print(len(features), features[:10])

Geospatial Utilities

from data_science_toolkit.gis import GIS

gis = GIS()
gis.add_data_layer("parcels", "data/parcels.geojson", data_type="sf")
gis.add_area_column("parcels", unit="ha")
gis.to_crs("parcels", epsg="3857")
gis.export("parcels", "out/parcels_3857", file_format="geojson")

Computer Vision Helpers

from data_science_toolkit.imagefactory import ImageFactory

img = ImageFactory("data/sample.jpg")
img.to_gray_scale()
img.gaussian_blur((5,5))
img.save("out/processed.jpg")

Documentation

Full API docs and tutorials live at: https://data-science-toolkit.readthedocs.io

Contributing

Contributions and suggestions are welcome via GitHub pull requests.

Typical workflow:

Fork the repo and create a feature branch.
Install dev dependencies: pip install -e ..
Add tests or notebook snippets where relevant.
Open a PR with a clear description and examples.

Maintainership

We’re actively enhancing the repo with new algorithms and utilities. Feedback on priorities is appreciated.

License

MIT License. See the LICENSE file for details.

Citation

If you use DST in academic work, please cite the repository and (optionally) reference the Code Ocean capsule for reproducibility: https://codeocean.com/capsule/1309232/tree

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.67

Feb 3, 2026

0.1.65

Jun 25, 2024

0.1.64

Jan 15, 2024

0.1.63

Jan 15, 2024

0.1.6

Jan 15, 2024

0.1.4

Jun 20, 2023

0.1.3

Jun 16, 2023

0.1.2

Jun 14, 2023

0.1.0

May 13, 2023

0.0.999

May 13, 2023

0.0.998

May 12, 2023

0.0.997

Mar 15, 2023

0.0.994

Nov 21, 2022

0.0.993

Nov 21, 2022

0.0.992

Nov 21, 2022

0.0.991

Oct 26, 2022

0.0.990

Oct 24, 2022

0.0.989

Oct 12, 2022

0.0.988

Oct 6, 2022

0.0.987

Oct 5, 2022

0.0.986

Oct 5, 2022

0.0.985

Oct 5, 2022

0.0.984

Jul 20, 2022

0.0.983

Jun 8, 2022

0.0.982

May 20, 2022

0.0.981

May 20, 2022

0.0.980

May 20, 2022

0.0.979

May 20, 2022

0.0.978

May 19, 2022

0.0.977

Apr 25, 2022

0.0.976

Apr 14, 2022

0.0.975

Apr 14, 2022

0.0.974

Apr 14, 2022

0.0.973

Apr 14, 2022

0.0.971

Apr 14, 2022

0.0.95

Apr 14, 2022

0.0.94

Apr 3, 2022

0.0.93

Mar 27, 2022

0.0.91

Mar 26, 2022

0.0.88

Mar 26, 2022

0.0.86

Mar 25, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_science_toolkit-0.1.67.tar.gz (212.4 kB view details)

Uploaded Feb 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

data_science_toolkit-0.1.67-py3-none-any.whl (217.0 kB view details)

Uploaded Feb 3, 2026 Python 3

File details

Details for the file data_science_toolkit-0.1.67.tar.gz.

File metadata

Download URL: data_science_toolkit-0.1.67.tar.gz
Upload date: Feb 3, 2026
Size: 212.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for data_science_toolkit-0.1.67.tar.gz
Algorithm	Hash digest
SHA256	`0376a08bf100ca2b03ee5fd9ba8bbcdb8a42dbe11d16053a4c8a36615be3561d`
MD5	`37a60591eb1b29d6353e442816f8c3ed`
BLAKE2b-256	`7a1c05e50779522b5ca686cb12d8deb24aab61e03f2986b1a98fbb02df9f5f06`

See more details on using hashes here.

File details

Details for the file data_science_toolkit-0.1.67-py3-none-any.whl.

File metadata

Download URL: data_science_toolkit-0.1.67-py3-none-any.whl
Upload date: Feb 3, 2026
Size: 217.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for data_science_toolkit-0.1.67-py3-none-any.whl
Algorithm	Hash digest
SHA256	`714371e90c9aaaed9fc5f6339e1ca056e515c2bc902dcc4565be35ae0e1d8a62`
MD5	`a5380314428f76840385c4fe5d453914`
BLAKE2b-256	`efa2eaeff787a5c8fed72c68b13053837cee1039afb762b54c1bf1913467b78e`

See more details on using hashes here.

data-science-toolkit 0.1.67

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Data Science Toolkit (DST)

Key Features

Installation

Quickstart

Work with Parquet (large data)

Text Vectorization

Geospatial Utilities

Computer Vision Helpers

Documentation

Contributing

Maintainership

License

Citation

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes