DSTools: Data Science Tools Library
Project description
DSTools: Data Science Tools Library
Authors
DSTools is a Python library designed to assist data scientists and researchers by providing a collection of helpful functions for various stages of a data science project, from data exploration and preprocessing to model evaluation and synthetic data generation.
Table of Contents
Features
- Data Exploration: Quickly get statistics for numerical and categorical features (
describe_numeric,describe_categorical), check for missing values (check_NINF), and visualize correlations (corr_matrix). - Model Evaluation: Comprehensive classification model evaluation (
evaluate_classification,compute_metrics) with clear visualizations (plot_confusion_matrix). - Data Preprocessing: Encode categorical variables (
labeling), handle outliers (remove_outliers_iqr), and scale features (min_max_scale). - Time Series Analysis: Test for stationarity using the Dickey-Fuller test (
test_stationarity). - Synthetic Data Generation: Create complex numerical distributions matching specific statistical moments (
generate_distribution,generate_distribution_from_metrics). - Advanced Statistics: Calculate non-parametric correlation (
chatterjee_correlation), entropy, and KL-divergence. - Utilities: Save/load DataFrames to/from ZIP archives, generate random alphanumeric codes, and more.
Installation
Clone the Repository
git clone https://github.com/s-kav/ds_tools.git
Navigate to the Project Directory
cd ds_tools
Install Dependencies
Ensure you have Python version 3.8 or higher and install the required packages:
pip install -r requirements.txt
Usage
Here's a simple example of how to use the library to evaluate a classification model.
import numpy as np
from ds_tools import DSTools
# 1. Initialize the toolkit
tools = DSTools()
# 2. Generate some dummy data
y_true = np.array([0, 1, 1, 0, 1, 0, 0, 1])
y_probs = np.array([0.1, 0.8, 0.6, 0.3, 0.9, 0.2, 0.4, 0.7])
# 3. Get a comprehensive evaluation report
# This will print metrics and show plots for ROC and Precision-Recall curves.
results = tools.evaluate_classification(true_labels=y_true, pred_probs=y_probs)
# The results are also returned as a dictionary
print(f"\nROC AUC Score: {results['roc_auc']:.4f}")
Full code base for other function testing you can find here.
Function Overview
The library provides a wide range of functions. To see a full, formatted list of available tools, you can use the function_list method:
from ds_tools import DSTools
tools = DSTools()
tools.function_list()
Example
Generating a Synthetic Distribution: need to create a dataset with specific statistical properties? generate_distribution_from_metrics can do that.
from ds_tools import DSTools, DistributionConfig
tools = DSTools()
# Define the desired metrics
metrics_config = DistributionConfig(
mean=1042,
median=330,
std=1500,
min_val=1,
max_val=120000,
skewness=13.2,
kurtosis=245, # Excess kurtosis
n=10000
)
# Generate the data
generated_data = tools.generate_distribution_from_metrics(n=10000, metrics=metrics_config)
print(f"Generated Mean: {np.mean(generated_data):.2f}")
print(f"Generated Std: {np.std(generated_data):.2f}")
Full code base for other function testing you can find here.
Contributing
Contributions are welcome! Please feel free to submit a pull request or open an issue on the GitHub repository.
To contribute:
Fork the repository. Create a new branch for your feature or bugfix. Commit your changes with clear messages. Push to your fork and submit a pull request. Please ensure your code adheres to PEP8 standards and includes appropriate docstrings and comments.
References
For citing you should use:
Sergii Kavun. (2025). s-kav/ds_tools: Version 0.9.1 (v0.9.1). Zenodo. https://doi.org/10.5281/zenodo.15864146
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dscience_tools-1.0.0.tar.gz.
File metadata
- Download URL: dscience_tools-1.0.0.tar.gz
- Upload date:
- Size: 17.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c13bbb2e4dcf299afcace9af2f77f9851f4bb0c688e4b05484dec63b65977c3
|
|
| MD5 |
62bd0e36c90b68f5b2bfbc00105259cb
|
|
| BLAKE2b-256 |
9c398c3abd260018d1d5d640564595d55b603f07057447cd41086478c4341816
|
Provenance
The following attestation bundles were made for dscience_tools-1.0.0.tar.gz:
Publisher:
python-publish.yml on s-kav/ds_tools
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dscience_tools-1.0.0.tar.gz -
Subject digest:
2c13bbb2e4dcf299afcace9af2f77f9851f4bb0c688e4b05484dec63b65977c3 - Sigstore transparency entry: 273447668
- Sigstore integration time:
-
Permalink:
s-kav/ds_tools@a5e43ad643a271338a7ac405fe219f85098d5ba9 -
Branch / Tag:
refs/tags/v.1.0.4 - Owner: https://github.com/s-kav
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@a5e43ad643a271338a7ac405fe219f85098d5ba9 -
Trigger Event:
release
-
Statement type:
File details
Details for the file dscience_tools-1.0.0-py3-none-any.whl.
File metadata
- Download URL: dscience_tools-1.0.0-py3-none-any.whl
- Upload date:
- Size: 4.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1de715dfba887d824b9d9f244587a786aec998dbca525a4f44ad6fc01a93ac22
|
|
| MD5 |
36ba7619b5bfbdbb41c3c318acc8ed15
|
|
| BLAKE2b-256 |
93ec9a388515c3de3274fac7ec453530723c85bd585c3cdb01207bbf5e0a8a82
|
Provenance
The following attestation bundles were made for dscience_tools-1.0.0-py3-none-any.whl:
Publisher:
python-publish.yml on s-kav/ds_tools
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dscience_tools-1.0.0-py3-none-any.whl -
Subject digest:
1de715dfba887d824b9d9f244587a786aec998dbca525a4f44ad6fc01a93ac22 - Sigstore transparency entry: 273447671
- Sigstore integration time:
-
Permalink:
s-kav/ds_tools@a5e43ad643a271338a7ac405fe219f85098d5ba9 -
Branch / Tag:
refs/tags/v.1.0.4 - Owner: https://github.com/s-kav
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@a5e43ad643a271338a7ac405fe219f85098d5ba9 -
Trigger Event:
release
-
Statement type: