Helper tools for the data science workflow
Project description
datamallet
Datamallet is a collection of helpful functions and modules built by Data scientists for Data scientists, to help
expedite the data science workflow.
From a technical standpoint, datamallet is built on top of the following libraries:
- Scikit-learn (for creating the transformer classes).
- Plotly (for the automatic visualization function).
- Pandas (for creating scikit-learn compatible transformers, and for creating utility functions for wrangling data).
- Numpy
- Scipy.
The goal of this project is to help Data scientists become more efficient in their roles by
providing commonly used functionality that have been battle tested and have been contributed by
other Data scientists.
Installation
datamallet is available on pip and can be installed using the command below:
pip install datamallet
Tests
from the main directory, you can run the tests by simply running the pytest command.
pytest
Quick start
Automatic Data Visualization
from datamallet.visualization import AutoPlot
import plotly.express as px
from datamallet.visualization import AutoPlot
tips = px.data.tips()
autoplot = AutoPlot(df=tips,include_scatter=True,include_pie=True,include_box=True, include_sunburst=True, include_violin=True, include_treemap=True, include_histogram=True, include_correlation=True, create_html=True, filename='autoplot')
list_of_charts = autoplot.show()
An html file with filename autoplot.html would be created (check the current directory, a sample is also
found in this repo),
the show method also creates a list of the plotly graph objects, so you have the option of not creating the chart but
using the list of graph objects to display the charts.
for chart in list_of_charts:
chart.show()
Modules
datamallet currently has the following modules
Visualizationmodule which contains helper functions for automatic visualization and for creating different types of charts such as:
-Scatter plots.
-Correlation plots.
-Histogram.
-Box plots.
-Violin plots.
-Treemaps.
-Sun burst Charts.
-Pie Charts.
-Density Contour Charts.
-Density Heatmap.
All these charts can be created automatrically using the Autoplot class in the visualization module,
they can also be created using individual functions in the plot module.
Tabularmodule contains scikit-learn compatible transformers for data manipulation for tabular data, (data which can be found in a table (pandas dataframe) either pure tabular or timeseries). The classes found in the tabular module can be used in a scikit-learn pipeline.
TheTabularmodule contains the following submodules:
-featureswhich contains scikit-learn compatible transformer classes for creating new features (more classes are welcome).
-timeserieswhich contains transformers for manipulating time series data.
-utilswhich contains helper functions for data wrangling and carrying out checks.
preprocesswhich contains transformers for preprocessing data.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datamallet-0.22.0.tar.gz.
File metadata
- Download URL: datamallet-0.22.0.tar.gz
- Upload date:
- Size: 23.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.0 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d64a5aa170cd5f6cbc52d0ab11fb7231f213b6f814a4cadc56dfb4284c282bda
|
|
| MD5 |
aea86ba02c1d2dbe2584490e94d4e901
|
|
| BLAKE2b-256 |
c971e9642a2d54eebd7d49aa192a5d8d782c996aa8102a6298f48e62e6b6e9ba
|
File details
Details for the file datamallet-0.22.0-py3-none-any.whl.
File metadata
- Download URL: datamallet-0.22.0-py3-none-any.whl
- Upload date:
- Size: 28.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.0 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ab9e8dcd81baed4e6c500aef895d38ea464a8760d332df04707f2338e5168fd
|
|
| MD5 |
bf22b34ccb37276ae60bc0c8f1d09c43
|
|
| BLAKE2b-256 |
4ab1173477ea5eab25c62dd46488bc31dd222ca17f7123363bd02b066f8732bf
|