Skip to main content

Helper tools for the data science workflow

Project description

datamallet

Screenshot

Datamallet is a collection of helpful functions and modules built by Data scientists for Data scientists, to help expedite the data science workflow.
From a technical standpoint, datamallet is built on top of the following libraries:

  1. Scikit-learn (for creating the transformer classes).
  2. Plotly (for the automatic visualization function).
  3. Pandas (for creating scikit-learn compatible transformers, and for creating utility functions for wrangling data).
  4. Numpy
  5. Scipy.

The goal of this project is to help Data scientists become more efficient in their roles by providing commonly used functionality that have been battle tested and have been contributed by other Data scientists.

Installation

datamallet is available on pip and can be installed using the command below:

pip install datamallet

Tests

from the main directory, you can run the tests by simply running the pytest command.

pytest

Quick start

Automatic Data Visualization

from datamallet.visualization import AutoPlot
import plotly.express as px
from datamallet.visualization import AutoPlot

tips = px.data.tips()

autoplot = AutoPlot(df=tips,include_scatter=True,include_pie=True,include_box=True, include_sunburst=True, include_violin=True, include_treemap=True, include_histogram=True, include_correlation=True, create_html=True, filename='autoplot')

list_of_charts = autoplot.show()

An html file with filename autoplot.html would be created (check the current directory, a sample is also found in this repo), the show method also creates a list of the plotly graph objects, so you have the option of not creating the chart but using the list of graph objects to display the charts.

for chart in list_of_charts:
chart.show()

Modules

datamallet currently has the following modules

  1. Visualization module which contains helper functions for automatic visualization and for creating different types of charts such as:
    -Scatter plots.
    -Correlation plots.
    -Histogram.
    -Box plots.
    -Violin plots.
    -Treemaps.
    -Sun burst Charts.
    -Pie Charts.
    -Density Contour Charts.
    -Density Heatmap.

All these charts can be created automatrically using the Autoplot class in the visualization module, they can also be created using individual functions in the plot module.

  1. Tabular module contains scikit-learn compatible transformers for data manipulation for tabular data, (data which can be found in a table (pandas dataframe) either pure tabular or timeseries). The classes found in the tabular module can be used in a scikit-learn pipeline.
    The Tabular module contains the following submodules:
    -features which contains scikit-learn compatible transformer classes for creating new features (more classes are welcome).
    -timeseries which contains transformers for manipulating time series data.
    -utils which contains helper functions for data wrangling and carrying out checks.
  • preprocess which contains transformers for preprocessing data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datamallet-0.22.0.tar.gz (23.0 kB view details)

Uploaded Source

Built Distribution

datamallet-0.22.0-py3-none-any.whl (28.0 kB view details)

Uploaded Python 3

File details

Details for the file datamallet-0.22.0.tar.gz.

File metadata

  • Download URL: datamallet-0.22.0.tar.gz
  • Upload date:
  • Size: 23.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.0 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.6

File hashes

Hashes for datamallet-0.22.0.tar.gz
Algorithm Hash digest
SHA256 d64a5aa170cd5f6cbc52d0ab11fb7231f213b6f814a4cadc56dfb4284c282bda
MD5 aea86ba02c1d2dbe2584490e94d4e901
BLAKE2b-256 c971e9642a2d54eebd7d49aa192a5d8d782c996aa8102a6298f48e62e6b6e9ba

See more details on using hashes here.

File details

Details for the file datamallet-0.22.0-py3-none-any.whl.

File metadata

  • Download URL: datamallet-0.22.0-py3-none-any.whl
  • Upload date:
  • Size: 28.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.0 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.6

File hashes

Hashes for datamallet-0.22.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2ab9e8dcd81baed4e6c500aef895d38ea464a8760d332df04707f2338e5168fd
MD5 bf22b34ccb37276ae60bc0c8f1d09c43
BLAKE2b-256 4ab1173477ea5eab25c62dd46488bc31dd222ca17f7123363bd02b066f8732bf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page