Skip to main content

A toolbox for data science

Project description

Introduction

This repo is intended to contain a packaged toolbox of some neat, frequently-used data science code snippets and functions. The intention is that the classes should be compatible with the sklearn library.

Have a look at https://dfds-ds-toolbox.readthedocs.io for user guide.

Already implemented:

  • Model selector for regression and classification problems
  • Profiling tool for generating stats files of the execution time of a function

To be implemented in the future:

  • Preprocessing

    • Imbalanced datasets
    • Outlier detection & handling
    • Missing value imputation
  • Feature generation

    • Binning
    • Type variables, create multiple features
    • Timestamp, seasonality variables
    • Object: onehot, grouping, etc.
  • Performance analysis (plots, summary, error analysis)

More ideas might arise in the future and should be added to the list.

A guide on how to install the package and some working examples of how to use the classes can be found in later sections.

Getting Started

Install locally

We use poetry as the package manager and build tool. Make sure you have poetry installed locally, then run

poetry install

Run tests to see everything working

poetry run pytest

Install this library in another repo

Make sure your virtual environment is activated, then install the required packages

python -m pip install --upgrade pip

If you want to install the package dfds_ds_toolbox version 0.8.0, you should run

pip install dfds_ds_toolbox==0.8.0

Versions

See changelog at GitHub.

Contribute

We want this library to be useful across many data science projects. If you have some standard utilities that you keep using in your projects, please add them here and make a PR.

Releasing a new version

When you want to release a new version of this library to PyPI, there is a few steps you must follow.

  1. Update the version in pyproject.toml. We follow Semantic Versioning, so think about if there is any breaking changes in your release when you increment the version.
  2. Draft a new release in Github. You can follow this link or click the "Draft a new release button" on the "releases" page.
    1. Here you must add a tag in the form "v", for example "v0.9.2". The title should be the same as the tag.
    2. Add release notes. The easiest is to use the button "Auto-generate release notes". That will pull titles of completed pull requests. Modify as needed.
  3. Click "Publish release". That will start a Github Action that will build the package and upload to PyPI. It will also build the documentation website.

Documentation

Website

The full documentation of this package is available at https://dfds-ds-toolbox.readthedocs.io

To build the documentation locally run:

pip install -r docs/requirements.txt
cd docs/
sphinx-apidoc -o . ../dfds_ds_toolbox/ ../*tests*
make html

Now, you can open the documentation site in docs/_build/index.html.

Style

We are using Googles Python style guide convention for docstrings. This allows us to make an up-to-date documentation website for the package.

In short, every function should have a short one-line description, optionally a longer description afterwards and a list of parameters. For example

def example_function(some_parameter: str, optional_param: int=None) -> bool:
    """This function does something super smart.

    Here I will dive into more detail about the smart things.
    I can use several lines for that.

    Args:
        some_parameter: Name of whatever
        optional_param: Number of widgets or something. Only included when all the starts align.

    Returns:
         An indicator describing if something is true.
    """

There are many other style issues that we can run into, but if you follow the Google style guide, you will probably be fine.

Examples

To show the intended use and outcome of some of the included methods, we have included a gallery of plots in examples/. To make a new example create a new file and name it something like plot_<whatever>.py. Start this file with a docstring, for example

"""
Univariate plots
================

For a list of features separate in bins and analysis the target distribution in both Train and Test
"""

and after this add the python code needed to create the example plot.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dfds_ds_toolbox-0.11.0.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

dfds_ds_toolbox-0.11.0-py3-none-any.whl (14.4 kB view details)

Uploaded Python 3

File details

Details for the file dfds_ds_toolbox-0.11.0.tar.gz.

File metadata

  • Download URL: dfds_ds_toolbox-0.11.0.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.2 Linux/6.5.0-1021-azure

File hashes

Hashes for dfds_ds_toolbox-0.11.0.tar.gz
Algorithm Hash digest
SHA256 bff0ae6c778547f11e12344491318f3dd1bdb6f32bf865fb22758d20b38dfc78
MD5 e9ae01444117a8a77720ad5760836d02
BLAKE2b-256 5bc156fad545b50ae51d540c0e3f910492f9da4dffc959e1b364a4a4347996bf

See more details on using hashes here.

File details

Details for the file dfds_ds_toolbox-0.11.0-py3-none-any.whl.

File metadata

  • Download URL: dfds_ds_toolbox-0.11.0-py3-none-any.whl
  • Upload date:
  • Size: 14.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.2 Linux/6.5.0-1021-azure

File hashes

Hashes for dfds_ds_toolbox-0.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aea758176b14642efd7932a5e0e1c5e344882ef69d5a539abcbce9ab59ba06ac
MD5 cdf3f1a1045862413533f62f301f6757
BLAKE2b-256 a34f10357240b0a261816be4e0012d07fc633b1ac4b1b0cd25f5fe76c59a4401

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page