Skip to main content

A Data science library for data science / data analysis teams

Project description

Dataramp

Code style: black Pylint Flake8 Scikit-learn

Dataramp is a Python library designed to streamline data science and data analysis workflows. It offers a collection of utility functions and tools tailored to assist data science teams in various aspects of their projects.

Key Features

1. Project Management

  • Simplify project setup with a single function call to generate a standardized project directory structure.
  • Organize datasets, model outputs, scripts, notebooks, and more in predefined folders for better project management.

2. Model Saving and Loading

  • Save and load trained machine learning models effortlessly.
  • Supports multiple formats including joblib, pickle, and keras for compatibility with diverse model types.

3. Data Exploration and Visualization

  • Explore datasets and generate summary statistics with ease.
  • Visualize feature distributions and missing data patterns to gain insights into your data.

4. Feature Engineering

  • Handle missing data and outliers effectively.
  • Drop missing columns based on user-defined thresholds and detect outliers using Tukey's Interquartile Range (IQR) method.

5. Model Evaluation and Cross-Validation

  • Evaluate model performance with comprehensive metrics such as accuracy, F1-score, precision, and recall.
  • Generate classification reports and support cross-validation for robust model evaluation.

6. Scaling and Normalization

  • Scale and normalize data using min-max scaling and z-score normalization techniques.
  • Bring features to a common scale for improved model performance.

By providing a range of functionalities, Dataramp aims to enhance productivity and efficiency in data science projects, empowering teams to focus on deriving meaningful insights from their data.

Quickstart

To get started with Dataramp in your data science projects, follow these simple steps:

You can install Dataramp via pip:

pip install dataramp 

To upgrade an existing installation of Dataramp, use:

pip install --upgrade dataramp

Getting Started

Once installed, you can import the library and explore its functionality:

import dataramp as dr

Creating a New Project

To create a new project using Dataramp, run:

dr.core.create_project("project-name")

This will create a project with a structured directory layout to kickstart your project.

Project Directory Structure

project-name/
├── datasets
│   └── dataset.csv
├── outputs
│   └── models
├── README.md
└── src
    ├── notebooks
       └── notebook.ipynb
    └── scripts
        ├── ingest
        └── tests

Sample Usage

import dataramp as dr  # import the dataramp library
import pandas as pd

from dataramp.utils import (
    describe_df,
    get_cat_vars,
    feature_summary,
    display_missing,
    get_unique_counts,
)

df = pd.read_csv("data/iris.csv")  # load iris dataset

df.head() #  Snapshot of your df

missing = display_missing(df)
print(missing)

Project Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataramp-1.0.1.dev184.tar.gz (13.9 kB view details)

Uploaded Source

Built Distribution

dataramp-1.0.1.dev184-py2.py3-none-any.whl (14.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file dataramp-1.0.1.dev184.tar.gz.

File metadata

  • Download URL: dataramp-1.0.1.dev184.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.6

File hashes

Hashes for dataramp-1.0.1.dev184.tar.gz
Algorithm Hash digest
SHA256 c1cebc919820b4459a6560ff0ae9cce9f119d53968e308f92c1d969ea6142c24
MD5 775d0f4c5b3adbe9b3ef2f7fcc474fc1
BLAKE2b-256 921f173af26a71551e098a663cebedfa4a8f9fd9abf63f3bfab74e0eea438b9c

See more details on using hashes here.

File details

Details for the file dataramp-1.0.1.dev184-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for dataramp-1.0.1.dev184-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 4eee1e04346024d9ae1cefdac648daf54e5ac40080c0b94e0ac07cae5cbdb9c5
MD5 4e28b947955ef99a8e285b67dfc18845
BLAKE2b-256 1945cd0b2ef4c6f4884843370ecd92a5e86f882c14a2c7b32a247787e4480e64

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page