Skip to main content

Dataprep: Data Preparation in Python

Project description

DataPrep Build Status

Documentation | Mail List & Forum

Dataprep is a collection of functions that helps you accomplish tasks before you build a predictive model.

Implementation Status

Currently, you can use dataprep to:

  • Collect data from common data sources (through dataprep.data_connector)
  • Do your exploratory data analysis (through dataprep.eda)
  • ...

Installation

pip install dataprep

dataprep is in its alpha stage now, so please manually specific the version number.

Examples & Usages

More detailed examples can be found at the examples folder.

Data Connector

You can download Yelp business search result into a pandas DataFrame, using two lines of code, without taking deep looking into the Yelp documentation!

from dataprep.data_connector import Connector

dc = Connector("yelp", auth_params={"access_token":"<Your yelp access token>"})
df = dc.query("businesses", term="ramen", location="vancouver")

DataConnectorResult

EDA

There are common tasks during the exploratory data analysis stage, like a quick look at the columnar distribution, or understanding the correlations between columns.

The EDA module categorizes these EDA tasks into functions helping you finish EDA tasks with a single function call.

  • Want to understand the distributions for each DataFrame column? Use plot.
from dataprep.eda import plot

df = ...

plot(df)
  • Want to understand the correlation between columns? Use plot_correlation.
from dataprep.eda import plot_correlation

df = ...

plot_correlation(df)
  • Or, if you want to understand the impact of the missing values for each column, use plot_missing.
from dataprep.eda import plot_missing

df = ...

plot_missing(df)
  • You can even drill down to get more information by given plot, plot_correlation and plot_missing a column name.
df = ...

plot_missing(df, x="some_column_name")

Don't forget to checkout the examples folder for detailed demonstration!

Contribution

Contribution is always welcome. If you want to contribute to dataprep, be sure to read the contribution guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataprep-0.2.0.tar.gz (42.9 kB view details)

Uploaded Source

Built Distribution

dataprep-0.2.0-py3-none-any.whl (53.6 kB view details)

Uploaded Python 3

File details

Details for the file dataprep-0.2.0.tar.gz.

File metadata

  • Download URL: dataprep-0.2.0.tar.gz
  • Upload date:
  • Size: 42.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.7.5 Linux/4.4.0-169-generic

File hashes

Hashes for dataprep-0.2.0.tar.gz
Algorithm Hash digest
SHA256 120e2b9b1b918716852b664e93297101b0aa333782f7d665853412f573473f41
MD5 1c66715e8c8278304e9a031faee7c758
BLAKE2b-256 2d914cc201188f3ce2afe9c6382c76b1c401e36f2689b7f6ef84d5f5a650d222

See more details on using hashes here.

File details

Details for the file dataprep-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: dataprep-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 53.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.7.5 Linux/4.4.0-169-generic

File hashes

Hashes for dataprep-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3155a73ab72384d29786f76a801456b5578392cea299115ec73610c0c942a728
MD5 8ba3d5ac578743111eb61e44772bdefa
BLAKE2b-256 13bfa3c287c71bb03f3ffaa6835f28adf3f6f107fa0e9111da1252281156d3d7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page