Dataprep: Data Preparation in Python
Project description
DataPrep
Documentation | Mail List & Forum
Dataprep is a collection of functions that helps you accomplish tasks before you build a predictive model.
Implementation Status
Currently, you can use dataprep
to:
- Collect data from common data sources (through
dataprep.data_connector
) - Do your exploratory data analysis (through
dataprep.eda
) - ...
Installation
pip install dataprep
dataprep
is in its alpha stage now, so please manually specific the version number.
Examples & Usages
More detailed examples can be found at the examples folder.
Data Connector
You can download Yelp business search result into a pandas DataFrame, using two lines of code, without taking deep looking into the Yelp documentation!
from dataprep.data_connector import Connector
dc = Connector("yelp", auth_params={"access_token":"<Your yelp access token>"})
df = dc.query("businesses", term="ramen", location="vancouver")
EDA
There are common tasks during the exploratory data analysis stage, like a quick look at the columnar distribution, or understanding the correlations between columns.
The EDA module categorizes these EDA tasks into functions helping you finish EDA tasks with a single function call.
- Want to understand the distributions for each DataFrame column? Use
plot
.
from dataprep.eda import plot
df = ...
plot(df)
- Want to understand the correlation between columns? Use
plot_correlation
.
from dataprep.eda import plot_correlation
df = ...
plot_correlation(df)
- Or, if you want to understand the impact of the missing values for each column, use
plot_missing
.
from dataprep.eda import plot_missing
df = ...
plot_missing(df)
- You can even drill down to get more information by given
plot
,plot_correlation
andplot_missing
a column name.
df = ...
plot_missing(df, x="some_column_name")
Don't forget to checkout the examples folder for detailed demonstration!
Contribution
Contribution is always welcome. If you want to contribute to dataprep, be sure to read the contribution guidelines.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dataprep-0.2.0.tar.gz
.
File metadata
- Download URL: dataprep-0.2.0.tar.gz
- Upload date:
- Size: 42.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.5 CPython/3.7.5 Linux/4.4.0-169-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 120e2b9b1b918716852b664e93297101b0aa333782f7d665853412f573473f41 |
|
MD5 | 1c66715e8c8278304e9a031faee7c758 |
|
BLAKE2b-256 | 2d914cc201188f3ce2afe9c6382c76b1c401e36f2689b7f6ef84d5f5a650d222 |
File details
Details for the file dataprep-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: dataprep-0.2.0-py3-none-any.whl
- Upload date:
- Size: 53.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.5 CPython/3.7.5 Linux/4.4.0-169-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3155a73ab72384d29786f76a801456b5578392cea299115ec73610c0c942a728 |
|
MD5 | 8ba3d5ac578743111eb61e44772bdefa |
|
BLAKE2b-256 | 13bfa3c287c71bb03f3ffaa6835f28adf3f6f107fa0e9111da1252281156d3d7 |